BLOOM: Revolutionizing the World of Multilingual Language Models
Large language models (LLMs) have undeniably left a significant mark on AI research. These robust and general-purpose models can handle a diverse range of new language tasks based on user instructions.
However, a major hurdle has been faced by academia, nonprofits, and smaller companies' research labs. They have struggled to create, study, or even utilize LLMs as only a select few industrial labs with ample resources and exclusive rights have had full access to them.
Enter BLOOM, the world's largest open multilingual language model. Trained with complete transparency, it is the outcome of the most extensive collaboration of AI researchers ever witnessed in a single research project.
With a staggering 176 billion parameters, BLOOM can generate text in 46 natural languages and 13 programming languages. For numerous languages like Spanish, French, and Arabic, it is the first language model with over 100B parameters to be created.
This remarkable feat was achieved through the efforts of over 1000 researchers from more than 70 countries and 250+ institutions. The training of the BLOOM model took place on the Jean Zay supercomputer in the south of Paris, France, over a period of 117 days (March 11 - July 6), courtesy of a compute grant worth approximately €3M from French research agencies CNRS and GENCI.
Researchers now have the opportunity to download, run, and study BLOOM. This allows them to delve deep into the performance and behavior of recently developed large language models, right down to their innermost workings.
Moreover, any individual or institution that agrees to the terms of the model's Responsible AI License (developed during the BigScience project itself) can utilize and build upon the model on a local machine or via a cloud provider. Thanks to its integration with the Hugging Face ecosystem, it's as simple as importing it with transformers and running it with accelerate.
In a spirit of collaboration and continuous improvement, the intermediary checkpoints and optimizer states of the training are also being released for the first time. And for those without access to 8 A100s, an inference API is being finalized for large-scale use even without dedicated hardware or engineering. In the meantime, an early version is available on the HF hub for quick tests, prototyping, and lower-scale use.
BLOOM's capabilities are set to expand further. Work is already underway to make it as instructable as T0++ and to add more languages. The model will also be compressed into a more usable version while maintaining the same level of performance. It will serve as a starting point for more complex architectures, opening up a world of possibilities for researchers and practitioners to conduct all the experiments they've always desired, starting with the power of a 100+ billion parameter model.
In essence, BLOOM is not just a one-time wonder but the seed of a growing family of models, and the community's efforts to expand it will be fully supported.