TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens. - jzhang38/TinyLlama
We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. You can find the evaluation results of TinyLlama in EVAL.md. We will be rolling out intermediate checkpoints following the below schedule. We are crafting a note offering possible explaination on why there is a significant improvement from 2T to 2.5T checkpoint (It is related to bos_id issue) Note that the learning rate of the base model has not cooled down yet so we recommend you to also use the finetuned chat model. Meanwhile, you can track the live cross entropy loss here. Tiny but strong language models are useful for many applications. Here are some potential usecases: Below are some details of our training setup: Our codebase supports the following features: The fact that TinyLlama is a relatively small model with grouped query attention means it is also fast during inference. Below are some throughputs that we measure: Please refer to PRETRAIN.md for instructions on how to pretrain TinyLlama. This project is still under active development. We are a really small team. Community feedback and contributions are highly appreciated. Here are some things we plan to work on: If you find our work valuable, please cite: Above is the training loss curve taken from the Llama 2 paper. Here I quote from that paper: "We observe that after pretraining on 2T Tokens, the models still did not show any sign of saturation". That is why we believe pretraining a 1.1B model for 3T tokens is a reasonable thing to do. Even if the loss curve does not go down eventually, we can still study the phenomenon of saturation and learn something from it. The figure from the Pythia paper displays the LAMBADA accuracy plotted against the total training tokens (300B). The term "saturation" pertains specifically to the 70M and 160M models. Notably, even the 410M model does not saturate with 300B tokens, as it continues to show an increasing trend, similar to the trend of larger models. The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.
Codestral
Empowering developers and democratising coding with Mistral AI.
Based on the provided content, there is insufficient meaningful user feedback to summarize opinions about Codestral. The social mentions consist mainly of unrelated spam posts, generic links, and simple YouTube video titles that just say "Codestral AI" without any actual reviews or user commentary. The only potentially relevant mention is a GitHub pricing update for vertex-ai that adds 70 new models, but this doesn't provide specific user sentiment about Codestral itself. More substantial user reviews and genuine social media discussions would be needed to provide an accurate assessment of user opinions about this software tool.
TinyLlama
Codestral
TinyLlama
Codestral
TinyLlama (3)
Codestral (3)
Only in TinyLlama (10)
Only in Codestral (8)
TinyLlama
No data yet
Codestral
TinyLlama
Codestral