TinyLlama vs StarCoder — Features, Pricing & Reviews Compared

TinyLlama

open-source-model

StarCoder

open-source-model

Overview

What each tool does and who it's for

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens. - jzhang38/TinyLlama

We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint. You can find the evaluation results of TinyLlama in EVAL.md. We will be rolling out intermediate checkpoints following the below schedule. We are crafting a note offering possible explaination on why there is a significant improvement from 2T to 2.5T checkpoint (It is related to bos_id issue) Note that the learning rate of the base model has not cooled down yet so we recommend you to also use the finetuned chat model. Meanwhile, you can track the live cross entropy loss here. Tiny but strong language models are useful for many applications. Here are some potential usecases: Below are some details of our training setup: Our codebase supports the following features: The fact that TinyLlama is a relatively small model with grouped query attention means it is also fast during inference. Below are some throughputs that we measure: Please refer to PRETRAIN.md for instructions on how to pretrain TinyLlama. This project is still under active development. We are a really small team. Community feedback and contributions are highly appreciated. Here are some things we plan to work on: If you find our work valuable, please cite: Above is the training loss curve taken from the Llama 2 paper. Here I quote from that paper: "We observe that after pretraining on 2T Tokens, the models still did not show any sign of saturation". That is why we believe pretraining a 1.1B model for 3T tokens is a reasonable thing to do. Even if the loss curve does not go down eventually, we can still study the phenomenon of saturation and learn something from it. The figure from the Pythia paper displays the LAMBADA accuracy plotted against the total training tokens (300B). The term "saturation" pertains specifically to the 70M and 160M models. Notably, even the 410M model does not saturate with 300B tokens, as it continues to show an increasing trend, similar to the trend of larger models. The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.

StarCoder

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia. As such it is not an instruction model and commands like "Write a function that computes the square root." do not work well. Here are some examples to get started with the model. You can find a script for fine-tuning in StarCoder2's GitHub repository. First, make sure to install transformers from source: The pretraining dataset of the model was filtered for permissive licenses and code with no license only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a search index that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code. The model has been trained on source code from 600+ programming languages. The predominant language in source is English although other languages are also present. As such the model is capable to generate code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient, contain bugs or exploits. See the paper for an in-depth discussion of the model limitations. The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement here.

Key Metrics

—

Avg Rating

—

Mentions (30d)

8,930

GitHub Stars

2,050

605

GitHub Forks

192

—

npm Downloads/wk

—

PyPI Downloads/mo

—

Community Sentiment

How developers feel about each tool based on mentions and reviews

TinyLlama

0% positive100% neutral0% negative

StarCoder

0% positive100% neutral0% negative

Pricing

TinyLlama

tiered

StarCoder

tiered

Use Cases

When to use each tool

TinyLlama (3)

Enabling real-time dialogue generation in video games.reference for enthusiasts keen on pretraining language models under 5 billion parametersTraining Details

Features

Only in TinyLlama (10)

2023-09-28: Add a discord server.Enabling real-time dialogue generation in video games.multi-gpu and multi-node distributed training with FSDP.flash attention 2.fused layernorm.fused swiglu.fused cross entropy loss .fused rotary positional embedding.EvaluationReleases Schedule

Only in StarCoder (6)

bigcode/the-stack-v2-train-full-ids💫 StarCoder2StarCoder 2 and The Stack v2: The Next GenerationEfficient Training of Language Models to Fill in the MiddleFlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessLongformer: The Long-Document Transformer

Developer Ecosystem

GitHub Repos

600

GitHub Followers

1,804

—

npm Packages

—

HuggingFace Models

—

SO Reputation

—

Product Screenshots

TinyLlama

StarCoder

Company Intel

information technology & services

Industry

information technology & services

6,000

Employees

690

$7.9B

Funding

$395.7M

Other

Stage

Series D

Supported Languages & Categories

TinyLlama

AI/MLFinTechDevOpsSecurityDeveloper Tools

StarCoder

AI/ML

View TinyLlama Profile View StarCoder Profile

TinyLlama

StarCoder

TinyLlama vs StarCoder — Comparison

TinyLlama

StarCoder

TinyLlama vs StarCoder — Comparison