Description from the site:

Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.
Mistral 7B in short

Mistral 7B is a 7.3B parameter model that:

    Outperforms Llama 2 13B on all benchmarks
    Outperforms Llama 1 34B on many benchmarks
    Approaches CodeLlama 7B performance on code, while remaining good at English tasks
    Uses Grouped-query attention (GQA) for faster inference
    Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.

    Download it and use it anywhere (including locally) with our reference implementation
    Deploy it on any cloud (AWS/GCP/Azure), using vLLM inference server and skypilot
    Use it on HuggingFace

Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.

  • TheChurn@kbin.social
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    how much VRAM you need to run this model

    It will depend on the representation of the parameters. Most models support bfloat16, where each parameters is 16-bits (2 Bytes). For these models, every Billion parameters needs roughly 2 GB of VRAM.

    It is possible to reduce the memory footprint by using 8 bits for each param, and some models support this, but they start to get very stupid.