👋 Welcome to MLC LLM¶

Discord | GitHub

Machine Learning Compilation for Large Language Models (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone’s devices with ML compilation techniques.

Getting Started¶

To begin with, try out MLC LLM support for int4-quantized Llama2 7B. It is recommended to have at least 6GB free VRAM to run it.

Install MLC Chat Python. MLC LLM is available via pip. It is always recommended to install it in an isolated conda virtual environment.

Download pre-quantized weights. The comamnds below download the int4-quantized Llama2-7B from HuggingFace:

git lfs install && mkdir -p dist/prebuilt
git clone https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4f16_1 \
                          dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1

Download pre-compiled model library. The pre-compiled model library is available as below:

git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib

Run in Python. The following Python script showcases the Python API of MLC LLM and its stream capability:

from mlc_chat import ChatModule
from mlc_chat.callback import StreamToStdout

cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1")
cm.generate(prompt="What is the meaning of life?", progress_callback=StreamToStdout(callback_interval=2))

Colab walkthrough. A Jupyter notebook on Colab is provided with detailed walkthrough of the Python API.

Documentation and tutorial. Python API reference and its tutorials are available online.

https://raw.githubusercontent.com/mlc-ai/web-data/main/images/mlc-llm/tutorials/python-api.jpg

MLC LLM Python API¶