日期:
来源:新智元收集编辑:新智元
新智元报道
新智元报道
【新智元导读】现在,Meta最新的大语言模型LLaMA,可以在搭载苹果芯片的Mac上跑了!
在M1/M2的Mac上跑LLaMA
-p '第一个登上月球的人是' 第一个登上月球的人是38岁的宇航员Neil A. Armstrong。 1969年7月20日,阿波罗11号在月球上着陆。 阿波罗11号是第一个登陆月球的载人任务,也是太空竞赛的高潮。1969年7月,Neil Armstrong和Edwin "Buzz" Aldrin成为第一批在月球上登陆的人类。 阿波罗11号于7月16日从肯尼迪航天中心发射。
-p 'def open_and_return_content(filename):'
def open_and_return_content(filename):
""" Opens file (returning the content) and performs basic sanity checks """if os.path.isfile(filename):
with open(filename) as f:
content = f.read()
return contentelse:
print('WARNING: file "{}" does not exist'.format(filename), file=sys.stderr)
return ''def get_file_info(filename, fullpath):
""" Get file information (i.e., permission, owner, group, size) """
接下来,我们就来看看具体是如何实现的。
第一步:下载模型
第二步:安装依赖项
xcode-select --install
brew install pkgconfig cmake
/opt/homebrew/bin/python3.11 -m venv venv
. venv/bin/activate.fish
pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu
如果你对利用新的Metal性能着色器(MPS)后端进行GPU训练加速感兴趣,可以通过运行以下程序来进行验证。但这不是在M1上运行LLaMA的必要条件。
python
Python 3.11.2 (main, Feb 16 2023, 02:55:59) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch; torch.backends.mps.is_available()True
第三步:编译LLaMA CPP
git clone git@github.com:ggerganov/llama.cpp.git
make
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c utils.cpp -o utils.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread main.cpp ggml.o utils.o -o main -framework Accelerate
./main -h
usage: ./main [options]
options:
-h, --help show this help message and exit
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin)
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize -framework Accelerate
第四步:转换模型
python convert-pth-to-ggml.py models/7B 1
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': 32000}n_parts = 1Processing part 0Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: norm.weight with shape: torch.Size([4096]) and type: torch.float16
Converting to float32
Processing variable: output.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.f
loat16
Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.f
loat16
Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.f
loat16
Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.f
loat16
Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: tor
ch.float16
Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: tor
ch.float16
Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: tor
ch.float16
Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float
16...
Done. Output file: models/7B/ggml-model-f16.bin, (part 0 )
下一步将是进行量化处理:
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
输出如下:
llama_model_quantize: loading model from './models/7B/ggml-model-f16.bin'llama_model_quantize: n_vocab = 32000llama_model_quantize: n_ctx = 512llama_model_quantize: n_embd = 4096llama_model_quantize: n_mult = 256llama_model_quantize: n_head = 32llama_model_quantize: n_layer = 32llama_model_quantize: f16 = 1...
layers.31.attention_norm.weight - [ 4096, 1], type = f32 size = 0.016 MB
layers.31.ffn_norm.weight - [ 4096, 1], type = f32 size = 0.016 MB
llama_model_quantize: model size = 25705.02 MB
llama_model_quantize: quant size = 4017.27 MB
llama_model_quantize: hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022
main: quantize time = 29389.45 ms
main: total time = 29389.45 ms
第五步:运行模型
./main -m ./models/7B/ggml-model-q4_0.bin \
-t 8 \
-n 128 \
-p 'The first president of the USA was '
main: seed = 1678615879llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000llama_model_load: n_ctx = 512llama_model_load: n_embd = 4096llama_model_load: n_mult = 256llama_model_load: n_head = 32llama_model_load: n_layer = 32llama_model_load: n_rot = 128llama_model_load: f16 = 2llama_model_load: n_ff = 11008llama_model_load: n_parts = 1llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'llama_model_load: .................................... donellama_model_load: model size = 4017.27 MB / num tensors = 291
main: prompt: 'The first president of the USA was 'main: number of tokens in prompt = 9 1 -> '' 1576 -> 'The' 937 -> ' first' 6673 -> ' president' 310 -> ' of' 278 -> ' the' 8278 -> ' USA' 471 -> ' was' 29871 -> ' '
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000
The first president of the USA was 57 years old when he assumed office (George Washington). Nowadays, the US electorate expects the new president to be more young at heart. President Donald Trump was 70 years old when he was inaugurated. In contrast to his predecessors, he is physically fit, healthy and active. And his fitness has been a prominent theme of his presidency. During the presidential campaign, he famously said he
would be the “most active president ever” — a statement Trump has not yet achieved, but one that fits his approach to the office. His tweets demonstrate his physical activity.
main: mem per token = 14434244 bytes
main: load time = 1311.74 ms
main: sample time = 278.96 ms
main: predict time = 7375.89 ms / 54.23 ms per token
main: total time = 9216.61 ms
资源使用情况
没有指令微调