當前位置：首頁 > news >正文

學生做爰網(wǎng)站微信群推廣網(wǎng)站

news 2025/7/4 10:01:44

學生做爰網(wǎng)站,微信群推廣網(wǎng)站,海門市建設(shè)局網(wǎng)站,蘇州找工作這篇博客是一篇來自 Meta AI，關(guān)于指令微調(diào) Llama 2 的擴展說明。旨在聚焦構(gòu)建指令數(shù)據(jù)集，有了它，我們則可以使用自己的指令來微調(diào) Llama 2 基礎(chǔ)模型。目標是構(gòu)建一個能夠基于輸入內(nèi)容來生成指令的模型。這么做背后的邏輯是，模型如…

這篇博客是一篇來自 Meta AI，關(guān)于指令微調(diào) Llama 2 的擴展說明。旨在聚焦構(gòu)建指令數(shù)據(jù)集，有了它，我們則可以使用自己的指令來微調(diào) Llama 2 基礎(chǔ)模型。

目標是構(gòu)建一個能夠基于輸入內(nèi)容來生成指令的模型。這么做背后的邏輯是，模型如此就可以由其他人生成自己的指令數(shù)據(jù)集。這在當想開發(fā)私人個性化定制模型，如發(fā)送推特、寫郵件等，時很方便。這也意味著你可以通過你的郵件來生成一個指令數(shù)據(jù)集，然后用它來訓練一個模型來為你寫郵件。

好，那我們來開始吧？我們將進行:

定義應(yīng)用場景細節(jié)并創(chuàng)建指令的提示詞模板
構(gòu)建指令數(shù)據(jù)集
使用 trl 與 SFTTrainer 指令微調(diào) Llama 2
測試模型、進行推理

1. 定義應(yīng)用場景細節(jié)并創(chuàng)建指令的提示詞模板

在描述應(yīng)用場景前，我們要更好的理解一下究竟什么是指令。

指令是一段文本或提供給大語言模型，類似 Llama，GPT-4 或 Claude，使用的提示詞，用來指導它去生成回復(fù)。指令可以讓人們做到把控對話，約束模型輸出更自然、實用的輸出，并使這些結(jié)果能夠?qū)R用戶的目的。制作清晰的、整潔的指令則是生成高質(zhì)量對話的關(guān)鍵。

指令的例子如下表所示。

能力	示例指令
頭腦風暴	提供一系列新口味的冰淇淋的創(chuàng)意。
分類	根據(jù)劇情概要，將這些電影歸類為喜劇、戲劇或恐怖片。
確定性問答	用一個單詞回答“法國的首都是哪里？”
生成	用羅伯特·弗羅斯特的風格寫一首關(guān)于大自然和季節(jié)變化的詩。
信息提取	從這篇短文中提取主要人物的名字。
開放性問答	為什么樹葉在秋天會變色？用科學的理由解釋一下。
摘要	用 2-3 句話概括一下這篇關(guān)于可再生能源最新進展的文章。

如開頭所述，我們想要微調(diào)模型，以便根據(jù)輸入 (或輸出) 生成指令。我們希望將其用作創(chuàng)建合成數(shù)據(jù)集的方法，以賦予 LLM 和代理個性化能力。

把這個想法轉(zhuǎn)換成一個基礎(chǔ)的提示模板，按照 Alpaca 格式.

###?Instruction:
Use?the?Input?below?to?create?an?instruction,?which?could?have?been?used?to?generate?the?input?using?an?LLM.?###?Input:
Dear?[boss?name],I'm?writing?to?request?next?week,?August?1st?through?August?4th,
off?as?paid?time?off.I?have?some?personal?matters?to?attend?to?that?week?that?require?
me?to?be?out?of?the?office.?I?wanted?to?give?you?as?much?advance?
notice?as?possible?so?you?can?plan?accordingly?while?I?am?away.Please?let?me?know?if?you?need?any?additional?information?from?me?
or?have?any?concerns?with?me?taking?next?week?off.?I?appreciate?you?
considering?this?request.Thank?you,?[Your?name]###?Response:
Write?an?email?to?my?boss?that?I?need?next?week?08/01?-?08/04?off.

2. 創(chuàng)建指令數(shù)據(jù)集

在定義了我們的應(yīng)用場景和提示模板后，我們需要創(chuàng)建自己的指令數(shù)據(jù)集。創(chuàng)建高質(zhì)量的指令數(shù)據(jù)集是獲得良好模型性能的關(guān)鍵。研究表明，“對齊，越少越好” 表明，創(chuàng)建高質(zhì)量、低數(shù)量 (大約 1000 個樣本) 的數(shù)據(jù)集可以達到與低質(zhì)量、高數(shù)量的數(shù)據(jù)集相同的性能。

創(chuàng)建指令數(shù)據(jù)集有幾種方法，包括:

使用現(xiàn)有數(shù)據(jù)集并將其轉(zhuǎn)換為指令數(shù)據(jù)集，例如 FLAN
使用現(xiàn)有的 LLM 創(chuàng)建合成指令數(shù)據(jù)集，例如 Alpaca
人力創(chuàng)建指令數(shù)據(jù)集，例如 Dolly。

每種方法都有其優(yōu)缺點，這取決于預(yù)算、時間和質(zhì)量要求。例如，使用現(xiàn)有數(shù)據(jù)集是最簡單的，但可能不適合您的特定用例，而使用人力可能是最準確的，但必然耗時、昂貴。也可以結(jié)合幾種不同方法來創(chuàng)建指令數(shù)據(jù)集，如 Orca: Progressive Learning from Complex Explanation Traces of GPT-4.。

為了簡單起見，我們將使用 **Dolly**，這是一個開源的指令跟蹤記錄數(shù)據(jù)集，由數(shù)千名 Databricks 員工在 InstructGPT paper 中描述的幾個行為類別中生成，包括頭腦風暴、分類、確定性回答、生成、信息提取、開放性回答和摘要。

開始編程吧，首先，我們來安裝依賴項。

!pip?install?"transformers==4.31.0"?"datasets==2.13.0"?"peft==0.4.0"?"accelerate==0.21.0"?"bitsandbytes==0.40.2"?"trl==0.4.7"?"safetensors>=0.3.1"?--upgrade

我們使用 🤗 Datasets library 的 load_dataset() 方法加載 databricks/databricks-dolly-15k?數(shù)據(jù)集。

from?datasets?import?load_dataset
from?random?import?randrange#?從hub加載數(shù)據(jù)集
dataset?=?load_dataset("databricks/databricks-dolly-15k",?split="train")print(f"dataset?size:?{len(dataset)}")
print(dataset[randrange(len(dataset))])
#?dataset?size:?15011

為了指導我們的模型，我們需要將我們的結(jié)構(gòu)化示例轉(zhuǎn)換為通過指令描述的任務(wù)集合。我們定義一個 formatting_function ，它接受一個樣本并返回一個符合格式指令的字符串。

def?format_instruction(sample):return?f"""###?Instruction:
Use?the?Input?below?to?create?an?instruction,?which?could?have?been?used?to?generate?the?input?using?an?LLM.?###?Input:
{sample['response']}###?Response:
{sample['instruction']}
"""

我們來在一個隨機的例子上測試一下我們的結(jié)構(gòu)化函數(shù)。

from?random?import?randrangeprint(format_instruction(dataset[randrange(len(dataset))]))

3. 使用 `trl` 和`SFTTrainer` 指令微調(diào) Llama 2

我們將使用最近在由 Tim Dettmers 等人的發(fā)表的論文“QLoRA: Quantization-aware Low-Rank Adapter Tuning for Language Generation”中介紹的方法。QLoRA 是一種新的技術(shù)，用于在微調(diào)期間減少大型語言模型的內(nèi)存占用，且并不會降低性能。QLoRA 的 TL;DR; 是這樣工作的:

將預(yù)訓練模型量化為 4bit 位并凍結(jié)它。
附加輕量化的、可訓練的適配器層。(LoRA)
在使用凍結(jié)的量化模型基于文本內(nèi)容進行微調(diào)時，僅微調(diào)適配器層參數(shù)。

如果您想了解有關(guān) QLoRA 及其工作原理的更多信息，我建議您閱讀 Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA 博客文章。

Flash Attention (快速注意力)

Flash Attention 是一種經(jīng)過重新排序的注意力計算方法，它利用經(jīng)典技術(shù) (排列、重計算) 來顯著加快速度，將序列長度的內(nèi)存使用量從二次降低到線性。它基于論文“FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”。

TL;DR; 將訓練加速了 3 倍。在這兒獲得更多信息 FlashAttention。Flash Attention 目前僅支持 Ampere (A10, A40, A100, …) & Hopper (H100, …) GPU。你可以檢查一下你的 GPU 是否支持，并用下面的命令來安裝它:

注意: 如果您的機器的內(nèi)存小于 96GB，而 CPU 核心數(shù)足夠多，請減少 MAX_JOBS 的數(shù)量。在我們使用的 g5.2xlarge 上，我們使用了 4 。

python?-c?"import?torch;?assert?torch.cuda.get_device_capability()[0]?>=?8,?'Hardware?not?supported?for?Flash?Attention'"
pip?install?ninja?packaging
MAX_JOBS=4?pip?install?flash-attn?--no-build-isolation

_安裝 flash attention 是會需要一些時間 (10-45 分鐘)_。

該示例支持對所有 Llama 檢查點使用 Flash Attention，但默認是未啟用的。要開啟 Flash Attention，請取消代碼塊中這段的注釋， # COMMENT IN TO USE FLASH ATTENTION 。

import?torch
from?transformers?import?AutoTokenizer,?AutoModelForCausalLM,?BitsAndBytesConfiguse_flash_attention?=?False#?COMMENT?IN?TO?USE?FLASH?ATTENTION
#?replace?attention?with?flash?attention?
#?if?torch.cuda.get_device_capability()[0]?>=?8:
#?????from?utils.llama_patch?import?replace_attn_with_flash_attn
#?????print("Using?flash?attention")
#?????replace_attn_with_flash_attn()
#?????use_flash_attention?=?True#?Hugging?Face?模型id
model_id?=?"NousResearch/Llama-2-7b-hf"?#?non-gated
#?model_id?=?"meta-llama/Llama-2-7b-hf"?#?gated#?BitsAndBytesConfig?int-4?config?
bnb_config?=?BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_use_double_quant=True,bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16
)#?加載模型與分詞器
model?=?AutoModelForCausalLM.from_pretrained(model_id,?quantization_config=bnb_config,?use_cache=False,?device_map="auto")
model.config.pretraining_tp?=?1?#?通過對比doc中的字符串，驗證模型是在使用flash?attention
if?use_flash_attention:from?utils.llama_patch?import?forward????assert?model.model.layers[0].self_attn.forward.__doc__?==?forward.__doc__,?"Model?is?not?using?flash?attention"tokenizer?=?AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token?=?tokenizer.eos_token
tokenizer.padding_side?=?"right"

SFTTrainer 支持與 peft 的本地集成，這使得高效地指令微調(diào)LLM變得非常容易。我們只需要創(chuàng)建 LoRAConfig 并將其提供給訓練器。

from?peft?import?LoraConfig,?prepare_model_for_kbit_training,?get_peft_model#?基于?QLoRA?論文來配置?LoRA
peft_config?=?LoraConfig(lora_alpha=16,lora_dropout=0.1,r=64,bias="none",task_type="CAUSAL_LM",?
)#?為訓練準備好模型
model?=?prepare_model_for_kbit_training(model)
model?=?get_peft_model(model,?peft_config)

在開始訓練之前，我們需要定義自己想要的超參數(shù) (TrainingArguments)。

from?transformers?import?TrainingArgumentsargs?=?TrainingArguments(output_dir="llama-7-int4-dolly",num_train_epochs=3,per_device_train_batch_size=6?if?use_flash_attention?else?4,gradient_accumulation_steps=2,gradient_checkpointing=True,optim="paged_adamw_32bit",logging_steps=10,save_strategy="epoch",learning_rate=2e-4,bf16=True,tf32=True,max_grad_norm=0.3,warmup_ratio=0.03,lr_scheduler_type="constant",disable_tqdm=True?#?當配置的參數(shù)都正確后可以關(guān)閉tqdm
)

我們現(xiàn)在有了用來訓練模型 SFTTrainer 所需要準備的每一個模塊。

from?trl?import?SFTTrainermax_seq_length?=?2048?#?數(shù)據(jù)集的最大長度序列trainer?=?SFTTrainer(model=model,train_dataset=dataset,peft_config=peft_config,max_seq_length=max_seq_length,tokenizer=tokenizer,packing=True,formatting_func=format_instruction,?args=args,
)

通過調(diào)用 Trainer 實例上的 train() 方法來訓練我們的模型。

#?訓練
trainer.train()?#?tqdm關(guān)閉后將不顯示進度條信息#?保存模型
trainer.save_model()

不使用 Flash Attention 的訓練過程在 g5.2xlarge 上花費了 03:08:00。實例的成本為 1,212$/h ，總成本為 3.7$ 。

使用 Flash Attention 的訓練過程在 g5.2xlarge 上花費了 02:08:00。實例的成本為 1,212$/h ，總成本為 2.6$ 。

使用 Flash Attention 的結(jié)果令人滿意，速度提高了 1.5 倍，成本降低了 30%。

4. 測試模型、進行推理

在訓練完成后，我們想要運行和測試模型。我們會使用 peft 和 transformers 將 LoRA 適配器加載到模型中。

if?use_flash_attention:#?停止?flash?attentionfrom?utils.llama_patch?import?unplace_flash_attn_with_attnunplace_flash_attn_with_attn()import?torch
from?peft?import?AutoPeftModelForCausalLM
from?transformers?import?AutoTokenizerargs.output_dir?=?"llama-7-int4-dolly"#?加載基礎(chǔ)LLM模型與分詞器
model?=?AutoPeftModelForCausalLM.from_pretrained(args.output_dir,low_cpu_mem_usage=True,torch_dtype=torch.float16,load_in_4bit=True,
)?
tokenizer?=?AutoTokenizer.from_pretrained(args.output_dir)

我們來再次用隨機樣本加載一次數(shù)據(jù)集，試著來生成一條指令。

from?datasets?import?load_dataset?
from?random?import?randrange#?從hub加載數(shù)據(jù)集并得到一個樣本
dataset?=?load_dataset("databricks/databricks-dolly-15k",?split="train")
sample?=?dataset[randrange(len(dataset))]prompt?=?f"""###?Instruction:
Use?the?Input?below?to?create?an?instruction,?which?could?have?been?used?to?generate?the?input?using?an?LLM.?###?Input:
{sample['response']}###?Response:
"""input_ids?=?tokenizer(prompt,?return_tensors="pt",?truncation=True).input_ids.cuda()
#?with?torch.inference_mode():
outputs?=?model.generate(input_ids=input_ids,?max_new_tokens=100,?do_sample=True,?top_p=0.9,temperature=0.9)print(f"Prompt:\n{sample['response']}\n")
print(f"Generated?instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(),?skip_special_tokens=True)[0][len(prompt):]}")
print(f"Ground?truth:\n{sample['instruction']}")

太好了！我們的模型可以工作了！如果想要加速我們的模型，我們可以使用 Text Generation Inference 部署它。因此我們需要將我們適配器的參數(shù)合并到基礎(chǔ)模型中去。

from?peft?import?AutoPeftModelForCausalLMmodel?=?AutoPeftModelForCausalLM.from_pretrained(args.output_dir,low_cpu_mem_usage=True,
)?#?合并?LoRA?與?base?model
merged_model?=?model.merge_and_unload()#?保存合并后的模型
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")#?push合并的模型到hub上
#?merged_model.push_to_hub("user/repo")
#?tokenizer.push_to_hub("user/repo")

原文作者: Philschmid

原文鏈接:?https://www.philschmid.de/instruction-tune-llama-2

譯者: Xu Haoran

查看全文

http://m.aloenet.com.cn/news/37336.html

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

學生做爰網(wǎng)站微信群推廣網(wǎng)站

1. 定義應(yīng)用場景細節(jié)并創(chuàng)建指令的提示詞模板

2. 創(chuàng)建指令數(shù)據(jù)集

3. 使用 `trl` 和`SFTTrainer` 指令微調(diào) Llama 2

Flash Attention (快速注意力)

4. 測試模型、進行推理

相關(guān)文章：

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

1. 定義應(yīng)用場景細節(jié)并創(chuàng)建指令的提示詞模板

2. 創(chuàng)建指令數(shù)據(jù)集

3. 使用 trl 和SFTTrainer 指令微調(diào) Llama 2

Flash Attention (快速注意力)

4. 測試模型、進行推理

相關(guān)文章：

3. 使用 `trl` 和`SFTTrainer` 指令微調(diào) Llama 2

4. 測試模型、進行推理