ð€ Accelerateã¯ãPyTorchã®ãããã§éåžžã«å€§èŠæš¡ãªã¢ãã«ãå®è¡ããæ¹æ³ã§ã
'ð€ Accelerateã¯ãPyTorchã«ãã£ãŠå€§èŠæš¡ãªã¢ãã«ãå®è¡ããæ¹æ³ã§ã'
倧èŠæš¡ã¢ãã«ã®èªã¿èŸŒã¿ãšå®è¡
Meta AIãšBigScienceã¯æè¿ãã»ãšãã©ã®äžè¬çãªããŒããŠã§ã¢ã®ã¡ã¢ãªïŒRAMãŸãã¯GPUïŒã«åãŸããªãéåžžã«å€§ããªèšèªã¢ãã«ããªãŒãã³ãœãŒã¹åããŸãããHugging Faceã§ã¯ãç§ãã¡ã®äœ¿åœã®äžéšãšããŠããããã®å€§ããªã¢ãã«ã«ã¢ã¯ã»ã¹ã§ããããã«ããããã®ããŒã«ãéçºããŸããããã®ãããã¹ãŒããŒã³ã³ãã¥ãŒã¿ãææããŠããªããŠãããããã®ã¢ãã«ãå®è¡ã§ããããã«ããããã®ããŒã«ãéçºããŸããããã®ããã°æçš¿ã§éžã°ãããã¹ãŠã®äŸã¯ãç¡æã®Colabã€ã³ã¹ã¿ã³ã¹ïŒå¶éä»ãã®RAMãšãã£ã¹ã¯å®¹éïŒã§å®è¡ãããŸãããã£ã¹ã¯å®¹éã«äœè£ãããå Žåã¯ããã倧ããªãã§ãã¯ãã€ã³ããéžæããããšãã§ããŸãã
ããã§ã¯ãOPT-6.7Bãå®è¡ããæ¹æ³ã瀺ããŸã:
import torch
from transformers import pipeline
# ããã¯åºæ¬çãªColabã€ã³ã¹ã¿ã³ã¹ã§åäœããŸãã
# ããæéãããã£ãŠãåŸ
ã€æéãšååãªãã£ã¹ã¯å®¹éãããå Žåã¯ããã倧ããªãã§ãã¯ãã€ã³ããéžæããŠãã ããïŒ
checkpoint = "facebook/opt-6.7b"
generator = pipeline("text-generation", model=checkpoint, device_map="auto", torch_dtype=torch.float16)
# æšè«ãå®è¡ããŸã
generator("More and more large language models are opensourced so Hugging Face has")
ãããã®åŒæ°ãããããäœãæå³ããã®ãã«ã€ããŠã¯ããŸããªã説æããŸããããŸãã¯PyTorchã§ã®äŒçµ±çãªã¢ãã«ã®èªã¿èŸŒã¿ãã€ãã©ã€ã³ãèããŠã¿ãŸãããïŒéåžžã¯æ¬¡ã®ãããªæé ã§è¡ãããŸãïŒ
- ã¢ãã«ã®äœæ
- ãã®éã¿ãã¡ã¢ãªã«èªã¿èŸŒãïŒéåžžã¯
state_dict
ãšåŒã°ãããªããžã§ã¯ãã«å«ãŸããŸãïŒ - èªã¿èŸŒãã éã¿ãäœæããã¢ãã«ã«ããŒããã
- æšè«çšã«ã¢ãã«ãããã€ã¹ã«ç§»åãã
ãããŸã§ããã§ããŸããã£ãŠããŸããããéåžžã«å€§ããªã¢ãã«ã§ã¯ãã®ã¢ãããŒããé£ãããªããŸããããã§éžãã ã¢ãã«ã¯67åã®ãã©ã¡ãŒã¿ãæã£ãŠããŸããããã©ã«ãã®ç²ŸåºŠã§ã¯ããã£ã1ã¹ãããïŒã¢ãã«ã®äœæïŒã§ããã26.8GBã®RAMãå¿ èŠã§ãïŒfloat32ã®ãã©ã¡ãŒã¿ã¯ã¡ã¢ãªäžã§4ãã€ããå æããŸãïŒãããã¯Colabã®RAMã«ãåãŸããŸããã
次ã«ãã¹ããã2ã§ã¯ã¢ãã«ã®ã¡ã¢ãªã«ãã1ã€ã®ã³ããŒãèªã¿èŸŒã¿ãŸãïŒããã©ã«ãã®ç²ŸåºŠã§ã¯ããã«26.8GBã®RAMãå¿ èŠã§ãïŒãããæ倧ã®ã¢ãã«ãäŸãã°BLOOMãŸãã¯OPT-176BïŒãããã1760åã®ãã©ã¡ãŒã¿ãæã€ïŒããã®ããã«èªã¿èŸŒãããšããå Žåã1.4ãã©ãã€ãã®CPU RAMãå¿ èŠã«ãªããŸããããã¯ããéå°ã§ãïŒãããŠãå šãŠã®éã¿ã1ã€ã®GPUïŒãŸãã¯è€æ°ã®GPUïŒã«ç§»åããããã ãã«ããããã¹ãŠãè¡ãããŸãã
æããã«ãããã¹ããŒããªæ¹æ³ãå¿ èŠã§ãããã®ããã°æçš¿ã§ã¯ãAccelerateãPyTorchã®æ©èœã掻çšããŠéåžžã«å€§ããªã¢ãã«ãããŒãããŠæšè«ãå®è¡ããæ¹æ³ã«ã€ããŠèª¬æããŸããããã«ãããã¡ã¢ãªãŸãã¯1ã€ã®GPUã«åãŸããªãå Žåã§ããäžèšã®ããã»ã¹ã次ã®ããã«å€æŽãããŸãïŒ
- 空ã®ïŒéã¿ã®ãªãïŒã¢ãã«ãäœæãã
- åã¬ã€ã€ãŒãã©ãã«é 眮ãããã決å®ããïŒè€æ°ã®ããã€ã¹ãå©çšå¯èœãªå ŽåïŒ
- éã¿ã®äžéšãã¡ã¢ãªã«èªã¿èŸŒã
- 空ã®ã¢ãã«ã«éã¿ãããŒããã
- éã¿ãæšè«çšã®ããã€ã¹ã«ç§»åãã
- ãã¹ãŠã®éã¿ãèªã¿èŸŒãŸãããŸã§ã¹ããã3ããç¹°ãè¿ã
空ã®ã¢ãã«ã®äœæ
PyTorch 1.9ã§ã¯ãã¡ã¿ããã€ã¹ãšããæ°ããããã€ã¹ãå°å ¥ãããŸãããããã«ãããããŒã¿ãé¢é£ä»ããããŠããªããã³ãœã«ãäœæããããšãã§ããŸããã¡ã¿ããã€ã¹äžã§ã¯ã圢ç¶ãããã°ãCPUïŒãŸãã¯GPUïŒã®RAMã®å¿é ãããå¿ èŠãªããä»»æã®å€§ããªãã³ãœã«ãäœæããããšãã§ããŸãã
ããšãã°ã次ã®ã³ãŒãã¯Colabã§ã¯ã©ãã·ã¥ããŸãïŒ
import torch
large_tensor = torch.randn(100000, 100000)
ãã®å€§ããªãã³ãœã«ã¯4 * 10**10
ãã€ãïŒããã©ã«ãã®ç²ŸåºŠã¯FP32ãªã®ã§ããã³ãœã«ã®åèŠçŽ ã¯4ãã€ããå æããŸãïŒã€ãŸã40GBã®RAMãå¿
èŠã§ããäžæ¹ãã¡ã¿ããã€ã¹äžã§ã¯åé¡ãããŸããïŒ
import torch
large_tensor = torch.randn(100000, 100000, device="meta")
ãã®ãã³ãœã«ã衚瀺ããããšãããšãPyTorchã¯æ¬¡ã®ããã«è¡šç€ºããŸãïŒ
tensor(..., device='meta', size=(100000, 100000))
åè¿°ã®éãããã®ãã³ãœã«ã«ã¯ããŒã¿ã¯é¢é£ä»ããããŠãããã圢ç¶ã®ã¿ãååšããŸãã
ã¡ã¿ããã€ã¹äžã§ã¢ãã«ãçŽæ¥ã€ã³ã¹ã¿ã³ã¹åããããšãã§ããŸãïŒ
large_model = torch.nn.Linear(100000, 100000, device="meta")
ãã ããæ¢åã®ã¢ãã«ã§ã¯ããã®æ§æã«ãããåãµãã¢ãžã¥ãŒã«ãdevice
ããŒã¯ãŒãåŒæ°ãåãå
¥ããŠæž¡ãããã«ã¢ããªã³ã°ã³ãŒãå
šäœãæžãçŽãå¿
èŠããããŸããTransformersã©ã€ãã©ãªã®150ã®ã¢ãã«ã«ã¯ãããå®çšçã§ã¯ãªãã£ãããã空ã®ã¢ãã«ãèªåçã«çæããããã®ã³ã³ããã¹ããããŒãžã£ãéçºããŸããã
以äžã¯ãBLOOMã®ç©ºã®ããŒãžã§ã³ãã€ã³ã¹ã¿ã³ã¹åããæ¹æ³ã§ã:
from accelerate import init_empty_weights
from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("bigscience/bloom")
with init_empty_weights():
model = AutoModelForCausalLM.from_config(config)
ããã¯ã©ã®ã¢ãã«ã§ãæ©èœããŸãããçŽæ¥äœ¿çšã§ããªãã·ã§ã«ãè¿ãããŸããäžéšã®æäœã¯ã¡ã¿ããã€ã¹ã§å®è£
ãããŠããŸããããã¹ãŠã®æäœã¯ãŸã å®è£
ãããŠããŸãããããšãã°ãäžèšã§å®çŸ©ããlarge_model
ã¯å
¥åãšå
±ã«äœ¿çšã§ããŸãããBLOOMã¢ãã«ã¯äœ¿çšã§ããŸããã䜿çšããŠããåºåã¯ã¡ã¿ããã€ã¹ã®ãã³ãœã«ãšãªããçµæã®åœ¢ç¶ã¯ååŸã§ããŸããããã以äžã®æ
å ±ã¯åŸãããŸããã
ãããªãäœæ¥ãšããŠãPyTorchããŒã ã¯æ°ããFakeTensor
ã¯ã©ã¹äžã§äœæ¥ããŠããŸããããã¯ãã¡ã¿ããã€ã¹äžã®ãã³ãœã«ã®ãããªãã®ã§ãããããã€ã¹æ
å ±ïŒåœ¢ç¶ãšdtypeã«å ããŠïŒãæã£ãŠããŸãã
åéã¿ã®åœ¢ç¶ãç¥ã£ãŠãããããäºååŠç¿æžã¿ã®ãã³ãœã«ãå®å šã«ããŒãããå Žåã«ããããã©ãã ãã®ã¡ã¢ãªãæ¶è²»ããããç¥ãããšãã§ããŸãããã®ãããã¢ãã«ãCPUãšGPUã«åå²ããæ¹æ³ã«ã€ããŠã®æ±ºå®ãäžãããšãã§ããŸãã
ããã€ã¹ãããã®èšç®
äºååŠç¿æžã¿ã®éã¿ãããŒãããåã«ãããããé 眮ããå Žæãç¥ãå¿ èŠããããŸããããã«ãããéã¿ãæ£ããå Žæã«é 眮ãããã³ã«CPUã®RAMã解æŸããããšãã§ããŸããããã¯ãã¡ã¢ãªå ã§ã©ãã ãã®ã¹ããŒã¹ãå æããããèšç®ããããã空ã®ã¢ãã«äžã®ã¡ã¿ããã€ã¹ã§å®è¡ã§ããŸãã
Accelerateã¯ã空ã®ã¢ãã«ããèªåçã«ããã€ã¹ãããã決å®ããããã®é¢æ°ãæäŸããŠããŸããããã«ãããå©çšå¯èœãªãã¹ãŠã®GPUã®äœ¿çšãæ倧åãã次ã«CPUã®RAMã䜿çšããæåŸã«ãã£ã¹ã¯ãªãããŒãã«é©åããªãéã¿ã瀺ããŸããOPT-13bã䜿çšããŠè©³çŽ°ãèŠãŠã¿ãŸãããã
from accelerate import infer_auto_device_map, init_empty_weights
from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("facebook/opt-13b")
with init_empty_weights():
model = AutoModelForCausalLM.from_config(config)
device_map = infer_auto_device_map(model)
ããã«ãããã¢ãžã¥ãŒã«ãŸãã¯éã¿ãããã€ã¹ã«ãããã³ã°ããèŸæžãè¿ãããŸããããšãã°ãTitan RTXã1ã€æèŒããããã·ã³ã§ã¯ã次ã®ããã«ãªããŸã:
{'model.decoder.embed_tokens': 0,
'model.decoder.embed_positions': 0,
'model.decoder.final_layer_norm': 0,
'model.decoder.layers.0': 0,
'model.decoder.layers.1': 0,
...
'model.decoder.layers.9': 0,
'model.decoder.layers.10.self_attn': 0,
'model.decoder.layers.10.activation_fn': 0,
'model.decoder.layers.10.self_attn_layer_norm': 0,
'model.decoder.layers.10.fc1': 'cpu',
'model.decoder.layers.10.fc2': 'cpu',
'model.decoder.layers.10.final_layer_norm': 'cpu',
'model.decoder.layers.11': 'cpu',
...
'model.decoder.layers.17': 'cpu',
'model.decoder.layers.18.self_attn': 'cpu',
'model.decoder.layers.18.activation_fn': 'cpu',
'model.decoder.layers.18.self_attn_layer_norm': 'cpu',
'model.decoder.layers.18.fc1': 'disk',
'model.decoder.layers.18.fc2': 'disk',
'model.decoder.layers.18.final_layer_norm': 'disk',
'model.decoder.layers.19': 'disk',
...
'model.decoder.layers.39': 'disk',
'lm_head': 'disk'}
Accelerateã¯ãåã蟌ã¿ããã³ãã³ãŒããŒã®9çªç®ã®ãããã¯ãŸã§ããã¹ãŠGPUïŒããã€ã¹0ïŒã«åãŸããšè©äŸ¡ãã10çªç®ã®ãããã¯ã®äžéšã¯CPUã«é 眮ããå¿ èŠããããŸãããŸãã17çªç®ã®ã¬ã€ã€ãŒãŸã§ã®éã¿ãCPUã«é 眮ããå¿ èŠããããŸãã次ã«ã18çªç®ã®ã¬ã€ã€ãŒã¯CPUãšãã£ã¹ã¯ã®äž¡æ¹ã«åå²ããããã®åŸã®ã¬ã€ã€ãŒã¯ãã¹ãŠãã£ã¹ã¯ã«ãªãããŒãããå¿ èŠããããŸãã
ãã ãããã®ããã€ã¹ããããåŸã§äœ¿çšãããšãšã©ãŒãçºçããŸãããªããªãããã®ã¢ãã«ãæ§æããã¬ã€ã€ãŒã«ã¯æ®å·®æ¥ç¶ïŒãããã¯ã®å
¥åããããã¯ã®åºåã«è¿œå ãããïŒããããããç¹å®ã®ã¬ã€ã€ãŒã®ãã¹ãŠã®èŠçŽ ã¯åãããã€ã¹äžã«ããå¿
èŠãããããã§ãããããAccelerateã«äŒããããã«ãno_split_module_classes
ããŒã¯ãŒãåŒæ°ã§åå²ããªãã¢ãžã¥ãŒã«ã®ãªã¹ããæž¡ãããšãã§ããŸã:
device_map = infer_auto_device_map(model, no_split_module_classes=["OPTDecoderLayer"])
ããã«ããã次ã®çµæãè¿ãããŸãã
'model.decoder.embed_tokens': 0,
'model.decoder.embed_positions': 0,
'model.decoder.final_layer_norm': 0,
'model.decoder.layers.0': 0,
'model.decoder.layers.1': 0,
...
'model.decoder.layers.9': 0,
'model.decoder.layers.10': 'cpu',
'model.decoder.layers.11': 'cpu',
...
'model.decoder.layers.17': 'cpu',
'model.decoder.layers.18': 'disk',
...
'model.decoder.layers.39': 'disk',
'lm_head': 'disk'}
åã¬ã€ã€ãŒã¯åžžã«åãããã€ã¹äžã«ãããŸãã
Transformersã§ã¯ãfrom_pretrained()
ã¡ãœãããŸãã¯pipeline
ã§device_map
ã䜿çšããå Žåãåãããã€ã¹ã«æ®ããããã¯ã®ã¯ã©ã¹ã¯èªåçã«æäŸããããããå¿é
ããå¿
èŠã¯ãããŸããã device_map
ã«ã¯æ¬¡ã®ãªãã·ã§ã³ããããŸãïŒè€æ°ã®GPUãããå Žåã«ã®ã¿é¢é£ããŸãïŒïŒ
"auto"
ãŸãã¯"balanced"
ïŒAccelerateã¯éã¿ãåçã«åå²ããŠåGPUãåçã«äœ¿çšããŸãã"balanced_low_0"
ïŒAccelerateã¯éã¿ãåçã«åå²ããæåã®GPUã«ã¯å¯èœãªéãå°ãªãéã¿ãå«ãŸããããã«ããŸãïŒgenerate
é¢æ°ã䜿çšããŠã¢ãã«ã®åºåã§äœæ¥ããå Žåãªã©ã«äŸ¿å©ã§ãïŒã"sequential"
ïŒAccelerateã¯GPUãé çªã«åããŸãïŒæåŸã®GPUã¯äœ¿çšãããªãå ŽåããããŸãïŒã
ãŸããèŸæžåœ¢åŒã®device_map
ãèªåã§æž¡ãããšãã§ããŸãïŒã¬ã€ã€ãŒ/ã¢ãžã¥ãŒã«åããããã€ã¹ãžã®ãããã³ã°ïŒã
æåŸã«ãåãåãdevice_map
ã®çµæã¯ãéžæããdtype
ã«äŸåããããšã«æ³šæããŠãã ããïŒç°ãªãæµ®åå°æ°ç¹æ°ã®åã¯ç°ãªãã¹ããŒã¹ã䜿çšããŸãïŒã dtype="float16"
ãæå®ãããšãç°ãªãçµæãåŸãããŸãïŒ
device_map = infer_auto_device_map(model, no_split_module_classes=["OPTDecoderLayer"], dtype="float16")
ãã®ç²ŸåºŠã§ã¯ãã¢ãã«ãã¬ã€ã€ãŒ21ãŸã§GPUã«åããããšãã§ããŸãïŒ
{'model.decoder.embed_tokens': 0,
'model.decoder.embed_positions': 0,
'model.decoder.final_layer_norm': 0,
'model.decoder.layers.0': 0,
'model.decoder.layers.1': 0,
...
'model.decoder.layers.21': 0,
'model.decoder.layers.22': 'cpu',
...
'model.decoder.layers.37': 'cpu',
'model.decoder.layers.38': 'disk',
'model.decoder.layers.39': 'disk',
'lm_head': 'disk'}
åéã¿ãã©ãã«é 眮ãããã¹ãããç¥ã£ãã®ã§ãã¢ãã«å ã«äºååŠç¿æžã¿ã®éã¿ãé次çã«èªã¿èŸŒãããšãã§ããŸãã
ã·ã£ãŒãã£ã³ã°ãããç¶æ èŸæž
åŸæ¥ãPyTorchã¢ãã«ã¯ããã©ã¡ãŒã¿åããéã¿ãžã®ããããå«ã1ã€ã®ãã¡ã€ã«ã«ä¿åãããŸãããã®ãããã¯éåžžstate_dict
ãšåŒã°ããŸãã以äžã¯ãPyTorchã®ä¿åãšèªã¿èŸŒã¿ã«é¢ããããã¥ã¡ã³ãã®æç²ã§ãïŒ
# ã¢ãã«ã®éã¿ãä¿åãã
torch.save(my_model.state_dict(), 'model_weights.pth')
# ããããå床ããŒããã
new_model = ModelClass()
new_model.load_state_dict(torch.load('model_weights.pth'))
ããã¯ã10åãã©ã¡ãŒã¿ä»¥äžã®ã¢ãã«ã«ã¯éåžžã«é©ããŠããŸããããã倧ããªã¢ãã«ã§ã¯ãããã¯RAMã«éåžžã«è² è·ãããããŸããBLOOMã¢ãã«ã«ã¯1760åã®ãã©ã¡ãŒã¿ããããŸããã¹ããŒã¹ãç¯çŽããããã«bfloat16ã§éã¿ãä¿åãããŠããŠããå šäœãšããŠ352GBãè¡šããŸãããã®ã¢ãã«ãèšç·Žããã¹ãŒããŒã³ã³ãã¥ãŒã¿ã¯ããã®éã®ã¡ã¢ãªãå©çšã§ãããããããŸããããæšè«ã«ãããå¿ èŠãšããããšã¯çŸå®çã§ã¯ãããŸããã
ããããHugging Face Hubã®å€§èŠæš¡ã¢ãã«ã1ã€ã®å€§ããªãã¡ã€ã«ã§ä¿åããããè€æ°ã®ãã¡ã€ã«ã§å
±æãããçç±ã§ããããšãã°ãBLOOMã¢ãã«ã®ããŒãžã«ç§»åãããšãpytorch_model_xxxxx-of-00072.bin
ãšãã72åã®ãã¡ã€ã«ãããããšãããããŸããåãã¡ã€ã«ã«ã¯ã¢ãã«ã®äžéšã®éã¿ãå«ãŸããŠããŸãããã®åœ¢åŒã䜿çšãããšãã¡ã¢ãªã«1ã€ã®ã·ã£ãŒãïŒéšåïŒã®ç¶æ
èŸæžãèªã¿èŸŒã¿ãéã¿ãã¢ãã«ã«å
¥ããŠãé©åãªããã€ã¹ã«ç§»åããŠãããã®ç¶æ
èŸæžã®äžéšãç Žæ£ãã次ã®ã·ã£ãŒãã«é²ãããšãã§ããŸããã¢ãã«å
šäœãå容ããããã«ååãªRAMãå¿
èŠãšãããã®ã§ã¯ãªããæ倧ã®ãã§ãã¯ãã€ã³ãããŒããååŸããããã«ååãªRAMãå¿
èŠã§ãããããã·ã£ãŒããšåŒã³ãŸããããšãã°ãBLOOMã®å Žåã¯7.19GBã§ãã
ç§ãã¡ã¯ãBLOOMã®åå²ããããã§ãã¯ãã€ã³ããšåŒã°ããè€æ°ã®ãã¡ã€ã«ã«ä¿åããããã§ãã¯ãã€ã³ããåŒã³ãŸãããããŠããããã®åœ¢åŒã次ã®ããã«æšæºåããŠããŸãïŒ
- 1ã€ã®ãã¡ã€ã«ïŒ
pytorch_model.bin.index.json
ãšåŒã°ããïŒã«ã¯ãã¡ã¿ããŒã¿ãšãã©ã¡ãŒã¿åãããã¡ã€ã«åãžã®ããããå«ãŸããŠãããåéã¿ãã©ãã«ãããã瀺ããŠããŸãã - ä»ã®ãã¹ãŠã®ãã¡ã€ã«ã¯ãæšæºçãªPyTorchã®ã¹ããŒãèŸæžã§ãããå šäœã§ã¯ãªãã¢ãã«ã®äžéšãå«ãã§ããŸããã€ã³ããã¯ã¹ãã¡ã€ã«ã®å 容ã¯ããã¡ãã§ç¢ºèªã§ããŸãã
ã¢ãã«ã«ãã®ãããªåå²ããããã§ãã¯ãã€ã³ããããŒãããã«ã¯ãããŸããŸãªã·ã£ãŒããã«ãŒãã§åŠçããã ãã§ããAccelerateã¯ãHubã®ãªããžããªãã¯ããŒã³ããŠããå Žåã«ãããè¡ãããã®load_checkpoint_in_model
ãšããé¢æ°ãæäŸããŠããŸãããŸãã¯ãTransformersã®from_pretrained
ã¡ãœãããçŽæ¥äœ¿çšããããšãã§ããŸãããã®ã¡ãœããã¯ãããŠã³ããŒããšãã£ãã·ã¥ãåŠçããŸãïŒ
import torch
from transformers import AutoModelForCausalLM
# ãšã©ãŒãçºçããŸã
checkpoint = "facebook/opt-13b"
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.float16)
èªåçã«èšç®ãããããã€ã¹ãããã«ãããGPUãšCPUã®RAMãååã§ãªãããã«äžéšã®éã¿ããã£ã¹ã¯ã«ãªãããŒãããå¿ èŠãããå Žåã以äžã®ãšã©ãŒã衚瀺ãããŸãïŒ
ValueError: The current `device_map` had weights offloaded to the disk. Please provide an
`offload_folder` for them.
ãã®ãšã©ãŒã解決ããããã«ã次ã®åŒæ°ãè¿œå ããŠãã ããïŒ
import torch
from transformers import AutoModelForCausalLM
# Colabäžã§RAMãäžè¶³ããŸã
checkpoint = "facebook/opt-13b"
model = AutoModelForCausalLM.from_pretrained(
checkpoint, device_map="auto", offload_folder="offload", torch_dtype=torch.float16
)
泚æããŠãã ãããCPUã®ãªãããŒãã«å ããŠãã£ã¹ã¯ã®ãªãããŒããå¿
èŠãªéåžžã«å€§ããªã¢ãã«ãããŒãããããšããŠããå Žåããã§ãã¯ãã€ã³ãã®æåŸã®ã·ã£ãŒããããŒããããéã«RAMäžè¶³ã«ãªãå¯èœæ§ããããŸãããªããªããCPUäžã«æ®ã£ãŠããã¢ãã«ã®äžéšãã¹ããŒã¹ãåãããã§ãããã®å Žåã¯ããªãã·ã§ã³ã®offload_state_dict=True
ã䜿çšããŠãéã¿ããã¹ãŠåŠçãããåŸã«CPUäžã«ããã¢ãã«ã®äžéšãäžæçã«ãªãããŒãããRAMã«å床ããŒãããŠãã ããã
import torch
from transformers import AutoModelForCausalLM
checkpoint = "facebook/opt-13b"
model = AutoModelForCausalLM.from_pretrained(
checkpoint, device_map="auto", offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16
)
ããã§Colabã«åãŸãããã«ãªããŸãããäºæž¬ãçæããããšãããšã䜿çšå¯èœãªãã¹ãŠã®RAMãã»ãŒäœ¿ãåã£ãŠããŸããRAMäžè¶³ã«ãªããŸãã䜿çšå¯èœãªã¢ãã«ãåŸãããã«ã¯ãããã«1ã€ã®ã¬ã€ã€ãŒããã£ã¹ã¯äžã«ãªãããŒãããå¿
èŠããããŸããåã®ã»ã¯ã·ã§ã³ã§èšç®ãããdevice_map
ãå°ãæ¹å€ããŠãfrom_pretrained
åŒã³åºãã«æž¡ãããšã§ããããè¡ãããšãã§ããŸãïŒ
import torch
from transformers import AutoModelForCausalLM
checkpoint = "facebook/opt-13b"
device_map["model.decoder.layers.37"] = "disk"
model = AutoModelForCausalLM.from_pretrained(
checkpoint, device_map=device_map, offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16
)
è€æ°ã®ããã€ã¹ã§åå²ãããã¢ãã«ã®å®è¡
æåŸã«è§ŠããŠããªãæåŸã®éšåã¯ãAccelerateãã¢ãã«ãè€æ°ã®GPUãCPU RAMãããã³ãã£ã¹ã¯ãã©ã«ãã«åæ£ããŠå®è¡ããæ¹æ³ã§ããããã¯éåžžã«ã·ã³ãã«ã«ãããã¯ã䜿çšããŠå®çŸãããŠããŸãã
ããã¯ã¯ãåforwardåŒã³åºãã®çŽåã«å®è¡ãããé¢æ°ãè¿œå ããPyTorchã®APIã§ã
çŽæ¥äœ¿çšããããšã¯ã§ããŸãããããããšåãèãæ¹ãåãå
¥ããŠããŸããã¢ãã«ãããŒãããããšãdispatch_model
é¢æ°ã¯ãåã¢ãžã¥ãŒã«ãšãµãã¢ãžã¥ãŒã«ã«ããã¯ãè¿œå ããåforwardãã¹ã®ååŸã«å®è¡ãããŸãããããã®ããã¯ã¯ä»¥äžã®åŠçãè¡ããŸãïŒ
- ã¢ãžã¥ãŒã«ã®ãã¹ãŠã®å ¥åãéã¿ãšåãããã€ã¹ã«ããããšã確èªããŸãã
- éã¿ãCPUã«ãªãããŒããããŠããå Žåãforwardãã¹ã®åã«ããããGPU 0ã«ç§»åãããã®åŸããã«CPUã«æ»ããŸãã
- éã¿ããã£ã¹ã¯ã«ãªãããŒããããŠããå ŽåãRAMã«ããŒãããŠããforwardãã¹ã®åã«GPU 0ã«ç§»åãããã®ã¡ã¢ãªã解æŸããŸãã
以äžã®ãããªã§ãå šäœã®ããã»ã¹ãèŠçŽããŠããŸãïŒ
ãã®æ¹æ³ã§ã¯ãGPU RAMãšCPU RAMãååã§ãªããŠãã¢ãã«ãèªã¿èŸŒãã§å®è¡ããããšãã§ããŸããå¿ èŠãªã®ã¯ãã£ã¹ã¯å®¹éïŒãããŠããããã®å¿èåïŒïŒã ãã§ãããã®è§£æ±ºçã¯ãè€æ°ã®GPUãæã£ãŠããå Žåã«ã¯ããªãåçŽã§ãïŒã¯ã¬ããŒãªãã€ãã©ã€ã³äžŠååŠçã¯ãªããåçŽã«GPUãé 次䜿çšããã ãã§ãïŒãããã§ããBLOOMã«ã¯ããªãè¯ãçµæããããããããå°èŠæš¡ãªã»ããã¢ããã§ãã¢ãã«ãå®è¡ããããšãã§ããŸãïŒãã ããããé ããªããŸãïŒã
倧èŠæš¡ã¢ãã«ã®æšè«ã®é«éåã«ã€ããŠè©³ããã¯ãããã¥ã¡ã³ããåç §ããŠãã ããã
We will continue to update VoAGI; if you have any questions or suggestions, please contact us!
Was this article helpful?
93 out of 132 found this helpful
Related articles
- 𧚠JAX / Flax ã§ã®å®å®ããæ¡æ£ïŒ
- ãã®ã³ã°ãã§ã€ã¹æšè«ãšã³ããã€ã³ãã®å§ãæ¹
- MTEB 倧èŠæš¡ããã¹ãåã蟌ã¿ãã³ãããŒã¯
- PyTorch DDPããAccelerateãžããããŠTrainerãžç°¡åã«åæ£ãã¬ãŒãã³ã°ããã¹ã¿ãŒããŸããã
- ð€ Optimum IntelãšOpenVINOã§ã¢ãã«ãé«éåããŸããã
- ãã«ããªã³ã¬ã«ASRã®ããã®Whisperã®èª¿æŽãè¡ããŸã with ð€ Transformers
- Diffusersã䜿çšããDreamboothã«ããå®å®ããæ¡æ£ã®ãã¬ãŒãã³ã°