ð€ Accelerate ã®ã玹ä»
'ð€ Accelerate Introduction'
ð€ ã¢ã¯ã»ã©ã¬ãŒã
ããããçš®é¡ã®ããã€ã¹ã§ãçã® PyTorch ã®ãã¬ãŒãã³ã°ã¹ã¯ãªãããå®è¡ã§ããŸãã
PyTorch ã®äžäœã¬ãã«ã®å€ãã®ã©ã€ãã©ãªã¯ãåæ£ãã¬ãŒãã³ã°ãæ··å粟床ã®ãµããŒããæäŸããŠããŸãããããããå°å ¥ããæœè±¡åã«ããããŠãŒã¶ãŒã¯åºç€ãšãªããã¬ãŒãã³ã°ã«ãŒããã«ã¹ã¿ãã€ãºããããã«æ°ãã API ãåŠã¶å¿ èŠããããŸããð€ ã¢ã¯ã»ã©ã¬ãŒãã¯ããã¬ãŒãã³ã°ã«ãŒããå®å šã«å¶åŸ¡ããã PyTorch ãŠãŒã¶ãŒã®ããã«äœæãããŸããããåæ£ãã¬ãŒãã³ã°ïŒè€æ°ã®ããŒãäžã®ãã«ã GPUãTPU ãªã©ïŒãæ··å粟床ãã¬ãŒãã³ã°ã«å¿ èŠãªéªšæ Œã³ãŒãã®èšè¿°ïŒããã³ä¿å®ïŒãè¡ããããªããŠãŒã¶ãŒã察象ã§ããä»åŸã®èšç»ã«ã¯ãfairscaleãdeepseedãAWS SageMaker ç¹å®ã®ããŒã¿äžŠååŠçãšã¢ãã«äžŠååŠçã®ãµããŒããå«ãŸããŸãã
ããã¯æ¬¡ã®2ã€ã®ããšãæäŸããŸãïŒéªšæ Œã³ãŒããæœè±¡åããã·ã³ãã«ã§äžè²«ãã API ãšãããŸããŸãªã»ããã¢ããã§ãããã®ã¹ã¯ãªãããç°¡åã«å®è¡ããããã®èµ·åã³ãã³ãã§ãã
ç°¡åãªçµ±åïŒ
ãŸãã¯äŸãèŠãŠã¿ãŸãããïŒ
- CPUäžã§BERTæšè«ãã¹ã±ãŒãªã³ã°ã¢ããããïŒããŒã1ïŒ
- Hugging Faceã¢ãã«ãGradio 2.0ã§äœ¿çšããŠæ··åšããã
- Hugging Face Hubã§ã®Sentence Transformers
import torch
import torch.nn.functional as F
from datasets import load_dataset
+ from accelerate import Accelerator
+ accelerator = Accelerator()
- device = 'cpu'
+ device = accelerator.device
model = torch.nn.Transformer().to(device)
optim = torch.optim.Adam(model.parameters())
dataset = load_dataset('my_dataset')
data = torch.utils.data.DataLoader(dataset, shuffle=True)
+ model, optim, data = accelerator.prepare(model, optim, data)
model.train()
for epoch in range(10):
for source, targets in data:
source = source.to(device)
targets = targets.to(device)
optimizer.zero_grad()
output = model(source)
loss = F.cross_entropy(output, targets)
- loss.backward()
+ accelerator.backward(loss)
optimizer.step()
æšæºã® PyTorch ãã¬ãŒãã³ã°ã¹ã¯ãªããã«ããã5è¡ã®ã³ãŒããè¿œå ããã ãã§ããã®ã¹ã¯ãªãããã©ã®ãããªåæ£èšå®ã§ãå®è¡ã§ããããã«ãªããæ··å粟床ã䜿çšãããã©ããã«é¢ä¿ãªãå®è¡ã§ããŸããð€ ã¢ã¯ã»ã©ã¬ãŒãã¯ãããã€ã¹ã®é 眮ãèªåçã«åŠçãããããäžèšã®ãã¬ãŒãã³ã°ã«ãŒããããã«ç°¡çŽ åã§ããŸãïŒ
import torch
import torch.nn.functional as F
from datasets import load_dataset
+ from accelerate import Accelerator
+ accelerator = Accelerator()
- device = 'cpu'
- model = torch.nn.Transformer().to(device)
+ model = torch.nn.Transformer()
optim = torch.optim.Adam(model.parameters())
dataset = load_dataset('my_dataset')
data = torch.utils.data.DataLoader(dataset, shuffle=True)
+ model, optim, data = accelerator.prepare(model, optim, data)
model.train()
for epoch in range(10):
for source, targets in data:
- source = source.to(device)
- targets = targets.to(device)
optimizer.zero_grad()
output = model(source)
loss = F.cross_entropy(output, targets)
- loss.backward()
+ accelerator.backward(loss)
optimizer.step()
å¯Ÿç §çã«ããã®ã³ãŒããåæ£ãã¬ãŒãã³ã°ã§å®è¡ããããã«å¿ èŠãªå€æŽã¯æ¬¡ã®ãšããã§ãïŒ
+ import os
import torch
import torch.nn.functional as F
from datasets import load_dataset
+ from torch.utils.data import DistributedSampler
+ from torch.nn.parallel import DistributedDataParallel
+ local_rank = int(os.environ.get("LOCAL_RANK", -1))
- device = 'cpu'
+ device = device = torch.device("cuda", local_rank)
model = torch.nn.Transformer().to(device)
+ model = DistributedDataParallel(model)
optim = torch.optim.Adam(model.parameters())
dataset = load_dataset('my_dataset')
+ sampler = DistributedSampler(dataset)
- data = torch.utils.data.DataLoader(dataset, shuffle=True)
+ data = torch.utils.data.DataLoader(dataset, sampler=sampler)
model.train()
for epoch in range(10):
+ sampler.set_epoch(epoch)
for source, targets in data:
source = source.to(device)
targets = targets.to(device)
optimizer.zero_grad()
output = model(source)
loss = F.cross_entropy(output, targets)
loss.backward()
optimizer.step()
ãããã®å€æŽã«ããããã¬ãŒãã³ã°ã¹ã¯ãªãããè€æ°ã® GPU ã§åäœãããããšãã§ããŸãããã¹ã¯ãªãã㯠CPU ã 1 ã€ã® GPU ã§ã¯åäœããªããªããŸãïŒãã¹ãŠã®å Žæ㧠if æãè¿œå ããªãéãïŒãããã«è¿·æãªããšã«ãã¹ã¯ãªããã TPU ã§ãã¹ããããå Žåã¯ãç°ãªãè¡ã®ã³ãŒããå€æŽããå¿ èŠããããŸããæ··å粟床ãã¬ãŒãã³ã°ãåæ§ã§ããð€ ã¢ã¯ã»ã©ã¬ãŒãã®çŽæã¯æ¬¡ã®ãšããã§ãïŒ
- ãã¬ãŒãã³ã°ã«ãŒããžã®å€æŽãæå°éã«æããããšã§ãã§ããã ãå°ãªãåŠã¶å¿ èŠããããŸãã
- åãé¢æ°ãã©ã®åæ£èšå®ã§ãåäœããããã1 ã€ã® API ã®ã¿ãåŠã¶å¿ èŠããããŸãã
åäœåçã¯ã©ããªã£ãŠããŸããïŒ
å®éã«ã©ã€ãã©ãªãã©ã®ããã«åäœãããã確èªããããã«ããã¬ãŒãã³ã°ã«ãŒãã«è¿œå ããå¿ èŠã®ããã³ãŒãã®åè¡ãèŠãŠã¿ãŸãããã
accelerator = Accelerator()
ãã®è¡ã¯ã䜿çšããã¡ã€ã³ãªããžã§ã¯ããæäŸããã ãã§ãªããç°å¢ããåæ£ãã¬ãŒãã³ã°ã©ã³ã®ã¿ã€ããåæããå¿
èŠãªåæåãè¡ããŸãããã®initã«cpu=True
ãŸãã¯fp16=True
ãæž¡ãããšã§ãCPUã§ã®ãã¬ãŒãã³ã°ãæ··å粟床ãã¬ãŒãã³ã°ã匷å¶ããããšãã§ããŸãããããã®ãªãã·ã§ã³ã¯ãã¹ã¯ãªããã®ã©ã³ãã£ãŒã䜿çšããŠãèšå®ã§ããŸãã
model, optim, data = accelerator.prepare(model, optim, data)
ããã¯APIã®äž»èŠãªéšåã§ããã3ã€ã®äž»èŠãªãªããžã§ã¯ããæºåããŸããã¢ãã«ïŒtorch.nn.Module
ïŒããªããã£ãã€ã¶ïŒtorch.optim.Optimizer
ïŒãããã³ããŒã¿ããŒããŒïŒtorch.data.dataloader.DataLoader
ïŒã§ãã
ã¢ãã«
ã¢ãã«ã®æºåã«ã¯ãé©åãªã³ã³ããïŒããšãã°DistributedDataParallel
ïŒã§ã©ããããããšãšãé©åãªããã€ã¹ã«é
眮ããããšãå«ãŸããŸããéåžžã®åæ£ãã¬ãŒãã³ã°ãšåæ§ã«ãã¢ãã«ãä¿åããããã«ã¯ã¢ã³ã©ããããå¿
èŠããããŸãããŸããaccelerator.unwrap_model(model)
ã䜿çšããŠç¹å®ã®ã¡ãœããã«ã¢ã¯ã»ã¹ããããšãã§ããŸãã
ãªããã£ãã€ã¶
ãªããã£ãã€ã¶ãç¹æ®ãªã³ã³ããã§ã©ãããããŠãããæ··å粟床ãæ©èœããããã«å¿ èŠãªæäœãã¹ãããã§å®è¡ããŸãããŸããç¶æ èŸæžã®ããã€ã¹é 眮ãé©åã«åŠçããŸãïŒé空ã®å Žåããã§ãã¯ãã€ã³ãããããŒããããå ŽåïŒã
ããŒã¿ããŒããŒ
ããã«ã¯ã»ãšãã©ã®éæ³ãé ãããŠããŸããã³ãŒãã®äŸã§èŠãããã«ããã®ã©ã€ãã©ãªã¯DistributedSampler
ã«äŸåããŸãããå®éã«ã¯ãããŒã¿ããŒããŒã«æž¡ãå¯èœæ§ã®ããä»»æã®ãµã³ãã©ãŒã§åäœããŸãïŒç¬èªã®ã«ã¹ã¿ã ãµã³ãã©ãŒã®åæ£ããŒãžã§ã³ãæžããªããã°ãªããªãã£ãå Žåãããã¯ããå¿
èŠãããŸããïŒãããŒã¿ããŒããŒã¯ãçŸåšã®ããã»ã¹ã«é¢é£ããã€ã³ããã¯ã¹ã®ã¿ããµã³ãã©ãŒããååŸãïŒIterableDataset
ã䜿çšããå Žåã¯ä»ã®ããã»ã¹ã®ããããã¹ãããïŒãããããé©åãªããã€ã¹ã«é
眮ããã³ã³ããã§ã©ãããããŸãã
ãããæ©èœããããã«ãAccelerateã¯ãåæ£ãã¬ãŒãã³ã°äžã«å®è¡ãããåããã»ã¹ã®ä¹±æ°çæåšãåæãããŠãŒãã£ãªãã£é¢æ°ãæäŸããŸããããã©ã«ãã§ã¯ããµã³ãã©ãŒã®generator
ã®ã¿ãåæãããããããŒã¿æ¡åŒµã¯åããã»ã¹ã§ç°ãªããã®ã«ãªããŸãããã©ã³ãã ã·ã£ããã«ã¯åãã«ãªããŸããå¿
èŠã«å¿ããŠãããå€ãã®RNGãåæããããã«ãã®ãŠãŒãã£ãªãã£ã䜿çšããããšãã§ããŸãã
accelerator.backward(loss)
ãã®æåŸã®è¡ã¯ãåŸæ¹ãã¹ã®ããã®å¿ èŠãªã¹ãããïŒäž»ã«æ··å粟床ã§ãããä»ã®çµ±åã§ã¯ããã§ã«ã¹ã¿ã ã®åäœãå¿ èŠã«ãªãå ŽåããããŸãïŒãè¿œå ããŸãã
è©äŸ¡ã¯ã©ãã§ããïŒ
è©äŸ¡ã¯ãã¹ãŠã®ããã»ã¹ã§éåžžå®è¡ããããšãã§ããŸãããã¡ã€ã³ããã»ã¹ã§ã®ã¿å®è¡ãããå Žåã¯ã䟿å©ãªãã¹ãã䜿çšããããšãã§ããŸãïŒ
if accelerator.is_main_process():
# è©äŸ¡ã«ãŒã
ããããAccelerateã䜿çšããŠç°¡åã«åæ£è©äŸ¡ãå®è¡ããããšãã§ããŸãã以äžã«ãè©äŸ¡ã«ãŒãã«è¿œå ããå¿ èŠãããå 容ã瀺ããŸãïŒ
+ eval_dataloader = accelerator.prepare(eval_dataloader)
predictions, labels = [], []
for source, targets in eval_dataloader:
with torch.no_grad():
output = model(source)
- predictions.append(output.cpu().numpy())
- labels.append(targets.cpu().numpy())
+ predictions.append(accelerator.gather(output).cpu().numpy())
+ labels.append(accelerator.gather(targets).cpu().numpy())
predictions = np.concatenate(predictions)
labels = np.concatenate(labels)
+ predictions = predictions[:len(eval_dataloader.dataset)]
+ labels = label[:len(eval_dataloader.dataset)]
metric_compute(predictions, labels)
ãã¬ãŒãã³ã°ãšåæ§ã«ãè©äŸ¡ããŒã¿ããŒããŒãæºåããããã®1è¡ãè¿œå ããå¿
èŠããããŸãããã®åŸãaccelerator.gather
ã䜿çšããŠãäºæž¬ãšã©ãã«ã®ãã³ãœã«ãããã»ã¹éã§åéããã ãã§ããæåŸã®è¡ã¯ãæºåãããè©äŸ¡ããŒã¿ããŒããŒãåããã»ã¹ã§ãããã®ãµã€ãºãåãã«ããããã«ãäºæž¬ãšã©ãã«ãããŒã¿ã»ããã®äŸæ°ã«åãè©°ããŸãã
ãã¹ãŠãæ¯é ãã1ã€ã®ã©ã³ãã£ãŒ
Accelerateã䜿çšããã¹ã¯ãªããã¯ãtorch.distributed.launch
ãªã©ã®åŸæ¥ã®ã©ã³ãã£ãŒãšå®å
šã«äºææ§ããããŸãããããããããã®åŒæ°ããã¹ãŠèŠããããšã¯å°ãé¢åã§ãããŸãã4ã€ã®GPUã§ã€ã³ã¹ã¿ã³ã¹ãã»ããã¢ããããå Žåãã»ãšãã©ã®ãã¬ãŒãã³ã°ã¯ãããããã¹ãŠäœ¿çšããŠå®è¡ãããŸããAccelerateã«ã¯ã2ã€ã®ã¹ãããã§åäœãã䟿å©ãªCLIãä»å±ããŠããŸãïŒ
accelerate config
ããã«ãããã»ããã¢ããã«é¢ããç°¡åãªã¢ã³ã±ãŒãã衚瀺ããããã¬ãŒãã³ã°ã³ãã³ãã®ããã©ã«ãå€ãç·šéã§ããèšå®ãã¡ã€ã«ãäœæãããŸãã次ã«
accelerate launch path_to_script.py --args_to_the_script
ããã«ãããããã©ã«ãã䜿çšããŠãã¬ãŒãã³ã°ã¹ã¯ãªãããèµ·åãããŸããå¿ èŠãªãã¹ãŠã®åŒæ°ããã¬ãŒãã³ã°ã¹ã¯ãªããã«æäŸããå¿ èŠãããã ãã§ãã
ããã«çŽ æŽãããã©ã³ãã£ãŒã«ããããã«ãSageMakerã䜿çšããŠAWSã€ã³ã¹ã¿ã³ã¹ãçæããããšãã§ããŸãã詳现ã¯ããã®ã¬ã€ããåç §ããŠãã ããïŒ
åå æ¹æ³
å§ããã«ã¯ãåã«pip install accelerate
ãå®è¡ããããããå€ãã®ã€ã³ã¹ããŒã«ãªãã·ã§ã³ã«ã€ããŠã¯ããã¥ã¡ã³ããåç
§ããŠãã ããã
Accelerateã¯å®å šãªãªãŒãã³ãœãŒã¹ãããžã§ã¯ãã§ãããGitHubã§å ¥æã§ããŸããããã¥ã¡ã³ããåç §ããããåºæ¬çãªäŸããã£ãšèŠããããŠãã ãããåé¡ããµããŒãããŠã»ããæ©èœãããå Žåã¯ããç¥ãããã ããã質åã¯ãã©ãŒã©ã ããã§ãã¯ããŠãã ããïŒ
ããè€éãªç¶æ³ã®äŸã«ã€ããŠã¯ãå
¬åŒã®Transformersã®äŸãåç
§ããŠãã ãããåãã©ã«ãã«ã¯ãAccelerateã©ã€ãã©ãªã掻çšããrun_task_no_trainer.py
ãå«ãŸããŠããŸãïŒ
We will continue to update VoAGI; if you have any questions or suggestions, please contact us!
Was this article helpful?
93 out of 132 found this helpful
Related articles
- Hugging Face HubãžãããããspaCyãã
- ã¹ã±ãŒã«ã«ããããã©ã³ã¹ãã©ãŒããŒã®æé©åããŒã«ããããOptimumãã玹ä»ããŸã
- Hugging FaceãšGraphcoreãIPUæé©åãããTransformersã®ããã«ææº
- ãã°ãã§ã€ã¹ã§ã®å€
- Gradioã䜿çšããŠãSpacesã§èªåã®ãããžã§ã¯ããã·ã§ãŒã±ãŒã¹ããŸããã
- ãªã¢ãŒãã»ã³ã·ã³ã°ïŒè¡æïŒç»åãšãã£ãã·ã§ã³ã䜿çšããŠCLIPã®åŸ®èª¿æŽ
- 1Bã®ãã¬ãŒãã³ã°ãã¢ã§æåã蟌ã¿ã¢ãã«ããã¬ãŒãã³ã°ãã