Description
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ sudo -H ./build.sh
[a bunch of output here]
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ IMAGE_NAME=intel-extension-for-pytorch:gpu
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ VIDEO=$(getent group video | sed -E 's,^video:[^:]*:([^:]*):.*$,\1,')
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,')
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ test -z "$RENDER" || RENDER_GROUP="--group-add ${RENDER}"
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ sudo -H docker run --rm -v /home/tedliosu/intel_pytorch_workspace:/workspace --group-add ${VIDEO} ${RENDER_GROUP} --device=/dev/dri --ipc=host -it $IMAGE_NAME bash
[sudo] password for tedliosu:
groups: cannot find name for group ID 109
root@8e852a62c8b4:/# cd workspace/
root@d4958d53cb7c:/workspace# python3 -m trace -t ipex_f32_example.py 2>&1 | tee ipex_f32_example_py_trace.txt | grep ipex_f32_example
--- modulename: ipex_f32_example, funcname: <module>
ipex_f32_example.py(1): import torch
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(2): import torchvision
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(4): import intel_extension_for_pytorch as ipex
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(7): LR = 0.001
ipex_f32_example.py(8): DOWNLOAD = True
ipex_f32_example.py(9): DATA = 'datasets/cifar10/'
ipex_f32_example.py(11): transform = torchvision.transforms.Compose([
ipex_f32_example.py(12): torchvision.transforms.Resize((224, 224)),
ipex_f32_example.py(13): torchvision.transforms.ToTensor(),
ipex_f32_example.py(14): torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
ipex_f32_example.py(11): transform = torchvision.transforms.Compose([
ipex_f32_example.py(16): train_dataset = torchvision.datasets.CIFAR10(
ipex_f32_example.py(17): root=DATA,
ipex_f32_example.py(18): train=True,
ipex_f32_example.py(19): transform=transform,
ipex_f32_example.py(20): download=DOWNLOAD,
ipex_f32_example.py(16): train_dataset = torchvision.datasets.CIFAR10(
ipex_f32_example.py(22): train_loader = torch.utils.data.DataLoader(
ipex_f32_example.py(23): dataset=train_dataset,
ipex_f32_example.py(24): batch_size=128
ipex_f32_example.py(22): train_loader = torch.utils.data.DataLoader(
ipex_f32_example.py(27): model = torchvision.models.resnet50()
ipex_f32_example.py(28): criterion = torch.nn.CrossEntropyLoss().to("xpu")
ipex_f32_example.py(29): optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
ipex_f32_example.py(30): model.train()
ipex_f32_example.py(32): model = model.to("xpu")
ipex_f32_example.py(33): model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)
ipex_f32_example.py(36): for batch_idx, (data, target) in enumerate(train_loader):
ipex_f32_example.py(37): print("Begin 1 loop iteration")
ipex_f32_example.py(39): data = data.to("xpu")
ipex_f32_example.py(40): print("Moved data onto XPU")
ipex_f32_example.py(41): target = target.to("xpu")
ipex_f32_example.py(42): print("Moved target onto XPU")
ipex_f32_example.py(44): optimizer.zero_grad()
ipex_f32_example.py(45): print("About to apply model to data")
ipex_f32_example.py(46): output = model(data)
ipex_f32_example.py(47): print("Finished applying model to data")
ipex_f32_example.py(48): loss = criterion(output, target)
ipex_f32_example.py(49): print("About to execute loss.backward()")
ipex_f32_example.py(50): loss.backward()
ipex_f32_example.py(51): print("About to execute optimizer.step()")
ipex_f32_example.py(52): optimizer.step()
ipex_f32_example.py(53): print("Current batch id : %d" % (batch_idx))
ipex_f32_example.py(54): data = None
ipex_f32_example.py(55): target = None
ipex_f32_example.py(36): for batch_idx, (data, target) in enumerate(train_loader):
[I killed the process after ***90 minutes*** of being stuck here]
root@d4958d53cb7c:/workspace# tail -n35 ipex_f32_example_py_trace.txt
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
collate.py(81): if not all(len(elem) == elem_size for elem in it):
--- modulename: collate, funcname: <genexpr>
root@d4958d53cb7c:/workspace# pip list
Package Version
--------------------------- -------------------
contourpy 1.0.6
cycler 0.11.0
fonttools 4.38.0
intel-extension-for-pytorch 1.10.200+gpu
kiwisolver 1.4.4
matplotlib 3.6.1
numpy 1.23.4
packaging 21.3
Pillow 9.3.0
pip 20.0.2
pyparsing 3.0.9
python-dateutil 2.8.2
setuptools 45.2.0
six 1.16.0
torch 1.10.0a0+git3d5f2d4
torchvision 0.11.3
typing-extensions 4.4.0
wheel 0.34.2
Contents of ipex_f32_example.py
(as you can see it's basically the Float32 example from here):
import torch
import torchvision
############# code changes ###############
import intel_extension_for_pytorch as ipex
############# code changes ###############
LR = 0.001
DOWNLOAD = True
DATA = 'datasets/cifar10/'
transform = torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = torchvision.datasets.CIFAR10(
root=DATA,
train=True,
transform=transform,
download=DOWNLOAD,
)
train_loader = torch.utils.data.DataLoader(
dataset=train_dataset,
batch_size=128
)
model = torchvision.models.resnet50()
criterion = torch.nn.CrossEntropyLoss().to("xpu")
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
#################################### code changes ################################
model = model.to("xpu")
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)
#################################### code changes ################################
for batch_idx, (data, target) in enumerate(train_loader):
print("Begin 1 loop iteration")
########## code changes ##########
data = data.to("xpu")
print("Moved data onto XPU")
target = target.to("xpu")
print("Moved target onto XPU")
########## code changes ##########
optimizer.zero_grad()
print("About to apply model to data")
output = model(data)
print("Finished applying model to data")
loss = criterion(output, target)
print("About to execute loss.backward()")
loss.backward()
print("About to execute optimizer.step()")
optimizer.step()
print("Current batch id : %d" % (batch_idx))
data = None
target = None
torch.save({
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
}, 'checkpoint.pth')
As you can see in the command line output I noted that the ipex_f32_example.py
script basically froze for 90 minutes when I was running it after it reached the for batch_idx, (data, target) in enumerate(train_loader):
line; when I was running it without the tracing it froze at data = data.to("xpu")
for over 8 hours before I had to simply kill the process. I have no idea if this is a driver issue or a torchvision issue or whatever, but this is really annoying and I'd be more than happy to provide extra info about my system to help solve this freezing problem. Also note that tail -n35 ipex_f32_example_py_trace.txt
displays the last 35 lines of the trace I ran on the script to see exactly where the execution of the script is freezing.
P.S. since I already mentioned this issue in here before I made this separate issue, I saw the reply here to my initial comment about this issue but I have no idea how to apply that person's comment to help solve this issue 😕