You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/blogs_publications.md
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
Blogs & Publications
2
2
====================
3
3
4
+
*[Accelerating PyTorch with Intel® Extension for PyTorch\*](https://medium.com/pytorch/accelerating-pytorch-with-intel-extension-for-pytorch-3aef51ea3722)
4
5
*[Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
5
6
*[Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
model = model.to(memory_format=torch.channels_last)
107
117
criterion = torch.nn.CrossEntropyLoss()
108
118
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
109
119
model.train()
@@ -116,7 +126,7 @@ for batch_idx, (data, target) in enumerate(train_loader):
116
126
data = data.to(memory_format=torch.channels_last)
117
127
output = model(data)
118
128
loss = criterion(output, target)
119
-
loss.backward()
129
+
loss.backward()
120
130
optimizer.step()
121
131
print(batch_idx)
122
132
torch.save({
@@ -193,6 +203,10 @@ torch.save({
193
203
194
204
## Inference
195
205
206
+
Channels last is a memory layout format that is more friendly to Intel Architecture. It is recommended for users to utilize this memory layout format for computer vision workloads. It is as simple as invoking `to(memory_format=torch.channels_last)` function against the model object and input data.
207
+
208
+
Moreover, `optimize` function of Intel® Extension for PyTorch\* applies optimizations to the model, and could bring performance boosts. For both computer vision workloads and NLP workloads, it is recommended to apply the `optimize` function against the model object.
209
+
196
210
### Float32
197
211
198
212
#### Imperative Mode
@@ -244,7 +258,7 @@ with torch.no_grad():
244
258
245
259
#### TorchScript Mode
246
260
247
-
It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
261
+
It is highly recommended for users to take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
248
262
249
263
##### Resnet50
250
264
@@ -301,6 +315,10 @@ with torch.no_grad():
301
315
302
316
### BFloat16
303
317
318
+
Similar to running with FP32, the `optimize` function also works for BFloat16 data type. The only difference is setting `dtype` parameter to `torch.bfloat16`.
319
+
320
+
Auto Mixed Precision (AMP) is recommended to be working with BFloat16 data type.
321
+
304
322
#### Imperative Mode
305
323
306
324
##### Resnet50
@@ -352,7 +370,7 @@ with torch.no_grad():
352
370
353
371
#### TorchScript Mode
354
372
355
-
It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
373
+
It is highly recommended for users to take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
356
374
357
375
##### Resnet50
358
376
@@ -412,6 +430,18 @@ with torch.no_grad():
412
430
413
431
#### Calibration
414
432
433
+
For calibrating a model with INT8 data type, code changes are highlighted in the code snippet below.
434
+
435
+
Please follow the steps below:
436
+
437
+
1. Utilize `torch.fx.experimental.optimization.fuse` function to perform op folding for better performance.
438
+
2. Import `intel_extension_for_pytorch` as `ipex`.
439
+
3. Instantiate a config object with `ipex.quantization.QuantConf` function to save configuration data during calibration.
440
+
4. Iterate through calibration dataset under `ipex.quantization.calibrate` scope to perform the calibration.
441
+
5. Save the calibration data into a `json` file.
442
+
6. Invoke `ipex.quantization.convert` function to apply the calibration configure object to the fp32 model object to get an INT8 model.
443
+
7. Save the INT8 model into a `pt` file.
444
+
415
445
```
416
446
import os
417
447
import torch
@@ -420,39 +450,50 @@ model = Model()
420
450
model.eval()
421
451
data = torch.rand(<shape>)
422
452
423
-
# Applying torch.fx.experimental.optimization.fuse against model performs
453
+
# Applying torch.fx.experimental.optimization.fuse against model performs
424
454
# conv-batchnorm folding for better performance.
425
455
import torch.fx.experimental.optimization as optimization
oneDNN provides [oneDNN Graph Compiler](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview4/doc#onednn-graph-compiler) as a prototype feature which could boost performance for selective topologies. No code change is required. Please install [a binary](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html#installation_onednn_graph_compiler) with this feature enabled. We verified this feature with `Bert-large`, `bert-base-cased`, `roberta-base`, `xlm-roberta-base`, `google-electra-base-generator` and `google-electra-base-discriminator`.
536
+
484
537
## C++
485
538
486
539
To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch\* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).
@@ -582,4 +635,4 @@ $ ldd example-app
582
635
583
636
## Model Zoo
584
637
585
-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo forIntel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running sciptsin the Model Zoo.
638
+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo forIntel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.11-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.11-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running sciptsin the Model Zoo.
For pre-built wheel files with [oneDNN Graph Compiler](#installation_onednn_graph_compiler), please use the following command to perform the installation.
Copy file name to clipboardExpand all lines: docs/tutorials/releases.md
+15Lines changed: 15 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,21 @@
1
1
Releases
2
2
=============
3
3
4
+
## 1.11.200
5
+
6
+
### Highlights
7
+
8
+
- Enable more fused operators to accelerate particular models.
9
+
- Fuse `Convolution` and `LeakyReLU` ([#648](https://github.com/intel/intel-extension-for-pytorch/commit/d7603133f37375b3aba7bf744f1095b923ba979e))
10
+
- Support [`torch.einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html) and fuse it with `add` ([#684](https://github.com/intel/intel-extension-for-pytorch/commit/b66d6d8d0c743db21e534d13be3ee75951a3771d))
11
+
- Fuse `Linear` and `Tanh` ([#685](https://github.com/intel/intel-extension-for-pytorch/commit/f0f2bae96162747ed2a0002b274fe7226a8eb200))
12
+
- In addition to the original installation methods, this release provides Docker installation from [DockerHub](https://hub.docker.com/).
13
+
- Provided the [evaluation wheel packages](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html#installation_onednn_graph_compiler) that could boost performance for selective topologies on top of oneDNN graph compiler prototype feature.
14
+
***NOTE***: This is still at an early development stage and not fully mature yet, but feel free to reach out through GitHub tickets if you have any suggestions.
We are excited to announce Intel® Extension for PyTorch\* 1.11.0-cpu release by tightly following PyTorch 1.11 release. Along with extension 1.11, we focused on continually improving OOB user experience and performance. Highlights include:
0 commit comments