Skip to content

Commit 5d07b1b

Browse files
authored
updated installation guide and blogs_publications (#773)
add wheel links fine tune docs fine tune fine tune fine tune fine tune
1 parent 627f1ef commit 5d07b1b

File tree

4 files changed

+91
-15
lines changed

4 files changed

+91
-15
lines changed

docs/tutorials/blogs_publications.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
Blogs & Publications
22
====================
33

4+
* [Accelerating PyTorch with Intel® Extension for PyTorch\*](https://medium.com/pytorch/accelerating-pytorch-with-intel-extension-for-pytorch-3aef51ea3722)
45
* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
56
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
67
* *Note*: APIs mentioned in it are deprecated.

docs/tutorials/examples.md

Lines changed: 67 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,20 @@ Examples
77

88
#### Code Changes Highlight
99

10+
There are only a few lines code change required to use Intel® Extension for PyTorch\* on training.
11+
12+
Recommended code changes involve:
13+
1. `torch.channels_last` is recommended to be applied to both of the model object and data to raise CPU resource usage efficiency.
14+
2. `ipex.optimize` function applies optimizations against the model object, as well as an optimizer object.
15+
16+
1017
```
1118
...
1219
import torch
1320
import intel_extension_for_pytorch as ipex
1421
...
1522
model = Model()
23+
model = model.to(memory_format=torch.channels_last)
1624
criterion = ...
1725
optimizer = ...
1826
model.train()
@@ -56,6 +64,7 @@ train_loader = torch.utils.data.DataLoader(
5664
)
5765
5866
model = torchvision.models.resnet50()
67+
model = model.to(memory_format=torch.channels_last)
5968
criterion = torch.nn.CrossEntropyLoss()
6069
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
6170
model.train()
@@ -104,6 +113,7 @@ train_loader = torch.utils.data.DataLoader(
104113
)
105114
106115
model = torchvision.models.resnet50()
116+
model = model.to(memory_format=torch.channels_last)
107117
criterion = torch.nn.CrossEntropyLoss()
108118
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
109119
model.train()
@@ -116,7 +126,7 @@ for batch_idx, (data, target) in enumerate(train_loader):
116126
data = data.to(memory_format=torch.channels_last)
117127
output = model(data)
118128
loss = criterion(output, target)
119-
loss.backward()
129+
loss.backward()
120130
optimizer.step()
121131
print(batch_idx)
122132
torch.save({
@@ -193,6 +203,10 @@ torch.save({
193203

194204
## Inference
195205

206+
Channels last is a memory layout format that is more friendly to Intel Architecture. It is recommended for users to utilize this memory layout format for computer vision workloads. It is as simple as invoking `to(memory_format=torch.channels_last)` function against the model object and input data.
207+
208+
Moreover, `optimize` function of Intel® Extension for PyTorch\* applies optimizations to the model, and could bring performance boosts. For both computer vision workloads and NLP workloads, it is recommended to apply the `optimize` function against the model object.
209+
196210
### Float32
197211

198212
#### Imperative Mode
@@ -244,7 +258,7 @@ with torch.no_grad():
244258

245259
#### TorchScript Mode
246260

247-
It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
261+
It is highly recommended for users to take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
248262

249263
##### Resnet50
250264

@@ -301,6 +315,10 @@ with torch.no_grad():
301315

302316
### BFloat16
303317

318+
Similar to running with FP32, the `optimize` function also works for BFloat16 data type. The only difference is setting `dtype` parameter to `torch.bfloat16`.
319+
320+
Auto Mixed Precision (AMP) is recommended to be working with BFloat16 data type.
321+
304322
#### Imperative Mode
305323

306324
##### Resnet50
@@ -352,7 +370,7 @@ with torch.no_grad():
352370

353371
#### TorchScript Mode
354372

355-
It is highly recommended for users to take advantage of Intel® Extension for PyTorch* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
373+
It is highly recommended for users to take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
356374

357375
##### Resnet50
358376

@@ -412,6 +430,18 @@ with torch.no_grad():
412430

413431
#### Calibration
414432

433+
For calibrating a model with INT8 data type, code changes are highlighted in the code snippet below.
434+
435+
Please follow the steps below:
436+
437+
1. Utilize `torch.fx.experimental.optimization.fuse` function to perform op folding for better performance.
438+
2. Import `intel_extension_for_pytorch` as `ipex`.
439+
3. Instantiate a config object with `ipex.quantization.QuantConf` function to save configuration data during calibration.
440+
4. Iterate through calibration dataset under `ipex.quantization.calibrate` scope to perform the calibration.
441+
5. Save the calibration data into a `json` file.
442+
6. Invoke `ipex.quantization.convert` function to apply the calibration configure object to the fp32 model object to get an INT8 model.
443+
7. Save the INT8 model into a `pt` file.
444+
415445
```
416446
import os
417447
import torch
@@ -420,39 +450,50 @@ model = Model()
420450
model.eval()
421451
data = torch.rand(<shape>)
422452
423-
# Applying torch.fx.experimental.optimization.fuse against model performs
453+
# Applying torch.fx.experimental.optimization.fuse against model performs
424454
# conv-batchnorm folding for better performance.
425455
import torch.fx.experimental.optimization as optimization
426456
model = optimization.fuse(model, inplace=True)
427457
428458
#################### code changes ####################
429459
import intel_extension_for_pytorch as ipex
430-
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
431-
######################################################
460+
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
432461
433-
for d in calibration_data_loader():
434-
# conf will be updated with observed statistics during calibrating with the dataset
462+
for d in calibration_data_loader():
463+
# conf will be updated with observed statistics during calibrating with the dataset
435464
with ipex.quantization.calibrate(conf):
436-
model(d)
465+
model(d)
437466
438467
conf.save('int8_conf.json', default_recipe=True)
439468
with torch.no_grad():
440-
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
441-
model.save('quantization_model.pt')
469+
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
470+
######################################################
471+
472+
model.save('quantization_model.pt')
442473
```
443474

444475
#### Deployment
445476

446477
##### Imperative Mode
447478

479+
In imperative mode, the INT8 model conversion is done on-the-fly.
480+
481+
Please follow the steps below:
482+
483+
1. Utilize `torch.fx.experimental.optimization.fuse` function to perform op folding for better performance.
484+
2. Import `intel_extension_for_pytorch` as `ipex`.
485+
3. Load the calibration configuration object from the saved file.
486+
4. Invoke `ipex.quantization.convert` function to apply the calibration configure object to the fp32 model object to get an INT8 model.
487+
5. Run inference.
488+
448489
```
449490
import torch
450491
451492
model = Model()
452493
model.eval()
453494
data = torch.rand(<shape>)
454495
455-
# Applying torch.fx.experimental.optimization.fuse against model performs
496+
# Applying torch.fx.experimental.optimization.fuse against model performs
456497
# conv-batchnorm folding for better performance.
457498
import torch.fx.experimental.optimization as optimization
458499
model = optimization.fuse(model, inplace=True)
@@ -463,15 +504,25 @@ conf = ipex.quantization.QuantConf('int8_conf.json')
463504
######################################################
464505
465506
with torch.no_grad():
466-
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
507+
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
467508
model(data)
468509
```
469510

470511
##### Graph Mode
471512

513+
In graph mode, the INT8 model is loaded from the local file and can be used directly on the inference.
514+
515+
Please follow the steps below:
516+
517+
1. Import `intel_extension_for_pytorch` as `ipex`.
518+
2. Load the INT8 model from the saved file.
519+
3. Run inference.
520+
472521
```
473522
import torch
523+
#################### code changes ####################
474524
import intel_extension_for_pytorch as ipex
525+
######################################################
475526
476527
model = torch.jit.load('quantization_model.pt')
477528
model.eval()
@@ -481,6 +532,8 @@ with torch.no_grad():
481532
model(data)
482533
```
483534

535+
oneDNN provides [oneDNN Graph Compiler](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview4/doc#onednn-graph-compiler) as a prototype feature which could boost performance for selective topologies. No code change is required. Please install [a binary](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html#installation_onednn_graph_compiler) with this feature enabled. We verified this feature with `Bert-large`, `bert-base-cased`, `roberta-base`, `xlm-roberta-base`, `google-electra-base-generator` and `google-electra-base-discriminator`.
536+
484537
## C++
485538

486539
To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch\* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).
@@ -582,4 +635,4 @@ $ ldd example-app
582635
583636
## Model Zoo
584637
585-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
638+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.11-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.11-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.

docs/tutorials/installation.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Prebuilt wheel files availability matrix for Python versions
4343

4444
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
4545
| :--: | :--: | :--: | :--: | :--: | :--: |
46+
| 1.11.200 | | ✔️ | ✔️ | ✔️ | ✔️ |
4647
| 1.11.0 | | ✔️ | ✔️ | ✔️ | ✔️ |
4748
| 1.10.100 | ✔️ | ✔️ | ✔️ | ✔️ | |
4849
| 1.10.0 | ✔️ | ✔️ | ✔️ | ✔️ | |
@@ -63,6 +64,11 @@ Alternatively, you can also install the latest version with the following comman
6364
python -m pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable
6465
```
6566

67+
For pre-built wheel files with [oneDNN Graph Compiler](#installation_onednn_graph_compiler), please use the following command to perform the installation.
68+
```
69+
python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-dev
70+
```
71+
6672
**Note:** For version prior to 1.10.0, please use package name `torch_ipex`, rather than `intel_extension_for_pytorch`.
6773

6874
**Note:** To install a package with a specific version, please run with the following command.
@@ -76,7 +82,7 @@ python -m pip install <package_name>==<version_name> -f https://software.intel.c
7682
```bash
7783
git clone --recursive https://github.com/intel/intel-extension-for-pytorch
7884
cd intel-extension-for-pytorch
79-
git checkout v1.11.0
85+
git checkout v1.11.200
8086

8187
# if you are updating an existing checkout
8288
git submodule sync
@@ -119,6 +125,7 @@ docker pull intel/intel-optimized-pytorch:latest
119125

120126
|Version|Pre-cxx11 ABI|cxx11 ABI|
121127
|--|--|--|
128+
| 1.11.200 | [libintel-ext-pt-1.11.200+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-shared-with-deps-1.11.200%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-1.11.200+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-cxx11-abi-shared-with-deps-1.11.200%2Bcpu.run) |
122129
| 1.11.0 | [libintel-ext-pt-1.11.0+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-1.11.0%2Bcpu.run) | [libintel-ext-pt-cxx11-abi-1.11.0+cpu.run](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/libtorch_zip/libintel-ext-pt-cxx11-abi-1.11.0%2Bcpu.run) |
123130
| 1.10.100 | [libtorch-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/libtorch-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip) | [libtorch-cxx11-abi-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip](http://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/libtorch-cxx11-abi-shared-with-deps-1.10.0%2Bcpu-intel-ext-pt-cpu-1.10.100.zip) |
124131
| 1.10.0 | [intel-ext-pt-cpu-libtorch-shared-with-deps-1.10.0+cpu.zip](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/intel-ext-pt-cpu-libtorch-shared-with-deps-1.10.0%2Bcpu.zip) | [intel-ext-pt-cpu-libtorch-cxx11-abi-shared-with-deps-1.10.0+cpu.zip](https://intel-optimized-pytorch.s3.cn-north-1.amazonaws.com.cn/wheels/v1.10/intel-ext-pt-cpu-libtorch-cxx11-abi-shared-with-deps-1.10.0%2Bcpu.zip) |

docs/tutorials/releases.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,21 @@
11
Releases
22
=============
33

4+
## 1.11.200
5+
6+
### Highlights
7+
8+
- Enable more fused operators to accelerate particular models.
9+
- Fuse `Convolution` and `LeakyReLU` ([#648](https://github.com/intel/intel-extension-for-pytorch/commit/d7603133f37375b3aba7bf744f1095b923ba979e))
10+
- Support [`torch.einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html) and fuse it with `add` ([#684](https://github.com/intel/intel-extension-for-pytorch/commit/b66d6d8d0c743db21e534d13be3ee75951a3771d))
11+
- Fuse `Linear` and `Tanh` ([#685](https://github.com/intel/intel-extension-for-pytorch/commit/f0f2bae96162747ed2a0002b274fe7226a8eb200))
12+
- In addition to the original installation methods, this release provides Docker installation from [DockerHub](https://hub.docker.com/).
13+
- Provided the [evaluation wheel packages](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html#installation_onednn_graph_compiler) that could boost performance for selective topologies on top of oneDNN graph compiler prototype feature.
14+
***NOTE***: This is still at an early development stage and not fully mature yet, but feel free to reach out through GitHub tickets if you have any suggestions.
15+
16+
**[Full Changelog](https://github.com/intel/intel-extension-for-pytorch/compare/v1.11.0...v1.11.200)**
17+
18+
419
## 1.11.0
520

621
We are excited to announce Intel® Extension for PyTorch\* 1.11.0-cpu release by tightly following PyTorch 1.11 release. Along with extension 1.11, we focused on continually improving OOB user experience and performance. Highlights include:

0 commit comments

Comments
 (0)