Skip to content

Commit 61769fe

Browse files
committed
Added blog post "docTR joins PyTorch Ecosystem: From Pixels to Data, Building a Recognition Pipeline with PyTorch and docTR"
Signed-off-by: Chris Abraham <[email protected]>
1 parent d60d010 commit 61769fe

File tree

7 files changed

+169
-0
lines changed

7 files changed

+169
-0
lines changed
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
layout: blog_detail
3+
title: "docTR joins PyTorch Ecosystem: From Pixels to Data, Building a Recognition Pipeline with PyTorch and docTR"
4+
author: Olivier Dulcy & Sebastian Olivera, Mindee
5+
---
6+
7+
![docTR logo](/assets/images/doctr-joins-pytorch-ecosystem/fg1.png){:style="width:100%;display: block;max-width:400px; margin-left:auto; margin-right:auto;"}
8+
9+
We’re thrilled to announce that the docTR project has been integrated into the PyTorch ecosystem! This integration ensures that docTR aligns with PyTorch’s standards and practices, giving developers a reliable, community-backed solution for powerful OCR workflows.
10+
11+
**For more information on what it means to be a PyTorch ecosystem project, see the [PyTorch Ecosystem Tools page](https://pytorch.org/ecosystem/).**
12+
13+
14+
## About docTR
15+
16+
docTR is an Apache 2.0 project developed and distributed by [Mindee](https://www.mindee.com/) to help developers integrate OCR capabilities into applications with no prior knowledge required.
17+
18+
To quickly and efficiently extract text information, docTR uses a two-stage approach:
19+
20+
21+
22+
* First, it performs text **detection** to localize words.
23+
* Then, it conducts text **recognition** to identify all characters in a word.
24+
25+
**Detection** and **recognition** are performed by state-of-the-art models written in PyTorch. To learn more about this approach, you can refer [to the docTR documentation](https://mindee.github.io/doctr/using_doctr/using_models.html).
26+
27+
docTR enhances the user experience in PyTorch projects by providing high-performance OCR capabilities right out of the box. Its specially designed models require minimal to no fine-tuning for common use cases, allowing developers to quickly integrate advanced document analysis features.
28+
29+
30+
## Local installation
31+
32+
docTR requires Python >= 3.10 and supports Windows, Mac and Linux. Please refer to our [README](https://github.com/mindee/doctr?tab=readme-ov-file#installation) for necessary dependencies for MacBook with the M1 chip.
33+
34+
```
35+
pip3 install -U pip
36+
pip3 install "python-doctr[torch,viz]"
37+
```
38+
39+
This will install docTR along with the latest version of PyTorch.
40+
41+
42+
```
43+
Note: docTR also provides docker images for an easy deployment, such as a part of Kubernetes cluster.
44+
```
45+
46+
47+
48+
## Text recognition
49+
50+
Now, let’s try docTR’s OCR recognition on this sample:
51+
52+
53+
![OCR sample](/assets/images/doctr-joins-pytorch-ecosystem/fg2.png){:style="width:100%;display: block;max-width:300px; margin-left:auto; margin-right:auto;"}
54+
55+
56+
The OCR recognition model expects an image with only one word on it and will output the predicted word with a confidence score. You can use the following snippet to test OCR capabilities from docTR:
57+
58+
```
59+
python
60+
from doctr.io import DocumentFile
61+
from doctr.models import recognition_predictor
62+
63+
doc = DocumentFile.from_images("/path/to/image")
64+
65+
# Load the OCR model
66+
# This will download pre-trained models hosted by Mindee
67+
model = recognition_predictor(pretrained=True)
68+
69+
result = model(doc)
70+
print(result)
71+
```
72+
73+
Here, the most important line of code is `model = recognition_predictor(pretrained=True)`. This will load a default text recognition model,** **`crnn_vgg16_bn`, but you can select other models through the `arch` parameter. You can check out the [available architectures](https://mindee.github.io/doctr/using_doctr/using_models.html).
74+
75+
When run on the sample, the recognition predictor retrieves the following data: `[('MAGAZINE', 0.9872216582298279)]`
76+
77+
78+
```
79+
Note: using the DocumentFile object docTR provides an easy way to manipulate PDF or Images.
80+
```
81+
82+
83+
84+
## Text detection
85+
86+
The last example was a crop on a single word. Now, what about an image with several words on it, like this one?
87+
88+
89+
![photo of magazines](/assets/images/doctr-joins-pytorch-ecosystem/fg3.jpg){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
90+
91+
92+
A text detection model is used before the text recognition to output a segmentation map representing the location of the text. Following that, the text recognition is applied on every detected patch.
93+
94+
Below is a snippet to run only the detection part:
95+
96+
```
97+
from doctr.io import DocumentFile
98+
from doctr.models import detection_predictor
99+
from matplotlib import pyplot as plt
100+
from doctr.utils.geometry import detach_scores
101+
from doctr.utils.visualization import draw_boxes
102+
103+
doc = DocumentFile.from_images("path/to/my/file")
104+
model = detection_predictor(pretrained=True)
105+
106+
result = model(doc)
107+
108+
draw_boxes(detach_scores([result[0]["words"]])[0][0], doc[0])
109+
plt.axis('off')
110+
plt.show()
111+
```
112+
113+
Running it on the full sample yields the following:
114+
115+
116+
![photo of magazines](/assets/images/doctr-joins-pytorch-ecosystem/fg4.png){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
117+
118+
119+
Similarly to the text recognition, `detection_predictor` will load a default model (`fast_base `here). You can also load another one by providing it through the `arch` parameter.
120+
121+
122+
## The full implementation
123+
124+
Now, let’s plug both components into the same pipeline.
125+
126+
Conveniently, docTR provides a wrapper that does exactly that for us:
127+
128+
```
129+
from doctr.io import DocumentFile
130+
from doctr.models import ocr_predictor
131+
132+
doc = DocumentFile.from_images("/path/to/image")
133+
134+
model = ocr_predictor(pretrained=True, assume_straight_pages=False)
135+
136+
result = model(doc)
137+
result.show()
138+
```
139+
140+
![photo of magazines](/assets/images/doctr-joins-pytorch-ecosystem/fg5.png){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
141+
142+
The last line should display a matplotlib window which shows the detected patches. Hovering the mouse over them will display their contents.
143+
144+
You can also do more with this output, such as reconstituting a synthetic document like so:
145+
146+
```
147+
import matplotlib.pyplot as plt
148+
149+
synthetic_pages = result.synthesize()
150+
plt.imshow(synthetic_pages[0])
151+
plt.axis('off')
152+
plt.show()
153+
```
154+
155+
![black text on white](/assets/images/doctr-joins-pytorch-ecosystem/fg6.png){:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
156+
157+
158+
The pipeline is highly customizable, where you can modify the detection or recognition model behaviors by passing arguments to the `ocr_predictor`. Please refer to the [documentation](https://mindee.github.io/doctr/using_doctr/using_models.html) to learn more about it.
159+
160+
161+
## Conclusion
162+
163+
We’re excited to welcome docTR into the PyTorch Ecosystem, where it seamlessly integrates with PyTorch pipelines to deliver state-of-the-art OCR capabilities right out of the box.
164+
165+
By empowering developers to quickly extract text from images or PDFs using familiar tooling, docTR simplifies complex document analysis tasks and enhances the overall PyTorch experience.
166+
167+
We invite you to explore the [docTR GitHub repository](https://github.com/mindee/doctr), join the [docTR community on Slack](https://slack.mindee.com/), and reach out at [email protected] for inquiries or collaboration opportunities.
168+
169+
Together, we can continue to push the boundaries of document understanding and develop even more powerful, accessible tools for everyone in the PyTorch community.
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)