You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OCR recognition model expects an image with only one word on it and will output the predicted word with a confidence score. You can use the following snippet to test OCR capabilities from docTR:
@@ -70,7 +70,7 @@ result = model(doc)
70
70
print(result)
71
71
```
72
72
73
-
Here, the most important line of code is `model = recognition_predictor(pretrained=True)`. This will load a default text recognition model,****`crnn_vgg16_bn`, but you can select other models through the `arch` parameter. You can check out the [available architectures](https://mindee.github.io/doctr/using_doctr/using_models.html).
73
+
Here, the most important line of code is `model = recognition_predictor(pretrained=True)`. This will load a default text recognition model,`crnn_vgg16_bn`, but you can select other models through the `arch` parameter. You can check out the [available architectures](https://mindee.github.io/doctr/using_doctr/using_models.html).
74
74
75
75
When run on the sample, the recognition predictor retrieves the following data: `[('MAGAZINE', 0.9872216582298279)]`
76
76
@@ -86,7 +86,7 @@ Note: using the DocumentFile object docTR provides an easy way to manipulate PDF
86
86
The last example was a crop on a single word. Now, what about an image with several words on it, like this one?
87
87
88
88
89
-
{:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
89
+
{:style="width:100%;display: block;max-width:300px; margin-left:auto; margin-right:auto;"}
90
90
91
91
92
92
A text detection model is used before the text recognition to output a segmentation map representing the location of the text. Following that, the text recognition is applied on every detected patch.
@@ -113,10 +113,10 @@ plt.show()
113
113
Running it on the full sample yields the following:
114
114
115
115
116
-
{:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
116
+
{:style="width:100%;display: block;max-width:300px; margin-left:auto; margin-right:auto;"}
117
117
118
118
119
-
Similarly to the text recognition, `detection_predictor` will load a default model (`fast_base`here). You can also load another one by providing it through the `arch` parameter.
119
+
Similarly to the text recognition, `detection_predictor` will load a default model (`fast_base`here). You can also load another one by providing it through the `arch` parameter.
120
120
121
121
122
122
## The full implementation
@@ -137,7 +137,7 @@ result = model(doc)
137
137
result.show()
138
138
```
139
139
140
-
{:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
140
+
{:style="width:100%;display: block;max-width:300px; margin-left:auto; margin-right:auto;"}
141
141
142
142
The last line should display a matplotlib window which shows the detected patches. Hovering the mouse over them will display their contents.
143
143
@@ -152,7 +152,7 @@ plt.axis('off')
152
152
plt.show()
153
153
```
154
154
155
-
{:style="width:100%;display: block;max-width:200px; margin-left:auto; margin-right:auto;"}
155
+
{:style="width:100%;display: block;max-width:300px; margin-left:auto; margin-right:auto;"}
156
156
157
157
158
158
The pipeline is highly customizable, where you can modify the detection or recognition model behaviors by passing arguments to the `ocr_predictor`. Please refer to the [documentation](https://mindee.github.io/doctr/using_doctr/using_models.html) to learn more about it.
0 commit comments