Releases: breezedeus/Pix2Text
Releases · breezedeus/Pix2Text
feat: add new models MFD-1.5 & MFR-1.5
Update 2025.07.25: V1.1.4 Released
Major Changes:
- Upgraded the Mathematical Formula Detection (MFD) and Mathematical Formula Recognition (MFR) models to version 1.5. All default configurations, documentation, and examples now use
mfd-1.5andmfr-1.5as the standard models.
主要变更:
- 数学公式检测(MFD)和数学公式识别(MFR)模型升级到 1.5 版本,所有默认配置、文档和示例均以
mfd-1.5和mfr-1.5为标准模型。
feat: enhance image processing functions and add comprehensive tests for transparency handling
fix: update version to 1.1.3.1 for VLM model import fix
Update 2025.04.27: V1.1.3.1 Released
Major Changes:
- Bugfix: Fixed the issue of model import related to VLM.
主要变更:
- 修复了 VLM 相关的模型导入问题。
feat: add VLM support for text and table recognition
Update 2025.04.15: V1.1.3 Released
Major Changes:
- Support for
VlmTableOCRandVlmTextFormulaOCRmodels based on the VLM interface (see LiteLLM documentation) allowing the use of closed-source VLM models. Installation command:pip install pix2text[vlm].- Usage examples can be found in tests/test_vlm.py and tests/test_pix2text.py.
主要变更:
- 支持基于 VLM 接口(具体参考 LiteLLM 文档)的
VlmTableOCR和VlmTextFormulaOCR模型,可使用闭源 VLM 模型。安装命令:pip install pix2text[vlm]。- 使用方式见 tests/test_vlm.py 和 tests/test_pix2text.py。
Bugfix: Fixed issues related to downloading models on Windows
Update 2024.12.17: V1.1.2.3 Released
Major Changes:
- Bugfix: Fixed issues related to downloading models on Windows.
主要变更:
- 修复了在 Windows 环境下下载模型的问题。
bugfix
Update 2024.12.11: V1.1.2.2 Released
Major Changes:
- Bugfix: Resolved issues related to serialization errors when handling ONNX Runtime session options by ensuring that non-serializable configurations are managed appropriately.
主要变更:
- 修复了与 ONNX Runtime session options 相关的序列化错误,通过确保不可序列化的配置信息在适当的管理下进行处理。
bugfix
Update 2024.12.02: V1.1.2.1 Released
Major Changes:
- Fixed an error in
fetch_column_info()@DocYoloLayoutParser, thanks to Bin.
主要变更:
- 修复了 fetch_column_info()@DocYoloLayoutParser 中的错误,感谢网友 Bin 。
Integrated a better layout analysis model DocLayout-YOLO
Update 2024.11.17: V1.1.2 Released
Major Changes:
- A new layout analysis model DocLayout-YOLO has been integrated, improving the accuracy of layout analysis.
- Bug fixes:
- When the text language is set to English only, a dedicated English OCR model is used to avoid including Chinese in the output.
- The processing logic for PNG images has been optimized, enhancing recognition performance.
主要变更:
- 版面分析模型加入 DocLayout-YOLO,提升版面分析的准确性。
- 修复 bugs:
- 在设置文本语言只有英语时,使用专门的英文 OCR 模型,避免输出中包含中文。
- 对 PNG 图片的处理逻辑进行了优化,提升了识别效果。
Bugfixes
Fix: some formats of models require fixed-size input images
Update 2024.06.24: V1.1.1.1 Released
Major Changes:
- Added a new parameter
static_resized_shapewhen initializingMathFormulaDetector, which is used to resize the input image to a fixed size. Some formats of models require fixed-size input images during inference, such asCoreML.
主要变更:
MathFormulaDetector初始化时加入了参数static_resized_shape, 用于把输入图片 resize 为固定大小。某些格式的模型在推理时需要固定大小的输入图片,如CoreML。