Releases: pdfminer/pdfminer.six
Releases · pdfminer/pdfminer.six
20250506
20250416
Fixed
TypeErrorwhen parsing font width with indirect object references (#1098)ValueErrorwhen loading xref with invalid position or generation numbers that cannot be parsed as int (#1099)- Safely converting PDF stack objects to float or int in PDFInterpreter (#1100)
TypeErrorwhen parsing font bbox with incorrect values (#1103)ValueErroron incorrect stream lengths for ASCII85 data (#1112)
20250327
Added
- Support for Python 3.13 (#1092)
Changed
- Reduce memory overhead on runlength encoding by using lists (#1055)
- Using
pyproject.tomlinstead ofsetup.py(#1028)
Fixed
TypeErrorwhen CID character widths are not parseable as floats (#1001)TypeErrorraised by extract_text method with compressed PDF file (#1029)PSBaseParsercan't handle tokens split across end of buffer (#1030)TypeErrorwhen CropBox is an indirect object reference (#1004)- Remove redundant line to be able to recognize rectangles (#1066)
- Support indirect objects for filters (#1062)
- Make sure
bytesisbyteswhere it counts (#1069)
Removed
- Support for Python 3.8 (#1091)
20250324
Changed
- Using absolute instead of relative imports ([#995])
Deprecated
- The third argument (generation number) to
PDFObjRef(#972)
Fixed
TypeErrorwhen corrupt PDF object reference cannot be parsed as int (#972)])TypeErrorwhen corrupt PDF literal cannot be converted to str (#978)ValueErrorwhen corrupt PDF specifies a negative xref location (#980)ValueErrorwhen corrupt PDF specifies an invalid mediabox (#987)RecursionErrorwhen corrupt PDF specifies a recursive /Pages object (#998)TypeErrorwhen corrupt PDF specifies text-positioning operators with invalid values (#1000)- inline image parsing fails when stream data contains "EI\n" (#1008)
TypeErrorwhen parsing object reference as mediabox (#1082)
Removed
- Deprecated tools, functions and classes (#974)
20240706
Added
- Support for zipped jpeg's (#938)
- Fuzzing harnesses for integration into Google's OSS-Fuzz (949)
- Support for setuptools-git-versioning version 2.0.0 (#957)
Fixed
- Resolving mediabox and pdffont (#834)
- Keywords that aren't terminated by the pattern
END_KEYWORDbefore end-of-stream are parsed (#885) ValueErrorwrong error message when specifying codec for text output (#902)- Resolve stream filter parameters (#906)
- Reading cmap's with whitespace in the name (#935)
- Optimize
apply_png_predictorby using lists (#912)
Changed
20231228
Added
- Output converter for the hOCR format (#651)
- Font name aliases for Arial, Courier New and Times New Roman (#790)
- Documentation on why special characters can sometimes not be extracted (#829)
- Storing Bezier path and dashing style of line in LTCurve (#801)
Fixed
- Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools (#921)
flake8failures (#921)ValueErrorwhen bmp images with 1 bit channel are decoded (#773)ValueErrorwhen trying to decrypt empty metadata values (#766)- Sphinx errors during building of documentation (#760)
TypeErrorwhen getting default width of font (#720)- Installing typing-extensions on Python 3.6 and 3.7 (#775)
TypeErrorin cmapdb.py when parsing null characters (#768)- Color "convenience operators" now (per spec) also set color space (#794)
ValueErrorwhen extracting images, due to breaking changes in Pillow (#827)- Small typo's and issues in the documentation (#828)
- Ignore non-Unicode cmaps in TrueType fonts (#806)
Changed
- Using non-hardcoded version string and setuptools-git-versioning to enable installation from source and building on Python 3.12 (#922)
Deprecated
- Usage of
if __name__ == "__main__"where it was only intended for testing purposes (#756)
Removed
- Support for Python 3.6 and 3.7 because they are end-of-life (#923)
20221105
Added
- Output converter for the hOCR format (#651)
- Font name aliases for Arial, Courier New and Times New Roman (#790)
- Documentation on why special characters can sometimes not be extracted (#829)
Fixed
ValueErrorwhen bmp images with 1 bit channel are decoded (#773)ValueErrorwhen trying to decrypt empty metadata values (#766)- Sphinx errors during building of documentation (#760)
TypeErrorwhen getting default width of font (#720)- Installing typing-extensions on Python 3.6 and 3.7 (#775)
TypeErrorin cmapdb.py when parsing null characters (#768)- Color "convenience operators" now (per spec) also set color space (#794)
ValueErrorwhen extracting images, due to breaking changes in Pillow (#827)- Small typo's and issues in the documentation (#828)
Deprecated
- Usage of
if __name__ == "__main__"where it was only intended for testing purposes (#756)
20220524
20220506
Fixed
IndexErrorwhen handling invalid bfrange code map in
CMap (#731)TypeErrorin lzw.py whenself.tableis not set (#732)TypeErrorin encodingdb.py when name of unicode is not
str (#733)TypeErrorin HTMLConverter when using a bytes fontname (#734)
Added
- Exporting images without any specific encoding (#737)
Changed
- Using charset-normalizer instead of chardet for less restrictive license (#744)
20220319
Added
- Export type annotations from pypi package per PEP561 (#679)
- Support for identity cmap's (#626)
- Add support for PDF page labels (#680)
- Installation of Pillow as an optional extra dependency (#714)
Fixed
- Hande decompression error due to CRC checksum error (#637)
- Regression (since 20191107) in
LTLayoutContainer.group_textboxesthat returned some text lines out of order (#659) - Add handling of JPXDecode filter to enable extraction of images for some pdfs (#645)
- Fix extraction of jbig2 files, which was producing invalid files (#652)
- Crash in
pdf2txt.py --boxes-flow=disabled(#682) - Only use xref fallback if
PDFNoValidXRefis raised andfallbackis True (#684) - Ignore empty characters when analyzing layout (#499)
Changed
- Replace warnings.warn with logging.Logger.warning in line with recommended use (#673)
- Switched from nose to pytest, from tox to nox and from Travis CI to GitHub Actions (#704)
Removed
- Unnecessary return statements without argument at the end of functions (#707)