A tool to extract text from documents and insert it into Craft CMS Asset Elements.
This plugin requires Craft CMS 5.0.0 or later, and PHP 8.2 or later.
- Supports PDF (.pdf) and MS Word (.docx) files
- Password-protected PDF files are not supported.
- Extracts text on Asset creation and when Asset files are replaced
- Includes an Action to extract text from the Assets index view.
Extracted document text is inserted into the custom field handle defined by the plugin. The default field handle is body
.
You can customize the handle by adding a plugin config file.
<?php
/* @note config/text-extractor.php */
return [
'fieldHandle' => 'myCustomHandle'
];
This must be a Text field or CKEditor field.
- Upload supported file extensions and enjoy!
The PHPWord library (docs) and PHPOffice tools like promising, but were more complex than needed for this project at this time.