Skip to content

mostlyserious/craft-text-extractor

Repository files navigation

Text Extractor

A tool to extract text from documents and insert it into Craft CMS Asset Elements.

Requirements

This plugin requires Craft CMS 5.0.0 or later, and PHP 8.2 or later.

Features

  • Supports PDF (.pdf) and MS Word (.docx) files
  • Extracts text on Asset creation and when Asset files are replaced
  • Includes an Action to extract text from the Assets index view.

Configuration

Extracted document text is inserted into the custom field handle defined by the plugin. The default field handle is body.

You can customize the handle by adding a plugin config file.

<?php

/* @note config/text-extractor.php */

return [
    'fieldHandle' => 'myCustomHandle'
];

This must be a Text field or CKEditor field.

Usage

  • Upload supported file extensions and enjoy!

Thank you to the following packages:

Future Plans and Other Document Parsers

The PHPWord library (docs) and PHPOffice tools like promising, but were more complex than needed for this project at this time.

About

Craft CMS Plugin to extract text from some document formats and add it to Asset fields

Resources

License

Stars

Watchers

Forks

Packages

No packages published