This repository provides a set of tools to help generate OJS native XML files, streamlining the import of large archives into an OJS 3.4 installation.
The primary script in this project, ojs-xml-generator.py, converts article metadata from a standardized CSV file into OJS 3.4-compatible native XML files—one per issue. This is particularly useful for batch imports of legacy content, back issues, or large-scale archive migrations.
We created this tool to help migrate content efficiently and flexibly, and found it especially handy when working with incomplete or varying metadata. It’s designed to be adaptable, letting you include or exclude metadata fields as needed.
conda env create -f environment.yamlconda activate ojs-tools┌───────────┐ ┌─────────────┐ ┌────────────────┐ ┌────────────────────┐ ┌────────────────┐
│source data├───►│csv processor├───►│intermediate.csv├───►│ojs_xml_processor.py├───►│ojs native xml's│
└───────────┘ └─────────────┘ └────────────────┘ └────────────────────┘ └────────────────┘
- The
source datacan be in any format. - The
CSV processorconverts the data into theintermediate.csvformat. ojs_xml_processor.pyconvertsintermediate.csvinto multiple OJS native XML files—one per issue.
Custom CSV processors are needed for each archive. The output must be validated by output_csv_validator.py.
This is a custom CSV processor for the TVHO project. It generates input for ojs-xml-generator.
python tvho_csv_processor.py --input_csv /path/to/input.csv --output_csv /path/to/output.csv --files_path /path/to/documents--files_pathpoints to a directory containing the files listed in the input CSV.
⚠️ Opening the output in spreadsheet tools (e.g., Excel) might alter its contents and cause errors during XML generation.
Ensures that the CSV output conforms to the required schema.
python output_csv_validator.py --csv /path/to/data.csv| Field | Description | Required? |
|---|---|---|
| id | Numeric ID | Yes |
| title | Article title | Yes |
| publication | Issue title (if applicable) | Yes |
| abstract | Article abstract | Yes |
| file | Full path or Base64-encoded content of the file | Yes |
| publication_date | YYYY-MM-DD format |
Yes |
| volume | Volume number | Yes |
| year | Year of publication | Yes |
| issue | Issue number (as a string) | Yes |
| page_number | Page numbers | Yes |
| section_title | Title of the section | Yes |
| section_policy | Section policy (internal use) | Yes |
| section_reference | Short section code (internal use) | Yes |
| doi | DOI (if available) | No |
| keywords | Keywords (semicolon-separated with [;sep;]) |
No |
| author_given_name_x | Author first name (starts at 0) | Yes |
| author_family_name_x | Author last name (starts at 0) | Yes |
| author_affiliation_x | Author affiliation | No |
| author_email_x | Author email | No |
| author_country_x | Author country code (ISO 3166) | No |
Note: by default, the script expects a semicolon-seperated CSV. This can be altered on line 302, if needed.
id;title;publication;abstract;file;publication_date;volume;year;issue;page_number;section_title;section_policy;section_reference;doi;keywords;author_given_name_0;author_family_name_0;author_affiliation_0;author_email_0;author_country_0;author_given_name_1;author_family_name_1;author_affiliation_1;author_email_1;author_country_1;author_given_name_2;author_family_name_2;author_affiliation_2;author_email_2;author_country_2
1;"Machine Learning Applications in Healthcare Diagnostics";"Journal of Medical Informatics";"This study explores the implementation of machine learning algorithms for early disease detection in clinical settings. Our analysis shows a 15% improvement in diagnostic accuracy compared to traditional methods.";"./articles/ml_healthcare_2024.pdf";"2024-03-15";45;2024;3;"123-145";"Research Articles";"peer-reviewed";"RA";"10.1016/j.jmedinf.2024.03.015";"machine learning[;sep;]healthcare[;sep;]diagnostics[;sep;]artificial intelligence";"Sarah";"Johnson";"Stanford University Medical Center";"[email protected]";"US";"Michael";"Chen";"MIT Computer Science Lab";"[email protected]";"US";"";"";"";"";""
2;"Climate Change Impact on Coastal Ecosystems";"Environmental Science Quarterly";"A comprehensive analysis of temperature and sea level changes affecting marine biodiversity along the Pacific coast over the past three decades.";"./articles/climate_coastal_2024.pdf";"2024-01-22";12;2024;1;"67-89";"Environmental Studies";"open-access";"ES";"10.1007/s10661-024-12345";"climate change[;sep;]marine biology[;sep;]ecosystem[;sep;]biodiversity[;sep;]coastal";"Maria";"Rodriguez";"University of California San Diego";"[email protected]";"US";"";"";"";"";"";"";"";"";"";""
Generates self-contained XML files for the OJS NativeImportExportPlugin.
python ojs-xml-generator.py --csv_file /path/to/data.csv --output_path /path/to/output/folder --journal_name "Journal Full Name"| Parameter | Description | Default |
|---|---|---|
--author_group |
OJS user group (localized: Author / Auteur) |
Author |
--submission_file_genre |
File genre (localized: Article Text / Artikeltekst) |
Article Text |
--locale |
Locale used in XML (en, nl, etc.) |
en |
--file_input |
Input mode for files: file_path or base64 |
file_path |
python ojs-xml-generator.py --csv_file /path/to/data.csv --output_path /path/to/output/folder --journal_name "Journal Name" --author_group Auteur --submission_file_genre Artikeltekst --locale nl --file_input base64Uploads XML files to OJS.
📍 Place the script in the root directory of your OJS installation.
./ojs_import.sh /path/to/xmls journal_path/path/to/xmlscontains the generated OJS XML files.journal_pathis the OJS-configured journal path.