Install with requirements:
python3 -m venv venv
source venv/bin/activate
pip3 install -e .Create local config file and change access credentials to Elasticsearch and your Flink cluster:
cp warc_indexer/conf/config.py warc_indexer/conf/local_config.py
# Adjust values in local_config.pySet up indices if they don't exist yet:
chatnoir-index index-setup META_INDEX_NAME DATA_INDEX_NAMEIndex data from WARC S3 bucket:
chatnoir-index index 's3://bucket/warc-glob*.warc.gz' META_INDEX_NAME DATA_INDEX_NAME ID_PREFIXIn addition to configuring them in local_config.py, you can pass any Apache Beam args (such as --flink_master) to the index command as well.
For more information on the main parameters of chatnoir-index or any of its subcommands, run with --help.