- Install Docker and Docker-Compose, refer to the installation instructions at
https://www.docker.com/andhttps://docs.docker.com/compose/ - If developing or generating the HTML Documentation:
- Install Python with Version 3.9, see
https://www.python.org/downloads/ - Install pipenv, refer to installation instructions at
https://pipenv.pypa.io/en/latest/
- Install Python with Version 3.9, see
Run the following in root folder to start the system:
docker-compose up --buildBuild Server and then start Neo4J Database and Servercurl -X POST http://localhost:5000/competencies/initializeInitialize the Database and Store (takes around 5 Minutes) or- Go to
http://localhost:5000/api/docsand execute the "Initialize" Endpoint for Competencies
If you haven't already run pipenv install and then run
pre-commit install
The first time you commit something it will take a little longer to initialize the dependencies but usually the pre-commit hook only checks the diff, so it should be fast.
Use the following commands for development (in the root folder):
- Create a
.envfile - Paste (and adjust if necessary) the following content into the
.envfile:
DB_URI=bolt://localhost:7687
DATA_FILE=./data/skills_de.csv
COURSES_FILE=./data/courses_preprocessed.csv
MODEL_FILES=./data/MLmodel
NLTK_FILES=./data/lemma_cache_data/nltk_data
MORPHYS_FILE=./data/lemma_cache_data/morphys.csv
STOPWORDS_FILE=./data/lemma_cache_data/stopwords-de.txt
ML_DIR=./ML/
LABELED_COMPETENCIES_FILE=./data/preproccessed_labels.csv
docker-compose up dbto only start Neo4J Databasepipenv installto install requirementspipenv run python -m flask runto start the server (for Dev/Debug purposes)curl -X POST http://localhost:5000/competencies/initializeto initialize the Database and Store (takes around 5 Minutes)
After having executed the prerequisites for Development in General (make sure the database is running), use the following commands to run the tests:
- If the database is already initialized: Run
pipenv run pytest tests/ -k 'not initialize' - If the database is not initialized, to test the initialization: Run
pipenv run pytest tests/ -k 'initialize'
match (a) -[r] -> () delete a, rto clean up relationsmatch (a) delete ato clean up nodes
Use the following commands to reproduce the Machine Learning model used in the Machine Learning based Competency Extractor:
pipenv run python app/machine_learning.pythis creates the spacy files for training and testing the modelcd MLnavigate the console to the "ML" directorypipenv run python -m spacy train config.cfg --output ./outputtrain and test the model with the created spacy files
You can find the documentation of our API at http://localhost:5000/api/docs once you have the system up and running.
A recent version of the HTML Documentation of the Code can be found in the docs/html folder.
However, to manually generate the latest version based on the current source code, execute:
pipenv installto install required dependenciespipenv run make htmlto generate HTML documentation based on the current Source Code
You will find the generated HTML Documentation afterwards in the build/html Folder. Just drag and drop the index.html File
into a Browser to start browsing the Documentation.
To use the preprocessing pipeline use the following code:
from app.preprocessing_utils import PreprocessorGerman
prc_pipeline = PreprocessorGerman()
preprocessed_course_descriptions = prc_pipeline.preprocess_course_descriptions(course_descriptions)
If the data folder doesn't show up or cannot be opened try sudo chmod a+r data -R.
To use the trained Entity Recognition Model use the following code:
import spacy
nlp = spacy.load(path_to_model)
doc = nlp()
ents = doc.ents