This is an FCS Endpoint implementation for the (No)SketchEngine. It uses the bonito-open API as search backend.
It is being developed by the Leipzig Corpora Collection (LCC) and the Saxon Academy of Sciences and Humanities in Leipzig (SAW) and the code is licensed under MIT.
This repository should only be regarded as basis for own deployments. While templates and example configurations contain LCC specific URLs, those should only be used for testing and if you want to try out this code base! If you want to deploy your own FCS endpoint, please check that you have the permissions to use the specific NoSketchEngine API. You can setup your own NoSketchEngine easily with e.g. ELTE-DH/NoSketch-Engine-Docker.
There is some partial (No)SketchEngine API adapter in d.s.t.w.f.f.noske that can be extracted and used as is. There is a test case to see its usage besides the one in this endpoint.
Note that there are some basic assumptions about the backend NoSketchEngine searcher.
Those are implementation details and can be seen in the classes d.s.t.w.f.f.NoSkESRUFCSEndpointSearchEngine and d.s.t.w.f.f.query.FCSQLtoNoSkECQLConverter.
- We assume that all corpora are freely accessible and that there are not sub-corpora. The endpoint will dynamically configure itself by listing all corpora available and setting the appropriate metadata.
- The corpus
language_idis an ISO 639-3 identifier, e.g.deu. - We only have a single (required) structure:
s, meaning sentence (with optional attributesid/source/datethat are not really used at this point). - We use the following attributes:
word(required),lemma,pos(withpos_ud17) andlc(required) /lemma_lcas automatic lower cased variants forword/lemma.lemmaandposare optional attributes.- The attributes
posandpos_ud17are not completely integrated. At the moment, only theposattribute is checked which might not be UD17 (as required by FCS).
- The attributes
Adaptions to own corpus configurations should not be too complicated.
Dockerfile
Multi-stage Maven build and slim Jetty runtime image.docker-compose.ymlpom.xml
Java dependencies for use with Maven..env.template
Template.envfile for Docker deployments.
The following classes live in the de.saw_leipzig.textplus.webservices.fcs.fcs_noske_endpoint namespace.
d.s.t.w.f.f.NoSkESRUFCSConstants
Constants for accessing FCS request parameters and output generation. Can be used to store own constants.d.s.t.w.f.f.NoSkESRUFCSEndpointSearchEngine
The glue between the FCS and our own search engine. It is the actual implementation that handles SRU/FCS explain and search requests. Here, we load and initialize our FCS endpoint. It will perform searches with our own search engine (here only with static results), and wrap results into the appropriate output (d.s.t.w.f.f.NoSkESRUFCSSearchResultSet).d.s.t.w.f.f.NoSkESRUFCSSearchResultSet
FCS Data View output generation. Generates the basic HITS and ADVANCED Data Views. Here custom output can be generated from the result wrapperd.s.t.w.f.f.searcher.MyResults.d.s.t.w.f.f.searcher.MyResults
Lightweight wrapper around own results that allows access to results counts and result items per index and wraps the native result entries with kwic, left and right context as well as some metadata.
d.s.t.w.f.f.query.CQLtoNoSkECQLConverter
Query converion from simple CQL to (No)SketchEngine CQL (CQP) query.d.s.t.w.f.f.query.FCSQLtoNoSkECQLConverter
Query converion from FCS-QL to (No)SketchEngine CQL (CQP) query.
d.s.t.w.f.f.noske.NoSkeAPI
NoSkE Bonito API Client.- Namespace
d.s.t.w.f.f.noske.pojo
NoSkE Bonito API response wrapper classes.
d.s.t.w.f.f.util.LanguagesISO693
Helper class (from FCS SRU Aggregator) that handles conversion between ISO639 Codes and Language names.- src/main/resources/lang/iso-639-3_20230123.tab
Resource file for ISO639 conversion
Only the log4j2.xml is important in case of changing logging settings.
endpoint-description.xml
FCS Endpoint Description, like resources, capabilities etc.
This file can be used to pre-configure the endpoint, e.g., to restrict the exposed resources. Otherwise, using theFCS_RESOURCES_FROM_NOSKEparameter, resource information will be queried from the (No)SketchEngine API and all found resources are exposed. The Endpoint Description will be generated programmatically.jetty-env.xml
Jetty environment variable settings.sru-server-config.xml
SRU Endpoint Settings.web.xml
Java Servlet configuration, SRU/FCS endpoint settings.
The configuration (via Java environment variable context) for the endpoint are:
NOSKE_API_URI: URI; base URI to (No)SketchEngine Bonito endpoint, required!FCS_RESOURCES_FROM_NOSKE: Boolean, if (No)SketchEngine/corporaAPI endpoint should be used to automatically generate the Endpoint Description with the list of resources (corpora). Iffalse, the embedded or withRESOURCE_INVENTORY_URL("de.saw_leipzig.textplus.webservices.fcs.fcs_noske_endpoint.resourceInventoryURL") specified Endpoint Description file is being used.DEFAULT_RESOURCE_PID: String, default resource PID for searches where nox-fcs-contextis specified. Take care that you include the possible resource PID prefix, specified ind.s.t.w.f.f.NoSkESRUFCSConstants.
Build fcs.war file for webapp deployment:
mvn [clean] packageSome endpoint/resource configurations are being set using environment variables. See jetty-env.xml for details. You can set default values there.
For production use, you can set values in the .env file that is then loaded with the docker-compose.yml configuration. Take a look at the .env.template file, save a copy to .env with your own configuration.
This SRU/FCS Endpoint project includes both a Dockerfile and a docker-compose.yml configuration.
The Dockerfile can be used to build a simple Jetty image to run the FCS endpoint. It still needs to be configured with port-mappings, environment variables etc. The docker-compose.yml file bundles all those runtime configurations to allow easier deployment. You still need to create an .env file or set the environment variables if you use the generated code as is.
# build the image and label it "fcs-endpoint"
docker build -t fcs-endpoint .
# run the image in the foreground (to see logs and interact with it) with environment variables from .env file
docker run --rm -it --name fcs-endpoint -p 8200:8080 --env-file .env fcs-endpoint
# or run in background with automatic restart
docker run -d --restart=unless-stopped --name fcs-endpoint -p 8200:8080 --env-file .env fcs-endpoint# build
docker-compose build
# run
docker-compose up [-d]Uses Jetty 10. See pom.xml --> plugin jetty-maven-plugin.
mvn [package] jetty:run-warNOTE: jetty:run-war uses built war file in target/ folder.
The search request for something in CQL/BASIC-Search:
curl '127.0.0.1:8080?operation=searchRetrieve&queryType=cql&query=something&x-indent-response=1'
# or port 8200 if run with dockerAdd default debug setting Attach by Process ID, then start the jetty server with the following command, and start debugging in VSCode while it waits to attach.
# export configuration values, see section #Configuration
MAVEN_OPTS="-Xdebug -Xnoagent -Djava.compiler=NONE -agentlib:jdwp=transport=dt_socket,server=y,address=5005" mvn jetty:run-warThere are a few basic tests in src/test/java/d.s.t.w.f.f/ with hopefully more to come...
There exists a custom tests log4j2.xml configuration file.