Skip to content

A clone of the Apache accumulo-wikisearch repo. This repo is intended to track my progress of making master actually work with 1.5.

Notifications You must be signed in to change notification settings

wjsl/accumulo-wikisearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 Apache Accumulo Wikipedia Search Example

 This project contains a sample application for ingesting and querying wikipedia data.
 
  
 Ingest
 ------
 
 	Prerequisites
 	-------------
 	1. Accumulo, Hadoop, and ZooKeeper must be installed and running
 	2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory.
	   You will want to grab the files with the link name of pages-articles.xml.bz2
        3. Though not strictly required, the ingest will go more quickly if the files are decompressed:

            $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml

 
 	INSTRUCTIONS
 	------------
	1. Copy the ingest/conf/wikipedia.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information. 
	2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext
	3. Then run ingest/bin/ingest.sh with one argument (the name of the directory in HDFS where the wikipedia XML 
           files reside) and this will kick off a MapReduce job to ingest the data into Accumulo.
   
 Query
 -----
 
 	Prerequisites
 	-------------
	1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
 	
	NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The
	workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file.
	
	INSTRUCTIONS
	-------------
	1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to 
	   query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same 
	   information that you put into the wikipedia.xml file from the Ingest step above. 
	2. Re-build the query distribution by running 'mvn package assembly:single' in the top-level directory. 
        3. Untar the resulting file in the $JBOSS_HOME/server/default directory.

              $ cd $JBOSS_HOME/server/default
              $ tar -xzf $ACCUMULO_HOME/src/examples/wikisearch/query/target/wikisearch-query*.tar.gz
 
           This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
	4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. 
	5. Start JBoss ($JBOSS_HOME/bin/run.sh)
	6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example: 
			setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
	7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory:
	
		commons-lang*.jar
		kryo*.jar
		minlog*.jar
		commons-jexl*.jar
		google-collections*.jar
		
	8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext.


	9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp.
	You can issue the queries using this user interface or via the following REST urls: <host>/accumulo-wikisearch/rest/Query/xml,
	<host>/accumulo-wikisearch/rest/Query/html, <host>/accumulo-wikisearch/rest/Query/yaml, or <host>/accumulo-wikisearch/rest/Query/json.
	There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
	into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
	enwiki,frwiki,dewiki, etc. Or you can use all) 

About

A clone of the Apache accumulo-wikisearch repo. This repo is intended to track my progress of making master actually work with 1.5.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6

Languages