A tool to crawl a site and log any resources that return a 404. Results are presented with a searchable todo-style checklist.
- Install Node
- Clone repo
git clone [email protected]:hudakdidit/site_crawler.git - Install dependencies
npm install - Setup config file: run
mv config-example.json config.json. Update thesiteandportproperties as necessary. - Start by running
npm run crawlto crawl the site you added in the last step. This will create the json 'database' (used as the data for the react front-end). Depending on the size of the site, the crawler may take some time so check your email and get coffee. A progress bar will indicate how far along the crawler is.
TODO
Start the crawler script.
npm run crawlStart webpack and the express web server
npm startStart webpack the express web server, and the web crawler
npm run dev-crawlStart the express web server
npm run server