Skip to content

michaelsizonenko/friedmansscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Written for Python 2.7.9

check python is in the System Path variable (check by typing 'echo %PATH%' and press enter)

config.json meaning :

{ "input_file": "C:/some_dir/test2.csv", --- absolute path to the input csv file (use slash only / instead of backslash \ ) "output_file": "C:/some_dir/result.csv" --- absolute path to the output csv file "depth": 5, --- depth of parsing "name_index": 1, --- column in csv contains person name (number order starts from 0) "start_from": 0, --- start row. file will be processed from this row number "process_until": 3001, --- the last row to process "continue_processing": false --- false if you want to remove old results and create new result.csv; true if you want continue writing to the same file }

other configuration parameters:

spiders/settings.py contains:

ROBOTSTXT_OBEY = False -- True if spider need to ask robots.txt file RETRY_ENABLED = False -- True if you want to retry request URL failed to load DOWNLOAD_TIMEOUT = 30 -- download timeout for a single URL RETRY_TIMES = 1 -- retry numbers if failed to download URL

All params are explained above and stored in config.json or settings.py

To run spider:

python main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages