A web scraping project that extracts Premier League statistics (rankings, player data, match results) using Scrapy and Selenium for dynamic content. The collected data is stored in CSV and JSON formats for analysis.
- Comprehensive Data Collection: Scrapes rankings, player stats, and match results
- Dynamic Content Handling: Uses Selenium for JavaScript-rendered content
- Structured Output: Stores data in both CSV and JSON formats
- Production-Ready: Configured with proper Scrapy middlewares and pipelines
- Web Scraping: Scrapy, Selenium
- Browser Automation: ChromeDriver
- Data Processing: Pandas, NumPy
- Data Formats: JSON, CSV
- Analysis: Jupyter Notebooks
- Clone the repository:
git clone [repository-url] cd premier-league-scraper
- Install dependencies:
pip install -r requirements.txt
- Install ChromeDriver (for Selenium):
brew install chromedriver # MacOS
or download from https://chromedriver.chromium.org/
- Run a spider:
cd scraper scrapy crawl rankings -O ../data/raw/rank.json