This project utilizes Scrapy to build a web scraping bot that collects data on top Vietnamese YouTube channels across various categories for analysis.
- Scrapes Top Vietnamese YouTube Channels: Collects data on leading YouTube channels across multiple categories.
- Automated Data Extraction: Uses Scrapy to efficiently extract channel names, subscriber counts, total views, and more.
- Customizable Categories: Allows modification of scraping targets based on user-defined categories.
- CSV Export: Saves the extracted data into a structured CSV file for further analysis.
- Error Handling & Logging: Implements basic error handling and logging to ensure smooth execution.
- Lightweight & Fast: Optimized for quick and efficient data retrieval.
- Python: main programming language.
- Scrapy: Python library for web scraping.
- csv: File format of the results.
- Python 3.x installed
- Jupyter Notebook or a Python IDE (VS Code, PyCharm, etc.)
- Virtual environment (optional but recommended)
-
Clone the repository:
git clone https://github.com/TheVinh-Ha-1710/Youtube-Channels-Scraper.git cd Youtube-Channels-Scraper -
Create and activate a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Run the web scraping bot:
cd youtube_scraper scrapy crawl youtube_spider -o ../results.csv
π Youtube-Channels-Scraper
βββ π youtube_scraper # Main infrastructure of the scraper
βββ π .gitignore # For specifying untracked files
βββ π README.md # Project document
βββ π requirements.txt # Required frameworks
βββ π results.csv # The result CSV file