Skip to content

TheVinh-Ha-1710/Youtube-Channels-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Youtube Channels Scraper

Description

This project utilizes Scrapy to build a web scraping bot that collects data on top Vietnamese YouTube channels across various categories for analysis.

Features

  • Scrapes Top Vietnamese YouTube Channels: Collects data on leading YouTube channels across multiple categories.
  • Automated Data Extraction: Uses Scrapy to efficiently extract channel names, subscriber counts, total views, and more.
  • Customizable Categories: Allows modification of scraping targets based on user-defined categories.
  • CSV Export: Saves the extracted data into a structured CSV file for further analysis.
  • Error Handling & Logging: Implements basic error handling and logging to ensure smooth execution.
  • Lightweight & Fast: Optimized for quick and efficient data retrieval.

Technologies Used

  • Python: main programming language.
  • Scrapy: Python library for web scraping.
  • csv: File format of the results.

Installation & Setup

Prerequisites

  • Python 3.x installed
  • Jupyter Notebook or a Python IDE (VS Code, PyCharm, etc.)
  • Virtual environment (optional but recommended)

Setup

  1. Clone the repository:

    git clone https://github.com/TheVinh-Ha-1710/Youtube-Channels-Scraper.git
    cd Youtube-Channels-Scraper
  2. Create and activate a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install dependencies:

    pip install -r requirements.txt
  4. Run the web scraping bot:

    cd youtube_scraper
    scrapy crawl youtube_spider -o ../results.csv

Folder Structure

πŸ“‚ Youtube-Channels-Scraper
 β”œβ”€β”€ πŸ“‚ youtube_scraper         # Main infrastructure of the scraper
 β”œβ”€β”€ πŸ“œ .gitignore              # For specifying untracked files
 β”œβ”€β”€ πŸ“œ README.md               # Project document
 β”œβ”€β”€ πŸ“œ requirements.txt        # Required frameworks
 β”œβ”€β”€ πŸ“œ results.csv             # The result CSV file

About

Build a web scraping bot with Scrapy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages