Web Scraping Collection 🕷️

A comprehensive collection of web scraping scripts for extracting data from popular websites. This project demonstrates various web scraping techniques using Python and provides ready-to-use scripts for data extraction.

🌟 Features

Multiple Website Support: Scrape data from 10+ popular websites
CSV Output: All scrapers export data in CSV format for easy analysis
Easy to Use: Simple Python scripts with clear documentation
Educational: Perfect for learning web scraping techniques
Open Source: Contribute and improve the collection

📋 Available Scrapers

Scraper	Description	Output
Flipkart (`1. flipkart.py`)	Extract Nokia smartphone data (name, rating, price, description)	`flipkart.csv`
YouTube (`2. youtube.py`)	Scrape YouTube video information	`youtube.csv`
YouTube Links (`3. youtube_links.py`)	Extract YouTube video links	`youtube_links.csv`
IMDB (`4. imdb.py`)	Get top-rated movies with rankings, ratings, and director info	`imdb.csv`
Amazon (`5. Amazon.py`)	Extract Amazon product data	`Amazon.csv`
GitHub (`6. Github.py`)	Scrape GitHub repository information	`github.csv`
Udemy (`7. Udemy.py`)	Extract Udemy course data	`udemy.csv`
College Notices (`8. college_notice_scrapper.py`)	Scrape college notice board	`notice.csv`
Sanfoundry (`9. Sanfoundry.py`)	Extract educational content	`sanfoundry.csv`
Hacker News (`10. HackNews.py`)	Scrape GitHub-related posts from Hacker News	`hacknews.csv`
Weather (`Weather.py`)	Extract weather information	`weather.csv`

🚀 Quick Start

Prerequisites

pip install requests beautifulsoup4 lxml

Installation

Clone the repository

git clone https://github.com/amolsr/web-scrapping.git
cd web-scrapping

Run any scraper
```
python "1. flipkart.py"
```
Check the output
```
ls output/
```

📊 Sample Output

IMDB Top Movies

Rank,Name,Year,Rating,Link,Director
1,The Shawshank Redemption,1994,9.2,https://www.imdb.com/title/tt0111161/,Frank Darabont
2,The Godfather,1972,9.2,https://www.imdb.com/title/tt0068646/,Francis Ford Coppola

Flipkart Smartphones

Mobile Name,Ratings,Pricing,Description
Nokia 8.1,4.3,₹15,999,6GB RAM | 128GB Storage
Nokia 6.1 Plus,4.2,₹12,999,4GB RAM | 64GB Storage

🛠️ Usage Examples

Basic Usage

# Run a specific scraper
python "4. imdb.py"

# The script will automatically:
# 1. Fetch data from the website
# 2. Parse the HTML content
# 3. Extract relevant information
# 4. Save to CSV file in the output/ directory

Customization

Each script can be easily modified to:

Change the target URL
Extract different data fields
Modify the output format
Add error handling

📁 Project Structure

web-scrapping/
├── 1. flipkart.py          # Flipkart smartphone scraper
├── 2. youtube.py           # YouTube video scraper
├── 3. youtube_links.py     # YouTube links extractor
├── 4. imdb.py              # IMDB top movies scraper
├── 5. Amazon.py            # Amazon product scraper
├── 6. Github.py            # GitHub repository scraper
├── 7. Udemy.py             # Udemy course scraper
├── 8. college_notice_scrapper.py  # College notices scraper
├── 9. Sanfoundry.py        # Sanfoundry educational content
├── 10. HackNews.py         # Hacker News GitHub posts
├── Weather.py              # Weather information scraper
├── output/                 # Generated CSV files
│   ├── flipkart.csv
│   ├── imdb.csv
│   ├── github.csv
│   └── ...
└── README.md               # This file

🔧 Dependencies

requests: HTTP library for making web requests
beautifulsoup4: HTML/XML parsing library
lxml: XML and HTML processing library
csv: Built-in CSV module for data export

🤝 Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a new scraper or improve existing ones
Add proper documentation and comments
Test your changes
Submit a pull request

Contribution Ideas

Add new website scrapers
Improve error handling
Add data validation
Create web interface
Add support for different output formats (JSON, XML)
Implement rate limiting and respect robots.txt

⚠️ Important Notes

Respect robots.txt: Always check the website's robots.txt file
Rate Limiting: Add delays between requests to be respectful
Terms of Service: Ensure you comply with each website's terms
Data Usage: Use scraped data responsibly and ethically

📝 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

Beautiful Soup for HTML parsing
Requests library for HTTP handling
All contributors who help improve this collection

📞 Support

If you have questions or need help:

Open an issue on GitHub
Check the code comments for implementation details
Review the output files for expected data format

Happy Scraping! 🕷️✨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Collection 🕷️

🌟 Features

📋 Available Scrapers

🚀 Quick Start

Prerequisites

Installation

📊 Sample Output

IMDB Top Movies

Flipkart Smartphones

🛠️ Usage Examples

Basic Usage

Customization

📁 Project Structure

🔧 Dependencies

🤝 Contributing

Contribution Ideas

⚠️ Important Notes

📝 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
output		output
1. flipkart.py		1. flipkart.py
10. HackNews.py		10. HackNews.py
2. youtube.py		2. youtube.py
3. youtube_links.py		3. youtube_links.py
4. imdb.py		4. imdb.py
5. Amazon.py		5. Amazon.py
6. Github.py		6. Github.py
7. Udemy.py		7. Udemy.py
8. college_notice_scrapper.py		8. college_notice_scrapper.py
9. Sanfoundry.py		9. Sanfoundry.py
LICENSE		LICENSE
README.md		README.md
Weather.py		Weather.py

License

amolsr/web-scrapping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Collection 🕷️

🌟 Features

📋 Available Scrapers

🚀 Quick Start

Prerequisites

Installation

📊 Sample Output

IMDB Top Movies

Flipkart Smartphones

🛠️ Usage Examples

Basic Usage

Customization

📁 Project Structure

🔧 Dependencies

🤝 Contributing

Contribution Ideas

⚠️ Important Notes

📝 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages