WebCrawler

A powerful and extensible C# console web crawler that recursively visits URLs, supports filtering, and exports discovered links to a file.

Features

Recursive link crawling with domain-relative expansion
URL filtering (domain, file extensions, keywords to include/exclude)
Queue-based scheduling with concurrency control
Export results to crawled_links.txt
Interactive CLI for user-defined filters
Console output with colored highlights

Getting Started

Prerequisites

.NET 6 SDK or newer
Internet connection

Build and Run

cd src/WebCrawler
dotnet run

Usage

You will be prompted to enter a starting URL.
Optionally, enter filtering criteria:
- Allowed domain (e.g., example.com)
- Allowed extensions (.html, .php, etc.)
- Keywords to include or exclude in URLs
The crawler will process the site and save all valid links to crawled_links.txt.

Customization

You can modify filters or concurrency settings inside:

QueueCrawlerService.cs — crawling logic
UrlHelper.cs — filtering logic

Screenshots

License

MIT License — use freely, modify boldly.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
WebCrawler		WebCrawler
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
WebCrawler.sln		WebCrawler.sln
screenshot1.png		screenshot1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebCrawler

Features

Getting Started

Prerequisites

Build and Run

Usage

Customization

Screenshots

License

About

Uh oh!

Uh oh!

Languages

atymri/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

WebCrawler

Features

Getting Started

Prerequisites

Build and Run

Usage

Customization

Screenshots

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages