A powerful and extensible C# console web crawler that recursively visits URLs, supports filtering, and exports discovered links to a file.
- Recursive link crawling with domain-relative expansion
- URL filtering (domain, file extensions, keywords to include/exclude)
- Queue-based scheduling with concurrency control
- Export results to
crawled_links.txt
- Interactive CLI for user-defined filters
- Console output with colored highlights
- .NET 6 SDK or newer
- Internet connection
cd src/WebCrawler
dotnet run
-
You will be prompted to enter a starting URL.
-
Optionally, enter filtering criteria:
- Allowed domain (e.g.,
example.com
) - Allowed extensions (
.html
,.php
, etc.) - Keywords to include or exclude in URLs
- Allowed domain (e.g.,
-
The crawler will process the site and save all valid links to
crawled_links.txt
.
You can modify filters or concurrency settings inside:
QueueCrawlerService.cs
— crawling logicUrlHelper.cs
— filtering logic
MIT License — use freely, modify boldly.