This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and personnel from the University of Kashan. It includes tools to scrape, parse, and export data from an HTML file into JSON format.
- HTML parsing to extract structured data.
- Export of extracted data in JSON format.
- Modular and adaptable code for similar scraping tasks.
organization-phone-118
.
├── demo.html # Sample HTML data file.
├── extract.php # Script for extracting data from HTML.
├── output.json # Extracted data in JSON format.
└── load.php # Configuration and utility script.
- PHP: Version 7.4 or higher.
- Web Server: Optional, such as Apache or Nginx.
- Clone the repository:
git clone https://github.com/BaseMax/kashan-university-phone-directory.git cd kashan-university-phone-directory
Place the HTML file to be parsed in the root directory and name it demo.html.
Run the extraction script:
php extract.phpView the output in output.json:
cat output.jsonThe extracted data is stored in a JSON file with a structure similar to this:
[
["Name", "Position", "Phone Number"],
["Example User", "Lecturer", "123456789"]
]Contributions are welcome! Please submit issues or pull requests on the GitHub repository.
This project is licensed under the MIT License.
Ensure compliance with local laws and regulations regarding the publication of personal data. Obtain permission if necessary before sharing extracted information.
Data source: 118 Kashan University Directory. https://118.kashanu.ac.ir/
Developed by BaseMax.
Copyright 2024-2025, Max Base