University of Kashan Phone Directory Scraper

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and personnel from the University of Kashan. It includes tools to scrape, parse, and export data from an HTML file into JSON format.

Features

HTML parsing to extract structured data.
Export of extracted data in JSON format.
Modular and adaptable code for similar scraping tasks.

Project Structure

organization-phone-118
.
├── demo.html # Sample HTML data file.
├── extract.php # Script for extracting data from HTML.
├── output.json # Extracted data in JSON format.
└── load.php # Configuration and utility script.

Prerequisites

PHP: Version 7.4 or higher.
Web Server: Optional, such as Apache or Nginx.

Usage

Clone the repository:

git clone https://github.com/BaseMax/kashan-university-phone-directory.git
cd kashan-university-phone-directory

Place the HTML file to be parsed in the root directory and name it demo.html.

Run the extraction script:

php extract.php

View the output in output.json:

cat output.json

Output Format

The extracted data is stored in a JSON file with a structure similar to this:

[
    ["Name", "Position", "Phone Number"],
    ["Example User", "Lecturer", "123456789"]
]

Contribution

Contributions are welcome! Please submit issues or pull requests on the GitHub repository.

License

This project is licensed under the MIT License.

Disclaimer

Ensure compliance with local laws and regulations regarding the publication of personal data. Obtain permission if necessary before sharing extracted information.

Copyright

Data source: 118 Kashan University Directory. https://118.kashanu.ac.ir/

Author

Developed by BaseMax.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

University of Kashan Phone Directory Scraper

Features

Project Structure

Prerequisites

Usage

Output Format

Contribution

License

Disclaimer

Copyright

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
demo.html		demo.html
extract.php		extract.php
output.json		output.json

License

BaseMax/kashan-university-phone-directory

Folders and files

Latest commit

History

Repository files navigation

University of Kashan Phone Directory Scraper

Features

Project Structure

Prerequisites

Usage

Output Format

Contribution

License

Disclaimer

Copyright

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages