A Python package for processing and analyzing Taiwan air quality observation data. This package provides tools to merge, transform, and analyze air quality data from Taiwan's monitoring stations.
- Data Merging: Combine air quality data files by year from multiple monitoring stations
- Data Transformation: Convert wide-format data to long-format for analysis
- CLI Tools: Command-line interfaces for batch processing
- Type Hints: Full type hint support for better development experience
- Modern Python: Built with modern Python practices using
uv
for dependency management
This project uses uv for dependency management. First, install uv if you haven't already:
# On Windows (PowerShell)
Invoke-WebRequest -Uri https://install.uv.io/install.ps1 -OutFile install.ps1; ./install.ps1
# On macOS/Linux
curl -LsSf https://install.uv.io/install.sh | sh
Then clone and install the project:
git clone https://github.com/whats2000/TaiwanAirQualityObservationData.git
cd TaiwanAirQualityObservationData
uv sync
The package provides two main CLI commands:
First download the air quality data from the 歷年監測資料 and place it in the data/
directory. The data should be organized by year, with each year's data in a separate subdirectory.
data/
├── 2023/
│ ├── 三義_2023.csv
│ ├── 三重_2023.csv
│ └── ... (other monitoring stations)
├── 2024/
│ ├── 三義_2024.csv
│ ├── 三重_2024.csv
│ └── ... (other monitoring stations)
└── ...
Merge air quality data files by year:
# Merge all data in the 'data' directory and output to 'output' directory
uv run merge-data
# Specify custom data directory, output directory, and stations file
uv run merge-data --data-dir /path/to/data --output-dir /path/to/output --stations-file custom_stations.csv
Transform data from wide format to long format:
# Transform all CSV files in the 'output' directory
uv run transform-data
# Transform a specific file
uv run transform-data --file output/2023.csv
# Specify custom data directory
uv run transform-data --data-dir /path/to/data
You can also use the package programmatically:
from taiwan_air_quality import merge_data, transform_data, process_all_files
# Merge data
merge_data(data_dir='data', stations_file='monitoring_stations.csv')
# Transform a specific file
transformed_df = transform_data('data/2023.csv')
# Transform all files in a directory
process_all_files('data')
The package expects air quality data in the following structure:
- Raw data organized in yearly directories under
data/
- Each year directory contains CSV files from monitoring stations
- Monitoring stations metadata in
monitoring_stations.csv
- Merged Data: Combined yearly CSV files with standardized columns
- Transformed Data: Long-format data with datetime columns for time-series analysis
# Clone the repository
git clone https://github.com/whats2000/TaiwanAirQualityObservationData.git
cd TaiwanAirQualityObservationData
# Install with development dependencies
uv sync --extra dev
# Install additional data science tools
uv sync --extra data
uv run pytest
The project uses several tools for code quality:
# Format code with Black
uv run black src/ tests/
# Sort imports with isort
uv run isort src/ tests/
# Lint with flake8
uv run flake8 src/ tests/
# Type checking with mypy
uv run mypy src/
TaiwanAirQualityObservationData/
├── src/
│ └── taiwan_air_quality/
│ ├── __init__.py # Package initialization
│ ├── merge.py # Data merging functionality
│ ├── transform.py # Data transformation functionality
│ └── py.typed # Type hint marker
├── tests/
│ ├── __init__.py
│ └── test_taiwan_air_quality.py
├── data/ # Data directory (not in repo)
├── monitoring_stations.csv # Station metadata
├── pyproject.toml # Project configuration
├── uv.lock # Dependency lock file
├── README.md # This file
└── .gitignore # Git ignore rules
- pandas: Data manipulation and analysis
- tqdm: Progress bars for long-running operations
- pytest: Testing framework
- black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
- matplotlib: Plotting and visualization
- seaborn: Statistical data visualization
- jupyter: Interactive notebooks
- plotly: Interactive plots
- numpy: Numerical computing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests and ensure code quality (
uv run pytest && uv run black . && uv run flake8
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Taiwan Environmental Protection Administration for providing air quality data
- The Python community for excellent tools and libraries
- Contributors to this project