AutoCSV Profiler Suite

Multi-environment CSV data analysis orchestrator with isolated profiling engines

AutoCSV Profiler Suite resolves dependency conflicts in data science through a multi-environment architecture that isolates profiling engines while providing a unified interface.

Overview

AutoCSV Profiler Suite is a multi-environment CSV data analysis orchestrator that solves dependency conflicts between profiling engines through isolated conda environments. The system provides a unified interface while running YData Profiling, SweetViz, DataPrep, and custom statistical analysis in separate environments to prevent package version conflicts

Problem Statement

Data science projects face dependency conflicts between profiling engines and statistical libraries. This project uses isolated conda environments to prevent conflicts while maintaining functionality.

Key Features

Dependency Conflict Resolution: Three specialized conda environments plus base environment prevent library conflicts
Multiple Profiling Engines: YData Profiling, SweetViz, DataPrep, and custom statistical analysis
Memory Management: Chunking for large files with 1GB default limit
Interface: Console interface with progress tracking and error handling
Lazy Loading: Engines load only when needed for performance
Degradation: Continues working even with partial engine availability
Cross-Platform Support: Windows, Linux, macOS with conda environment isolation

Architecture and dependency details: Architecture Guide

Demo Interactive demonstration of the complete analysis workflow from setup to results

Start Guide

Prerequisites

Anaconda or Miniconda
Python 3.10 or higher
At least 3GB free disk space (2GB for conda environments, 1GB for data/outputs)

Setup Steps

Complete installation instructions: Installation Guide

Quick setup:

# 1. Clone and navigate
git clone https://github.com/dhaneshbb/autocsv-profiler-suite.git
cd autocsv-profiler-suite

# 2. Install requirements and setup environments
pip install -r requirements.txt
python bin/setup_environments.py create --parallel

Run analysis:

First, explore available analysis options:

python bin/run_analysis.py --help

Then start the interactive analysis:

python bin/run_analysis.py

Analysis Command Options:

Interactive Welcome Interface:

The interface provides file selection, delimiter detection, and engine selection:

Choose Analysis Engines:

All Reports Generated Successfully:

Setup guide: Installation Guide | Issues: Troubleshooting Guide

Architecture & Requirements

System Diagram

graph TD
    A[User Interface] --> B[Main Orchestrator]
    B --> C[Base Environment<br/>Python 3.10+]
    C --> D[Environment Manager]
    D --> E[Main Environment<br/>Python 3.11]
    D --> F[Profiling Environment<br/>Python 3.10]
    D --> G[DataPrep Environment<br/>Python 3.10]

    E --> H[Statistical Analysis<br/>numpy 2.2.6,
scipy 1.13.1]
    F --> I[YData Profiling<br/>numpy <2.2]
    F --> J[SweetViz<br/>legacy pandas]
    G --> K[DataPrep Engine<br/>pandas 1.5.3]

    H --> L[Analysis Reports]
    I --> L
    J --> L
    K --> L

Architecture details: Architecture Guide

Summary: Multi-environment conda architecture resolves dependency conflicts between profiling engines. Requires conda, Python 3.10+, and 3GB free disk space.

Usage Examples

Interactive Mode

python bin/run_analysis.py

Interactive mode includes:

File selection with validation
Delimiter detection (with manual override)
Engine selection based on availability
Progress monitoring with updates
Result summary and output locations

Direct Command Line

# Analyze specific file directly
python bin/run_analysis.py path/to/data.csv

# Debug mode with detailed output
python bin/run_analysis.py --debug

# Direct analysis with debug mode
python bin/run_analysis.py path/to/data.csv --debug

Command options documentation available in Quick Start Guide above.

Programmatic Usage

Command-line interface usage for reliable multi-environment support:

# Interactive mode (recommended) - guides through file selection
python bin/run_analysis.py

# Direct file analysis
python bin/run_analysis.py data.csv

# Debug mode for troubleshooting
python bin/run_analysis.py --debug

Python API (requires proper environment setup):

# Note: ImportError may occur in multi-environment setup
# CLI interface recommended for production workflows
try:
    from autocsv_profiler import profile_csv
    report_path = profile_csv("data.csv", "output_directory/")
except ImportError:
    print("Usage: python bin/run_analysis.py data.csv")

Documentation

Getting Started:

Installation Guide - Environment setup instructions
Getting Started Tutorial - Step-by-step walkthrough
User Guide - Reference for daily usage

Technical Reference:

API Documentation - Technical API reference
Architecture Guide - Multi-environment design
Performance Guide - Optimization and benchmarks

Development:

Development Guide - Environment setup and workflow
Design Decisions - Architectural decision records
Troubleshooting Guide - Common issues and solutions

Examples:

Examples Directory - samples
Engine Testing Guide - Engine testing procedures

Contributing

See Contributing Guide for workflow details.

License

MIT License - see LICENSE file. Third-party dependencies have various licenses - see NOTICE for details.

Important: See DISCLAIMER for liability limitations and dependency responsibility.

Links

Repository: https://github.com/dhaneshbb/autocsv-profiler-suite
Documentation: https://github.com/dhaneshbb/autocsv-profiler-suite/tree/main/docs
Issues: https://github.com/dhaneshbb/autocsv-profiler-suite/issues
Changelog: CHANGELOG.md
License: MIT License

Version 2.0.0 | Beta | Python 3.10-3.13 | Cross-Platform

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
autocsv_profiler		autocsv_profiler
bin		bin
config		config
docs		docs
tests		tests
.bandit		.bandit
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER		DISCLAIMER
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
mypy_dataprep.ini		mypy_dataprep.ini
mypy_main.ini		mypy_main.ini
mypy_profiling.ini		mypy_profiling.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoCSV Profiler Suite

Overview

Problem Statement

Key Features

Start Guide

Prerequisites

Setup Steps

Architecture & Requirements

System Diagram

Usage Examples

Interactive Mode

Direct Command Line

Programmatic Usage

Documentation

Contributing

License

Links

About

Uh oh!

Releases 3

Packages

Languages

License

dhaneshbb/autocsv-profiler-suite

Folders and files

Latest commit

History

Repository files navigation

AutoCSV Profiler Suite

Overview

Problem Statement

Key Features

Start Guide

Prerequisites

Setup Steps

Architecture & Requirements

System Diagram

Usage Examples

Interactive Mode

Direct Command Line

Programmatic Usage

Documentation

Contributing

License

Links

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages