Skip to content

Multi-environment CSV data analysis orchestrator that resolves dependency conflicts between profiling engines through isolated conda environments while providing a unified interface.

License

Notifications You must be signed in to change notification settings

dhaneshbb/autocsv-profiler-suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoCSV Profiler Suite

Multi-environment CSV data analysis orchestrator with isolated profiling engines

Python 3.10+ License: MIT Conda Required Status: Beta Platform Support

AutoCSV Profiler Suite resolves dependency conflicts in data science through a multi-environment architecture that isolates profiling engines while providing a unified interface.

Overview

AutoCSV Profiler Suite is a multi-environment CSV data analysis orchestrator that solves dependency conflicts between profiling engines through isolated conda environments. The system provides a unified interface while running YData Profiling, SweetViz, DataPrep, and custom statistical analysis in separate environments to prevent package version conflicts

Problem Statement

Data science projects face dependency conflicts between profiling engines and statistical libraries. This project uses isolated conda environments to prevent conflicts while maintaining functionality.

Key Features

  • Dependency Conflict Resolution: Three specialized conda environments plus base environment prevent library conflicts
  • Multiple Profiling Engines: YData Profiling, SweetViz, DataPrep, and custom statistical analysis
  • Memory Management: Chunking for large files with 1GB default limit
  • Interface: Console interface with progress tracking and error handling
  • Lazy Loading: Engines load only when needed for performance
  • Degradation: Continues working even with partial engine availability
  • Cross-Platform Support: Windows, Linux, macOS with conda environment isolation

Architecture and dependency details: Architecture Guide

AutoCSV Profiler Suite Demo

Demo Interactive demonstration of the complete analysis workflow from setup to results

Start Guide

Prerequisites

  • Anaconda or Miniconda
  • Python 3.10 or higher
  • At least 3GB free disk space (2GB for conda environments, 1GB for data/outputs)

Setup Steps

Complete installation instructions: Installation Guide

Quick setup:

# 1. Clone and navigate
git clone https://github.com/dhaneshbb/autocsv-profiler-suite.git
cd autocsv-profiler-suite

# 2. Install requirements and setup environments
pip install -r requirements.txt
python bin/setup_environments.py create --parallel

Run analysis:

First, explore available analysis options:

python bin/run_analysis.py --help

Then start the interactive analysis:

python bin/run_analysis.py

Analysis Command Options: Analysis Help

Interactive Welcome Interface: Analysis Interface

The interface provides file selection, delimiter detection, and engine selection:

Choose Analysis Engines: Engine Selection

All Reports Generated Successfully: Analysis Complete

Setup guide: Installation Guide | Issues: Troubleshooting Guide

Architecture & Requirements

System Diagram

graph TD
    A[User Interface] --> B[Main Orchestrator]
    B --> C[Base Environment<br/>Python 3.10+]
    C --> D[Environment Manager]
    D --> E[Main Environment<br/>Python 3.11]
    D --> F[Profiling Environment<br/>Python 3.10]
    D --> G[DataPrep Environment<br/>Python 3.10]

    E --> H[Statistical Analysis<br/>numpy 2.2.6,
scipy 1.13.1]
    F --> I[YData Profiling<br/>numpy <2.2]
    F --> J[SweetViz<br/>legacy pandas]
    G --> K[DataPrep Engine<br/>pandas 1.5.3]

    H --> L[Analysis Reports]
    I --> L
    J --> L
    K --> L
Loading

Architecture details: Architecture Guide

Summary: Multi-environment conda architecture resolves dependency conflicts between profiling engines. Requires conda, Python 3.10+, and 3GB free disk space.

Usage Examples

Interactive Mode

python bin/run_analysis.py

Interactive mode includes:

  1. File selection with validation
  2. Delimiter detection (with manual override)
  3. Engine selection based on availability
  4. Progress monitoring with updates
  5. Result summary and output locations

Direct Command Line

# Analyze specific file directly
python bin/run_analysis.py path/to/data.csv

# Debug mode with detailed output
python bin/run_analysis.py --debug

# Direct analysis with debug mode
python bin/run_analysis.py path/to/data.csv --debug

Command options documentation available in Quick Start Guide above.

Programmatic Usage

Command-line interface usage for reliable multi-environment support:

# Interactive mode (recommended) - guides through file selection
python bin/run_analysis.py

# Direct file analysis
python bin/run_analysis.py data.csv

# Debug mode for troubleshooting
python bin/run_analysis.py --debug

Python API (requires proper environment setup):

# Note: ImportError may occur in multi-environment setup
# CLI interface recommended for production workflows
try:
    from autocsv_profiler import profile_csv
    report_path = profile_csv("data.csv", "output_directory/")
except ImportError:
    print("Usage: python bin/run_analysis.py data.csv")

Documentation

Getting Started:

Technical Reference:

Development:

Examples:

Contributing

See Contributing Guide for workflow details.

License

MIT License - see LICENSE file. Third-party dependencies have various licenses - see NOTICE for details.

Important: See DISCLAIMER for liability limitations and dependency responsibility.

Links


Version 2.0.0 | Beta | Python 3.10-3.13 | Cross-Platform

Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler-suite