This repository delves into cutting-edge advancements in battery intelligence and automation, showcasing a detailed process to highlight these innovations.
Through our work, we have achieved significant cost savings of $85K annually by automating the Python pipeline with AWS stack (Redshift, QuickSight) and data science libraries (Numpy, Pandas, Scikit-learn). This automation effectively replaced a full-time Data Analyst role and reduced analysis time by an impressive 95%, bringing it down from 40 hours to just 2 hours weekly.
Additionally, we have boosted battery life prediction accuracy by 10 percentage points, increasing it from 79% to 89%. This improvement was realized through ensemble modeling techniques (XGBoost, RandomForest, SVM) and innovative feature engineering, including the use of cell-pack resistance ratios and thermodynamic calculations.
Furthermore, our work has substantially reduced testing costs, slashing $500K annually by detecting failing batteries within their first 100 test cycles. This was achieved through statistical modeling of key electrical parameters, which reduced overall testing time by 80%.
I'm happy to discuss more about this project and the fascinating techniques employed. If you have any specific questions or need further information, feel free to reach out! 😊
Disclaimer: This is a novel work in battery intelligence, and due to a Non-Disclosure Agreement (NDA), I cannot share all minor details. This is a redacted version of how automation generally works, and how feature engineering is done in batteries using electrical parameter measurements to enhance machine learning predictive power.
- Python 3.x
- Pandas
- NumPy
- Seaborn
- Matplotlib
- Amazon Web Service (AWS) Command Line Service (CLI)
- Redshift
- tqdm
- boto3
- bz2
- json
- pingouin
You can install all required dependencies using:
pip install -r requirements.txtThe parse_bz2 function reads and processes compressed JSON data from Rockpi raw log messages stored in .bz2 files. It transforms the data into a structured DataFrame for analysis.
Dependencies:
- bz2
- json
- pandas
Installation:
pip install pandasThe VWCycloid class interacts with AWS Redshift to retrieve and process cycloid data using boto3 and pandas.
Dependencies:
- boto3
- pandas
- tqdm
Installation:
pip install boto3 pandas tqdmThe adhoc class streamlines and automates daily ad-hoc analysis requests. Below are its key methods:
Handles outliers in a pandas Series using rolling median replacement.
- Parameters:
series: Input Serieswindow: Window size (default: 5)sigma_multiplier: Outlier boundary definition (default: 0.1)
- Returns: Series with handled outliers
Retrieves discharge energy data with median and lower bound energy.
- Parameters:
df: Input cycloid DataFramewindow: Window size (default: 7)sigma_multiplier: Outlier boundary (default: 0.1)disclaimer: Print energy jump disclaimer (default: True)raw: Return raw discharge energy (default: False)
- Returns: DataFrame with discharge energy data
Performs t-test for A/B testing.
- Parameters:
sample1: First samplesample2: Second samplealternative: Test hypothesis ("two-sided", "greater", "less")interprete: Print interpretation (default: True)
- Returns: DataFrame with statistical parameters
Calculates statistics around RPT cycles.
- Parameters:
df: Input DataFramerpt_cycle: RPT cycle number (default: 201)neighbors_cycle: Cycles to consider (default: 3)window: Rolling window size (default: 5)sigma_multiplier: Outlier detection (default: 0.1)avoid_close_cycle: Cycles to avoid (default: 0)
- Returns: DataFrame with cycle statistics
Calculates energy gains around RPT cycles.
- Parameters:
df: Input DataFramerpt_cycle: Reference cycle (default: 201)neighbors_cycle: Cycles to consider (default: 5)avoid_close_cycle: Cycles to avoid (default: 0)window: Window size (default: 5)sigma_multiplier: Outlier detection (default: 0.1)show_plot: Display plot (default: True)
- Returns: DataFrame with energy statistics
Retrieves DCIR data.
- Parameters:
df: Input cycloid DataFrame
- Returns: DataFrame with DCIR data
Plots DCIR data over cycles.
- Parameters:
df: Input DataFramehue: Color grouping (default: 'instance_number')label_title: Legend title (default: 'Instance Number')ax: Matplotlib axes object
Plots mean voltage data.
- Parameters:
df: Input DataFramehue: Color grouping (default: 'instance_number')charging_state_name: Substate to plot (default: 'CHARGE')label_title: Legend titleraw_data: Plot raw data (default: True)window: Window size (default: 5)sigma_multiplier: Outlier detection (default: 0.2)ax: Matplotlib axes object
Creates subplot dashboard of up to 6 metrics.
- Parameters:
df: Input DataFramex: X-axis column (default: 'elapsed_minutes')charging_state_name: Charging state (default: 'CHARGE')hue: Color groupingsuptitle: Overall titlelabel_title: Legend titlelegend: Show legend (default: True)
read_data: Reads and processes data filesfilter_by_doe: Filters by DOE valuesfilter_by_instance: Filters by instance valuesfilter_by_cycle: Filters by cycle numbersfilter_by_doe_instance: Filters by DOE-instance combinations
get_cells_voltage_std: Calculates voltage standard deviationplot_cells_voltage_std: Plots voltage standard deviationplot_cells_voltage: Visualizes cell voltagesget_polarization_voltage: Calculates polarization voltageplot_polarization_voltage: Plots polarization voltage
plot_temperature: Visualizes temperature data across cycling states
Example usage of adhoc class methods:
# User input configuration
doe_number = 'MTSOWPhase2_Pack25CValidation'
instance_number = None
cycles = None
default_columns = True
override_query = None
# Fetch data
dfc = redshift.get_cycloid_data(
doe=doe_number,
instances=instance_number,
cycles=cycles,
default_columns=default_columns,
override_query=override_query
)
# Analysis
dcir = adhoc.get_dcir(dfc)
discharge_energy = adhoc.get_discharge_energy(dfc)Handles data dumping to Redshift database with these key components:
- DBCredentials: Dataclass for database credentials
- load_credentials: Loads credentials from JSON
- check_database_exists: Verifies database existence
- create_db_and_dump_data: Creates database and imports data
Features the CapacityTimeUncertainty class for analyzing cycloid data:
- Calculates capacity and time uncertainties
- Handles data loading and parameter management
- Processes capacity bounds and uncertainties
- Implements data cleaning and outlier removal
Contains detailed data preprocessing for CapacityTimeUncertainty calculations.
Explains error limit variations across different current regimes for charge time and capacity uncertainty estimation.