This project analyzes LEGO sets and their associated themes to determine the most popular themes based on the number of sets available. The analysis is performed using SQL queries to extract relevant data from a database, and the results are visualized using matplotlib.
- Technologies Used
- Database Schema
- Installation
- Usage
- SQL Queries
- Visualization
- Contributing
- License
- Python
- SQLite (or any other SQL database)
- Pandas (for data analysis and manipulation)
- Matplotlib (for data visualization)
create table if not exists themes (
id smallint primary key,
name varchar(64),
parent_id smallint
);
create table if not exists colors (
id smallint primary key,
name varchar(64),
rgb varchar(6),
is_trans varchar(1)
);
create table if not exists part_categories (
id smallint primary key,
name varchar(64)
);
create table if not exists parts (
part_num varchar(16) primary key,
name varchar(255),
part_cat_id smallint,
part_material_id smallint,
foreign key(part_cat_id) references part_categories(id)
);
create table if not exists part_relationships (
rel_type varchar(1),
child_part_num varchar(20),
parent_part_num varchar(20)
);
create table if not exists elements (
element_id varchar(16) primary key,
part_num varchar(16),
color_id smallint,
foreign key(part_num) references parts(part_num),
foreign key(color_id) references colors(id)
);
create table if not exists minifigs (
fig_num varchar(20) primary key,
name varchar(256),
num_parts smallint
);
create table if not exists inventories (
id int primary key,
version smallint,
set_num varchar(16),
foreign key(set_num) references sets(set_num)
);
create table if not exists sets (
set_num varchar(16) primary key,
name varchar(128),
year smallint,
theme_id smallint,
num_parts int,
img_url varchar(128),
foreign key(theme_id) references themes(id)
);
create table if not exists inventory_parts (
inventory_id int,
part_num varchar(16),
color_id smallint,
quantity smallint,
is_spare varchar(1),
foreign key(inventory_id) references inventories(id),
foreign key(color_id) references colors(id)
);
create table if not exists inventory_sets (
inventory_id int,
set_num varchar(16),
quantity smallint,
foreign key(inventory_id) references inventories(id),
foreign key(set_num) references sets(set_num)
);
create table if not exists inventory_minifigs (
inventory_id int,
fig_num varchar(20),
quantity smallint,
foreign key(inventory_id) references inventories(id),
foreign key(fig_num) references minifigs(set_num)
);
- Clone the repository:
git clone https://github.com/mgadhvi/lego-sets-analysis.git cd lego-sets-analysis
- Install the required Python packages from the
requirements.txt
file:pip install -r requirements.txt
- Set up your SQLite database with the lego_sets and lego_themes tables from here
- Update the database connection string in the Python script to point to your SQLite database file.
- Run the Python script to execute the SQL queries and generate the pie chart:
python analyze_lego_themes.py
The main SQL query used to retrieve the top 10 most popular LEGO themes is as follows:
SELECT theme_name, COUNT(set_id) AS set_count
FROM lego_themes
LEFT JOIN lego_sets ON lego_themes.theme_id = lego_sets.theme_id
GROUP BY theme_name
ORDER BY set_count DESC
LIMIT 10;
This query counts the number of sets for each theme and orders the results to find the top 10 themes.
The results are visualized using a pie chart created with Matplotlib. The pie chart displays the distribution of the top 10 LEGO themes based on the number of sets.
Contributions are welcome! If you have suggestions for improvements or new features, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.