Skip to content

[Edit] Python: Matplotlib: .boxplot() #7300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 129 additions & 42 deletions content/matplotlib/concepts/pyplot/terms/boxplot/boxplot.md
Original file line number Diff line number Diff line change
@@ -1,86 +1,173 @@
---
Title: '.boxplot()'
Description: 'Returns a box and whisker plot.'
Description: 'Creates box-and-whisker plots to display statistical summary of datasets.'
Subjects:
- 'Data Science'
- 'Data Visualization'
Tags:
- 'Graphs'
- 'Libraries'
- 'Charts'
- 'Matplotlib'
- 'Statistics'
CatalogContent:
- 'learn-python-3'
- 'paths/computer-science'
- 'paths/data-science'
---

The **`.boxplot()`** is a method in the Matplotlib library that returns a box and whisker plot based on one or more arrays of data as input.
The **`matplotlib.pyplot.boxplot()`** method is a powerful data visualization function in matplotlib's pyplot module that creates box-and-whisker plots to display the statistical summary of a dataset. This method displays the distribution of data through quartiles, showing the median, first quartile (Q1), third quartile (Q3), and potential outliers in a compact visual format.

Box plots are widely used in statistical analysis and data science for comparing distributions across different groups, identifying outliers, and understanding the spread and central tendency of data. They are particularly valuable when analyzing multiple datasets simultaneously, as they provide a clear visual comparison of statistical properties across different categories or groups.

## Syntax

```pseudo
matplotlib.pyplot.boxplot(x, notch, sym, vert, whis, bootstrap, usermedians, conf_intervals, positions, widths, patch_artist, labels, manage_ticks, autorange, meanline, zorder )
matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, ...)
```

The `x` parameter is required, and represents an array or a sequence of vectors. Other parameters are optional and used to modify the features of the boxplot.
> **Note:** TThe ellipses (`...`) indicate that there are many additional optional parameters available—such as `widths`, `patch_artist`, `showmeans`, `boxprops`, and others. These parameters provide detailed control over the style, layout, and display of the boxplot.

**Parameters:**

`.boxplot()` takes the following arguments:
- `x`: The input data (array-like or sequence of arrays). Can be a 1D array for single boxplot or sequence of arrays for multiple boxplots.
- `notch`: Boolean, optional. If True, creates a notched boxplot to indicate confidence intervals around the median.
- `sym`: String, optional. Default symbol for outlier points. An empty string hides the outliers.
- `vert`: Boolean, optional. If True (default), plots boxes vertically. If False, plots horizontally.

- `x` : Takes in the data to be plotted as a list, array, or a sequence of arrays. Each array represents a different dataset to be plotted in the boxplot.
- `notch`: If `True`, a notch is drawn around the median to represent the median uncertainty.
- `sym`: A parameter is used to modify the designation of outliers. By default, outliers are represented as dots, if an empty string is passed any outliers in the data will not be visible in the plot.
- `vert`: If `True`, the boxplot is drawn vertically (default). If `False`, it is drawn horizontally.
- `whis`: This parameter is used to specify the whisker length as a multiple of the IQR. The default is 1.5, which is the standard length.
- `bootstrap`: Specifies whether to bootstrap the confidence intervals around the median for notched boxplots.
- `usermedians`: This parameter is used to pass in a sequence of medians to be used for each dataset.
- `conf_intervals`: If `True`, the confidence intervals around the median are drawn as notches.
- `positions`: This parameter is used to specify the positions of the boxes in the plot.
- `widths`: This parameter is used to specify the width of the boxes.
- `patch_artist`: If `True`, the boxes will be filled with color.
- `labels`: This parameter is used to pass in a list of labels to be used for each dataset.
- `meanline`: If `True`, a line is drawn at the mean value of each dataset.
- `zorder`: This parameter is used to specify the z-order of the plot. By default, the boxplot is drawn on top of other plot elements.
**Return value:**

## Examples
The method returns a dictionary containing the matplotlib artists used in the boxplot. The dictionary includes keys for 'boxes', 'medians', 'whiskers', 'caps', 'fliers', and 'means'.

Below are the examples demonstrating the use of `.boxplot()`.
## Example 1: Creating a Basic Boxplot using `matplotlib.pyplot.boxplot()`

This example demonstrates how to create a simple boxplot using randomly generated data:

```py
import matplotlib.pyplot as plt
import numpy as np

# Generate some random data
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
# Set random seed for reproducibility
np.random.seed(42)

# Create a box and whisker plot
plt.boxplot(data)
# Generate sample data
data = np.random.normal(100, 15, 200)

# Show the plot
# Create the boxplot
plt.figure(figsize=(8, 6))
plt.boxplot(data)
plt.title('Basic Boxplot Example')
plt.ylabel('Values')
plt.show()
```

Output:
The output of this code is:

![A simple boxplot showing the distribution of normally distributed data with median line, quartile box, whiskers, and outlier points](https://raw.githubusercontent.com/Codecademy/docs/main/media/boxplot1.png)

The code generates a dataset with 200 values following a normal distribution with mean 100 and standard deviation 15. The resulting boxplot displays the median as a horizontal line, the box representing the interquartile range (IQR), whiskers extending to the most extreme non-outlier data points, and any outliers as individual points.

![Output of matplotlib.pyplot.boxplot() method example 1](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplotlib-boxplot-example-1.png)
## Example 2: Multiple Dataset Comparison using the `matplotlib.pyplot.boxplot()` method

This example shows how to create boxplots for multiple datasets to compare their distributions:

```py
import matplotlib.pyplot as plt
import numpy as np

# Generate some random data
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
# Set random seed for reproducibility
np.random.seed(42)

# Generate multiple datasets with different characteristics
dataset1 = np.random.normal(80, 10, 100) # Lower mean, smaller spread
dataset2 = np.random.normal(100, 20, 100) # Higher mean, larger spread
dataset3 = np.random.exponential(25, 100) # Exponential distribution
dataset4 = np.random.uniform(50, 150, 100) # Uniform distribution

# Combine datasets
data = [dataset1, dataset2, dataset3, dataset4]

# Create multiple boxplots
plt.figure(figsize=(10, 6))
box_plot = plt.boxplot(data, labels=['Normal (80,10)', 'Normal (100,20)',
'Exponential (25)', 'Uniform (50,150)'])
plt.title('Comparison of Different Distributions')
plt.ylabel('Values')
plt.xlabel('Distribution Type')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
```

The output of this code is:

![Four side-by-side boxplots comparing normal, exponential, and uniform distributions with different means and spreads](https://raw.githubusercontent.com/Codecademy/docs/main/media/boxplot2.png)

This example creates four different datasets with distinct statistical properties and displays them side by side. The boxplots make it easy to compare the medians, spreads, and presence of outliers across the different distributions.

# Create a box and whisker plot with some custom parameters
plt.boxplot(data, notch=True, sym='g+', vert=False, whis=0.75, bootstrap=10000, usermedians=[np.mean(d) for d in data], conf_intervals=None, patch_artist=True)
## Example 3: Customized Sales Performance Analysis on Boxplot

# Add labels and title
plt.xlabel('Value')
plt.ylabel('Group')
plt.title('Customized box and whisker plot')
This example demonstrates a real-world scenario analyzing quarterly sales performance across different product categories:

# Show the plot
```py
import matplotlib.pyplot as plt
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Simulate quarterly sales data (in thousands)
electronics = np.random.normal(150, 25, 50) # Electronics sales
clothing = np.random.normal(120, 30, 50) # Clothing sales
home_goods = np.random.normal(100, 20, 50) # Home goods sales
sports = np.random.normal(80, 15, 50) # Sports equipment sales

# Add some outliers to make it more realistic
electronics = np.append(electronics, [220, 250]) # High-performance months
clothing = np.append(clothing, [200, 40]) # Seasonal variations
home_goods = np.append(home_goods, [180]) # Holiday boost
sports = np.append(sports, [150, 30]) # Seasonal impact

# Combine all sales data
sales_data = [electronics, clothing, home_goods, sports]
categories = ['Electronics', 'Clothing', 'Home Goods', 'Sports']

# Create customized boxplot
plt.figure(figsize=(12, 8))
box_plot = plt.boxplot(sales_data,
labels=categories,
patch_artist=True, # Fill with colors
notch=True, # Show confidence intervals
showmeans=True) # Show mean values

# Customize colors for each category
colors = ['lightblue', 'lightgreen', 'lightcoral', 'lightyellow']
for patch, color in zip(box_plot['boxes'], colors):
patch.set_facecolor(color)

# Customize the plot appearance
plt.title('Quarterly Sales Performance Analysis by Product Category',
fontsize=16, fontweight='bold')
plt.ylabel('Sales (in thousands USD)', fontsize=12)
plt.xlabel('Product Categories', fontsize=12)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
```

Output:
The output of this code is:

![Colorful customized boxplots showing quarterly sales performance across four product categories with notches and mean indicators](https://raw.githubusercontent.com/Codecademy/docs/main/media/boxplot3.png)

This example simulates a business scenario where sales data is analyzed across different product categories. The customized boxplot uses colors to distinguish categories, shows confidence intervals through notches, and displays mean values alongside medians. This visualization helps identify which product categories perform best and have the most consistent sales patterns.

## Frequently Asked Questions

### 1. What is a box plot in Matplotlib?

A box plot displays data distribution through five statistics: minimum, Q1, median, Q3, and maximum, with outliers shown as individual points.

### 2. What is the difference between Seaborn Boxplot and Matplotlib Boxplot?

Seaborn's boxplot offers better default styling and easier categorical data handling, while Matplotlib's boxplot provides more low-level control and customization options.

### 3. How to plot a boxplot in a Python DataFrame?

![Output of matplotlib.pyplot.boxplot() method example 2](https://raw.githubusercontent.com/Codecademy/docs/main/media/matplotlib-boxplot-example-2.png)
Pass DataFrame columns to `plt.boxplot([df['col1'], df['col2']])` or use pandas' built-in `df.boxplot()` method.
Binary file added media/boxplot1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/boxplot2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/boxplot3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.