Skip to content

Commit 6d8ee89

Browse files
authored
Chapter 14: Add Example and Fix Index (#202)
* Fix dot points * improve structure by adding an example * Update DEPRECATED Line Magic * fix typos and add emphasis * minor improvements * Update on !
1 parent 09de43a commit 6d8ee89

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

lectures/pandas.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -354,21 +354,22 @@ df.loc[complexCondition]
354354

355355
The ability to make changes in dataframes is important to generate a clean dataset for future analysis.
356356

357-
1. We can use `df.where()` conveniently to "keep" the rows we have selected and replace the rest rows with any other values
357+
358+
**1.** We can use `df.where()` conveniently to "keep" the rows we have selected and replace the rest rows with any other values
358359

359360
```{code-cell} python3
360361
df.where(df.POP >= 20000, False)
361362
```
362363

363364

364-
2. We can simply use `.loc[]` to specify the column that we want to modify, and assign values
365+
**2.** We can simply use `.loc[]` to specify the column that we want to modify, and assign values
365366

366367
```{code-cell} python3
367368
df.loc[df.cg == max(df.cg), 'cg'] = np.nan
368369
df
369370
```
370371

371-
3. We can use the `.apply()` method to modify rows/columns as a whole
372+
**3.** We can use the `.apply()` method to modify *rows/columns as a whole*
372373

373374
```{code-cell} python3
374375
def update_row(row):
@@ -382,25 +383,29 @@ def update_row(row):
382383
df.apply(update_row, axis=1)
383384
```
384385

385-
4. We can use the `.applymap()` method to modify all individual entries in the dataframe altogether.
386+
**4.** We can use the `.applymap()` method to modify all *individual entries* in the dataframe altogether.
386387

387388
```{code-cell} python3
389+
# Round all decimal numbers to 2 decimal places
390+
df.applymap(lambda x : round(x,2) if type(x)!=str else x)
391+
```
392+
393+
**Application: Missing Value Imputation**
394+
395+
Replacing missing values is an important step in data munging.
396+
397+
Let's randomly insert some NaN values
388398

389-
# Let us randomly insert some NaN values
399+
```{code-cell} python3
390400
for idx in list(zip([0, 3, 5, 6], [3, 4, 6, 2])):
391401
df.iloc[idx] = np.nan
392402
393403
df
394404
```
395405

396-
The `zip` function here creates pairs of values at the corresponding position of the two lists (i.e. [0,3], [3,4] ...)
397-
398-
399-
**Application: Missing Value Imputation**
400-
401-
Replacing missing values is an important step in data munging.
406+
The `zip()` function here creates pairs of values from the two lists (i.e. [0,3], [3,4] ...)
402407

403-
We can use the functions above to replace missing values
408+
We can use the `.applymap()` method again to replace all missing values with 0
404409

405410
```{code-cell} python3
406411
# replace all NaN values by 0
@@ -413,9 +418,9 @@ def replace_nan(x):
413418
df.applymap(replace_nan)
414419
```
415420

416-
Pandas also provides us with convenient methods to replace missing values
421+
Pandas also provides us with convenient methods to replace missing values.
417422

418-
for example, single imputation using variable means can be easily done in pandas
423+
For example, single imputation using variable means can be easily done in pandas
419424

420425
```{code-cell} python3
421426
df = df.fillna(df.iloc[:,2:8].mean())
@@ -426,7 +431,7 @@ Missing value imputation is a big area in data science involving various machine
426431

427432
There are also more [advanced tools](https://scikit-learn.org/stable/modules/impute.html) in python to impute missing values.
428433

429-
### Standardization and Summarization
434+
### Standardization and Visualization
430435

431436
Let's imagine that we're only interested in the population (`POP`) and total GDP (`tcgdp`).
432437

0 commit comments

Comments
 (0)