@@ -354,21 +354,22 @@ df.loc[complexCondition]
354
354
355
355
The ability to make changes in dataframes is important to generate a clean dataset for future analysis.
356
356
357
- 1 . We can use ` df.where() ` conveniently to "keep" the rows we have selected and replace the rest rows with any other values
357
+
358
+ ** 1.** We can use ` df.where() ` conveniently to "keep" the rows we have selected and replace the rest rows with any other values
358
359
359
360
``` {code-cell} python3
360
361
df.where(df.POP >= 20000, False)
361
362
```
362
363
363
364
364
- 2 . We can simply use ` .loc[] ` to specify the column that we want to modify, and assign values
365
+ ** 2. ** We can simply use ` .loc[] ` to specify the column that we want to modify, and assign values
365
366
366
367
``` {code-cell} python3
367
368
df.loc[df.cg == max(df.cg), 'cg'] = np.nan
368
369
df
369
370
```
370
371
371
- 3 . We can use the ` .apply() ` method to modify rows/columns as a whole
372
+ ** 3. ** We can use the ` .apply() ` method to modify * rows/columns as a whole*
372
373
373
374
``` {code-cell} python3
374
375
def update_row(row):
@@ -382,25 +383,29 @@ def update_row(row):
382
383
df.apply(update_row, axis=1)
383
384
```
384
385
385
- 4 . We can use the ` .applymap() ` method to modify all individual entries in the dataframe altogether.
386
+ ** 4. ** We can use the ` .applymap() ` method to modify all * individual entries* in the dataframe altogether.
386
387
387
388
``` {code-cell} python3
389
+ # Round all decimal numbers to 2 decimal places
390
+ df.applymap(lambda x : round(x,2) if type(x)!=str else x)
391
+ ```
392
+
393
+ ** Application: Missing Value Imputation**
394
+
395
+ Replacing missing values is an important step in data munging.
396
+
397
+ Let's randomly insert some NaN values
388
398
389
- # Let us randomly insert some NaN values
399
+ ``` {code-cell} python3
390
400
for idx in list(zip([0, 3, 5, 6], [3, 4, 6, 2])):
391
401
df.iloc[idx] = np.nan
392
402
393
403
df
394
404
```
395
405
396
- The ` zip ` function here creates pairs of values at the corresponding position of the two lists (i.e. [ 0,3] , [ 3,4] ...)
397
-
398
-
399
- ** Application: Missing Value Imputation**
400
-
401
- Replacing missing values is an important step in data munging.
406
+ The ` zip() ` function here creates pairs of values from the two lists (i.e. [ 0,3] , [ 3,4] ...)
402
407
403
- We can use the functions above to replace missing values
408
+ We can use the ` .applymap() ` method again to replace all missing values with 0
404
409
405
410
``` {code-cell} python3
406
411
# replace all NaN values by 0
@@ -413,9 +418,9 @@ def replace_nan(x):
413
418
df.applymap(replace_nan)
414
419
```
415
420
416
- Pandas also provides us with convenient methods to replace missing values
421
+ Pandas also provides us with convenient methods to replace missing values.
417
422
418
- for example, single imputation using variable means can be easily done in pandas
423
+ For example, single imputation using variable means can be easily done in pandas
419
424
420
425
``` {code-cell} python3
421
426
df = df.fillna(df.iloc[:,2:8].mean())
@@ -426,7 +431,7 @@ Missing value imputation is a big area in data science involving various machine
426
431
427
432
There are also more [ advanced tools] ( https://scikit-learn.org/stable/modules/impute.html ) in python to impute missing values.
428
433
429
- ### Standardization and Summarization
434
+ ### Standardization and Visualization
430
435
431
436
Let's imagine that we're only interested in the population (` POP ` ) and total GDP (` tcgdp ` ).
432
437
0 commit comments