Data Cleaning
Normally data is available with missing values, Null values, incorrect values,  and inappropriate values 
Major problem is missing values. It is very very common in real time.
How we can handle those values in python? Let us see.
# import the pandas library
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',
'h'],columns=['one', 'two', 'three'])
df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
Result:
        one       two     three
a  0.335319 -0.298568 -2.062935
b       NaN       NaN       NaN
c -1.739043 -0.912386 -0.675446
d       NaN       NaN       NaN
e -0.462957 -1.445715  1.483821
f  0.901405 -1.162616  0.173550
g       NaN       NaN       NaN
h -0.736636  1.685347  1.091092
In the above data frame ,  we could  see NaN, not a Number.
Let us take another case for missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',
                                                'h'], columns=['one', 'two', 'three'])
df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
print(df['one'].isnull())
Result: 
a    False
b     True
c    False
d     True
e    False
f    False
g     True
h    False
Ok. Now we know the problem. How to rectify that. How to clean that.
Replace NaN with 0. 
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 3), index=['a', 'c', 'e'],columns=['one',
'two', 'three'])
df = df.reindex(['a', 'b', 'c'])
print(df)
print("C..NaN replaced with '0':")
print( df.fillna(0))
Result:
        one       two     three
a  0.373935 -1.487100 -0.272034
b       NaN       NaN       NaN
c  0.686059  0.286542 -0.093683
C..NaN replaced with '0':
        one       two     three
a  0.373935 -1.487100 -0.272034
b  0.000000  0.000000  0.000000
c  0.686059  0.286542 -0.093683
Now we fill with 'pad' as shown below in the python script.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',
'h'],columns=['one', 'two', 'three'])
df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
print( df.fillna(method='pad'))
.Result:
"""
       one       two     three
a -1.764189  1.336129  0.512163
b -1.764189  1.336129  0.512163
c  1.495126 -0.165035 -1.719821
d  1.495126 -0.165035 -1.719821
e  1.273926  0.606101  1.416004
f  1.901047  1.813446 -0.263735
g  1.901047  1.813446 -0.263735
h -1.900605  0.052075 -2.418204
"""
Drop Missing Values by the following Example.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',
'h'],columns=['one', 'two', 'three'])
df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
print(df.dropna())
Result:
"""
        one       two     three
a  1.177113 -0.471903 -0.779807
c -0.917548 -0.478030  0.128027
e -1.579338  0.950953 -2.017034
f -0.050153 -0.419798 -0.007029
h  1.207687 -1.491949 -0.895676
"""
Comparing the above two outputs, we clearly notice that rows b, d, g are dropped.
Replace missing values with scalar  value are similar to  fillna() function as shown below :
import pandas as pd
import numpy as np
df = pd.DataFrame({'one':[10,20,30,40,50,2000],
'two':[1000,0,30,40,50,60]})
print(df.replace({1000:10,2000:60}))
Result:
"""
   one  two
0   10   10
1   20    0
2   30   30
3   40   40
4   50   50
5   60   60
"""
Hope fully from the above examples we understand the functions.
Happy Cleaning data with Python and  Enjoy learning with Python!!!