AGGREGATE
Data Frame Aggregation
Python has several methods are available to perform aggregations on data. It is done using the pandas and numpy libraries. Data Frame method support data aggregation. Let us see how we can apply:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 2, 3, 4, 5],
[4, 5, 6, 7, 8],
[7, 8, 9, 10, 11],
[np.nan, np.nan, np.nan,np.nan,np.nan]],
columns=['A', 'B', 'C', 'D', 'E'])
# over rows
dfagg = df.agg(['sum', 'min'])
print(dfagg)
"""
A B C D E sum 12.0 15.0 18.0 21.0 24.0 min 1.0 2.0 3.0 4.0 5.0
"""
Aggregating different aggregates over columns
# Different aggregate functions in columns
df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
print(df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}))
"""
A B
sum 12.0 NaN
min 1.0 2.0
max NaN 8.0
"""
This describe() method display all statistical properties like min, max, mean, std, 25%, 50%, 75% quartiles.
print(df.describe())
"""
A B C D E
count 3.0 3.0 3.0 3.0 3.0
mean 4.0 5.0 6.0 7.0 8.0
std 3.0 3.0 3.0 3.0 3.0
min 1.0 2.0 3.0 4.0 5.0
25% 2.5 3.5 4.5 5.5 6.5
50% 4.0 5.0 6.0 7.0 8.0
75% 5.5 6.5 7.5 8.5 9.5
max 7.0 8.0 9.0 10.0 11.0
"""
Transformation and manipulation on elements are very easy. Let us see some code snippets
Let us assume we want add +1 to all the above elements.
print(df.transform(lambda x: x + 1))
"""
A B C D E
0 2.0 3.0 4.0 5.0 6.0
1 5.0 6.0 7.0 8.0 9.0
2 8.0 9.0 10.0 11.0 12.0
3 NaN NaN NaN NaN NaN
"""
We an use groupby() too.
df = pd.DataFrame({
"Date": [
"2019-05-08", "2019-05-07", "2019-05-06", "2019-05-05",
"2019-05-08", "2019-05-07", "2019-05-06", "2019-05-05"],
"Data": [5, 8, 6, 1, 50, 100, 60, 120],
})
print(df)
"""
Date Data
0 2019-05-08 5
1 2019-05-07 8
2 2019-05-06 6
3 2019-05-05 1
4 2019-05-08 50
5 2019-05-07 100
6 2019-05-06 60
7 2019-05-05 120
"""
print(df.groupby('Date')['Data'].transform('sum'))
"""
0 55
1 108
2 66
3 121
4 55
5 108
6 66
7 121
Name: Data, dtype: int64
"""
We can groupby different levels of a hierarchical index using the level parameter: Please note the usage of pd.MultiIndex.from.arrays() Method. So many ways, you can group by multiindex.
# Assume the array is like these : for concept..
arrays = [['male', 'male', 'female', 'female'],
['young', 'old', 'young', 'old']]
index = pd.MultiIndex.from_arrays(arrays, names=('Gender', 'Type'))
df = pd.DataFrame({'Max Enthu': [390., 350., 30., 20.]},
index=index)
print(df)
"""
Name: Data, dtype: int64
Max Enthu
Gender Type
male young 390.0
old 350.0
female young 30.0
old 20.0
"""
# using level 0
print(df.groupby(level=0).mean())
"""
Max Enthu
Gender
female 25.0
male 370.0
"""
# Using level
print(df.groupby(level="Gender").mean())
"""
Max Enthu
Type
old 185.0
young 210.0
"""
print(df.groupby(level="Gender").mean())
"""
Max Enthu
Gender
female 25.0
male 370.0
"""
Happy Learning at AMET!!!
No comments:
Post a Comment