PANDAS [Basics]..
Pandas is a library used for working with data series | data matrix | datasets.
Pandas is useful for analyzing using stat, cleaning messy data sets, exploring, and manipulating data.
"Pandas" is "Python Analysis for Data" and was created by Wes McKinney in 2008.
Pandas can find correlation between two columns or more. Aggregate functions like sum, average, min, max etc. can be easily found.
It is used in Python using import pandas
It is used to create, manipulate and extract Data Frames.
import pandas as pd
biodata = {
'Name':['Raj', 'Ram','Sita', 'Laks'],
'Age' :[ 23, 23, 24, 21],
}
biof1 = pd.DataFrame(biodata)
print('bio 1=', biof1)
"""
bio 1= Name Age
0 Raj 23
1 Ram 23
2 Sita 24
3 Laks 21
"""
import pandas as pd
biodata = {
'Name':['Raj', 'Ram','Sita', 'Laks'],
'Age' :[ 23, 23, 24, 21],
}
biof1 = pd.DataFrame(biodata)
print('bio 1=', biof1)
"""
bio 1= Name Age
0 Raj 23
1 Ram 23
2 Sita 24
3 Laks 21
"""
biof1.to_csv('bio.csv') #to save in a bio.csv file
It can be saved using biof1.to_csv('bio.csv')
biodata1 = {
'Name':['Raj', 'Ram','Sita', 'Laks'],
# 'Age' :[ 23, 23, 24, 21],
'Qual' :[ 'UG','PG', 'Phd','Phd'],
"Status" :['s','s','m','b']}
Note Dictionary is converted to data frame by pd.Dataframe()
If we have another data frame biof2, we can add both as below:
biof2 = pd.DataFrame(biodata1)
print(biof2)
"""
Name Qual Status
0 Raj UG s
1 Ram PG s
2 Sita Phd m
3 Laks Phd b
"""
result = pd.concat([biof1, biof2])
print(result)
"""
Name Age Qual Status
0 Raj 23.0 NaN NaN
1 Ram 23.0 NaN NaN
2 Sita 24.0 NaN NaN
3 Laks 21.0 NaN NaN
0 Raj NaN UG s
1 Ram NaN PG s
2 Sita NaN Phd m
3 Laks NaN Phd b
"""
biof1 and biof2 are combined. But notice the print out with NaN values. Then how to add with out this problem.
result1 = pd.concat([biof1, biof2], axis = 1)
print(result1)
"""
Name Age Name Qual Status
0 Raj 23 Raj UG s
1 Ram 23 Ram PG s
2 Sita 24 Sita Phd m
3 Laks 21 Laks Phd b
"""
# if dfs are same then use append()
result1 = pd.concat([biof1, biof2], axis=1)
SO easy.. axis =1 means by columns.
Now let us see how Series can be defined. How Data Frames can be accessedby row using df.loc() and columns by df["colname"]:
#refer to the row index:
print('\n',df.loc[0]) # First Row only
"""First Row
Name Annamalai
Age 90
Name: 0, dtype: object
"""
print('\n',df.loc[0:1]) # First, Second Row only
"""
Name Age
0 Annamalai 90
1 AMET 29
"""
print('\n',df["Age"]) # Accessing only Age Column values
"""
0 90
1 29
2 25
3 20
Name: Age, dtype: object
"""
To read csv file and load in df DataFrame
df = pd.read_csv('bio.csv')
print(df)
"""
Unnamed: 0 Name Age
0 0 Raj 23
1 1 Ram 23
2 2 Sita 24
3 3 Laks 21
"""
To read JSON file and print
import pandas as pd
df = pd.read_json('data.json')
print(df.head(5).to_string())
"""
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
"""
print(df.tail(5).to_string())
"""
Duration Pulse Maxpulse Calories
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4
"""
Please note JSON = Python Dictionary . JSON objects have the same format as Python dictionaries.
Reading by load() and writing a JSON File using dump():
import json
with open('data.json') as f:
data = json.load(f)
with open('data_new.json', 'w') as f:
json.dump(data, f, indent=2)
print("JSON file created from data.json file")
No comments:
Post a Comment