PANDAS [Basics]..

Pandas is a library used for working with data series | data matrix | datasets.

Pandas is useful for analyzing using stat, cleaning messy data sets, exploring, and manipulating data.

"Pandas" is "Python Analysis for Data" and was created by Wes McKinney in 2008.

Pandas can find correlation between two columns or more. Aggregate functions like sum, average, min, max etc. can be easily found.

It is used in Python using import pandas

It is used to create, manipulate and extract Data Frames.

import pandas as pd
biodata = {
'Name':['Raj', 'Ram','Sita', 'Laks'],
'Age' :[ 23, 23, 24, 21],
}

biof1 = pd.DataFrame(biodata)
print('bio 1=', biof1)

"""
bio 1= Name Age
0 Raj 23
1 Ram 23
2 Sita 24
3 Laks 21
"""

import pandas as pd
biodata = {
'Name':['Raj', 'Ram','Sita', 'Laks'],
'Age' :[ 23, 23, 24, 21],
}
biof1 = pd.DataFrame(biodata)
print('bio 1=', biof1)

"""
bio 1= Name Age
0 Raj 23
1 Ram 23
2 Sita 24
3 Laks 21
"""

biof1.to_csv('bio.csv') #to save in a bio.csv file

It can be saved using biof1.to_csv('bio.csv')

biodata1 = {
'Name':['Raj', 'Ram','Sita', 'Laks'],
# 'Age' :[ 23, 23, 24, 21],
'Qual' :[ 'UG','PG', 'Phd','Phd'],
"Status" :['s','s','m','b']}

Note Dictionary is converted to data frame by pd.Dataframe()

If we have another data frame biof2, we can add both as below:


biof2 = pd.DataFrame(biodata1)
print(biof2)
"""
   Name Qual Status
0   Raj   UG      s
1   Ram   PG      s
2  Sita  Phd      m
3  Laks  Phd      b
"""


result = pd.concat([biof1, biof2])
print(result)
"""
Name   Age Qual Status
0   Raj  23.0  NaN    NaN
1   Ram  23.0  NaN    NaN
2  Sita  24.0  NaN    NaN
3  Laks  21.0  NaN    NaN
0   Raj   NaN   UG      s
1   Ram   NaN   PG      s
2  Sita   NaN  Phd      m
3  Laks   NaN  Phd      b
"""

biof1 and biof2 are combined. But notice the print out with NaN values. Then how to add with out this problem.

result1 = pd.concat([biof1, biof2], axis = 1)
print(result1)
"""
   Name  Age  Name Qual Status
0   Raj   23   Raj   UG      s
1   Ram   23   Ram   PG      s
2  Sita   24  Sita  Phd      m
3  Laks   21  Laks  Phd      b
"""
# if dfs are same then use append()
result1 = pd.concat([biof1, biof2], axis=1)

SO easy.. axis =1 means by columns.

Now let us see how Series can be defined. How Data Frames can be accessedby row using df.loc() and columns by df["colname"]:

#refer to the row index:
print('\n',df.loc[0]) # First Row only
"""First Row 
Name    Annamalai
Age            90
Name: 0, dtype: object
"""
print('\n',df.loc[0:1]) # First, Second Row only
"""

         Name Age
0  Annamalai  90
1       AMET  29
"""


print('\n',df["Age"])  # Accessing only Age Column values
"""
0    90
1    29
2    25
3    20
Name: Age, dtype: object
"""

To read csv file and load in df DataFrame

df = pd.read_csv('bio.csv')
print(df)
"""
Unnamed: 0  Name  Age
0           0   Raj   23
1           1   Ram   23
2           2  Sita   24
3           3  Laks   21
"""

To read JSON file and print

import pandas as pd
df = pd.read_json('data.json')
print(df.head(5).to_string())
"""
   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0
2        60    103       135     340.0
3        45    109       175     282.4
4        45    117       148     406.0
"""
print(df.tail(5).to_string())
"""
     Duration  Pulse  Maxpulse  Calories
164        60    105       140     290.8
165        60    110       145     300.4
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4
"""

Please note JSON = Python Dictionary . JSON objects have the same format as Python dictionaries.

Reading by load() and writing a JSON File using dump():


import json
with open('data.json') as f:
    data = json.load(f)


with open('data_new.json', 'w') as f:
    json.dump(data, f, indent=2)
    print("JSON file created from data.json file")

AMET-SOLID

Tuesday, 22 March 2022

Pandas#01

PANDAS [Basics]..

No comments:

Post a Comment

Angular 18 Navigation for iMMSS

Happy open and Distance Learning!

Blog Archive