Sunday, 17 April 2022

P#26 natural Language processing

NLTK

Very easy to tokenize the statement with NLTK of python

import nltk

word_data ='Sitting pretty,impatient, work from home,

we can work from home'

tokens = nltk.word_tokenize(word_data)

print(tokens)

#['Sitting', 'pretty', ',', 'impatient', ',', 'work',

'from', 'home', ',', 'we', 'can', 'work', 'from', 'home']

Another Exercise:

sentence_data = "Johhny Jonny Yes papa.

Eating Sugar No. NO. papa "

nltk_tokens = nltk.sent_tokenize(sentence_data)

print(nltk_tokens) # ['Johhny Jonny Yes papa.',

'Eating Sugar No.', 'NO.', 'papa']

Happy Learning @AMET ODL!!!

Friday, 15 April 2022

Python - Date and Time


import datetime

print('The Date Today is  :', datetime.datetime.today())


date_today = datetime.date.today()

print(date_today)

print('This Year   :', date_today.year)

print('This Month    :', date_today.month)

print('Month Name:',date_today.strftime('%B'))

print('This Week Day    :', date_today.day)

print('Week Day Name:',date_today.strftime('%A'))

"""
The Date Today is  : 2022-04-15 22:56:14.461386
2022-04-15
This Year   : 2022
This Month    : 4
Month Name: April
This Week Day    : 15
Week Day Name: Friday
"""

DATE TIME ARITHMETIC

import datetime

# First Date
day1 = datetime.date(2020, 2, 12)
print('day1:', day1.ctime())

#  Second Date
day2 = datetime.date(2019, 8, 18)
print ('day2:', day2.ctime())

# Difference between the dates
print('Number of Days:', day1 - day2)

date_today = datetime.date.today()

# Create a delta of Four Days
no_of_days = datetime.timedelta(days=4)

# Use Delta for Past Date -
before_four_days = date_today - no_of_days
print('Before Four Days:', before_four_days)

# Use Delta for future Date +
after_four_days = date_today + no_of_days
print('After Four Days:', after_four_days)

"""
day1: Wed Feb 12 00:00:00 2020
day2: Sun Aug 18 00:00:00 2019
Number of Days: 178 days, 0:00:00
Before Four Days: 2022-04-11
After Four Days: 2022-04-19
"""

Happy Learning at AMET ODL!👦👧👥👭👬

P#25 Pandas Data Frame set_index() and reset_index()

Data Frame set_index() and reset_index()

These methods are very useful. We can dynamically change re index and reset index with out any problem.

import pandas as  pd


drinks= pd.read_csv('http://bit.ly/drinksbycountry')

print(drinks.shape) #(193, 6)

print(drinks.index) #RangeIndex(start=0, stop=193, step=1)

print(drinks.columns)
"""Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')
"""

drinks.set_index('continent',inplace=True)


print(drinks.index)

"""
Index(['Asia', 'Europe', 'Africa', 'Europe', 'Africa', 'North America',
       'South America', 'Europe', 'Oceania', 'Europe',
       ...
       'Africa', 'North America', 'South America', 'Asia', 'Oceania',
       'South America', 'Asia', 'Asia', 'Africa', 'Africa'],
      dtype='object', name='continent', length=193)
"""

print(drinks.head())

"""
              country  ...  total_litres_of_pure_alcohol
continent               ...                              
Asia       Afghanistan  ...                           0.0
Europe         Albania  ...                           4.9
Africa         Algeria  ...                           0.7
Europe         Andorra  ...                          12.4
Africa          Angola  ...                           5.9

[5 rows x 5 columns]
"""

print(drinks.shape) #(193, 5)

print(drinks.loc['Europe', 'spirit_servings'])
"""
continent
Europe    132
Europe    138
....
Europe    237
Europe    126
Name: spirit_servings, dtype: int64
"""

drinks.index.name = None


print(drinks.index)

"""
Index(['Asia', 'Europe', 'Africa', 'Europe', 'Africa', 'North America',
       'South America', 'Europe', 'Oceania', 'Europe',
       ...
       'Africa', 'North America', 'South America', 'Asia', 'Oceania',
       'South America', 'Asia', 'Asia', 'Africa', 'Africa'],
      dtype='object', length=193)

"""

drinks.index.name='continent'

drinks.reset_index(inplace=True)  # Resetting Index inplace

print(drinks.head(2))

"""
 continent      country  ...  wine_servings  total_litres_of_pure_alcohol
0      Asia  Afghanistan  ...              0                           0.0
1    Europe      Albania  ...             54                           4.9

[2 rows x 6 columns]
"""

The above code snippets and output shows , how it is very easy to operate in Data Frames

Happy Learning @ AMET ODL!!!!

Differences between loc() and iloc()

With these methods, we can simulate SQL for filtering rows or columns. Let us dive in to this exercise.

import pandas as pd

df = pd.read_csv('uforeports.csv')

# df =pd.read_csv('http://bit.ly/uforeports') Alternateively directly from web

print(df.shape)  # to print rows xcolumns  (18241,  5)

print(df.describe)
"""
<bound method NDFrame.describe of                        City Colors Reported  ... State              Time
0                    Ithaca             NaN  ...    NY    6/1/1930 22:00
1               Willingboro             NaN  ...    NJ   6/30/1930 20:00
2                   Holyoke             NaN  ...    CO   2/15/1931 14:00
3                   Abilene             NaN  ...    KS    6/1/1931 13:00
4      New York Worlds Fair             NaN  ...    NY   4/18/1933 19:00
...                     ...             ...  ...   ...               ...
18236            Grant Park             NaN  ...    IL  12/31/2000 23:00
18237           Spirit Lake             NaN  ...    IA  12/31/2000 23:00
18238           Eagle River             NaN  ...    WI  12/31/2000 23:45
18239           Eagle River             RED  ...    WI  12/31/2000 23:45
18240                  Ybor             NaN  ...    FL  12/31/2000 23:59

[18241 rows x 5 columns]>
"""



# print(df.head(5))  # to Print top 5 rows
"""
                   City Colors Reported Shape Reported State             Time
0                Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00
1           Willingboro             NaN          OTHER    NJ  6/30/1930 20:00
2               Holyoke             NaN           OVAL    CO  2/15/1931 14:00
3               Abilene             NaN           DISK    KS   6/1/1931 13:00
4  New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00
"""

# print(df.tail(5))  # to Print bottom 5 rows

"""
              City Colors Reported Shape Reported State              Time
18236   Grant Park             NaN       TRIANGLE    IL  12/31/2000 23:00
18237  Spirit Lake             NaN           DISK    IA  12/31/2000 23:00
18238  Eagle River             NaN            NaN    WI  12/31/2000 23:45
18239  Eagle River             RED          LIGHT    WI  12/31/2000 23:45
18240         Ybor             NaN           OVAL    FL  12/31/2000 23:59
"""


# To filter  rows()

# print(df.loc[6])

"""
City                  Crater Lake
Colors Reported               NaN
Shape Reported             CIRCLE
State                          CA
Time               6/15/1935 0:00
Name: 6, dtype: object
"""

print(df.loc[0:2])

"""
          City Colors Reported Shape Reported State             Time
0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00
1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00
2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00

"""

print(df.loc[0:2, :])  # same output as above


print(df.loc[0:2 , 'City':'State']) # To filter rows and display columns from city to state Time column dropped
"""
          City Colors Reported Shape Reported State
0       Ithaca             NaN       TRIANGLE    NY
1  Willingboro             NaN          OTHER    NJ
2      Holyoke             NaN           OVAL    CO
"""

print(df.loc[: , 'City':'State'])  # To print all rows
"""
                       City Colors Reported Shape Reported State
0                    Ithaca             NaN       TRIANGLE    NY
1               Willingboro             NaN          OTHER    NJ
2                   Holyoke             NaN           OVAL    CO
3                   Abilene             NaN           DISK    KS
4      New York Worlds Fair             NaN          LIGHT    NY
...                     ...             ...            ...   ...
18236            Grant Park             NaN       TRIANGLE    IL
18237           Spirit Lake             NaN           DISK    IA
18238           Eagle River             NaN            NaN    WI
18239           Eagle River             RED          LIGHT    WI
18240                  Ybor             NaN           OVAL    FL

[18241 rows x 4 columns]

"""
print(df.head(3).drop('Time', axis = 1))

"""
Name: City, Length: 18241, dtype: object
          City Colors Reported Shape Reported State
0       Ithaca             NaN       TRIANGLE    NY
1  Willingboro             NaN          OTHER    NJ
2      Holyoke             NaN           OVAL    CO
"""
print(df.loc[: , 'City':'State'])  # To print all rows drop timecolumn

print(df.loc[: , 'City'])  # To print all rows and city column only

"""
0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

"""

print(df[df.City == 'Abilene'])  # select Filtering by value

"""
          City Colors Reported Shape Reported State              Time
3      Abilene             NaN           DISK    KS    6/1/1931 13:00
6654   Abilene             NaN       TRIANGLE    TX     9/1/1991 1:00
8357   Abilene             NaN         SPHERE    TX    7/15/1995 0:00
8783   Abilene             NaN            NaN    KS  10/14/1995 23:20
10883  Abilene             NaN            NaN    TX  10/19/1997 20:45

"""

print(df[df.City == 'Abilene'].State) # Filter by value, select column

"""
3        KS
6654     TX
8357     TX
8783     KS
10883    TX
"""

print(df.columns)
# Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

print(df.iloc[0:3 , 0:3]) #Filter 3 rows and 3 columns

"""         City Colors Reported Shape Reported
0       Ithaca             NaN       TRIANGLE
1  Willingboro             NaN          OTHER
2      Holyoke             NaN           OVAL

"""

print(df.iloc[0:3 , :]) #Filter 3 rows and all columns
"""
          City Colors Reported Shape Reported State             Time
0       Ithaca             NaN       TRIANGLE    NY   6/1/1930 22:00
1  Willingboro             NaN          OTHER    NJ  6/30/1930 20:00
2      Holyoke             NaN           OVAL    CO  2/15/1931 14:00

"""

print(df.loc[:, ['City','Time']])  # to filter only two columns for all rows

"""
                       City              Time
0                    Ithaca    6/1/1930 22:00
1               Willingboro   6/30/1930 20:00
2                   Holyoke   2/15/1931 14:00
3                   Abilene    6/1/1931 13:00
4      New York Worlds Fair   4/18/1933 19:00
...                     ...               ...
18236            Grant Park  12/31/2000 23:00
18237           Spirit Lake  12/31/2000 23:00
18238           Eagle River  12/31/2000 23:45
18239           Eagle River  12/31/2000 23:45
18240                  Ybor  12/31/2000 23:59

[18241 rows x 2 columns]
"""

df1 = pd.read_csv('http://bit.ly/uforeports', index_col='City')

print(df1.head(5))

"""
                     Colors Reported Shape Reported State             Time
City                                                                      
Ithaca                           NaN       TRIANGLE    NY   6/1/1930 22:00
Willingboro                      NaN          OTHER    NJ  6/30/1930 20:00
Holyoke                          NaN           OVAL    CO  2/15/1931 14:00
Abilene                          NaN           DISK    KS   6/1/1931 13:00
New York Worlds Fair             NaN          LIGHT    NY  4/18/1933 19:00

"""

Happy Learning @AMET!!!

Tuesday, 12 April 2022

Data Science

Introduction

Data ? Data indeed is the new oil”

1.Facts about something that can be used in calculating, reasoning, or planning.

2 Information expressed as numbers for use especially in a computer. Hint: Data can be used as a singular or a plural in writing and speaking. This data is useful.

eg: Everything about You| me|World

Science ?

Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence. Scientific methodology includes the following: Objective observation: Measurement and data (possibly although not necessarily using mathematics as a tool) Evidence.

eg: Anthropology, archaeology, astronomy, biology, botany, chemistry, cybernetics, geography, geology, mathematics, medicine, physics, physiology, psychology, social science, sociology, and zoology

eg: Questions

What is the Universe made of?

How Did Life Begin

Are we alone in the Universe?

What makes us Human?..

DS Definition(s):

Courtesy : https://www.heavy.ai/learn/data-science

Data Science

Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. Analytic applications and data scientists can then review the results to uncover patterns and enable business leaders to draw informed insights.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,^[1]^[2] and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.

Stages of Data Science:

Apply mathematics, statistics, and the scientific method

Use a wide range of tools in R/Python and techniques for capturing, cleaning, evaluating and preparing data—everything from multi input channels to data mining to data integration methods

Extract insights from data using predictive analytics and artificial intelligence (AI), including machine learning and deep learning models

Write applications that automate data processing and calculations

Tell—and illustrate—stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical knowledge and understanding

By using Data Science, companies are able to make:

Better decisions (should we marry/study/start a company/A or B)

Predictive analysis (what will happen next?)

Pattern discoveries (deep drive into past and find pattern, or maybe hidden information in the data)

Courtesy: Data Science Skills geek of Geeks

Applications of Data Science:

Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

Data science is a multidisciplinary approach to extracting actionable insights from the large and ever-increasing volumes of data collected and created by today’s organizations. Data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions.

DS discovering actionable insight patterns in structured, unstructured, semi structured data sets. It involves statistics, inference, computer science, predictive analytics, machine learning algorithm development, and new technologies to gain insights from big data(Volume, Variety, Velocity).

First stage of DS:

Data Capture: acquiring data, sometimes extracting it, and entering it into the system.

Second Stage: Maintenance, which includes data warehousing, data cleansing, data processing, data staging, and data architecture.

Data processing follows, and constitutes one of the data science fundamentals.

It is during data exploration and processing that data scientists stand apart from data engineers.

This stage involves data mining, data classification and clustering, data modeling, and summarizing insights gleaned from the data—the processes that create effective data.

Third Stage : Data analysis, an equally critical stage.

Here data scientists conduct exploratory and confirmatory work, regression, predictive analysis, qualitative analysis, and text mining.

Fourth/Final Stage: Insights

Involves Data visualization, data reporting, the use of various business intelligence tools, and assisting businesses, policymakers, and others in smarter decision making.

As a result, data scientists (as data science practitioners are called) require computer science and pure science skills beyond those of a typical data analyst. A data scientist must be able to do the following:

Following are some of the applications that makes use of Data Science for it’s services:

To create Intelligent Digital Assistants (Google Assistant)
To drive Driverless Vehicle (Waymo)
To put Spam Filter (Gmail)
For Internet Search Results (Google)
For Recommendation Engine (Spotify in Music)
For finding Abusive Content and Hate Speech Filter (Facebook)
For Robotics (Boston Dynamics)
For Automatic Piracy Detection (YouTube) identification
To plan route planning: To discover the best routes to ship
To estimate delays for flight/ship/train etc. (through predictive analysis)
To create promotional offers for products
To find the best suited time to deliver goods
To forecast the next years revenue for a company
To analyze health benefit of training
To predict who will win elections

Data Science Functions

Types of Data

Structured data is highly specific and is stored in a predefined format, where unstructured data is a conglomeration of many varied types of data that are stored in their native formats.

Structured data vs. unstructured data

Structured data vs. unstructured data comes down to data types that can be used, the level of data expertise required to use it, and on-write versus on-read schema.

	Structured Data	Unstructured Data
Who	Self-service access	Requires data science expertise
What	Only select data types	Many varied types conglomerated
When	Schema-on-write	Schema-on-read
Where	Commonly stored in data warehouses	Commonly stored in data lakes
How	Predefined format	Native format

Courtesy: talend.com

Let's see the comparison chart between structured and unstructured data. Here, we are tabulating the difference between both terms based on some characteristics.
On the basis of Structured data Unstructured data
Technology It is based on a relational database. It is based on character and binary data.
Flexibility Structured data is less flexible and schema-dependent. There is an absence of schema, so it is more flexible.
Scalability It is hard to scale database schema. It is more scalable.
Robustness It is very robust. It is less robust.
Performance Here, we can perform a structured query that allows complex joining, so the performance is higher. While in unstructured data, textual queries are possible, the performance is lower than semi-structured and structured data.
Nature Structured data is quantitative, i.e., it consists of hard numbers or things that can be counted. It is qualitative, as it cannot be processed and analyzed using conventional tools.
Format It has a predefined format. It has a variety of formats, i.e., it comes in a variety of shapes and sizes.
Analysis It is easy to search. Searching for unstructured data is more difficult.

On the basis of	Structured data	Unstructured data
Technology	It is based on a relational database.	It is based on character and binary data.
Flexibility	Structured data is less flexible and schema-dependent.	There is an absence of schema, so it is more flexible.
Scalability	It is hard to scale database schema.	It is more scalable.
Robustness	It is very robust.	It is less robust.
Performance	Here, we can perform a structured query that allows complex joining, so the performance is higher.	While in unstructured data, textual queries are possible, the performance is lower than semi-structured and structured data.
Nature	Structured data is quantitative, i.e., it consists of hard numbers or things that can be counted.	It is qualitative, as it cannot be processed and analyzed using conventional tools.
Format	It has a predefined format.	It has a variety of formats, i.e., it comes in a variety of shapes and sizes.
Analysis	It is easy to search.	Searching for unstructured data is more difficult.

Courtesy: https://www.javatpoint.com/structured-data-vs-unstructured-data

Semi Structured data

Semi-structured data refers to data that is not captured or formatted in conventional ways. Semi-structured data does not follow the format of a tabular data model or relational databases because it does not have a fixed schema.

eg . Hypertext Markup Language (HTML) files JavaScript Object Notation (JSON) files Extensible Markup Language (XML) files

The following table gives a brief overview of structured, semi structured and unstructured data.

	Structured data	Semi-structured data	Unstructured data
What is it?	Data with a high degree of organization, typically stored in a spreadsheet-like manner	Data with some degree of organization	Data with no predefined organizational form and no specific format
To put it simply	Think of a spreadsheet (e.g. Excel) or data in a tabular format	Think of a TXT file with text that has some structure (headers, paragraphs, etc.)	Essentially anything that is not structured or semi-structured data (which is a lot)
Example formats	Excel spreadsheets Comma-separated value file (.csv) Relational database tables	Hypertext Markup Language (HTML) files JavaScript Object Notation (JSON) files Extensible Markup Language (XML) files	Images such as .jpeg or .png files Videos such as .mp4 or m4a files Sound files such as .mp3 or .wav files Plain text files Word files PDF files
Characte- ristics	Data is structured in a spreadsheet-like manner (e.g. in a table) Within that table, entries have the same format and a predefined length and follow the same order Is easily machine-readable and can therefore be analysed without major pre-processing of the data It is commonly said that around 20% of the world’s data is structured	Data is stored in files that have some degree of organization and structure Tags or other markers separate elements and enforce hierarchies, but the size of elements can vary and their order is not important Needs some pre-processing before it can be analysed by a computer Has gained importance with the emergence of the World Wide Web

Wednesday, 6 April 2022

P#23 Jokes Apart

Jokes Chokes

You are bored.

pip install jokes

import pyjokes
joke=pyjokes.get_joke(language='en', category= 'neutral')
print(joke)

Output: Enjoy!!!

I've been using Vim for a long time now, mainly because I can't figure out how to exit.

Hey this is the code will play the joke, so you can listen@@@

pip install gTTS playsound

import os
import pyjokes
from gtts import gTTS
from playsound import playsound

joke=pyjokes.get_joke(language='en', category= 'neutral')

print(joke)

myobj = gTTS(text=joke, lang='en', slow=False)

myobj.save("joke.mp3")

#os.system('mpg321 joke.mp3')
playsound('joke.mp3')

Be a Columbus in discovering Python abilities!!!! Happy learning @ AMET ODL...👧👧👧👧👧

AMET-SOLID

Sunday, 17 April 2022

P#26 natural Language processing

NLTK

Very easy to tokenize the statement with NLTK of python

Another Exercise:

Friday, 15 April 2022

P#25 DATE TIME

Python - Date and Time

DATE TIME ARITHMETIC

P#25 Pandas Data Frame set_index() and reset_index()

Data Frame set_index() and reset_index()

P#24 Pandas loc() and iloc()

Differences between loc() and iloc()

Tuesday, 12 April 2022

DS#1

Data Science

Introduction

Data ? Data indeed is the new oil”

Data Science Functions

Structured data vs. unstructured data

Courtesy: talend.com

Courtesy: https://www.javatpoint.com/structured-data-vs-unstructured-data

Semi Structured data

Wednesday, 6 April 2022

P#23 Jokes Apart

Jokes Chokes

Work Diary - 2025

Happy open and Distance Learning!

Blog Archive