Sunday 17 April 2022

P#26 natural Language processing

 NLTK

Very easy to tokenize the statement with NLTK of python


import nltk

word_data ='Sitting pretty,impatient, work from home,
we can work from home'

tokens = nltk.word_tokenize(word_data)

print(tokens)

#['Sitting', 'pretty', ',', 'impatient', ',', 'work',
'from', 'home', ',', 'we', 'can', 'work', 'from', 'home']


Another Exercise:


sentence_data = "Johhny Jonny Yes papa. 
Eating Sugar No. NO. papa "

nltk_tokens = nltk.sent_tokenize(sentence_data)

print(nltk_tokens) # ['Johhny Jonny Yes papa.',
'Eating Sugar No.', 'NO.', 'papa']




Happy Learning @AMET ODL!!!


Friday 15 April 2022

P#25 DATE TIME

 Python - Date and Time


import datetime

print('The Date Today is :', datetime.datetime.today())


date_today = datetime.date.today()

print(date_today)

print('This Year :', date_today.year)

print('This Month :', date_today.month)

print('Month Name:',date_today.strftime('%B'))

print('This Week Day :', date_today.day)

print('Week Day Name:',date_today.strftime('%A'))

"""
The Date Today is : 2022-04-15 22:56:14.461386
2022-04-15
This Year : 2022
This Month : 4
Month Name: April
This Week Day : 15
Week Day Name: Friday
"""


DATE TIME ARITHMETIC

import datetime

# First Date
day1 = datetime.date(2020, 2, 12)
print('day1:', day1.ctime())

# Second Date
day2 = datetime.date(2019, 8, 18)
print ('day2:', day2.ctime())

# Difference between the dates
print('Number of Days:', day1 - day2)

date_today = datetime.date.today()

# Create a delta of Four Days
no_of_days = datetime.timedelta(days=4)

# Use Delta for Past Date -
before_four_days = date_today - no_of_days
print('Before Four Days:', before_four_days)

# Use Delta for future Date +
after_four_days = date_today + no_of_days
print('After Four Days:', after_four_days)

"""
day1: Wed Feb 12 00:00:00 2020
day2: Sun Aug 18 00:00:00 2019
Number of Days: 178 days, 0:00:00
Before Four Days: 2022-04-11
After Four Days: 2022-04-19
"""

Happy Learning at AMET ODL!👦👧👥👭👬

P#25 Pandas Data Frame set_index() and reset_index()

 Data Frame set_index() and reset_index()

These methods are very useful. We can dynamically change re index and reset index with out any problem.

import pandas as  pd


drinks= pd.read_csv('http://bit.ly/drinksbycountry')

print(drinks.shape) #(193, 6)

print(drinks.index) #RangeIndex(start=0, stop=193, step=1)

print(drinks.columns)
"""Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
'total_litres_of_pure_alcohol', 'continent'],
dtype='object')
"""

drinks.set_index('continent',inplace=True)


print(drinks.index)

"""
Index(['Asia', 'Europe', 'Africa', 'Europe', 'Africa', 'North America',
'South America', 'Europe', 'Oceania', 'Europe',
...
'Africa', 'North America', 'South America', 'Asia', 'Oceania',
'South America', 'Asia', 'Asia', 'Africa', 'Africa'],
dtype='object', name='continent', length=193)
"""

print(drinks.head())

"""
country ... total_litres_of_pure_alcohol
continent ...
Asia Afghanistan ... 0.0
Europe Albania ... 4.9
Africa Algeria ... 0.7
Europe Andorra ... 12.4
Africa Angola ... 5.9

[5 rows x 5 columns]
"""

print(drinks.shape) #(193, 5)

print(drinks.loc['Europe', 'spirit_servings'])
"""
continent
Europe 132
Europe 138
....
Europe 237
Europe 126
Name: spirit_servings, dtype: int64
"""

drinks.index.name = None


print(drinks.index)

"""
Index(['Asia', 'Europe', 'Africa', 'Europe', 'Africa', 'North America',
'South America', 'Europe', 'Oceania', 'Europe',
...
'Africa', 'North America', 'South America', 'Asia', 'Oceania',
'South America', 'Asia', 'Asia', 'Africa', 'Africa'],
dtype='object', length=193)

"""

drinks.index.name='continent'

drinks.reset_index(inplace=True) # Resetting Index inplace

print(drinks.head(2))

"""
continent country ... wine_servings total_litres_of_pure_alcohol
0 Asia Afghanistan ... 0 0.0
1 Europe Albania ... 54 4.9

[2 rows x 6 columns]
"""

The above code snippets and output shows , how it is very easy to operate in Data Frames

Happy Learning @ AMET ODL!!!!

P#24 Pandas loc() and iloc()

Differences between loc() and iloc()

With these methods, we can simulate SQL for filtering rows or columns. Let us dive in to this exercise. 

import pandas as pd

df = pd.read_csv('uforeports.csv')

# df =pd.read_csv('http://bit.ly/uforeports') Alternateively directly from web

print(df.shape) # to print rows xcolumns (18241, 5)

print(df.describe)
"""
<bound method NDFrame.describe of City Colors Reported ... State Time
0 Ithaca NaN ... NY 6/1/1930 22:00
1 Willingboro NaN ... NJ 6/30/1930 20:00
2 Holyoke NaN ... CO 2/15/1931 14:00
3 Abilene NaN ... KS 6/1/1931 13:00
4 New York Worlds Fair NaN ... NY 4/18/1933 19:00
... ... ... ... ... ...
18236 Grant Park NaN ... IL 12/31/2000 23:00
18237 Spirit Lake NaN ... IA 12/31/2000 23:00
18238 Eagle River NaN ... WI 12/31/2000 23:45
18239 Eagle River RED ... WI 12/31/2000 23:45
18240 Ybor NaN ... FL 12/31/2000 23:59

[18241 rows x 5 columns]>
"""



# print(df.head(5)) # to Print top 5 rows
"""
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00
"""

# print(df.tail(5)) # to Print bottom 5 rows

"""
City Colors Reported Shape Reported State Time
18236 Grant Park NaN TRIANGLE IL 12/31/2000 23:00
18237 Spirit Lake NaN DISK IA 12/31/2000 23:00
18238 Eagle River NaN NaN WI 12/31/2000 23:45
18239 Eagle River RED LIGHT WI 12/31/2000 23:45
18240 Ybor NaN OVAL FL 12/31/2000 23:59
"""


# To filter rows()

# print(df.loc[6])

"""
City Crater Lake
Colors Reported NaN
Shape Reported CIRCLE
State CA
Time 6/15/1935 0:00
Name: 6, dtype: object
"""

print(df.loc[0:2])

"""
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00

"""

print(df.loc[0:2, :]) # same output as above


print(df.loc[0:2 , 'City':'State']) # To filter rows and display columns from city to state Time column dropped
"""
City Colors Reported Shape Reported State
0 Ithaca NaN TRIANGLE NY
1 Willingboro NaN OTHER NJ
2 Holyoke NaN OVAL CO
"""

print(df.loc[: , 'City':'State']) # To print all rows
"""
City Colors Reported Shape Reported State
0 Ithaca NaN TRIANGLE NY
1 Willingboro NaN OTHER NJ
2 Holyoke NaN OVAL CO
3 Abilene NaN DISK KS
4 New York Worlds Fair NaN LIGHT NY
... ... ... ... ...
18236 Grant Park NaN TRIANGLE IL
18237 Spirit Lake NaN DISK IA
18238 Eagle River NaN NaN WI
18239 Eagle River RED LIGHT WI
18240 Ybor NaN OVAL FL

[18241 rows x 4 columns]

"""
print(df.head(3).drop('Time', axis = 1))

"""
Name: City, Length: 18241, dtype: object
City Colors Reported Shape Reported State
0 Ithaca NaN TRIANGLE NY
1 Willingboro NaN OTHER NJ
2 Holyoke NaN OVAL CO
"""
print(df.loc[: , 'City':'State']) # To print all rows drop timecolumn

print(df.loc[: , 'City']) # To print all rows and city column only

"""
0 Ithaca
1 Willingboro
2 Holyoke
3 Abilene
4 New York Worlds Fair
...
18236 Grant Park
18237 Spirit Lake
18238 Eagle River
18239 Eagle River
18240 Ybor
Name: City, Length: 18241, dtype: object

"""

print(df[df.City == 'Abilene']) # select Filtering by value

"""
City Colors Reported Shape Reported State Time
3 Abilene NaN DISK KS 6/1/1931 13:00
6654 Abilene NaN TRIANGLE TX 9/1/1991 1:00
8357 Abilene NaN SPHERE TX 7/15/1995 0:00
8783 Abilene NaN NaN KS 10/14/1995 23:20
10883 Abilene NaN NaN TX 10/19/1997 20:45

"""

print(df[df.City == 'Abilene'].State) # Filter by value, select column

"""
3 KS
6654 TX
8357 TX
8783 KS
10883 TX
"""

print(df.columns)
# Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

print(df.iloc[0:3 , 0:3]) #Filter 3 rows and 3 columns

""" City Colors Reported Shape Reported
0 Ithaca NaN TRIANGLE
1 Willingboro NaN OTHER
2 Holyoke NaN OVAL

"""

print(df.iloc[0:3 , :]) #Filter 3 rows and all columns
"""
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00

"""

print(df.loc[:, ['City','Time']]) # to filter only two columns for all rows

"""
City Time
0 Ithaca 6/1/1930 22:00
1 Willingboro 6/30/1930 20:00
2 Holyoke 2/15/1931 14:00
3 Abilene 6/1/1931 13:00
4 New York Worlds Fair 4/18/1933 19:00
... ... ...
18236 Grant Park 12/31/2000 23:00
18237 Spirit Lake 12/31/2000 23:00
18238 Eagle River 12/31/2000 23:45
18239 Eagle River 12/31/2000 23:45
18240 Ybor 12/31/2000 23:59

[18241 rows x 2 columns]
"""

df1 = pd.read_csv('http://bit.ly/uforeports', index_col='City')

print(df1.head(5))

"""
Colors Reported Shape Reported State Time
City
Ithaca NaN TRIANGLE NY 6/1/1930 22:00
Willingboro NaN OTHER NJ 6/30/1930 20:00
Holyoke NaN OVAL CO 2/15/1931 14:00
Abilene NaN DISK KS 6/1/1931 13:00
New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00

"""

 Happy Learning @AMET!!!

Tuesday 12 April 2022

DS#1

Data Science 

Introduction


Data ? Data indeed is the new oil”


1.Facts about something that can be used in calculating, reasoning, or planning. 
2 Information expressed as numbers for use especially in a computer. Hint: Data can be used as a singular or a plural in writing and speaking. This data is useful.

eg: Everything about You| me|World



Science ?

Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence. Scientific methodology includes the following: Objective observation: Measurement and data (possibly although not necessarily using mathematics as a tool) Evidence.

eg: Anthropology, archaeology, astronomy, biology, botany, chemistry, cybernetics, geography, geology, mathematics, medicine, physics, physiology, psychology, social science, sociology, and zoology


eg: Questions
What is the Universe made of?
How Did Life Begin
Are we alone in the Universe?
What makes us Human?..


DS Definition(s):

Courtesy : https://www.heavy.ai/learn/data-science

Data Science

Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. Analytic applications and data scientists can then review the results to uncover patterns and enable business leaders to draw informed insights.

Data science is an interdisciplinary field that uses scientific methodsprocessesalgorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,[1][2] and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data miningmachine learning and big data.

Stages of Data Science:

  • Apply mathematics, statistics, and the scientific method
  • Use a wide range of tools in R/Python  and techniques for capturing, cleaning, evaluating and preparing data—everything from multi input channels to data mining to data integration methods
  • Extract insights from data using predictive analytics and artificial intelligence (AI), including machine learning and deep learning models
  • Write applications that automate data processing and calculations
  • Tell—and illustrate—stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical knowledge and understanding
By using Data Science, companies are able to make:
  • Better decisions (should we marry/study/start a company/A or B)
  • Predictive analysis (what will happen next?)
  • Pattern discoveries (deep drive into past and find pattern, or maybe hidden information in the data)

Courtesy: Data Science Skills geek of Geeks

Applications of Data Science:

Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

Data science is a multidisciplinary approach to extracting actionable insights from the large and ever-increasing volumes of data collected and created by today’s organizations. Data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions.

DS discovering actionable insight  patterns in structured, unstructured, semi structured data sets. It involves statistics, inference, computer science, predictive analytics, machine learning algorithm development, and new technologies to gain insights from big data(Volume, Variety, Velocity).

First stage of DS:

Data Capture: acquiring data, sometimes extracting it, and entering it into the system. 

Second Stage: Maintenance, which includes data warehousing, data cleansing, data processing, data staging, and data architecture.

Data processing follows, and constitutes one of the data science fundamentals. 

It is during data exploration and processing that data scientists stand apart from data engineers. 

This stage involves data mining, data classification and clustering, data modeling, and summarizing insights gleaned from the data—the processes that create effective data.

Third Stage : Data analysis, an equally critical stage. 

Here data scientists conduct exploratory and confirmatory work, regression, predictive analysis, qualitative analysis, and text mining. 

Fourth/Final Stage: Insights

Involves Data visualization, data reporting, the use of various business intelligence tools, and assisting businesses, policymakers, and others in smarter decision making.

As a result, data scientists (as data science practitioners are called) require computer science and pure science skills beyond those of a typical data analyst. A data scientist must be able to do the following:

    • Following are some of the applications that makes use of Data Science for it’s services:

        • To create Intelligent Digital Assistants (Google Assistant)
        • To drive Driverless  Vehicle (Waymo)
        • To put Spam Filter (Gmail)
        • For Internet Search Results (Google)
        • For Recommendation Engine (Spotify in Music)
        • For finding Abusive Content and Hate Speech Filter (Facebook)
        • For Robotics (Boston Dynamics)
        • For Automatic Piracy Detection (YouTube) identification
        • To plan route planning: To discover the best routes to ship
        • To estimate delays for flight/ship/train etc. (through predictive analysis)
        • To create promotional offers for products
        • To find the best suited time to deliver goods 
        • To forecast the next years revenue for a company
        • To analyze health benefit of training
        • To predict who will win elections

Data Science Functions


Types of Data

Structured data is highly specific and is stored in a predefined format, where unstructured data is a conglomeration of many varied types of data that are stored in their native formats.


Structured data vs. unstructured data

Structured data vs. unstructured data comes down to data types that can be used, the level of data expertise required to use it, and on-write versus on-read schema. 

Structured DataUnstructured Data
WhoSelf-service accessRequires data science expertise
WhatOnly select data typesMany varied types conglomerated
WhenSchema-on-writeSchema-on-read
WhereCommonly stored in data warehousesCommonly stored in data lakes
HowPredefined formatNative format

Courtesy: talend.com

Let's see the comparison chart between structured and unstructured data. Here, we are tabulating the difference between both terms based on some characteristics.

On the basis ofStructured dataUnstructured data
TechnologyIt is based on a relational database.It is based on character and binary data.
FlexibilityStructured data is less flexible and schema-dependent.There is an absence of schema, so it is more flexible.
ScalabilityIt is hard to scale database schema.It is more scalable.
RobustnessIt is very robust.It is less robust.
PerformanceHere, we can perform a structured query that allows complex joining, so the performance is higher.While in unstructured data, textual queries are possible, the performance is lower than semi-structured and structured data.
NatureStructured data is quantitative, i.e., it consists of hard numbers or things that can be counted.It is qualitative, as it cannot be processed and analyzed using conventional tools.
FormatIt has a predefined format.It has a variety of formats, i.e., it comes in a variety of shapes and sizes.
AnalysisIt is easy to search.Searching for unstructured data is more difficult.

Courtesy: https://www.javatpoint.com/structured-data-vs-unstructured-data 

Semi Structured data 

Semi-structured data refers to data that is not captured or formatted in conventional ways. Semi-structured data does not follow the format of a tabular data model or relational databases because it does not have a fixed schema.

eg . Hypertext Markup Language (HTML) files JavaScript Object Notation (JSON) files Extensible Markup Language (XML) files


The following table gives a brief overview of structured, semi structured and unstructured data.

 

Structured dataSemi-structured dataUnstructured data
What is it?Data with a high degree of organization, typically stored in a spreadsheet-like mannerData with some degree of organizationData with no predefined organizational form and no specific format
To put it simplyThink of a spreadsheet (e.g. Excel) or data in a tabular formatThink of a TXT file with text that has some structure (headers, paragraphs, etc.)Essentially anything that is not structured or semi-structured data (which is a lot)
Example formats
  • Excel spreadsheets
  • Comma-separated value file (.csv)
  • Relational database tables
  • Hypertext Markup Language (HTML) files
  • JavaScript Object Notation (JSON) files
  • Extensible Markup Language (XML) files
  • Images such as .jpeg or .png files
  • Videos such as .mp4 or m4a files
  • Sound files such as .mp3 or .wav files
  • Plain text files
  • Word files
  • PDF files
Characte- ristics
  • Data is structured in a spreadsheet-like manner (e.g. in a table)
  • Within that table, entries have the same format and a predefined length and follow the same order
  • Is easily machine-readable and can therefore be analysed without major pre-processing of the data
  • It is commonly said that around 20% of the world’s data is structured
 
  • Data is stored in files that have some degree of organization and structure
  • Tags or other markers separate elements and enforce hierarchies, but the size of elements can vary and their order is not important
  • Needs some pre-processing before it can be analysed by a computer
  • Has gained importance with the emergence of the World Wide Web



Wednesday 6 April 2022

P#23 Jokes Apart

 Jokes Chokes

You are bored. 

pip install jokes

import pyjokes
joke=pyjokes.get_joke(language='en', category= 'neutral')
print(joke)

Output: Enjoy!!! 

I've been using Vim for a long time now, mainly because I can't figure out how to exit.
Hey this is the code will play the joke, so you can listen@@@ 
pip install gTTS playsound
import os
import pyjokes
from gtts import gTTS
from playsound import playsound

joke=pyjokes.get_joke(language='en', category= 'neutral')

print(joke)

myobj = gTTS(text=joke, lang='en', slow=False)

myobj.save("joke.mp3")

#os.system('mpg321 joke.mp3')
playsound('joke.mp3')

Be a Columbus in discovering Python abilities!!!!  Happy learning @ AMET ODL...👧👧👧👧👧

Making Prompts for Profile Web Site

  Prompt: Can you create prompt to craft better draft in a given topic. Response: Sure! Could you please specify the topic for which you...