Tuesday 12 April 2022

DS#1

Data Science 

Introduction


Data ? Data indeed is the new oil”


1.Facts about something that can be used in calculating, reasoning, or planning. 
2 Information expressed as numbers for use especially in a computer. Hint: Data can be used as a singular or a plural in writing and speaking. This data is useful.

eg: Everything about You| me|World



Science ?

Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence. Scientific methodology includes the following: Objective observation: Measurement and data (possibly although not necessarily using mathematics as a tool) Evidence.

eg: Anthropology, archaeology, astronomy, biology, botany, chemistry, cybernetics, geography, geology, mathematics, medicine, physics, physiology, psychology, social science, sociology, and zoology


eg: Questions
What is the Universe made of?
How Did Life Begin
Are we alone in the Universe?
What makes us Human?..


DS Definition(s):

Courtesy : https://www.heavy.ai/learn/data-science

Data Science

Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. Analytic applications and data scientists can then review the results to uncover patterns and enable business leaders to draw informed insights.

Data science is an interdisciplinary field that uses scientific methodsprocessesalgorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,[1][2] and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data miningmachine learning and big data.

Stages of Data Science:

  • Apply mathematics, statistics, and the scientific method
  • Use a wide range of tools in R/Python  and techniques for capturing, cleaning, evaluating and preparing data—everything from multi input channels to data mining to data integration methods
  • Extract insights from data using predictive analytics and artificial intelligence (AI), including machine learning and deep learning models
  • Write applications that automate data processing and calculations
  • Tell—and illustrate—stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical knowledge and understanding
By using Data Science, companies are able to make:
  • Better decisions (should we marry/study/start a company/A or B)
  • Predictive analysis (what will happen next?)
  • Pattern discoveries (deep drive into past and find pattern, or maybe hidden information in the data)

Courtesy: Data Science Skills geek of Geeks

Applications of Data Science:

Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.

Data science is a multidisciplinary approach to extracting actionable insights from the large and ever-increasing volumes of data collected and created by today’s organizations. Data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions.

DS discovering actionable insight  patterns in structured, unstructured, semi structured data sets. It involves statistics, inference, computer science, predictive analytics, machine learning algorithm development, and new technologies to gain insights from big data(Volume, Variety, Velocity).

First stage of DS:

Data Capture: acquiring data, sometimes extracting it, and entering it into the system. 

Second Stage: Maintenance, which includes data warehousing, data cleansing, data processing, data staging, and data architecture.

Data processing follows, and constitutes one of the data science fundamentals. 

It is during data exploration and processing that data scientists stand apart from data engineers. 

This stage involves data mining, data classification and clustering, data modeling, and summarizing insights gleaned from the data—the processes that create effective data.

Third Stage : Data analysis, an equally critical stage. 

Here data scientists conduct exploratory and confirmatory work, regression, predictive analysis, qualitative analysis, and text mining. 

Fourth/Final Stage: Insights

Involves Data visualization, data reporting, the use of various business intelligence tools, and assisting businesses, policymakers, and others in smarter decision making.

As a result, data scientists (as data science practitioners are called) require computer science and pure science skills beyond those of a typical data analyst. A data scientist must be able to do the following:

    • Following are some of the applications that makes use of Data Science for it’s services:

        • To create Intelligent Digital Assistants (Google Assistant)
        • To drive Driverless  Vehicle (Waymo)
        • To put Spam Filter (Gmail)
        • For Internet Search Results (Google)
        • For Recommendation Engine (Spotify in Music)
        • For finding Abusive Content and Hate Speech Filter (Facebook)
        • For Robotics (Boston Dynamics)
        • For Automatic Piracy Detection (YouTube) identification
        • To plan route planning: To discover the best routes to ship
        • To estimate delays for flight/ship/train etc. (through predictive analysis)
        • To create promotional offers for products
        • To find the best suited time to deliver goods 
        • To forecast the next years revenue for a company
        • To analyze health benefit of training
        • To predict who will win elections

Data Science Functions


Types of Data

Structured data is highly specific and is stored in a predefined format, where unstructured data is a conglomeration of many varied types of data that are stored in their native formats.


Structured data vs. unstructured data

Structured data vs. unstructured data comes down to data types that can be used, the level of data expertise required to use it, and on-write versus on-read schema. 

Structured DataUnstructured Data
WhoSelf-service accessRequires data science expertise
WhatOnly select data typesMany varied types conglomerated
WhenSchema-on-writeSchema-on-read
WhereCommonly stored in data warehousesCommonly stored in data lakes
HowPredefined formatNative format

Courtesy: talend.com

Let's see the comparison chart between structured and unstructured data. Here, we are tabulating the difference between both terms based on some characteristics.

On the basis ofStructured dataUnstructured data
TechnologyIt is based on a relational database.It is based on character and binary data.
FlexibilityStructured data is less flexible and schema-dependent.There is an absence of schema, so it is more flexible.
ScalabilityIt is hard to scale database schema.It is more scalable.
RobustnessIt is very robust.It is less robust.
PerformanceHere, we can perform a structured query that allows complex joining, so the performance is higher.While in unstructured data, textual queries are possible, the performance is lower than semi-structured and structured data.
NatureStructured data is quantitative, i.e., it consists of hard numbers or things that can be counted.It is qualitative, as it cannot be processed and analyzed using conventional tools.
FormatIt has a predefined format.It has a variety of formats, i.e., it comes in a variety of shapes and sizes.
AnalysisIt is easy to search.Searching for unstructured data is more difficult.

Courtesy: https://www.javatpoint.com/structured-data-vs-unstructured-data 

Semi Structured data 

Semi-structured data refers to data that is not captured or formatted in conventional ways. Semi-structured data does not follow the format of a tabular data model or relational databases because it does not have a fixed schema.

eg . Hypertext Markup Language (HTML) files JavaScript Object Notation (JSON) files Extensible Markup Language (XML) files


The following table gives a brief overview of structured, semi structured and unstructured data.

 

Structured dataSemi-structured dataUnstructured data
What is it?Data with a high degree of organization, typically stored in a spreadsheet-like mannerData with some degree of organizationData with no predefined organizational form and no specific format
To put it simplyThink of a spreadsheet (e.g. Excel) or data in a tabular formatThink of a TXT file with text that has some structure (headers, paragraphs, etc.)Essentially anything that is not structured or semi-structured data (which is a lot)
Example formats
  • Excel spreadsheets
  • Comma-separated value file (.csv)
  • Relational database tables
  • Hypertext Markup Language (HTML) files
  • JavaScript Object Notation (JSON) files
  • Extensible Markup Language (XML) files
  • Images such as .jpeg or .png files
  • Videos such as .mp4 or m4a files
  • Sound files such as .mp3 or .wav files
  • Plain text files
  • Word files
  • PDF files
Characte- ristics
  • Data is structured in a spreadsheet-like manner (e.g. in a table)
  • Within that table, entries have the same format and a predefined length and follow the same order
  • Is easily machine-readable and can therefore be analysed without major pre-processing of the data
  • It is commonly said that around 20% of the world’s data is structured
 
  • Data is stored in files that have some degree of organization and structure
  • Tags or other markers separate elements and enforce hierarchies, but the size of elements can vary and their order is not important
  • Needs some pre-processing before it can be analysed by a computer
  • Has gained importance with the emergence of the World Wide Web



No comments:

Post a Comment

Making Prompts for Profile Web Site

  Prompt: Can you create prompt to craft better draft in a given topic. Response: Sure! Could you please specify the topic for which you...