Data Science
Introduction
Data ? Data indeed is the new oil”
DS Definition(s):
Courtesy : https://www.heavy.ai/learn/data-science
Data Science
Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. Analytic applications and data scientists can then review the results to uncover patterns and enable business leaders to draw informed insights.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,[1][2] and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.
Stages of Data Science:
- Apply mathematics, statistics, and the scientific method
- Use a wide range of tools in R/Python and techniques for capturing, cleaning, evaluating and preparing data—everything from multi input channels to data mining to data integration methods
- Extract insights from data using predictive analytics and artificial intelligence (AI), including machine learning and deep learning models
- Write applications that automate data processing and calculations
- Tell—and illustrate—stories that clearly convey the meaning of results to decision-makers and stakeholders at every level of technical knowledge and understanding
- Better decisions (should we marry/study/start a company/A or B)
- Predictive analysis (what will happen next?)
- Pattern discoveries (deep drive into past and find pattern, or maybe hidden information in the data)
Courtesy: Data Science Skills geek of Geeks
Applications of Data Science:
Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
Data science is a multidisciplinary approach to extracting actionable insights from the large and ever-increasing volumes of data collected and created by today’s organizations. Data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions.
DS discovering actionable insight patterns in structured, unstructured, semi structured data sets. It involves statistics, inference, computer science, predictive analytics, machine learning algorithm development, and new technologies to gain insights from big data(Volume, Variety, Velocity).
First stage of DS:
Data Capture: acquiring data, sometimes extracting it, and entering it into the system.
Second Stage: Maintenance, which includes data warehousing, data cleansing, data processing, data staging, and data architecture.
Data processing follows, and constitutes one of the data science fundamentals.
It is during data exploration and processing that data scientists stand apart from data engineers.
This stage involves data mining, data classification and clustering, data modeling, and summarizing insights gleaned from the data—the processes that create effective data.
Third Stage : Data analysis, an equally critical stage.
Here data scientists conduct exploratory and confirmatory work, regression, predictive analysis, qualitative analysis, and text mining.
Fourth/Final Stage: Insights
Involves Data visualization, data reporting, the use of various business intelligence tools, and assisting businesses, policymakers, and others in smarter decision making.
As a result, data scientists (as data science practitioners are called) require computer science and pure science skills beyond those of a typical data analyst. A data scientist must be able to do the following:
Following are some of the applications that makes use of Data Science for it’s services:
- To create Intelligent Digital Assistants (Google Assistant)
- To drive Driverless Vehicle (Waymo)
- To put Spam Filter (Gmail)
- For Internet Search Results (Google)
- For Recommendation Engine (Spotify in Music)
- For finding Abusive Content and Hate Speech Filter (Facebook)
- For Robotics (Boston Dynamics)
- For Automatic Piracy Detection (YouTube) identification
- To plan route planning: To discover the best routes to ship
- To estimate delays for flight/ship/train etc. (through predictive analysis)
- To create promotional offers for products
- To find the best suited time to deliver goods
- To forecast the next years revenue for a company
- To analyze health benefit of training
- To predict who will win elections
Data Science Functions
Structured data vs. unstructured data
Structured data vs. unstructured data comes down to data types that can be used, the level of data expertise required to use it, and on-write versus on-read schema.
Structured Data | Unstructured Data | |
---|---|---|
Who | Self-service access | Requires data science expertise |
What | Only select data types | Many varied types conglomerated |
When | Schema-on-write | Schema-on-read |
Where | Commonly stored in data warehouses | Commonly stored in data lakes |
How | Predefined format | Native format |
Courtesy: talend.com
Let's see the comparison chart between structured and unstructured data. Here, we are tabulating the difference between both terms based on some characteristics.
On the basis of | Structured data | Unstructured data |
---|---|---|
Technology | It is based on a relational database. | It is based on character and binary data. |
Flexibility | Structured data is less flexible and schema-dependent. | There is an absence of schema, so it is more flexible. |
Scalability | It is hard to scale database schema. | It is more scalable. |
Robustness | It is very robust. | It is less robust. |
Performance | Here, we can perform a structured query that allows complex joining, so the performance is higher. | While in unstructured data, textual queries are possible, the performance is lower than semi-structured and structured data. |
Nature | Structured data is quantitative, i.e., it consists of hard numbers or things that can be counted. | It is qualitative, as it cannot be processed and analyzed using conventional tools. |
Format | It has a predefined format. | It has a variety of formats, i.e., it comes in a variety of shapes and sizes. |
Analysis | It is easy to search. | Searching for unstructured data is more difficult. |
Courtesy: https://www.javatpoint.com/structured-data-vs-unstructured-data
Semi Structured data
The following table gives a brief overview of structured, semi structured and unstructured data.
Structured data | Semi-structured data | Unstructured data | |
What is it? | Data with a high degree of organization, typically stored in a spreadsheet-like manner | Data with some degree of organization | Data with no predefined organizational form and no specific format |
To put it simply | Think of a spreadsheet (e.g. Excel) or data in a tabular format | Think of a TXT file with text that has some structure (headers, paragraphs, etc.) | Essentially anything that is not structured or semi-structured data (which is a lot) |
Example formats |
|
|
|
Characte- ristics |
|
|
No comments:
Post a Comment