Friday, 29 April 2022

Plots Distribuition

 

Firs see the histogram

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

# Load the data
df = pd.read_csv('nft.csv')
# Extract feature we're interested in
data = df['release_year']

# Generate histogram/distribution plot
sns.displot(data)
plt.savefig('hist1.png')
plt.show()


data = df['release_year']
sns.displot(data, discrete = True, kde = True)
plt.savefig('dist_hist.png')
plt.show()

See the histogram with Distribution Curve


Like these we can combine histogram and distribution curve to understand things better

Code to join plot:


df.dropna(inplace=True) # to drop null value otherwise, seaborn has trouble to convert null
sns.jointplot(x = "rating", y = "release_year", data = df)
plt.savefig('joint.png')
plt.show()


Sea born is very easy to plot thes types of charts
















Tuesday, 26 April 2022

Lab Exercises for MBA DS #1

 1. Draw Charts for TEU for MAERSK and SHI (Shipping corporation of India)

Aim : Infer TEU for Maersk and SHI

Procedure:

import libraries 

import matplotlib.pyplot as plt
import numpy as np

define dataset


year = [2016, 2017,2018,2019,2020, 2021]
Maersk = [12000, 14000, 17000, 21000, 9000, 12000 ]
sci = [200,800, 1500, 1700, 150, 600]
scin = [2,8, 15, 17, 15, 6]

Maersk_new = np.log10(Maersk)


print(Maersk_new) #[4.07918125 4.14612804 4.23044892 4.32221929 3.95424251 4.07918125]


sci_new = np.log10(sci)


print(sci_new) #[2.30103 2.90308999 3.17609126 3.23044892 2.17

609126 2.77815125]

Plot graphs with actual values,

def plot1():
fig = plt.figure(figsize=(5, 4))
plt.plot(year, Maersk,'b-', year, sci,'r-')
plt.title('Multiple Plots for original vaues')
plt.xlabel('Year')
plt.ylabel('TEU')
plt.grid()
plt.savefig('renga1.png')

    plt.show()

plot1() 



Plot with modified values

def plot2():
fig = plt.figure(figsize=(5, 4))
plt.plot(year, Maersk_new,'b-', year, sci_new,'r-')
plt.title('Multiple Plots values for log10 Values')
plt.xlabel('Year')
plt.ylabel('TEU')
plt.grid()
plt.savefig('renga2.png')
plt.show()

plot2()



Plot with  log values 

def plot3():
fig = plt.figure(figsize=(5, 4))
plt.plot(year, Maersk_new,'b-', year, scin,'r-')
plt.title('Multiple Plots values for SCI-Small Values')
plt.xlabel('Year')
plt.ylabel('TEU')
plt.grid()
plt.savefig('renga3.png')
plt.show()

plot3()




Multiple sub plots for comparison in a column

def plot4():

plt.title('MultipleSub Plots Column wise')
# plot 1:
plt.subplot(3, 1, 1)
plt.plot(year, Maersk, year,sci)
plt.ylabel('TEU')
plt.title('Year Vs Maersk TEU')

# plot 2:
plt.subplot(3, 1, 2)
plt.plot(year, Maersk_new, year, sci_new)
plt.xlabel('Year')
plt.title('Year Vs Sci_new')

# plot 3:
plt.subplot(3, 1, 3)
plt.plot(year, Maersk_new, year, scin)
plt.title('Year Vs Scin')


plt.savefig('renga4.png')
plt.show()

plot4()



From the above charts we can infer which is better Maersk or SHI???

Happy Learning with AMET ODL🚢🚢🚢🚢🛳🛳🛳🛳😆

DS#7 EDA Notes

Exploratory Data Analysis

EDA:

Exploratory Data Analysis (EDA) is an approach to analyze the data using visual techniques. It is used to discover trends, patterns, or to check assumptions with the help of statistical summary and graphical representations.

Get maximum insights from a data set.
Uncover underlying structure.
Extract important variables from the dataset.
Detect outliers and anomalies(if any)
Test underlying assumptions.
Determine the optimal factor settings

EDA Techniques

Univariate Non-graphical.
Multivariate Non-graphical.
Univariate graphical.
Multivariate graphical.

Types of Data Analysis

Several data analysis techniques exist encompassing various domains such as business, science, social science, etc. with a variety of names. 

The major data analysis approaches are −
  • Data Mining
  • Business Intelligence
  • Statistical Analysis
  • Predictive Analytics
  • Text Analytics
  • Data Mining
    • Data Mining is the analysis of large quantities of data to extract previously unknown, interesting patterns of data, unusual data and the dependencies. Note that the goal is the extraction of patterns and knowledge from large amounts of data and not the extraction of data itself.Data mining analysis involves computer science methods at the intersection of the artificial intelligence, machine learning, statistics, and database systems. The patterns obtained from data mining can be considered as a summary of the input data that can be used in further analysis or to obtain more accurate prediction results by a decision support system.
  • Business Intelligence
    • Business Intelligence techniques and tools are for acquisition and transformation of large amounts of unstructured business data to help identify, develop and create new strategic business opportunities.
    • The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities. It helps in implementing an effective strategy based on insights that can provide businesses with a competitive market-advantage and long-term stability.
  • Statistical Analysis : Statistics is the study of collection, analysis, interpretation, presentation, and organization of data.
    • In data analysis, two main statistical methodologies are used −
    • Descriptive statistics − In descriptive statistics, data from the entire population or a sample is summarized with numerical descriptors such as −
      • Mean, Standard Deviation for Continuous Data
      • Frequency, Percentage for Categorical Data
      • Inferential statistics − It uses patterns in the sample data to draw inferences about the represented population or accounting for randomness. These inferences can be −
        • answering yes/no questions about the data (hypothesis testing)
        • estimating numerical characteristics of the data (estimation)
        • describing associations within the data (correlation)
        • modeling relationships within the data (E.g. regression analysis)
    • Predictive Analytics
      • Predictive Analytics use statistical models to analyze current and historical data for forecasting (predictions) about future or otherwise unknown events. In business, predictive analytics is used to identify risks and opportunities that aid in decision-making.
    • Text Analytics
      • Text Analytics, also referred to as Text Mining or as Text Data Mining is the process of deriving high-quality information from text. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data using means such as statistical pattern learning, and finally evaluation and interpretation of the output.
Definition:

Data Analysis is defined by the statistician John Tukey in 1961 as "Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”


Thus, data analysis is a process for obtaining large, unstructured data from various sources and converting it into information that is useful for −
  • Answering questions
  • Test hypotheses
  • Decision-making
  • Disproving theories

EDA Quantitative Techniques..... 2 be completed


Monday, 25 April 2022

DS#6 Components of Python for data Science

 Components of Python for DS

1.3.1 Python Components for Data Science

Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components:

Data exploration & analysis.

  • Included here: Pandas; NumPy; SciPy; a helping hand from Python’s Standard Library.

Data visualization. A pretty self-explanatory name. Taking data and turning it into something colorful.

  • Included here: Matplotlib; Seaborn; Datashader; others.

Classical machine learning. Conceptually, we could define this as any supervised or unsupervised learning task that is not deep learning (see below). Scikit-learn is far-and-away the go-to tool for implementing classification, regression, clustering, and dimensionality reduction, while StatsModels is less actively developed but still has a number of useful features.

  • Included here: Scikit-Learn, StatsModels.

Deep learning. This is a subset of machine learning that is seeing a renaissance, and is commonly implemented with Keras, among other libraries. It has seen monumental improvements over the last ~5 years, such as AlexNet in 2012, which was the first design to incorporate consecutive convolutional layers.

  • Included here: Keras, TensorFlow, and a whole host of others.

Data storage and big data frameworks. Big data is best defined as data that is either literally too large to reside on a single machine, or can’t be processed in the absence of a distributed environment. The Python bindings to Apache technologies play heavily here.

  • Apache Spark; Apache Hadoop; HDFS; Dask; h5py/pytables.

odds and ends. Includes subtopics such as natural language processing, and image manipulation with libraries such as OpenCV.

  • Included here: nltk; Spacy; OpenCV/cv2; scikit-image; Cython.

Among the various components listed above, we will discuss about most important components in Python for Data Science.

1.3.1.1. pandas (Python n-dimensional arrays)

Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating data. The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Features:

·         Provides a really fast and efficient way to manage and explore data

·         Alignment and indexing is so easy

·         Handling missing data is very much possible

·         Cleaning up data is simple

·         Input and output tools are there

·         Multiple file formats are supported

·         Merging and joining of datasets is so easy

·         A lot of time series support in pandas

·         Optimized performance for arrays is supported

·         Very opt for Visualization tool

·         Supports Grouping

·         Supports Masking data

·         Supports to pick unique data

·         Perform mathematical operations on the data

 

Installation:

            pip install pandas

Usage:

            import pandas as pd

1.3.1.2. NumPy:

Python numPy is a library adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. 

numPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package.

 Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use numPy that converts monotonous code into the compiled form.

 Faster than Python inherent list data structure, takes less storage and more compact.

 

Important Features of numpy

  1.        High-performance N-dimensional array object - 1D array, 2D Array, Multi-Dimensional Array
  2.         It contains Tools for integrating many C/C++ like languages
  3.         It contains a multidimensional container for generic data
  4.         It supports linear algebra, Fourier transform, and random number capabilities
  5.         It consists of broadcasting functions
  6.         It had data type definition capability to work with varied databases

Installation:

            pip install numpy

 Usage:

            import numpy as np


 Which is better NumPy or Pandas?

NumPy is more memory efficient in comparison to Pandas.

It helps to work on the “N” dimensional data structure which gives it a clear edge over Pandas data frames.

When it comes to working in the domain of data science, the NumPy library possesses multiple toolkits such as Tensorflow and Seaborn which can be fed to the models, unlike Pandas.

NumPy is also relatively faster than the Pandas series as it takes much time for indexing the data frames.

Pandas have their own importance as the python library, but looking at all the above advantages offered by the NumPy, the conclusion is that NumPy is better than Pandas.

1.3.1.3. matplotlib

Matplotlib is a low level graph plotting library in python that serves as a visualization utility.

Matplotlib was created by John D. Hunter.

Matplotlib is open source and we can use it freely.

Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for Platform compatibility.

Matplotlib is one of the most amazing powerful libraries in python for data visualization.

Data visualization which is the process of translating the numbers, text, or large data sets into various types of graphs such as histograms, maps, bar plots, pie charts, etc. For visualizations, we need some tools or technology. This is the library which helps to draw charts.

Matplotlib is an open-source drawing library that supports various drawing types, Easy to generate plots, histograms, bar charts, scatter and other types of charts with just a few lines of code

It’s often used in web application servers, shells, and Python scripts


The above fig shows Various Drawing Types supported by matplotlib.pyplot

Features

·        Supports 14 types of charts

·        Charting is easy, fun and child’s play with matplotlib.

·        Supports Titles, Labels, gridlines, legends, color and some cool visual aids for charts

Installation

            pip install  matplotlib

Usage 

Import matplotlib.pyplot as plt

1.3.1.4. scikit  (scikit-learn)

Scikit-learn (Sklearn) is the most useful and robust library for machine learning

It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction

This library is built upon NumPy, SciPy and Matplotlib.

Scikit-learn (Sklearn) is the most useful and robust library for machine learning


This library is built upon NumPy, SciPy and Matplotlib.


Through scikit-learn, we can implement various machine learning models for regression, classification, clustering, and statistical tools for analyzing these models.


It also provides functionality for dimensionality reduction, feature selection, feature extraction, ensemble techniques, and inbuilt datasets. We will be looking into these features one by one.



Sci-kit Features

 

  • Data Splitting -  Splitting the dataset is essential for an unbiased evaluation of prediction performance. 
  • Linear Regression - This supervised ML model is used when the output variable is continuous and it follows linear relation with dependent variables. It can be used to forecast sales in the coming months by analyzing the sales data for previous months.
  • Logistic regression : Logistic Regression is also a supervised regression algorithm just like linear regression. The only difference is that the output variable is categorical. It can be used to predict whether a patient has heart disease or not.
  • Decision Trees : A Decision Tree is a powerful tool that can be used for both classification and regression problems. It uses a tree-like model to make decisions and predict the output. It consists of roots and nodes. Roots represent the decision to split and nodes represent an output variable value. A decision tree is an important concept. Decision trees are useful when the dependent variables do not follow a linear relationship with the independent variable i.e linear regression does not accurate results.
  • The ensemble method is a technique in which multiple models are used to predict the output variable instead of a single one. The dataset is randomly divided into subsets and then passed to different models to train them. The average of all the models is considered when we predict the output. The Ensemble technique is used to reduce the variance-biases trade-off.There are generally two types of ensembling techniques:
    • Bagging is a technique in which multiple models of the same type are trained with random samples from the training set. The inputs to different models are independent of each other.
    • Boosting is a technique in which multiple models are trained in such a way that the input of a model is dependent on the output of the previous model. In Boosting, the data which is predicted incorrectly is given more preference.
  • Random Forest
    • Random Forest is a bagging technique in which hundreds/thousands of decision trees are used to build the model. Random Forest can be used for both classification and regression problems. It can be used to classify loan applicants, identify fraudulent activity and predict diseases.
  • XG Boost
    • XGBoost stands for eXtreme Gradient Boosting. It is a boosting technique that provides a high-performance implementation of gradient boosted decision trees. The main features of XG-Boost are it can handle missing data on its own, it supports regularization and generally gives much more accurate results than other models.
  • Support Vector Machines(SVM)
    • Supervised Vector Machine is a supervised ML algorithm in which we plot each data item as a point in n-dimensional space where n is the number of features in the dataset. After, we perform classification by finding the hyperplane that differentiates the classes very well. The data points which are closest to the hyperplane are called support vectors. It can also be used for regression problems but generally used in classification only. It is used in many applications such as face detection, classification of mails, etc.
  • Feature Extraction
  • Cross validation
  • Clustering
  • Scaling – Standardization, and Normalisation

Installation : 

            pip install -U scikit-learn

Usage : 

            import sklearn as skl

# For Reference


DEEP LEARNING FOR UNIVARIATE SERIES WORKOUT

 Reference run in PyCharm:

# From Reference https://machinelearningmastery.com/how-to-develop-deep-learning-models-for-univariate-time-series-forecasting/

import pandas as pd
import pandas_ta
import matplotlib.pyplot as plt
import statistics

#for DL
from math import sqrt
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from statistics import median


def linReg():
import pandas as pd
df = pd.read_csv('TSLA.csv')


# Load .csv file as DataFrame
df = pd.read_csv('TSLA.csv')

# print the data
print(df)

# print some summary statistics
print(df.describe())


# Indexing data using a DatetimeIndex
df.set_index(pd.DatetimeIndex(df['Date']), inplace=True)

# Keep only the 'Adj Close' Value
df = df[['Adj Close']]

# Re-inspect data
print(df)

print(df.info())

plt.plot(df[['Adj Close']])
plt.title('TESLA Share Price')
plt.xlabel('Year')
plt.ylabel('Adj Close Volume')
plt.savefig('TESLA.png')
plt.show()


df.ta.ema(close='Adj Close',length=10, append=True)

# will give Nan Values for First 10 Rows
# We have to fillup data

df = df.iloc[10:]

print(df.head(10))

#
plt.plot(df['Adj Close'])
plt.plot(df['EMA_10'])
plt.xlabel('Year')
plt.ylabel('Adj Close/EMA_10')
plt.title('TESLA Share Price with EMA overlaid')
plt.legend(["blue", "orange"], loc=0)
plt.savefig('TESLA_EMA_10.png')
plt.show()


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[['Adj Close']], df[['EMA_10']], test_size=.2)
#
from sklearn.linear_model import LinearRegression
# Create Regression Model
model = LinearRegression()
# Train the model
model.fit(X_train, y_train)
# Use model to make predictions
y_pred = model.predict(X_test)

#
# # #Test Set
print(X_test.describe())
# # # Training set
print(X_train.describe())
# #
#
# # y_pred_1000=model.predict([['1000']])
# # print('...........',y_pred_1000)
#
# from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
# # Printout relevant metrics
# print("Model Coefficients:", model.coef_) # [[0.98176283]]
# print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred)) #6.21531704292117
# print("Coefficient of Determination:", r2_score(y_test, y_pred)) #0.9942788743625711

#https://machinelearningmastery.com/how-to-develop-deep-learning-models-for-univariate-time-series-forecasting/

def drawSeries():
df= pd.read_csv('carsales.csv', header=0, index_col=0)
print(df.shape)


plt.plot(df)
plt.show()

################################################DL
def deepLearn1():

# persistence

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]


# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values


# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))


# difference dataset
def difference(data, interval):
return [data[i] - data[i - interval] for i in range(interval, len(data))]


# fit a model
def model_fit(train, config):
return None


# forecast with a pre-fit model
def model_predict(model, history, config):
values = list()
for offset in config:
values.append(history[-offset])
return median(values)


# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error


# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
return scores


# summarize model performance
def summarize_scores(name, scores):
# print a summary
scores_m, score_std = mean(scores), std(scores)
print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
# box and whisker plot
pyplot.boxplot(scores)
pyplot.show()


series = read_csv('carsales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [12, 24, 36]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('persistence', scores)

##############################################MLP
#for MLP
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from matplotlib import pyplot

def MLP():
# evaluate mlp

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))

# fit a model
def model_fit(train, config):
# unpack config
n_input, n_nodes, n_epochs, n_batch = config
# prepare data
data = series_to_supervised(train, n_in=n_input)
train_x, train_y = data[:, :-1], data[:, -1]
# define model
model = Sequential()
model.add(Dense(n_nodes, activation='relu', input_dim=n_input))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model

# forecast with a pre-fit model
def model_predict(model, history, config):
# unpack config
n_input, _, _, _ = config
# prepare data
x_input = array(history[-n_input:]).reshape(1, n_input)
# forecast
yhat = model.predict(x_input, verbose=0)
return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
return scores

# summarize model performance
def summarize_scores(name, scores):
# print a summary
scores_m, score_std = mean(scores), std(scores)
print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
# box and whisker plot
pyplot.boxplot(scores)
pyplot.show()

series = read_csv('carsales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [24, 500, 100, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('mlp', scores)


#CNN
#for CNN
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from matplotlib import pyplot

def ConvNet():
# evaluate cnn

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))

# fit a model
def model_fit(train, config):
# unpack config
n_input, n_filters, n_kernel, n_epochs, n_batch = config
# prepare data
data = series_to_supervised(train, n_in=n_input)
train_x, train_y = data[:, :-1], data[:, -1]
train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))
# define model
model = Sequential()
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, 1)))
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model

# forecast with a pre-fit model
def model_predict(model, history, config):
# unpack config
n_input, _, _, _, _ = config
# prepare data
x_input = array(history[-n_input:]).reshape((1, n_input, 1))
# forecast
yhat = model.predict(x_input, verbose=0)
return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
return scores

# summarize model performance
def summarize_scores(name, scores):
# print a summary
scores_m, score_std = mean(scores), std(scores)
print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
# box and whisker plot
pyplot.boxplot(scores)
pyplot.show()

series = read_csv('carsales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [36, 256, 3, 100, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('cnn', scores)

# #LSTM
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from matplotlib import pyplot
#
#
def LsTM():
# evaluate lstm

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))

# difference dataset
def difference(data, interval):
return [data[i] - data[i - interval] for i in range(interval, len(data))]

# fit a model
def model_fit(train, config):
# unpack config
n_input, n_nodes, n_epochs, n_batch, n_diff = config
# prepare data
if n_diff > 0:
train = difference(train, n_diff)
data = series_to_supervised(train, n_in=n_input)
train_x, train_y = data[:, :-1], data[:, -1]
train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(n_nodes, activation='relu', input_shape=(n_input, 1)))
model.add(Dense(n_nodes, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model

# forecast with a pre-fit model
def model_predict(model, history, config):
# unpack config
n_input, _, _, _, n_diff = config
# prepare data
correction = 0.0
if n_diff > 0:
correction = history[-n_diff]
history = difference(history, n_diff)
x_input = array(history[-n_input:]).reshape((1, n_input, 1))
# forecast
yhat = model.predict(x_input, verbose=0)
return correction + yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
return scores

# summarize model performance
def summarize_scores(name, scores):
# print a summary
scores_m, score_std = mean(scores), std(scores)
print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
# box and whisker plot
pyplot.boxplot(scores)
pyplot.show()

series = read_csv('carsales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [36, 50, 100, 100, 12]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('lstm', scores)


#--------------------------------------------------------------------------------------

from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import TimeDistributed
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from matplotlib import pyplot

def CNNLSTM():
# evaluate cnn lstm

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))

# fit a model
def model_fit(train, config):
# unpack config
n_seq, n_steps, n_filters, n_kernel, n_nodes, n_epochs, n_batch = config
n_input = n_seq * n_steps
# prepare data
data = series_to_supervised(train, n_in=n_input)
train_x, train_y = data[:, :-1], data[:, -1]
train_x = train_x.reshape((train_x.shape[0], n_seq, n_steps, 1))
# define model
model = Sequential()
model.add(TimeDistributed(
Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(None, n_steps, 1))))
model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu')))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(n_nodes, activation='relu'))
model.add(Dense(n_nodes, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model

# forecast with a pre-fit model
def model_predict(model, history, config):
# unpack config
n_seq, n_steps, _, _, _, _, _ = config
n_input = n_seq * n_steps
# prepare data
x_input = array(history[-n_input:]).reshape((1, n_seq, n_steps, 1))
# forecast
yhat = model.predict(x_input, verbose=0)
return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
return scores

# summarize model performance
def summarize_scores(name, scores):
# print a summary
scores_m, score_std = mean(scores), std(scores)
print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
# box and whisker plot
pyplot.boxplot(scores)
pyplot.show()

series = read_csv('carsales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [3, 12, 64, 3, 100, 200, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('cnn-lstm', scores)
#-----------------------------------------------------------------------------------------------------------------------

# # evaluate convlstm
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import ConvLSTM2D
from matplotlib import pyplot

def CONVLSTM():
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]


# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values


# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))


# difference dataset
def difference(data, interval):
return [data[i] - data[i - interval] for i in range(interval, len(data))]


# fit a model
def model_fit(train, config):
# unpack config
n_seq, n_steps, n_filters, n_kernel, n_nodes, n_epochs, n_batch = config
n_input = n_seq * n_steps
# prepare data
data = series_to_supervised(train, n_in=n_input)
train_x, train_y = data[:, :-1], data[:, -1]
train_x = train_x.reshape((train_x.shape[0], n_seq, 1, n_steps, 1))
# define model
model = Sequential()
model.add(
ConvLSTM2D(filters=n_filters, kernel_size=(1, n_kernel), activation='relu', input_shape=(n_seq, 1, n_steps, 1)))
model.add(Flatten())
model.add(Dense(n_nodes, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model


# forecast with a pre-fit model
def model_predict(model, history, config):
# unpack config
n_seq, n_steps, _, _, _, _, _ = config
n_input = n_seq * n_steps
# prepare data
x_input = array(history[-n_input:]).reshape((1, n_seq, 1, n_steps, 1))
# forecast
yhat = model.predict(x_input, verbose=0)
return yhat[0]


# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error


# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
return scores


# summarize model performance
def summarize_scores(name, scores):
# print a summary
scores_m, score_std = mean(scores), std(scores)
print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
# box and whisker plot
pyplot.boxplot(scores)
pyplot.show()


series = read_csv('carsales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [3, 12, 256, 3, 200, 200, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('convlstm', scores)



#################################################### Below Results May Vary depends upon Stochositic nature
drawSeries() #Univariate Series
deepLearn1() #persistence: 1841.156 RMSE (+/- 0.000)
MLP() #mlp: 1573.869 RMSE (+/- 121.720)
ConvNet() #cnn: 1557.113 RMSE (+/- 59.820)
LsTM() #lstm: 2091.674 RMSE (+/- 72.054)
CNNLSTM() #cnn-lstm: 1630.113 RMSE (+/- 180.719)
CONVLSTM() #convlstm: 1768.205 RMSE (+/- 230.159)

Just for Your Information. Happy Deep Learning with AMET ODL.

















































Green Energy - House Construction

With Minimum Meterological data, how i can build model for Green Energy new construction WIth Minimum Meterological data, how i can build m...