Thursday, 19 December 2024

ETo Prediction with other ML

In continuation with ETo Prediction Blog ..........

Prompt:

Can you give me code for the above latest ML algorithms suitable for the above prediction ETo

Here is Python code for Extreme Learning Machine (ELM), a recent and efficient ML algorithm, applied to the prediction of reference evapotranspiration (ETo) based on input parameters.


Extreme Learning Machine (ELM) Code for ETo Prediction

Steps:

  1. Install required libraries.
  2. Load and preprocess the dataset.
  3. Split the dataset into training and testing sets.
  4. Build the ELM model using a simple feedforward neural network.
  5. Train and test the model for ETo prediction.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import tensorflow as tf

# Step 1: Load the dataset
file_path = "Sample.xlsx"  # Replace with your file path
data = pd.read_excel(file_path)

print(data)

# Step 2: Data Preprocessing
# Assuming 'ETo' is the target column
X = data.drop("ET0", axis=1)  # Features
y = data["ET0"]  # Target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=42)

# Step 3: Define the Extreme Learning Machine (ELM) model
class ELM:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize random weights and biases
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.input_weights = np.random.uniform(-1, 1, (self.hidden_size, self.input_size))
        self.bias = np.random.uniform(-1, 1, (self.hidden_size, 1))
        self.output_weights = None

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def fit(self, X, y):
        # Compute hidden layer output
        H = self.sigmoid(np.dot(self.input_weights, X.T) + self.bias)
        H = H.T

        # Compute output weights using Moore-Penrose pseudoinverse
        self.output_weights = np.dot(np.linalg.pinv(H), y)

    def predict(self, X):
        # Compute hidden layer output
        H = self.sigmoid(np.dot(self.input_weights, X.T) + self.bias)
        H = H.T
        # Compute predictions
        return np.dot(H, self.output_weights)

# Step 4: Train the ELM model
input_size = X_train.shape[1]
hidden_size = 50  # Number of hidden neurons (adjustable)
output_size = 1

elm = ELM(input_size, hidden_size, output_size)
elm.fit(X_train, y_train)

# Step 5: Predict and evaluate
y_pred_train = elm.predict(X_train)
y_pred_test = elm.predict(X_test)

# Metrics
train_mse = mean_squared_error(y_train, y_pred_train)
test_mse = mean_squared_error(y_test, y_pred_test)
test_r2 = r2_score(y_test, y_pred_test)

print("Training MSE:", train_mse)
print("Testing MSE:", test_mse)
print("Testing R2 Score:", test_r2)

# Predict for a new sample
sample = np.array([[25, 50, 3, 20, 10, 5]])  # Replace with new input values
sample_scaled = scaler.transform(sample)
eto_prediction = elm.predict(sample_scaled)
print("Predicted ETo for the sample:", eto_prediction[0])

Results


Training MSE: 0.0020746283679554164
Testing MSE: 0.007839549128315061
Testing R2 Score: 0.9869685797505748

Predicted ETo for the sample: 3.9307827153083634 # Please check


Explanation:

  1. Model Architecture:

    • Input Layer: Takes the input features (temperature, humidity, solar radiation, etc.).
    • Hidden Layer: Randomly initialized weights and biases, with a non-trainable hidden layer using the sigmoid activation function.
    • Output Layer: Weights are calculated using the Moore-Penrose pseudoinverse of the hidden layer outputs.
  2. Advantages of ELM:

    • Very fast training due to single-layer optimization.
    • Efficient for large datasets.
  3. Metrics Evaluated:

    • Mean Squared Error (MSE): Measures the prediction error.
    • R² Score: Measures how well the predictions fit the data.
  4. Sample Input: Replace with your data's actual features and test its predictions.


Additional Algorithms for Comparison

If you're interested in trying Gradient Boosting Machines (GBMs) let me know, and I can provide additional implementations. Would you like me to extend the solution?

Prompt:

yes extend code so that i compare the performance of the above algorithm for the prediction of ET0

Here is an extended version of the code to compare the performance of Extreme Learning Machine (ELM) with Gradient Boosting Machine (GBM), XGBoost, and Random Forest for predicting reference evapotranspiration (ETo).


Full Comparison Code


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
import xgboost as xgb

# Step 1: Load the dataset
file_path = "Sample.xlsx"  # Replace with your file path
data = pd.read_excel(file_path)

# Assuming 'ETo' is the target column
X = data.drop("ET0", axis=1)  # Features
y = data["ET0"]  # Target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=42)

# Step 2: Define the Extreme Learning Machine (ELM)
class ELM:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.input_weights = np.random.uniform(-1, 1, (self.hidden_size, self.input_size))
        self.bias = np.random.uniform(-1, 1, (self.hidden_size, 1))
        self.output_weights = None

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def fit(self, X, y):
        H = self.sigmoid(np.dot(self.input_weights, X.T) + self.bias)
        H = H.T
        self.output_weights = np.dot(np.linalg.pinv(H), y)

    def predict(self, X):
        H = self.sigmoid(np.dot(self.input_weights, X.T) + self.bias)
        H = H.T
        return np.dot(H, self.output_weights)

# Step 3: Initialize Models
elm = ELM(input_size=X_train.shape[1], hidden_size=50, output_size=1)
gbm = GradientBoostingRegressor(random_state=42)
rf = RandomForestRegressor(random_state=42)
xgb_model = xgb.XGBRegressor(random_state=42)

# Step 4: Train Models
# ELM
elm.fit(X_train, y_train)

# Gradient Boosting Machine
gbm.fit(X_train, y_train)

# Random Forest
rf.fit(X_train, y_train)

# XGBoost
xgb_model.fit(X_train, y_train)

# Step 5: Make Predictions
y_pred_elm = elm.predict(X_test)
y_pred_gbm = gbm.predict(X_test)
y_pred_rf = rf.predict(X_test)
y_pred_xgb = xgb_model.predict(X_test)

# Step 6: Evaluate Models
def evaluate_model(name, y_test, y_pred):
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    print(f"{name} - MSE: {mse:.2f}, R²: {r2:.2f}")
    return mse, r2

print("Model Performance Comparison:")
evaluate_model("Extreme Learning Machine (ELM)", y_test, y_pred_elm)
evaluate_model("Gradient Boosting Machine (GBM)", y_test, y_pred_gbm)
evaluate_model("Random Forest (RF)", y_test, y_pred_rf)
evaluate_model("XGBoost", y_test, y_pred_xgb)

# Step 7: Predict for a New Sample
sample = np.array([[25, 50, 3, 20, 10, 5]])  # Replace with new input values
sample_scaled = scaler.transform(sample)

pred_elm = elm.predict(sample_scaled)
pred_gbm = gbm.predict(sample_scaled)
pred_rf = rf.predict(sample_scaled)
pred_xgb = xgb_model.predict(sample_scaled)

print("\nPredicted ETo for a new sample:")
print(f"ELM: {pred_elm[0]:.2f}")
print(f"GBM: {pred_gbm[0]:.2f}")
print(f"RF: {pred_rf[0]:.2f}")
print(f"XGBoost: {pred_xgb[0]:.2f}")


Explanation:

  1. Models Included:

    • ELM: Efficient neural network-based approach.
    • Gradient Boosting Machine (GBM): Ensemble method focused on reducing errors iteratively.
    • Random Forest (RF): Ensemble of decision trees for robust predictions.
    • XGBoost: Optimized gradient boosting algorithm known for high performance.
  2. Steps:

    • Models are trained and evaluated on the same train-test split for a fair comparison.
    • Performance metrics include Mean Squared Error (MSE) and R² Score.
    • Predicted ETo values for a new sample are displayed for each model.
  3. Results Comparison:

    • Look for the model with the lowest MSE and highest R² for optimal predictions.


Model Performance Comparison: Extreme Learning Machine (ELM) - MSE: 0.01, R²: 0.99 Gradient Boosting Machine (GBM) - MSE: 0.01, R²: 0.98 Random Forest (RF) - MSE: 0.02, R²: 0.97 XGBoost - MSE: 0.02, R²: 0.97

Predicted ETo for a new sample:
ELM: 3.05
GBM: 3.06
RF: 3.17
XGBoost: 3.08


Next Steps:

  • If you want detailed visualizations (e.g., bar charts for comparison), we can use matplotlib or seaborn.
  • Let me know if you want assistance with hyperparameter tuning or extending to other algorithms like CatBoost or LightGBM!
Prompt:

Yes. Please give me code for hyperparameter tuning or extending to other algorithms like CatBoost or LightGBM!

Below is the extended code for including  hyperparameter tuning for all models using their respective built-in tuners.


Extended Code for Additional Models and Hyperparameter Tuning


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
#from catboost import CatBoostRegressor
import lightgbm as lgb
import xgboost as xgb
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import numpy as np

# Step 1: Load Dataset
file_path = "Sample.xlsx"  # Replace with your file path
data = pd.read_excel(file_path)

# Assuming 'ETo' is the target column
X = data.drop("ET0", axis=1)  # Features
y = data["ET0"]  # Target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=42)

# Step 2: Define Models
gbm = GradientBoostingRegressor(random_state=42)
rf = RandomForestRegressor(random_state=42)
xgb_model = xgb.XGBRegressor(random_state=42)
#catboost_model = CatBoostRegressor(verbose=0, random_state=42)
# lightgbm_model = lgb.LGBMRegressor(random_state=42)

# Step 3: Define Hyperparameter Grids
param_grids = {
    "gbm": {
        "n_estimators": [100, 200],
        "learning_rate": [0.05, 0.1],
        "max_depth": [3, 5],
    },
    "rf": {
        "n_estimators": [100, 200],
        "max_depth": [None, 10, 20],
        "min_samples_split": [2, 5],
    },
    # "xgb": {
    #     "n_estimators": [100, 200],
    #     "learning_rate": [0.05, 0.1],
    #     "max_depth": [3, 5],
    # },
    # "catboost": {
    #     "iterations": [100, 200],
    #     "learning_rate": [0.05, 0.1],
    #     "depth": [3, 5],
    # },
    #  "lightgbm": {
    #     "n_estimators": [100, 200],
    #     "learning_rate": [0.05, 0.1],
    #     "max_depth": [3, 5, -1],
    # },
}

# Step 4: Hyperparameter Tuning
def tune_model(model, param_grid, X_train, y_train):
    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring="neg_mean_squared_error")
    grid_search.fit(X_train, y_train)
    return grid_search.best_estimator_, grid_search.best_params_

# Tune all models
print("Tuning models...")
best_models = {}
best_params = {}
for model_name, model in zip(
    ["gbm", "rf"], [gbm, rf]
):
    best_model, best_param = tune_model(model, param_grids[model_name], X_train, y_train)
    best_models[model_name] = best_model
    best_params[model_name] = best_param
    print(f"Best parameters for {model_name}: {best_param}")

# Step 5: Evaluate All Models
print("\nModel Performance Comparison:")
def evaluate_model(name, model, X_test, y_test):
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    print(f"{name} - MSE: {mse:.2f}, R²: {r2:.2f}")
    return mse, r2

results = {}
for model_name, model in best_models.items():
    print(f"Evaluating {model_name}...")
    results[model_name] = evaluate_model(model_name.upper(), model, X_test, y_test)



# Step 6: Predict for a New Sample
sample = np.array([[25, 50, 3, 20, 10, 5]])  # Replace with new input values
sample_scaled = scaler.transform(sample)

print("\nPredicted ETo for a new sample:")
for model_name, model in best_models.items():
    prediction = model.predict(sample_scaled)
    print(f"{model_name.upper()}: {prediction[0]:.2f}")

# For Visualization of the above two models

# Step 1: Extract Model Names and Metrics
model_names = list(results.keys())
mse_values = [results[model][0] for model in model_names]  # Extract MSE
r2_values = [results[model][1] for model in model_names]   # Extract R²

# Step 2: Create a Bar Plot for MSE
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.bar(model_names, mse_values, color='skyblue')
plt.title('Model Comparison - Mean Squared Error (MSE)', fontsize=14)
plt.ylabel('MSE', fontsize=12)
plt.xlabel('Models', fontsize=12)
plt.xticks(rotation=45)
for i, v in enumerate(mse_values):
    plt.text(i, v + 0.01 * max(mse_values), f'{v:.2f}', ha='center', fontsize=10)

# Step 3: Create a Bar Plot for R²
plt.subplot(1, 2, 2)
plt.bar(model_names, r2_values, color='salmon')
plt.title('Model Comparison - R² Score', fontsize=14)
plt.ylabel('R²', fontsize=12)
plt.xlabel('Models', fontsize=12)
plt.xticks(rotation=45)
for i, v in enumerate(r2_values):
    plt.text(i, v + 0.01 * max(r2_values), f'{v:.2f}', ha='center', fontsize=10)

# Step 4: Display the Plots
plt.tight_layout()
plt.show()


Results


Tuning models...
Best parameters for gbm: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 200}
Best parameters for rf: {'max_depth': 10, 'min_samples_split': 2, 'n_estimators': 100}

Model Performance Comparison:
Evaluating gbm...
GBM - MSE: 0.01, R²: 0.98
Evaluating rf...
RF - MSE: 0.02, R²: 0.97

Predicted ETo for a new sample: GBM: 3.03 RF: 3.18


Explanation:

  1. Hyperparameter Tuning:

    • GridSearchCV is used to find the optimal hyperparameters for each model.
    • Hyperparameter grids are defined for Gradient Boosting (GBM), Random Forest (RF)
  2. Comparison Metrics: 

    • Models are evaluated using Mean Squared Error (MSE) and R² Score on the test set.
  1. Prediction for a New Sample:

    • Predicts ETo using the best model for a hypothetical new input.(left as Student Exercise)


No comments:

Post a Comment

Green Energy - House Construction

With Minimum Meterological data, how i can build model for Green Energy new construction WIth Minimum Meterological data, how i can build m...