Tuesday, 10 May 2022

DS#09 Feature Selection

 Feature selection

What ?

Feature,  refined input variable,   selection is very important step for most predictive of a given outcome, 

"Variable selection is a problem of selecting the subset of features such that accuracy of the induced classifier is maximal."

Why ?

  • To reduce cost of risk associated with observing variables.
  • To increase predictive power
  • To reduce the size of models, so they are easier to trust and understand
  • To understand the domain


Sample Problem:

Let M be metric, scoring a model and a feture subset acc. to predictions and features used

Let A be learning algorithm used to build the model

FSS problem1: Select a feature subsets, that maximizes the score that M gives to themodel learned by A using the features s

PBM 2: Selec a feature subset s and learner A': that maximizes the score M gives to the model learned by A' using S features.

M is accuracy + a preference for smaller models A is SVM

Find the minimal Feature subset that maximizes the accuracy of a SVM

other Possibilities for M calibrated accuracy AUC, trade-off  b/w accuracy and cost of features.

Ref :   Find the importance of feature: 

Methods

  1. Filters
  2. Wrappers
  3. Intrinsic
  4. Hybrid
Selection of Features based on types of input and out put(target) data.

As we know there are numerical integer, numerical float, categorical nominal, categorical ordinal, categorical dichotomous data.

So the inputs, outputs and the methods are discussed below
  1. Numerical input and numerical output .. Pearson's Coefficient method (linear), Spearman's Rank Correlation Method(Non-Linear)
  2. Numerical input and categorical output.. ANOVA Correlation (for Linear), Kendall's Correlation (for Non-Linear)
  3. Categorical input categorical output .. Chi-Squared test(Contingency Tables), Mutual Info
  4. Categorical input Numerical  output

To select top Variables we have to use SCIKIT library and SelectKBest() and SelectPercentile()

Ref : https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/

Customer Churn refers to loss of existing customer incurs heavy loss to any business . 

References :  

  Feature selection  : General   Customer Churn case Study'

 Churn Prediction   : Churn Analysis ,  

 Churn Prediction   : Commercial use of Data Science: 

Formula Ref for Precision & Recall

No comments:

Post a Comment

Making Prompts for Profile Web Site

  Prompt: Can you create prompt to craft better draft in a given topic. Response: Sure! Could you please specify the topic for which you...