Feature selection
What ?
Feature, refined input variable, selection is very important step for most predictive of a given outcome,
"Variable selection is a problem of selecting the subset of features such that accuracy of the induced classifier is maximal."
Why ?
- To reduce cost of risk associated with observing variables.
- To increase predictive power
- To reduce the size of models, so they are easier to trust and understand
- To understand the domain
Sample Problem:
Let M be metric, scoring a model and a feture subset acc. to predictions and features used
Let A be learning algorithm used to build the model
FSS problem1: Select a feature subsets, that maximizes the score that M gives to themodel learned by A using the features s
PBM 2: Selec a feature subset s and learner A': that maximizes the score M gives to the model learned by A' using S features.
M is accuracy + a preference for smaller models A is SVM
Find the minimal Feature subset that maximizes the accuracy of a SVM
other Possibilities for M calibrated accuracy AUC, trade-off b/w accuracy and cost of features.
Ref : Find the importance of feature:
Methods
- Filters
- Wrappers
- Intrinsic
- Hybrid
Selection of Features based on types of input and out put(target) data.
As we know there are numerical integer, numerical float, categorical nominal, categorical ordinal, categorical dichotomous data.
So the inputs, outputs and the methods are discussed below
- Numerical input and numerical output .. Pearson's Coefficient method (linear), Spearman's Rank Correlation Method(Non-Linear)
- Numerical input and categorical output.. ANOVA Correlation (for Linear), Kendall's Correlation (for Non-Linear)
- Categorical input categorical output .. Chi-Squared test(Contingency Tables), Mutual Info
- Categorical input Numerical output
To select top Variables we have to use SCIKIT library and SelectKBest() and SelectPercentile()
Ref : https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/
Customer Churn refers to loss of existing customer incurs heavy loss to any business .
References :
Feature selection : General Customer Churn case Study'
Churn Prediction : Churn Analysis ,
Churn Prediction : Commercial use of Data Science:
Formula Ref for Precision & Recall: