Feature selection
What ?
Feature, refined input variable, selection is very important step for most predictive of a given outcome,
"Variable selection is a problem of selecting the subset of features such that accuracy of the induced classifier is maximal."
Why ?
- To reduce cost of risk associated with observing variables.
- To increase predictive power
- To reduce the size of models, so they are easier to trust and understand
- To understand the domain
Sample Problem:
Let M be metric, scoring a model and a feture subset acc. to predictions and features used
Let A be learning algorithm used to build the model
FSS problem1: Select a feature subsets, that maximizes the score that M gives to themodel learned by A using the features s
PBM 2: Selec a feature subset s and learner A': that maximizes the score M gives to the model learned by A' using S features.
M is accuracy + a preference for smaller models A is SVM
Find the minimal Feature subset that maximizes the accuracy of a SVM
other Possibilities for M calibrated accuracy AUC, trade-off b/w accuracy and cost of features.
Ref : Find the importance of feature:
Methods
- Filters
- Wrappers
- Intrinsic
- Hybrid
- Numerical input and numerical output .. Pearson's Coefficient method (linear), Spearman's Rank Correlation Method(Non-Linear)
- Numerical input and categorical output.. ANOVA Correlation (for Linear), Kendall's Correlation (for Non-Linear)
- Categorical input categorical output .. Chi-Squared test(Contingency Tables), Mutual Info
- Categorical input Numerical output
To select top Variables we have to use SCIKIT library and SelectKBest() and SelectPercentile()
Ref : https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/
Customer Churn refers to loss of existing customer incurs heavy loss to any business .
References :
Feature selection : General Customer Churn case Study'
Churn Prediction : Churn Analysis ,
Churn Prediction : Commercial use of Data Science:
No comments:
Post a Comment