Well, Python doesn't have stepAIC.
If you think about it, you can find Article like this on StackOverflow.
Refer to the link provided by the respondents (Forward Selection with stats models; but only linear regression is supported, and the index is the coefficient of determination, not AIC). I decided to write stepAIC.
step_aic
def step_aic(model, exog, endog, **kwargs):
"""
This select the best exogenous variables with AIC
Both exog and endog values can be either str or list.
(Endog list is for the Binomial family.)
Note: This adopt only "forward" selection
Args:
model: model from statsmodels.formula.api
exog (str or list): exogenous variables
endog (str or list): endogenous variables
kwargs: extra keyword argments for model (e.g., data, family)
Returns:
model: a model that seems to have the smallest AIC
"""
# exog,Forcibly convert endog to list format
exog = np.r_[[exog]].flatten()
endog = np.r_[[endog]].flatten()
remaining = set(exog)
selected = [] #Factors that confirmed the adoption
#Calculate AIC with constant term only
formula_head = ' + '.join(endog) + ' ~ '
formula = formula_head + '1'
aic = model(formula=formula, **kwargs).fit().aic
print('AIC: {}, formula: {}'.format(round(aic, 3), formula))
current_score, best_new_score = np.ones(2) * aic
#If all factors are adopted or if AIC does not increase no matter which factor is added, the process ends.
while remaining and current_score == best_new_score:
scores_with_candidates = []
for candidate in remaining:
#Calculate AIC when adding the remaining factors one by one
formula_tail = ' + '.join(selected + [candidate])
formula = formula_head + formula_tail
aic = model(formula=formula, **kwargs).fit().aic
print('AIC: {}, formula: {}'.format(round(aic, 3), formula))
scores_with_candidates.append((aic, candidate))
#The factor with the smallest AIC is the best_Candidate
scores_with_candidates.sort()
scores_with_candidates.reverse()
best_new_score, best_candidate = scores_with_candidates.pop()
#If AIC decreases due to the addition of candidate factors, add it as a deterministic factor.
if best_new_score < current_score:
remaining.remove(best_candidate)
selected.append(best_candidate)
current_score = best_new_score
formula = formula_head + ' + '.join(selected)
print('The best formula: {}'.format(formula))
return model(formula, **kwargs).fit()
If the explanatory variables are x and f, it looks like this.
(You can use'y'instead of ['y'])
Is it right ... is it right ...?
The answers were exactly the same as the binomial distribution and logistic regression chapters of Midoribon (Introduction to Statistical Modeling for Data Analysis).
But if you make a mistake, please let me know.
Recommended Posts