Gilles Hacheme
Anushka Chawla : anushka.chawla[at]univ-amu.fr
Kenza Elass : kenza.elass[at]univ-amu.fr
Carolina Ulloa Suarez : carolina.ulloa-suarez[at]univ-amu.fr
Machine Learning (ML) models, such as Random Forest and Boosting, have shown their ability to get very good prediction results compared to standard econometric approaches such as Linear regression. Indeed, ML models can approximate a very complex relationship between a set of explanatory variables and a dependent variable. That is capturing non-parametrically non-linearities and interactions efficiently. Nonetheless, ML models are seen as back box models while standard econometric models are better in terms of interpretability. In this paper, our goal is making ML models more interpretable by opening the black box to get a model at the same time performant and interpretable. We suggest a method combining Generalized Additive Models (GAM) and a variable selection method (whether LASSO or Autometrics). The GAM part can capture non-linearities and the selection method is used to capture relevant interaction variables. Our simulations and applications show that this method can get very close results to the ones of standard ML models, while being much more interpretable.