If you search for “SHAP analysis,” you’ll find its origins in a 2017 article by Lundberg and Lee titled “A Unified Approach to Interpreting Model Predictions.” This groundbreaking work introduces the concept of “Shapley Additive exPlanations,” often referred to as SHAP. When employing SHAP, the primary objective is to explain a machine learning model’s prediction by quantifying the contribution of each feature to that prediction. The technical foundation of SHAP is rooted in the calculation of Shapley values, a concept rooted in coalitional game theory. These values were christened in tribute to Lloyd Shapley, who introduced the concept in 1951 and earned the Nobel Memorial Prize in Economic Sciences in 2012.

In simple terms, Shapley values serve as a methodology to depict the relative influence of each feature or variable measured on the final output of a machine learning model. This is achieved by comparing the relative impact of inputs against an average reference point.

Shapley Values: Insight Unveiled

Imagine this scenario: a coalition of players joins forces to achieve a specific collective benefit arising from their collaboration. Given that individual players might contribute differently to the coalition and possess diverse degrees of influence or efficiency, how should the eventual distribution of profits be determined among these players in any game? In essence, we seek to comprehend the significance of each participant’s role within the collaborative endeavour and the corresponding compensation they should receive.

The Shapley coefficient values offer a potential solution to this quandary. In machine learning, consider the feature values of a data instance as the participants forming the coalition. The Shapley values then come into play to guide us in equitably distributing the “payout,” which is essentially the prediction among these features.

A “player” in this context can be a single feature value, as commonly found in tabular data. Alternatively, a “player” might be defined as a collection of feature values that function in synergy.

Furthermore, when you’ve trained a random forest model, predictions are derived from an ensemble of diverse decision trees. It’s worth noting that you can compute the Shapley value for each individual tree autonomously, subsequently averaging these values. The resultant averaged Shapley value can then be utilized to determine the contribution of a feature within a random forest. Importantly, this is feasible due to the linearity property inherent in the Shapley value calculation.