## Monte-Carlo Methods for Prediction Functions

I wrote a package Monte Carlo Methods for Prediction functions (mmpf) which does what the name suggests.

Usually we have a prediction function $\hat{f}$ which is a function of our training data ${\mathbf{y}, \mathbf{X}}$. In the most general case, $\hat{f}$ may depend on $\mathbf{X}$ in an arbitrary manner. So plotting $\hat{f}$ when $\mathbf{X}$ has greater than 2-3 column dimensions is impossible.

A solution to this problem is to marginalize $\hat{f}(\mathbf{X})$, so that it only depends on a few dimensions of $\mathbf{X}$.

With a full probability model for the joint distribution of $(X, Y)$, it would possible to marginalize $\hat{f}$ by either integrating these variables out or summing over them. In many cases it is desirable to avoid assuming a full probability model. In these cases this procedure doesn't work because we have no (assumed) joint distribution for $(X, Y)$.

Monte-Carlo integration can allow us to marginalize $\hat{f}$ nonetheless. The most basic way of doing this requires us to first treat $\mathbf{X}$ as having a uniform measure, that is, each value of $\mathbf{X}$ we observe is equally likely, which forms a discrete uniform distribution on $\mathbf{X}$. Since we want to marginalize $\hat{f}$ so that it only depends on a column-wise proper subset of $\mathbf{X}$, say $\mathbf{X}_{u}$, we also need to assume that $\mathbf{X}$ comes from a product distribution, that is, that the probability of $\mathbf{X}_{u}$ and $\mathbf{X}_{-u}$ are independent. These two assumptions allow us to perform a Monte-Carlo integration.

Say we have two independent uniform variables, $X_1$ and $X_2$, which we sample from to get $\mathbf{x}_1$ and $\mathbf{x}_2$. These variables are related to some other random variable $Y$, which we observe as $\mathbf{y}$. We could learn a function $\hat{f}$ which estimates the mapping between $(X_1, X_2)$ and $Y$ using their sample analogues, giving us $\hat{f}$. In this case because $X_1$ is independent of $X_2$, computing $\frac{1}{N} \sum_{i = 1}^N \hat{f} (\mathbf{x}_1, \mathbf{x}_2^{(i)})$ will give the expected prediction of $\hat{f}$ as a function of $X_1$.

Prediction functions estimated from data via nonparametric methods frequently too high dimensional to be directly interpretable. That is, they are not generally factorizable into components which themselves depend on less than two variables. However, they can be made more interpretable by using Monte-Carlo integration to marginalize these functions. Many functions of these marginals are commonly used: partial dependence, average marginal effects, and individual conditional expectations, to name a few.

I have written a package mmpf (Monte-Carlo Marginalization of Prediction Functions) which peforms this operation in a general and efficient manner. This functionality is now used in both mlr and edarf. This package performs Monte-Carlo integration over some variables that the prediction function depends on, and evaluates the marginalized function at points on a uniform or random grid on the other variables.

The mmph package has functions to create these uniform or random grids which are exposed to the user (makeGrid, variableGrid (a method), and cartesianExpand). The function which computes marginalized predictions is marginalPrediction. This function takes an object with an associated prediction function (by default the predict method for that object's class), arguments which control the resolution of the integration and the evaluation grid, the type of evaluation grid, which variables to integrate out, and what function of the marginalized function to evaluate.

The output from marginalPrediction is a list with the marginalized predictions on the grid, and the grid of evaluation points. The function can handle vector-valued prediction functions like multivariate regression and classification, multilabel classification, and multi-class classification, as well as regression.