Estimate a Multinomial logistic regression (MNL) for classification
To estimate a Multinomial logistic regression (MNL) we require a
categorical response variable with two or more levels and one or more
explanatory variables. We also need to specify the level of the response
variable to be used as the base for comparison. In the example
data file, ketchup
, we could assign heinz28
as
the base level by selecting it from the Choose level
dropdown in the Summary tab.
To access the ketchup
dataset go to Data >
Manage, select examples
from the
Load data of type
dropdown, and press the Load
button. Then select the ketchup
dataset.
In the Summary tab we can test if two or more variables
together improve the fit of a model by selecting them in the
Variables to test
dropdown. This functionality can be very
useful to evaluate the overall influence of a variable of type
factor
with three or more levels.
Additional output that requires re-estimation:
Additional output that does not require re-estimation:
As an example we will use a dataset on choice behavior for 300 individuals in a panel of households in Springfield, Missouri (USA). The data captures information on 2,798 purchase occasions over a period of approximately 2 years and includes the follow variables:
The screenshot of the Data > Pivot tab shown below
indicates that heinz32
is the most popular choice option,
followed by heinz28
. heinz41
and
hunts32
are much less common choices among the household
panel members.
Suppose we want to investigate how prices of the different products
influence the choice of ketchup brand and package size. In the Model
> Multinomial logistic regression (MNL) > Summary tab select
choice
as the response variable and heinz28
from the Choose base level dropdown menu. Select
price.heinz28
through price.hunts32
as the
explanatory variables. In the screenshot below we see that most, but not
all, of the coefficients have very small p.values and that the model has
some predictive power (p.value for the chi-squared statistic < .001).
The left-most output column shows which product a coefficient applies
to. For example, the 2nd row of coefficients and statistics captures the
effect of changes in price.heinz28
on the choice of
heinz32
relative to the base product (i.e.,
heinz28
). If consumers see heinz28
and
heinz32
as substitutes, which seems likely, we would expect
that an increase in price.heinz28
would lead to an increase
in the odds that a consumer chooses heinz32
rather than
heinz28
.
Unfortunately the coefficients from a multinomial logistic regression
model are difficult to interpret directly. The RRR
column,
however, provides estimates of Relative-Risk-Ratios (or odds) that are
easier to work with. The RRR
values are the exponentiated
coefficients from the regression (i.e., $exp(1.099) = 3.000). We see
that the risk
(or odds) of buying heinz32
rather than heinz28
is 3 times higher after a $1 increase
in price.heinz28
, keeping all other variables in the model
constant.
For each of the explanatory variables the following null and alternate hypotheses can be formulated:
A selected set of relative risk ratios from the multinomial logistic regression can be interpreted as follows:
RRR coefficient std.error z.value p.value
heinz32 price.heinz32 0.101 -2.296 0.135 -17.033 < .001 ***
hunts32 price.heinz28 3.602 1.282 0.126 10.200 < .001 ***
hunts32 price.hunts32 0.070 -2.655 0.208 -12.789 < .001 ***
price.heinz32
on the
relative odds or purchasing heinz32
rather than
heinz28
is 0.101. If the price for heinz32
increased by $1, the odds of purchasing heinz32
rather than
heinz28
would decrease by a factor of 0.101, or decrease by
89.9%, while holding all other variables in the model constant.price.heinz28
on the
relative odds or purchasing hunts32
rather than
heinz28
is 3.602. If the price for heinz28
increased by $1, the odds of purchasing hunts32
rather than
heinz28
would increase by a factor of 3.602, or increase by
260.2%, while holding all other variables in the model constant.price.hunts32
on the
relative odds or purchasing hunts32
rather than
heinz28
is 0.070. If the price for hunts32
increased by $1, the odds of purchasing hunts32
rather than
heinz28
would decrease by a factor of 0.070, or decrease by
93%, while holding all other variables in the model constant.The other RRRs
estimated in the model can be interpreted
similarly.
In addition to the numerical output provided in the Summary
tab we can also evaluate the link between choice
and the
prices of each of the four products visually (see Plot tab). In
the screenshot below we see a coefficient (or rather an RRR) plot with
confidence intervals. We see the following patterns:
price.heinz28
increases by $1 the relative
purchase odds for heinz32
, heinz41
, and
hunts32
increase significantlyprice.heinz32
increases, the odds of purchase for
heinz32
compared to heinz28
decrease
significantly. We see the same pattern for heinz41
and
hunts32
when their prices increasehunts32
is the only product to see a significant
improvement in purchase odds relative to heinz28
from an
increase in price.heinz32
Probabilities, are often more convenient for interpretation than
coefficients or RRRs from a multinomial logistic regression model. We
can use the Predict tab to predict probabilities for each of
the different response variable levels given specific values for the
selected explanatory variable(s). First, select the type of input for
prediction using the Prediction input type
dropdown. Choose
either an existing dataset for prediction (“Data”) or specify a command
(“Command”) to generate the prediction inputs. If you choose to enter a
command, you must specify at least one variable and one value in the
Prediction command box to get a prediction. If you do
not specify a value for each of the variables in the model either the
mean value or the most frequently observed level will be used. It is
only possible to predict probabilities based on variables used in the
model. For example, price.heinz32
must be one of the
selected explanatory variables to predict the probability of choosing to
buy heinz32
when priced at $3.80.
hunts32
is available in stores type
disp.hunts32 = "yes"
as the command and press enterheinz41
is (not)
on display and (not) featured type
disp.heinz41 = c("yes", "no"), feat.heinz41 = c("yes", "no")
and press enterprice.heinz28
increases type
price.heinz28 = seq(3.40, 5.20, 0.1)
and press enter. See
screenshot below.
The figure above shows that the probability of purchase drops sharply
for heinz28
as price.heinz28
increases.
heinz32
, the most popular option in the data, is predicted
to see a large increase in purchase probability following an increase in
price.heinz28
. Although the predicted increase in purchase
probability for hunts32
does not look as impressive in the
graph compared to the effect on heinz32
, the relative
predicted increase is larger (i.e., 3.2% to 8.4% for
hunts32
versus 39.3% to 72.8% for
heinz32
).
For a more comprehensive assessment of the impact of price changes
for each of the four products on purchase probabilities we can generate
a full table of predictions by selecting Data
from the
Prediction input type
dropdown in the Predict tab
and selecting ketchup
from the Predict data
dropdown. You can also create a dataset for input in Data >
Transform using Expand grid
or in a spreadsheet and
then paste it into Radiant using the Data > Manage tab.
Once the desired predictions have been generated they can be saved to
a CSV file by clicking the download icon on the top right of the
prediction table. To add predictions to the dataset used for estimation,
click the Store
button.
Note that MNL models generate as many columns of probabilities as
there are levels in the categorical response variable (i.e., four in the
ketchup data). If you want to add only the predictions for the first
level (i.e., heinz28
) to the dataset used for estimation,
provide only one name in the Store predictions
input. If
you want to store predictions for all ketchup products, enter four
variable names, separated by a comma.
Note: We ignored endogeneity concerns in the above discussion. Suppose, for example, that
price.heinz28
changes due to changes in the quality ofheinz28
. Changes in quality effect the price and, likely, also demand for the product. Unless we control in some way for these changes in quality, the estimated effects of price changes are likely to be incorrect (i.e., biased).
Add code to
Report
> Rmd to (re)create the analysis by clicking the
icon on the bottom
left of your screen or by pressing ALT-enter
on your
keyboard.
If a plot was created, it can be customized using
ggplot2
commands or with gridExtra
. See
example below and
Data
> Visualize for details.
For an overview of related R-functions used by Radiant to estimate a multinomial logistic regression model see Model > Multinomial logistic regression.
The key functions used in the mnl
tool are
multinom
from the nnet
package and
linearHypothesis
from the car
package.