Dec 24, 2018 - I am assuming that the reader is familiar with Linear regression model and its functionality. Here I have tried to explain logistic regression with.
by David Lillis, Ph.D.
Ordinary Least Squares regression provides linear models of continuous variables. However, much data of interest to statisticians and researchers are not continuous and so other methods must be used to create useful predictive models.
The glm() command is designed to perform generalized linear models (regressions) on binary outcome data, count data, probability data, proportion data and many other data types.
In this blog post, we explore the use of R’s glm() command on one such data type. Let’s take a look at a simple example where we model binary data.
In the mtcars data set, the variable vs indicates if a car has a V engine or a straight engine.
We want to create a model that helps us to predict the probability of a vehicle having a V engine or a straight engine given a weight of 2100 lbs and engine displacement of 180 cubic inches.
First we fit the model:
We use the glm() function, include the variables in the usual way, and specify a binomial error distribution, as follows:
We see from the estimates of the coefficients that weight influences vs positively, while displacement has a slightly negative effect.
The model output is somewhat different from that of an ordinary least squares model. I will explain the output in more detail in the next article, but for now, let’s continue with our calculations.
Remember, our goal here is to calculate a predicted probability of a V engine, for specific values of the predictors: a weight of 2100 lbs and engine displacement of 180 cubic inches.
To do that, we create a data frame called newdata, in which we include the desired values for our prediction.
Now we use the predict() function to calculate the predicted probability. We include the argument type=”response” in order to get our prediction.
The predicted probability is 0.24.
That wasn’t so hard! In our next article, I will explain more about the output we got from the glm() function.
About the Author:David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.
Understanding Probability, Odds, and Odds Ratios in Logistic Regression
Despite the way the terms are used in common English, odds and probability are not interchangeable. Join us to see how they differ, what each one means, and how to tame that tricky beast: Odds Ratios.
Related Posts
I am having trouble interpreting the results of a logistic regression. My outcome variable is
My predictor variable is
I want to know how the probability of taking the product changes as
Decision
and is binary (0 or 1, not take or take a product, respectively).My predictor variable is
Thoughts
and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point.I want to know how the probability of taking the product changes as
Thoughts
changes. The logistic regression equation is:
According to this model,
Thought
s has a significant impact on probability of Decision
(b = .72, p = .02). To determine the odds ratio of Decision
as a function of Thoughts
:![Logistic regression in rstudio Logistic regression in rstudio](http://www.sthda.com/english/sthda-upload/figures/machine-learning-essentials/027-logistic-regression-probabilities-curve-1.png)
Odds ratio = 2.07.
Questions:
- How do I interpret the odds ratio?
- Does an odds ratio of 2.07 imply that a .01 increase (or decrease) in
Thoughts
affect the odds of taking (or not taking) the product by 0.07 OR - Does it imply that as
Thoughts
increases (decreases) by .01, the odds of taking (not taking) the product increase (decrease) by approximately 2 units?
- Does an odds ratio of 2.07 imply that a .01 increase (or decrease) in
- How do I convert odds ratio of
Thoughts
to an estimated probability ofDecision
?
Or can I only estimate the probability ofDecision
at a certainThoughts
score (i.e. calculate the estimated probability of taking the product whenThoughts 1
)?
25.1k1414 gold badges6161 silver badges8181 bronze badges
Sudy MajdSudy Majd
2 Answers
The coefficient returned by a logistic regression in r is a logit, or the log of the odds. To convert logits to odds ratio, you can exponentiate it, as you've done above. To convert logits to probabilities, you can use the function
exp(logit)/(1+exp(logit))
. However, there are some things to note about this procedure.First, I'll use some reproducible data to illustrate
This returns:
The coefficients displayed are for logits, just as in your example. If we plot these data and this model, we see the sigmoidal function that is characteristic of a logistic model fit to binomial data
Note that the change in probabilities is not constant - the curve rises slowly at first, then more quickly in the middle, then levels out at the end. The difference in probabilities between 10 and 12 is far less than the difference in probabilities between 12 and 14. This means that it's impossible to summarise the relationship of age and probabilities with one number without transforming probabilities.
To answer your specific questions:
How do you interpret odds ratios?
The odds ratio for the value of the intercept is the odds of a 'success' (in your data, this is the odds of taking the product) when x = 0 (i.e. zero thoughts). The odds ratio for your coefficient is the increase in odds above this value of the intercept when you add one whole x value (i.e. x=1; one thought). Using the menarche data:
We could interpret this as the odds of menarche occurring at age = 0 is .00000000006. Or, basically impossible. Exponentiating the age coefficient tells us the expected increase in the odds of menarche for each unit of age. In this case, it's just over a quintupling. An odds ratio of 1 indicates no change, whereas an odds ratio of 2 indicates a doubling, etc.
Your odds ratio of 2.07 implies that a 1 unit increase in 'Thoughts' increases the odds of taking the product by a factor of 2.07.
How do you convert odds ratios of thoughts to an estimated probability of decision?
You need to do this for selected values of thoughts, because, as you can see in the plot above, the change is not constant across the range of x values. If you want the probability of some value for thoughts, get the answer as follows:
triddletriddle
Odds and probability are two different measures, both addressing the same aim of measuring the likeliness of an event to occur. They should not be compared to each other, only among themselves!
While odds of two predictor values (while holding others constant) are compared using 'odds ratio' (odds1 / odds2), the same procedure for probability is called 'risk ratio' (probability1 / probability2).
While odds of two predictor values (while holding others constant) are compared using 'odds ratio' (odds1 / odds2), the same procedure for probability is called 'risk ratio' (probability1 / probability2).
In general, odds are preferred against probability when it comes to ratios since probability is limited between 0 and 1 while odds are defined from -inf to +inf.
To easily calculate odds ratios including their confident intervals, see the
oddsratio
package:Here you can simply specify the increment of your continuous variables and see the resulting odds ratios. In this example, the response
admit
is 55 times more likely to occur when predictor gpa
is increased by 5
. If you want to predict probabilities with your model, simply use
pat-spat-stype = response
when predicting your model. This will automatically convert log odds to probability. You can then calculate risk ratios from the calculated probabilities. See ?predict.glm
for more details. 2,14511 gold badge1818 silver badges3434 bronze badges