Kevan Oswald

# Multiple Regression Analysis

Multiple regression is used to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. For example, a university may want to know what factors contribute most to the successful graduation of students. A successful graduation would be the dependent variable, and things like the choice of major, the level of the student’s social involvement, marital status of the student, financial situation, employment, students age, GPA, and other factors would be the independent or predictor variables. The goal would be to learn which of these factors contribute to the likelihood of a student graduating.

In another example, we may want to understand how variations in sales (the dependent variable) are explained by variations in advertising expenditures, the item price, and changes in packaging (the independent variables).

Multiple regression allows us to answer the question "what are the best predictors of…”. While correlation analysis looks at the strength of the relationship between two variables, regression analysis looks at the combined effect all variables have on the dependent variable. It estimates magnitude, relative importance, and statistical significance of the contribution of each of the predictor variables to the dependent variable. Correlation describes a relationship while regression analysis predicts a value. Regression Analysis can be used to forecast sales, profitability, market share, buying patterns, the impact of marketing programs.

__Example__: For this example we will continue to use our previous example from the correlation analysis about the impact of variables on demonstrator satisfaction. In the SPSS output, the first thing we see is a Model Summary. The R value in this table is called the multiple correlation coefficient because it looks at the association of all the variables together. It ranges from -1 to 1. The closer it is to 1, the stronger the influence all the independent (predictor) variables have of the dependent variable together. When expressed as R squared (also called the coefficient of determination), it explains the percent of the variance in the dependent variable that can be predicted by the combination of the independent variables. In the example below, 49% of the variance of the dependent variable is explained by the independent variables. The remainder of the variation (52%) is unaccounted for. A small R squared value indicates that the model is not a good fit. Adjusted R squared takes into consideration the number of observations and the number of predictor variables. It is superior to R-square because it is sensitive to the addition of irrelevant variables.

In this case, an R value of .697 indicates that these variables do a very good job of predicting overall satisfaction.

The ANOVA table tests for significance. The F test is used to test the null hypothesis that there is no association. The F value should be large and sig should be below .05 to be considered significant. In this case the “Sig.” value is 0, indicating that the model fits very well and is significantly related to the dependent variable.

The coefficients measure how well each of the individual variables contribute to overall satisfaction. They indicate the increase in the value of the dependent variable for each unit increase in the predictor. This is the B value. However, this is an unstandardized coefficient. The Beta creates a standardized measure, the closer to 1, the better the predictor. This is the number that should be used for comparison. The t value helps determine the relative importance of each variable, it is related to “Sig”. As stated earlier “Sig.” should be less .05 for 95% confidence in the ability of the model to explain the dependent variable. The “constant” is the value of the dependent variable if all the other independent variables were zero.

Regression analysis is similar to correlation analysis in that it looks at the individual contribution of each variable. However, the correlation of each variable is calculated only in combination with the other variables.

Examining the results we see that the degree to which demonstrators feel that the company cares about them personally is the best predictor of overall satisfaction, followed by how satisfied demonstrators are with products. With regression analysis we are able to measure the degree to which each variable contributes to overall satisfaction, such as .063 with the call center and .079 with the website. This is the average change we can expect in the satisfaction score given a one unit change in each independent variable. So if Call Center satisfaction was an average of 3 on our 1 to 7 sacale, and we improved that to an average of 4, we would expect to see the the constant rise from .235 to .298.

The real value of regression analysis would be found in a case where data was collected and measured in a non-standardized format (unlike our seven point scale), such as measuring the relationship an employee’s education level (in years), beginning salary (dollars), and months since hire has on an employee’s current salary (the dependent variable). Our coefficient output would show the actual dollar increase in salary, such as $1,000, for each year of education.

In the next example, the percent of students who scored at or above proficiency on a math assessment test (the dependent variable) is analyzed in relation to four predictor variables:

If the student is classified as economically disadvantaged.

If the student to teacher ratio in the class is acceptable.

If the student’s parents indicated that they are heavily involved in the student’s education.

If the student scored proficient or above on an art test.

The model is moderately strong, with a total of 56% of the variance in the percent of students scoring proficient or above on the assessment test being explained by the model.

The Parental Involvement Factor is the strongest predictor of the students text scores; a one-point increase in the mean rating on this factor is associated with a 13.5 point increase in students scoring at or above proficiency on the test.

The percent of students who indicated some sort of financial hardship, negatively predicts student test performance, and a higher student to teacher ratio, positively predicts student test performance.

**Stepwise Regression:** When there is a large number of independent (predictor) variables and the belief is that not all of them are significant, stepwise regression can be used. In stepwise regression, a smaller subset of variables that account for most of the variation in the dependent variable are selected. The independent variables are removed from the equation one at a time.