# P value in regression analysis stata

Parameter Estimates science j Coef. First, we set out the example we use to explain the multiple regression procedure in Stata.

Number of obs — This is the number of observations used in the regression analysis. It is used in testing the null hypothesis that all of the model coefficients are 0. R-squared — R-Squared is the proportion of variance in the dependent variable science which can be explained by the independent variables math, femalesocst and read.

### Linear regression analysis using Stata

This is an overall measure of the strength of association and does not reflect the extent to which any particular independent variable is associated with the dependent variable. Adj R-squared — This is an adjustment of the R-squared that penalizes the addition of extraneous predictors to the model. The regression equation is presented in analyses different ways, for example:. So for every unit increase in matha. Fortunately, you can check assumptions 3, 4, 5, 6 and 7 using Stata.

When moving on to assumptions 3, 4, 5, 6 and 7, we suggest testing them in this order because it represents an order where, if a violation to the assumption is not correctable, you will no longer be able to use linear regression. In fact, do not be surprised if your data fails one or more of these assumptions since this is fairly typical when working with real-world data rather than regression examples, which often only show you how to carry out linear regression when everything goes well. Just remember that if you do not check that you data meets these assumptions or you test for them incorrectly, the results you get when running linear regression might not be valid.

In practice, checking for assumptions 3, 4, 5, 6 and 7 will probably take up most of your time when value out linear regression.

However, it is not a difficult task, and Stata provides all the tools you need to do this. In the section, Procedurewe illustrate the Stata procedure required to perform linear regression assuming that no assumptions have been violated.

First, we set out the example we use to explain the linear regression procedure in Stata. Studies show that exercising can help prevent heart disease. Within reasonable limits, the more you exercise, the less risk you have of suffering from heart disease.

One way in which exercise reduces your risk of suffering from heart disease is by reducing a fat in your blood, called cholesterol.

### Regression Analysis | Stata Annotated Output

The more you exercise, the lower your cholesterol concentration. Furthermore, it has recently been shown that the amount of time you spend watching TV — an indicator of a sedentary lifestyle — might be a good predictor of heart disease i.

Therefore, a researcher decided to determine if cholesterol concentration was related to time spent watching TV in otherwise healthy 45 to 65 year old men an at-risk category of people.

For example, as people spent more time watching TV, did their cholesterol concentration also increase a positive relationship ; or did the opposite happen?

The researcher also wanted to know the proportion of cholesterol concentration that time spent watching TV could explain, as well as being able to predict cholesterol concentration. The researcher could then determine whether, for example, people that spent eight hours spent watching TV per day had dangerously high levels of cholesterol concentration compared to people watching just two hours of TV.

To carry out the analysis, the researcher recruited healthy male participants between the ages of 45 and 65 years old. The amount of time spent watching TV i. The example and data used for this guide are fictitious. We have just created them for the purposes of this guide. After creating these five variables, we entered the scores for each into the five columns of the Data Editor Edit spreadsheet, as shown below:. In this section, we show you how to analyze your data using multiple regression in Stata when the eight assumptions in the previous section, Assumptionshave not been violated.

You can carry out multiple regression using code or Stata's graphical user interface GUI. After you have carried out your analysis, we show you how to interpret your results.

First, choose whether you want to use code or Stata's graphical user interface GUI. This code is entered into the box below:. You'll see from the code above that continuous independent variables are simply entered "as is", whilst categorical independent variables have the prefix " i " e. You have not made a mistake. You are in the correct place to carry out the multiple regression procedure.

This is just the title that Stata gives, even when running a multiple regression procedure. You will be presented with the regress - Linear regression dialogue box, as shown below:. Select the dependent variable, VO2maxfrom the Dependent variable: Select the categorical independent variable, genderfrom the Independent variables: Leave Factor variable selected in the —Type of variable— area.

Next, in the —Add factor variable— area, leave selected in the Specification: Now, select gender in the Variables box using the drop-down button, and then select " Default " in the Base box.

Finally, click on the button. You analysis be presented with the following dialogue box where the categorical regression variable, i. You will be returned to the regress - Linear regression dialogue box, but with the categorical independent variable, i. This will generate the output. Stata will generate a single piece of output for a multiple regression analysis based on the selections made above, assuming that the value assumptions required for multiple regression have been met. The R 2 and adjusted R 2 can be used to determine how well a regression model fits the data:.

Source — This is the source of variance, Model, Residual, and Total. The Total variance is partitioned into the variance which can be explained by the independent variables Model and the variance which is not explained by the independent variables Residual, sometimes called Error. These can be computed in many ways.

Conceptually, these formulas can be expressed as: S Y — Ypredicted 2. Hence, this would be the squared differences between the predicted value of Y and the mean of Y, S Ypredicted — Ybar 2. The total variance has N-1 degrees of freedom. The model degrees of freedom corresponds to the number of predictors minus 1 K You may think this would be since there were 4 independent variables in the model, mathfemalesocst and read. But, the intercept is automatically included in the model unless you explicitly omit the intercept.

## Interpreting Regression Output

These are computed so you can compute the F ratio, dividing the Mean Square Model by the Mean Square Residual to test the significance of the predictors in the model. Number of obs — This is the number of observations used in the regression analysis.

The p-value associated with this F value is very small 0.

These values are used to answer the question "Do the independent variables reliably predict the dependent variable? The p-value is compared to your alpha level typically 0. You could say that the group of variables math and female can be used to reliably predict science the dependent variable. If the p-value were greater than 0. Note that this is an overall significance test assessing whether the group of independent variables when used together reliably predict the dependent variable, and does not address the ability of any of the particular independent variables to predict the dependent variable.

The ability of each individual independent variable to predict the dependent variable is addressed in the table below where each of the individual variables are listed. R-squared — R-Squared is the proportion of variance in the dependent variable science which can be predicted from the independent variables math, femalesocst and read.