Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now

Introduction

Regression analysis is a statistical tool that is used to develop approximate linear relationships among various variables. Regression analysis formulates an association between several variables. When coming up with the model, it is necessary to separate between dependent and independent variables. Multiple regression analysis focuses on the regression between the dependent variable and several explanatory variables. The paper carries out a multiple regression analysis between the average free-flow speed (kph) and several explanatory variables such as the proportion of heavy vehicles, bendiness measure (degrees turned through per km), visibility, carriageway width (m), hard strip width (m), verge width (m), number of junctions per km and hilliness measure (meters of rising or fall per km).

Scatter diagram

A scatter diagram is a graph that plots two related variables on a Cartesian plane. The independent variable is plotted on the x-axis while the dependent variable is on the y-axis. In this case, the average free-flow speed (kph) is plotted on the y-axis while the other explanatory variables will be plotted on the x-axis. Scatter diagram tries to establish if there exists a linear relationship between two variables plotted on the diagram. This can be observed by looking at the trend of the scatter plots.

Average free-flow speed (kph) and the proportion of heavy vehicles
Average free-flow speed (kph) and the proportion of heavy vehicles

The correlation coefficient = 0.070015.

Average free-flow speed (kph) and bendiness measure (degrees turned through per km)
Average free-flow speed (kph) and bendiness measure (degrees turned through per km)

The correlation coefficient = -0.77625.

Average free-flow speed (kph) and visibility
Average free-flow speed (kph) and visibility

The correlation coefficient = 0.59998.

Average free-flow speed (kph) and carriageway width (m)
Average free-flow speed (kph) and carriageway width (m)

The correlation coefficient = 0.504263.

Average free-flow speed (kph) and hard strip width (m)
Average free-flow speed (kph) and hard strip width (m)

The correlation coefficient = 0.45776.

Average free-flow speed (kph) and verge width (m)
Average free-flow speed (kph) and verge width (m)

The correlation coefficient = 0.310631.

Average free-flow speed (kph) and number of junctions per km
Average free-flow speed (kph) and number of junctions per km

The correlation coefficient = -0.05523

Average free-flow speed (kph) and hilliness measure (meters of rising or fall per km)
Average free-flow speed (kph) and hilliness measure (meters of rising or fall per km)

The correlation coefficient = -0.26919.

Points on the scatter diagram for the various diagrams slope in different directions. The table below summarizes the correlation coefficient for the various explanatory variables.

Variable Correlation coefficient
The proportion of heavy vehicles 0.070015
Measure of bendiness -0.77625
Visibility 0.59998
Carriageway width 0.504263
Hard strip width 0.45776
Verge width 0.310631
Number of junctions per km -0.05523
The measure of hilliness per km -0.26919.

From the summary above, the visibility has the highest positive correlation coefficient of 0.59998. This implies that visibility will contribute by a large extent to increase in speed. On the other hand, bendiness has the highest negative correlation coefficient (-0.77625).

Simple regression analysis of speed and bendiness

The dependent variable is the mean free-flow speed, while the independent variable is the bendiness.

The regression line will take the form Y = b0 + b1X

Y = Mean free flow speed

X = Bendiness (degrees turned through per km)

The theoretical expectations are b0 can take any value and b1 < 0 (negative).

Regression Results

Variable Coefficients of the variable
b0 Y-intercept 84.45057
X Bendiness -0.11647

From the above table, the regression equation can be written as Y = 84.45057  0.11647X. The intercept value of 84.45057 denotes other variables that affect the average free-flow speed but are not included in the modelling. The coefficient value of -0.11647 implies that as bendiness increases by one unit, the average free-flow speed decreases by 0.11647 units. When the regression equation is compared with the scatter diagram, there is an indication of consistency. The graph of average free-flow speed (kph) and bendiness shows a downward trend with a correlation coefficient of -0.77625. The regression equation above also yields a negative slope. Thus, it is clear that the regression equation is sensible.

Evaluation of regression model

Evaluation of the regression model can be done by testing the statistical significance of the variables. Testing statistical significance shows whether the explanatory variable is a significant determinant of average free-flow speed. A two-tailed t-test is carried out at a 95% level of confidence.

Null hypothesis: Ho: bi = 0

Alternative hypothesis: Ho: bi ` 0

Variable t  values computed t at ± 0.05 Decision
a0 Intercept 63.30132 1.9432 Reject
X1 bendiness -7.78768 1.9432 Reject

The null hypothesis implies that the variables are not significant determinants of demand. The alternative hypothesis implies that variables are a significant determinant of demand. From the table above, the values of t  calculated are greater than the values of t  tabulated. Therefore, the null hypothesis will be rejected, and this implies that bendiness is a significant determinant of the speed. Thus, it is statistically significant at the 95% level of significance. The value of the intercept is not relevant when testing the significance of the regression variables. Since the explanatory variable is statistically significant, it implies that the regression line can be used for prediction.

R-square value

The value of R2 is 60.26%. It explains 60.26% of the variation in free-flow speed. It is an indication of a strong explanatory variable. Also, the value of adjusted R2 is low at 59.26%. The value of R2 can be improved on by adding more variables in the regression model.

Analysis of variance

Item Value Proportion
The total sum of squares (TSS) 3,647.898364 100.00%
Residual sum of squares (RSS) 1,449.765473 39.74%
The explained sum of squares (ESS) 2,198.132891 60.26%

From the table, it is clear that the explained sum of squares (60.26%) is equal to the value of R2 discussed above (60.26%).

Unusual observations

Some of the unusual observations are summarized in the table below.

Observation Predicted Mean free-flow speed (kph) Residuals
2 82.81996 -19.29995872
32 82.70349 8.266513564
41 71.17273 -6.292730438
42 77.11282 -10.82281686

There are four outliers in the regression equation. Removal of these points will improve the regression line.

Simple regression analysis of speed and visibility

The dependent variable is the mean free-flow speed while the independent variable is the visibility.

The regression line will take a linear form Y = b0 + b1X

Y = Mean free flow speed

X = Visibility

The theoretical expectations are b0 can take any value and b1 > 0 (positive).

Regression Results

Variable Coefficients of the variable
b0 Y-intercept 64.42415
X Visibility 0.067293

From the above table, the regression equation can be written as Y = 64.42415 + 0.067293X. The coefficient value of 0.067293 implies that if visibility increases by one unit, the average free-flow speed will also increase by 0.06793 units. The positive value of the coefficient implies a positive relationship between the variables. When the regression equation is compared with the scatter diagram, there is an indication of consistency. The graph of average free-flow speed (kph) and visibility shows a positive trend with a correlation coefficient of 0.59998. The regression equation above also yields a positive slope. Thus, it is clear that the regression equation is sensible.

Evaluation of regression model

A two-tailed t-test is carried out at a 95% level of confidence to test the significance of the variables

Null hypothesis: Ho: bi = 0

Alternative hypothesis: Ho: bi ` 0

Variable t  values computed t at ± 0.05 Decision
b0 Intercept 22.21234 1.9432 Reject
X1 Visibility 4.743191 1.9432 Reject

From the table above, the values of t  calculated are greater than the values of t  tabulated. Therefore, the null hypothesis will be rejected, and this implies that visibility is a significant determinant of the explanatory variable (average free-flow speed). Thus, visibility is statistically significant at the 95% level of significance. The value of the intercept is not relevant when testing the significance of the regression variables.

R-square value

The value of R2 is 36.00%. This implies that visibility explains only 40% of the variation in free-flow speed. It is an indication of a weak explanatory variable. Also, the value of adjusted R2 is low at 34.39%. The value of R2 can be improved on by adding more variables in the regression model.

Analysis of variance

Item Value Proportion
The total sum of squares (TSS) 3,647.898 100.00%
Residual sum of squares (RSS) 2,334.735 64.00%
The explained sum of squares (ESS) 1,313.164 36.00%

From the table, it is clear that the explained sum of squares (36.00%) is equal to the value of R2 discussed above (36.00%).

Unusual observations

Observation Predicted Mean free-flow speed (kph) Residuals
1 70.41325 -11.8333
2 76.9407 -13.4207
3 71.82641 8.76359
22 77.47904 18.35096
27 83.80461 8.275389
29 72.29746 8.362537
32 77.00799 13.96201
34 81.65123 -9.40123

Visibility is commonly known to be a significant determinant of average flow speed. The result above is contrary to the common knowledge as indicated as the weak regression line. The regression line has several outliers, and this contributes to the weak model. Removal of the outliers will strengthen the regression equation.

Simple regression analysis of speed and hilliness

The dependent variable is the mean free-flow speed while the independent variable is the hilliness

The regression line will take the form Y = b0 + b1X

Y = Mean free-flow speed

X = Hilliness

The theoretical expectations are b0 can take any value and b1 < 0 (negative).

Regression Results

Variable Coefficients of the variable
b0 Y-intercept 80.1933
X Visibility -0.20343

From the above table, the regression equation can be written as Y = 80.1933  0.20343X. The coefficient value of -0.20343 implies that if hilliness increases by one unit, the average free-flow speed decrease by 0.20343 units. The positive value of the coefficient implies a positive relationship between the variables. When the regression equation is compared with the scatter diagram above, there is an indication of consistency. The graph of average free-flow speed (kph) and hilliness shows a negative trend with a correlation coefficient of -0.26919. The regression equation above also yields a negative slope. Thus, it is clear that the regression equation is sensible.

Evaluation of regression model

A two-tailed t-test is carried out at a 95% level of confidence to test the significance of the variables

Null hypothesis: Ho: bi = 0

Alternative hypothesis: Ho: bi ` 0

The table below summarizes the results of the t-tests.

Variable t  values computed t at ± 0.05 Decision
a0 Intercept 34.86548 1.9432 Reject
X1 Visibility -1.76774 1.9432 Reject

From the table above, the value of t  calculated is less than the values of t  tabulated for visibility. Therefore, the null hypothesis will not be rejected, and this implies that hilliness is not a significant determinant of the explanatory variable (average free-flow speed). Thus, hilliness is not statistically significant at the 95% level of significance. The regression model shows that the slope is weak and cannot explain the variations in speed.

R-square value

The value of R2 is 7.25%. This implies that hilliness explains only 7.25% of the variation in free-flow speed. It is an indication of a weak explanatory variable. Also, the value of adjusted R2 is low at 4.92%. The value of R2 can be improved on by adding more variables in the regression model.

Analysis of variance

The table below summarizes the analysis of variance.

Item Value Proportion
The total sum of squares (TSS) 3,647.898 100.00%
Residual sum of squares (RSS) 3,383.565 92.75%
The explained sum of squares (ESS) 264.3336 7.25%

The RSS is greater than ESS by a large margin. From the table, the explained sum of squares (7.25%) is equal to the value of R2 discussed above (7.25%). It shows that the model is irrelevant in determining the variations of speed. In real life, technology has lead to innovation of high power car such that hilliness cannot cause a reduction of speed.

Unusual observations

Over 90% of the observations are outliers. Thus, the removal of all these points would amount to eliminating the variable from the regression model.

Multiple regression regression results

The regression line will take the form Y = a0 + a1X1 + a2X2 + a3X3 + a4X4 + a5X5 + a6X6 + a7X7 + a8X8. This section will summarize the results of various iterations of multiple regression analysis.

First regression  speed and proportion of heavy vehicles

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles 9.200719643 0.443903 1.9432 Do not reject ho
R2 0.49%

The variable is not statistically significant at the 95% level of confidence.

Second regression  speed and proportion of heavy vehicles and bendiness

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles 13.78559 1.052812 1.9432 Do not reject ho
X2 Measure of bendiness -0.11718 -7.83748 1.9432 Reject ho
R2 61.36%

The additional variable is statistically significant, and it improves the value of R2 to 61.36%.

Third regression  speed and proportion of heavy vehicles, bendiness and visibility

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles 5.886797 0.442987 1.9432 Do not reject ho
X2 Bendiness -0.09756 -5.53575 1.9432 Reject ho
X3 Visibility 0.026284 1.942884 1.9432 Indifferent
R2 64.85%

The additional variable improves the value of R2 to 64.85%

Fourth regression  speed and proportion of heavy vehicles, bendiness, visibility and carriageway width

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles 1.377758 0.100896 1.9432 Do not reject ho
X2 Bendiness -0.09385 -5.29391 1.9432 Reject ho
X3 Visibility 0.018324 1.236613 1.9432 Do not reject ho
X4 Carriageway width 1.230603 1.267833 1.9432 Do not reject ho
R2 66.85%

The additional value reduces the values of t  computed, but it increases the R2. It is not statistically significant.

Fifth regression  speed and proportion of heavy vehicles, bendiness, visibility, carriageway width and hard strip width

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles -5.4002 -0.38987 1.9432 Do not reject ho
X2 Bendiness -0.08736 -4.94808 1.9432 Reject ho
X3 Visibility 0.016234 1.121482 1.9432 Do not reject ho
X4 Carriageway width 1.067432 1.124098 1.9432 Do not reject ho
X5 hard strip width 4.755019 1.74287 1.9432 Do not reject ho
R2 68.93%

The additional value reduces the values of t  computed, but it increases the R2. It is not statistically significant.

Sixth regression  speed and proportion of heavy vehicles, bendiness, visibility, carriageway width, hard strip width and verge width

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles -3.55509 -0.248 1.9432 Do not reject ho
X2 Bendiness -0.09101 -4.81811 1.9432 Reject ho
X3 Visibility 0.01441 0.964382 1.9432 Do not reject ho
X4 Carriageway width 1.135895 1.176334 1.9432 Do not reject ho
X5 hard strip width 5.263397 1.82196 1.9432 Do not reject ho
X6 Verge width -0.4176 -0.58207 1.9432 Do not reject ho
R2 69.23%

The additional value increases the values of t  computed for other variables, and it also increases the R2. It is not statistically significant.

Seventh regression  speed and proportion of heavy vehicles, bendiness, visibility, carriageway width, hard strip width, verge width and number of junctions

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles -2.66146 -0.18201 1.9432 Do not reject ho
X2 Bendiness -0.09256 -4.77427 1.9432 Reject ho
X3 Visibility 0.013669 0.899622 1.9432 Do not reject ho
X4 Carriageway width 1.078633 1.095975 1.9432 Do not reject ho
X5 hard strip width 5.278448 1.806581 1.9432 Do not reject ho
X6 Verge width -0.45175 -0.6195 1.9432 Do not reject ho
X7 Number of junctions per km -0.59045 -0.46883 1.9432 Do not reject ho
R2 69.42%

The additional value reduces the values of t  computed for other variables, and it increases the R2. It is not statistically significant.

Eighth regression  speed and proportion of heavy vehicles, bendiness, visibility, carriageway width, hard strip width, verge width, number of junctions and hilliness

Variable Coefficient t  values computed t at ± 0.05 Decision
X1 The proportion of heavy vehicles -2.6943 -0.18173 1.9432 Do not reject ho
X2 Bendiness -0.09256 -4.70873 1.9432 Reject ho
X3 Visibility 0.01217 0.743959 1.9432 Do not reject ho
X4 Carriageway width 1.16098 1.11353 1.9432 Do not reject ho
X5 hard strip width 4.930325 1.528246 1.9432 Do not reject ho
X6 Verge width -0.43526 -0.58674 1.9432 Do not reject ho
X7 Number of junctions per km -0.6492 -0.50131 1.9432 Do not reject ho
X8 The measure of hilliness per km -0.02419 -0.27249 1.9432 Do not reject ho
R2 69.49%

The additional value reduces the values of t  computed for other variables, and it also increases the R2. It is not statistically significant.

From the regression analysis above, only the variables are significant, and they lead to an increase in values of t  calculated these are, bendiness, visibility, and hardship strip. The variables increase the values of t  computed. The variables also increase the amount of R2 by a large margin. All the other variables should be dropped from the regression model.

Alternative models

There are several modelling techniques that can be used apart from the regression model. Some of them are polynomial models, logit and probit, among others. An example of the polynomial regression is shown below.

Y = a0 + a1X1 + a2X2 + a3X32 + a4X4 + a5X5 + a6X6 + a7X7

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now