- Tutors Avenue Center

Introduction

Regression analysis is a statistical tool that is used to develop approximate linear relationships among various variables. Regression analysis formulates an association between several variables. When coming up with the model, it is necessary to separate between dependent and independent variables. Multiple regression analysis focuses on the regression between the dependent variable and several explanatory variables. The paper carries out a multiple regression analysis between the average free-flow speed (kph) and several explanatory variables such as the proportion of heavy vehicles, bendiness measure (degrees turned through per km), visibility, carriageway width (m), hard strip width (m), verge width (m), number of junctions per km and hilliness measure (meters of rising or fall per km).

Scatter diagram

A scatter diagram is a graph that plots two related variables on a Cartesian plane. The independent variable is plotted on the x-axis while the dependent variable is on the y-axis. In this case, the average free-flow speed (kph) is plotted on the y-axis while the other explanatory variables will be plotted on the x-axis. Scatter diagram tries to establish if there exists a linear relationship between two variables plotted on the diagram. This can be observed by looking at the trend of the scatter plots.

*Average free-flow speed (kph) and the proportion of heavy vehicles*

The correlation coefficient = 0.070015.

*Average free-flow speed (kph) and bendiness measure (degrees turned through per km)*

The correlation coefficient = -0.77625.

*Average free-flow speed (kph) and visibility*

The correlation coefficient = 0.59998.

*Average free-flow speed (kph) and carriageway width (m)*

The correlation coefficient = 0.504263.

*Average free-flow speed (kph) and hard strip width (m)*

The correlation coefficient = 0.45776.

*Average free-flow speed (kph) and verge width (m)*

The correlation coefficient = 0.310631.

*Average free-flow speed (kph) and number of junctions per km*

The correlation coefficient = -0.05523

*Average free-flow speed (kph) and hilliness measure (meters of rising or fall per km)*

The correlation coefficient = -0.26919.

Points on the scatter diagram for the various diagrams slope in different directions. The table below summarizes the correlation coefficient for the various explanatory variables.

Variable	Correlation coefficient
The proportion of heavy vehicles	0.070015
Measure of bendiness	-0.77625
Visibility	0.59998
Carriageway width	0.504263
Hard strip width	0.45776
Verge width	0.310631
Number of junctions per km	-0.05523
The measure of hilliness per km	-0.26919.

From the summary above, the visibility has the highest positive correlation coefficient of 0.59998. This implies that visibility will contribute by a large extent to increase in speed. On the other hand, bendiness has the highest negative correlation coefficient (-0.77625).

Simple regression analysis of speed and bendiness

The dependent variable is the mean free-flow speed, while the independent variable is the bendiness.

The regression line will take the form Y = b₀ + b₁X

Y = Mean free flow speed

X = Bendiness (degrees turned through per km)

The theoretical expectations are b₀ can take any value and b₁ < 0 (negative).

Regression Results

Variable		Coefficients of the variable
b₀	Y-intercept	84.45057
X	Bendiness	-0.11647

From the above table, the regression equation can be written as Y = 84.45057 0.11647X_.The intercept value of 84.45057 denotes other variables that affect the average free-flow speed but are not included in the modelling. The coefficient value of -0.11647 implies that as bendiness increases by one unit, the average free-flow speed decreases by 0.11647 units. When the regression equation is compared with the scatter diagram, there is an indication of consistency. The graph of average free-flow speed (kph) and bendiness shows a downward trend with a correlation coefficient of -0.77625. The regression equation above also yields a negative slope. Thus, it is clear that the regression equation is sensible.

Evaluation of regression model

Evaluation of the regression model can be done by testing the statistical significance of the variables. Testing statistical significance shows whether the explanatory variable is a significant determinant of average free-flow speed. A two-tailed t-test is carried out at a 95% level of confidence.

Null hypothesis: Ho: b_i = 0

Alternative hypothesis: Ho: b_i ` 0

	Variable	t values computed	t at ± 0.05	Decision
a₀	Intercept	63.30132	1.9432	Reject
X₁	bendiness	-7.78768	1.9432	Reject

The null hypothesis implies that the variables are not significant determinants of demand. The alternative hypothesis implies that variables are a significant determinant of demand. From the table above, the values of t calculated are greater than the values of t tabulated. Therefore, the null hypothesis will be rejected, and this implies that bendiness is a significant determinant of the speed. Thus, it is statistically significant at the 95% level of significance. The value of the intercept is not relevant when testing the significance of the regression variables. Since the explanatory variable is statistically significant, it implies that the regression line can be used for prediction.

R-square value

The value of R² is 60.26%. It explains 60.26% of the variation in free-flow speed. It is an indication of a strong explanatory variable. Also, the value of adjusted R² is low at 59.26%. The value of R² can be improved on by adding more variables in the regression model.

Analysis of variance

Item	Value	Proportion
The total sum of squares (TSS)	3,647.898364	100.00%
Residual sum of squares (RSS)	1,449.765473	39.74%
The explained sum of squares (ESS)	2,198.132891	60.26%

From the table, it is clear that the explained sum of squares (60.26%) is equal to the value of R² discussed above (60.26%).

Unusual observations

Some of the unusual observations are summarized in the table below.

Observation	Predicted Mean free-flow speed (kph)	Residuals
2	82.81996	-19.29995872
32	82.70349	8.266513564
41	71.17273	-6.292730438
42	77.11282	-10.82281686

There are four outliers in the regression equation. Removal of these points will improve the regression line.

Simple regression analysis of speed and visibility

The dependent variable is the mean free-flow speed while the independent variable is the visibility.

The regression line will take a linear form Y = b₀ + b₁X

Y = Mean free flow speed

X = Visibility

The theoretical expectations are b₀ can take any value and b₁ > 0 (positive).

Regression Results

Variable		Coefficients of the variable
b₀	Y-intercept	64.42415
X	Visibility	0.067293

From the above table, the regression equation can be written as Y = 64.42415 + 0.067293X_.The coefficient value of 0.067293 implies that if visibility increases by one unit, the average free-flow speed will also increase by 0.06793 units. The positive value of the coefficient implies a positive relationship between the variables. When the regression equation is compared with the scatter diagram, there is an indication of consistency. The graph of average free-flow speed (kph) and visibility shows a positive trend with a correlation coefficient of 0.59998. The regression equation above also yields a positive slope. Thus, it is clear that the regression equation is sensible.

Evaluation of regression model

A two-tailed t-test is carried out at a 95% level of confidence to test the significance of the variables

Null hypothesis: Ho: b_i = 0

Alternative hypothesis: Ho: b_i ` 0

	Variable	t values computed	t at ± 0.05	Decision
b₀	Intercept	22.21234	1.9432	Reject
X₁	Visibility	4.743191	1.9432	Reject

From the table above, the values of t calculated are greater than the values of t tabulated. Therefore, the null hypothesis will be rejected, and this implies that visibility is a significant determinant of the explanatory variable (average free-flow speed). Thus, visibility is statistically significant at the 95% level of significance. The value of the intercept is not relevant when testing the significance of the regression variables.

R-square value

The value of R² is 36.00%. This implies that visibility explains only 40% of the variation in free-flow speed. It is an indication of a weak explanatory variable. Also, the value of adjusted R² is low at 34.39%. The value of R² can be improved on by adding more variables in the regression model.

Analysis of variance

Item	Value	Proportion
The total sum of squares (TSS)	3,647.898	100.00%
Residual sum of squares (RSS)	2,334.735	64.00%
The explained sum of squares (ESS)	1,313.164	36.00%

From the table, it is clear that the explained sum of squares (36.00%) is equal to the value of R² discussed above (36.00%).

Unusual observations

Observation	Predicted Mean free-flow speed (kph)	Residuals
1	70.41325	-11.8333
2	76.9407	-13.4207
3	71.82641	8.76359
22	77.47904	18.35096
27	83.80461	8.275389
29	72.29746	8.362537
32	77.00799	13.96201
34	81.65123	-9.40123

Visibility is commonly known to be a significant determinant of average flow speed. The result above is contrary to the common knowledge as indicated as the weak regression line. The regression line has several outliers, and this contributes to the weak model. Removal of the outliers will strengthen the regression equation.

Simple regression analysis of speed and hilliness

The dependent variable is the mean free-flow speed while the independent variable is the hilliness

The regression line will take the form Y = b₀ + b₁X

Y = Mean free-flow speed

X = Hilliness

The theoretical expectations are b₀ can take any value and b₁ < 0 (negative).

Regression Results

Variable		Coefficients of the variable
b₀	Y-intercept	80.1933
X	Visibility	-0.20343

From the above table, the regression equation can be written as Y = 80.1933 0.20343X_.The coefficient value of -0.20343 implies that if hilliness increases by one unit, the average free-flow speed decrease by 0.20343 units. The positive value of the coefficient implies a positive relationship between the variables. When the regression equation is compared with the scatter diagram above, there is an indication of consistency. The graph of average free-flow speed (kph) and hilliness shows a negative trend with a correlation coefficient of -0.26919. The regression equation above also yields a negative slope. Thus, it is clear that the regression equation is sensible.

Evaluation of regression model

A two-tailed t-test is carried out at a 95% level of confidence to test the significance of the variables

Null hypothesis: Ho: b_i = 0

Alternative hypothesis: Ho: b_i ` 0

The table below summarizes the results of the t-tests.

	Variable	t values computed	t at ± 0.05	Decision
a₀	Intercept	34.86548	1.9432	Reject
X₁	Visibility	-1.76774	1.9432	Reject

From the table above, the value of t calculated is less than the values of t tabulated for visibility. Therefore, the null hypothesis will not be rejected, and this implies that hilliness is not a significant determinant of the explanatory variable (average free-flow speed). Thus, hilliness is not statistically significant at the 95% level of significance. The regression model shows that the slope is weak and cannot explain the variations in speed.

R-square value

The value of R² is 7.25%. This implies that hilliness explains only 7.25% of the variation in free-flow speed. It is an indication of a weak explanatory variable. Also, the value of adjusted R² is low at 4.92%. The value of R² can be improved on by adding more variables in the regression model.

Analysis of variance

The table below summarizes the analysis of variance.

Item	Value	Proportion
The total sum of squares (TSS)	3,647.898	100.00%
Residual sum of squares (RSS)	3,383.565	92.75%
The explained sum of squares (ESS)	264.3336	7.25%

The RSS is greater than ESS by a large margin. From the table, the explained sum of squares (7.25%) is equal to the value of R² discussed above (7.25%). It shows that the model is irrelevant in determining the variations of speed. In real life, technology has lead to innovation of high power car such that hilliness cannot cause a reduction of speed.

Unusual observations

Over 90% of the observations are outliers. Thus, the removal of all these points would amount to eliminating the variable from the regression model.

Multiple regression regression results

The regression line will take the form Y = a₀ + a₁X₁+ a₂X₂+ a₃X₃+ a₄X₄+ a₅X₅+ a₆X₆+ a₇X₇+ a₈X_8.This section will summarize the results of various iterations of multiple regression analysis.

First regression speed and proportion of heavy vehicles

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	9.200719643	0.443903	1.9432	Do not reject h_o
R²	0.49%

The variable is not statistically significant at the 95% level of confidence.

Second regression speed and proportion of heavy vehicles and bendiness

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	13.78559	1.052812	1.9432	Do not reject h_o
X₂	Measure of bendiness	-0.11718	-7.83748	1.9432	Reject h_o
R²	61.36%

The additional variable is statistically significant, and it improves the value of R² to 61.36%.

Third regression speed and proportion of heavy vehicles, bendiness and visibility

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	5.886797	0.442987	1.9432	Do not reject h_o
X₂	Bendiness	-0.09756	-5.53575	1.9432	Reject h_o
X3	Visibility	0.026284	1.942884	1.9432	Indifferent
R²	64.85%

The additional variable improves the value of R² to 64.85%

Fourth regression speed and proportion of heavy vehicles, bendiness, visibility and carriageway width

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	1.377758	0.100896	1.9432	Do not reject h_o
X₂	Bendiness	-0.09385	-5.29391	1.9432	Reject h_o
X₃	Visibility	0.018324	1.236613	1.9432	Do not reject h_o
X₄	Carriageway width	1.230603	1.267833	1.9432	Do not reject h_o
R²	66.85%

The additional value reduces the values of t computed, but it increases the R². It is not statistically significant.

Fifth regression speed and proportion of heavy vehicles, bendiness, visibility, carriageway width and hard strip width

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	-5.4002	-0.38987	1.9432	Do not reject h_o
X₂	Bendiness	-0.08736	-4.94808	1.9432	Reject h_o
X₃	Visibility	0.016234	1.121482	1.9432	Do not reject h_o
X₄	Carriageway width	1.067432	1.124098	1.9432	Do not reject h_o
X₅	hard strip width	4.755019	1.74287	1.9432	Do not reject h_o
R²	68.93%

The additional value reduces the values of t computed, but it increases the R². It is not statistically significant.

Sixth regression speed and proportion of heavy vehicles, bendiness, visibility, carriageway width, hard strip width and verge width

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	-3.55509	-0.248	1.9432	Do not reject h_o
X₂	Bendiness	-0.09101	-4.81811	1.9432	Reject h_o
X₃	Visibility	0.01441	0.964382	1.9432	Do not reject h_o
X₄	Carriageway width	1.135895	1.176334	1.9432	Do not reject h_o
X₅	hard strip width	5.263397	1.82196	1.9432	Do not reject h_o
X₆	Verge width	-0.4176	-0.58207	1.9432	Do not reject h_o
R²	69.23%

The additional value increases the values of t computed for other variables, and it also increases the R². It is not statistically significant.

Seventh regression speed and proportion of heavy vehicles, bendiness, visibility, carriageway width, hard strip width, verge width and number of junctions

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	-2.66146	-0.18201	1.9432	Do not reject h_o
X₂	Bendiness	-0.09256	-4.77427	1.9432	Reject h_o
X₃	Visibility	0.013669	0.899622	1.9432	Do not reject h_o
X₄	Carriageway width	1.078633	1.095975	1.9432	Do not reject h_o
X₅	hard strip width	5.278448	1.806581	1.9432	Do not reject h_o
X₆	Verge width	-0.45175	-0.6195	1.9432	Do not reject h_o
X₇	Number of junctions per km	-0.59045	-0.46883	1.9432	Do not reject h_o
R²	69.42%

The additional value reduces the values of t computed for other variables, and it increases the R². It is not statistically significant.

Eighth regression speed and proportion of heavy vehicles, bendiness, visibility, carriageway width, hard strip width, verge width, number of junctions and hilliness

	Variable	Coefficient	t values computed	t at ± 0.05	Decision
X₁	The proportion of heavy vehicles	-2.6943	-0.18173	1.9432	Do not reject h_o
X₂	Bendiness	-0.09256	-4.70873	1.9432	Reject h_o
X₃	Visibility	0.01217	0.743959	1.9432	Do not reject h_o
X₄	Carriageway width	1.16098	1.11353	1.9432	Do not reject h_o
X₅	hard strip width	4.930325	1.528246	1.9432	Do not reject h_o
X₆	Verge width	-0.43526	-0.58674	1.9432	Do not reject h_o
X₇	Number of junctions per km	-0.6492	-0.50131	1.9432	Do not reject h_o
X₈	The measure of hilliness per km	-0.02419	-0.27249	1.9432	Do not reject h_o
^R2	69.49%

The additional value reduces the values of t computed for other variables, and it also increases the R². It is not statistically significant.

From the regression analysis above, only the variables are significant, and they lead to an increase in values of t calculated these are, bendiness, visibility, and hardship strip. The variables increase the values of t computed. The variables also increase the amount of R² by a large margin. All the other variables should be dropped from the regression model.

Alternative models

There are several modelling techniques that can be used apart from the regression model. Some of them are polynomial models, logit and probit, among others. An example of the polynomial regression is shown below.

Y = a₀ + a₁X₁+ a₂X₂+ a₃X₃² + a₄X₄+ a₅X₅+ a₆X₆+ a₇X₇

~~Need help with assignments?~~

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now