2 Simple linear regression
The most elementary regression model is called the simple regression model, which is the model that we will use when only one x variable has an influence on the y variable. When we analyse the data, the process is called simple regression analysis. Usually, the first step in simple regression analysis is to construct a graph to express the relationship between the x and y variables. The simplest graph relating the variable y to a single independent variable x is a straight line. Thus we call the model a simple linear model or a simple straight-line model. Finally, the analysis of the data is then called simple linear regression analysis.
2.1 Scatter plots
The most effective way to display the relationship between two variables is to plot or graph the data. As we have mentioned before, this is usually the first step in simple linear regression analysis as this graph helps us to see at a glance the nature of the relationship to get a “feel” for what we are dealing with. The kind of graph we draw when we have two variables is called a scatter plot.
The scatter plot consists of a horizontal x-axis, a vertical y-axis and a dot for each pair of observations. Let’s have a look at an example that illustrates how to obtain a scatter plot
Example 2.1 It is often said that a person’s income is reflected in his or her savings. The more a person earns, the more he is likely to save. In this illustration the monthly income (x) will be the independent variable and savings (y) the dependent variable. In other words, we want to predict the savings by using monthly income as the predictor. The data for this example were collected for six randomly selected individuals and are given in the below.
Individual | Income (in R1000) | Savings (in R100) |
|---|---|---|
A | 24 | 12.0 |
B | 26 | 14.0 |
C | 12 | 1.5 |
D | 22 | 9.0 |
E | 20 | 6.0 |
F | 18 | 2.0 |
Each data point or observation has two values: its x-value and its y-value and we call the two values a pair. Our sample of data in the table consists of 6 pairs of observations, which also indicates that the sample size, n, is equal to 6.
Figure 2.1: Scatter plot for income-savings data
