Evaluating Linear Relationships

Scatterplots

Scatterplots are used to visually assess the relationship between two numeric variables. Typically, the explanatory variable is placed on the X axis and the independent variable is placed on the Y axis.

Image_by_Author.png

Now, both linear relationships pictured below are positive. As X increases, Y also increases. Yet again, the relationship represented in the scatterplot on the right is far stronger than that in the scatterplot on the left.

Image_by_Emily.png

 

Often, the relationship between two continuous variables isn’t linear at all. One such non-linear relationship is pictured below — as X increases, Y follows a parabolic shape. There appears to be a strong and important relationship between these variables, but it would not be captured by techniques designed to assess linear relationships (e.g., correlation and regression). The possibility of a relationship such as that pictured below underscores the importance of producing a scatterplot before running analyses, as this meaningful relationship could be completely missed in an analysis that skips data visualization.

Image_by_Emily_2.png

Correlation Coefficient 

Once you’ve seen a somewhat linear relationship on your scatterplot, you can calculate a correlation coefficient to get a number representing the strength of the association. Correlation coefficients can be either negative or positive (which indicates a negative or positive relationship, respectively) and range from -1 to 1, with the ends of this spectrum representing strong relationships and 0 indicating that there is no linear relationship between the variables.

Monotonic_non-linear_relationship.png

 Monotonic, non-linear relationship. Image from https://thenounproject.com/term/graph-curve-up/2064827/

Image_by_Emily_3.png

One line of R code is all it takes to produce both the Pearson correlation coefficient and the associated t-test output for the “weak” positive correlation pictured on the left:

cor.test.png

 

As can be seen in the output below, the Pearson correlation coefficient (0.78) is very large even in this “weak” relationship. The p-value associated with the t-test statistic is well below 0.05, indicating a significant relationship:

Pearsons_Product.png

cor.test_data.png

Now the relationship is almost equivalent to 1, which confirms the very strong relationship that we could observe in the scatterplot above. Yet again, the relationship is statistically significant:

Pearson_Product_Moment_Correlation.png

 

Linear Regression*

*Note: There are many kinds of regression analyses, and lots of complexity that one can dive into in learning about regression. For the purposes of this article, I am keeping it simple and am focused entirely on linear regression and its relationship with scatterplots and correlation coefficients.

  1. Linearity: The relationship between X and Y is linear
  2. Homoscedasticity: Constant variance of residuals at different values of X
  3. Normality: Data should be normally distributed around the regression line

summaryweak.png

Call_Formula.png

Output for “weak” relationship linear model

Let’s run the model now for our “strong” relationship:

strong_relationship.png

Output_for_strong_relationship_linear_model.png

 Output for “strong” relationship linear model

Comments

Comments (4)

author
Leah Johnson
Brilliant
2020-11-24 20:05


author
Michael Turner
Well explained
2020-11-24 21:09


author
Adam Pavelka
This helped me so much!
2020-11-24 21:12


author
Pamela Reddin
Insightful
2020-11-24 22:05

Trending