What Happens When You Remove Outliers in Regression?

TL;DR
Removing outliers from a scatterplot enhances the fit of the regression line, resulting in an increased coefficient of determination (r squared) and an improved slope. This adjustment leads to a more accurate regression model that better represents the underlying data patterns.
Transcript
- [Instructor] The scatterplot below displays a set of bivariate data along with its least-squares regression line. Consider removing the outlier 95 comma one. So 95 comma one, we're talking about that outlier right over there. And calculating a new least-squares regression line. What effects would removing the outlier have? Choose all answers that... Read More
Key Insights
- 🫥 Outliers in a scatterplot can have a significant impact on the fit of the regression line.
- ❎ Removing outliers can improve the accuracy of the regression analysis by increasing the coefficient of determination (r squared).
- 💪 The correlation coefficient (r) can be positively influenced by removing outliers, leading to a stronger correlation.
- 🫥 The slope of the least-squares regression line can increase or decrease depending on the nature of the outliers.
Install to Summarize YouTube Videos and Get Transcripts
Explore YouTube Video Summarizer or Get YouTube Transcript Extractor
Questions & Answers
Q: What happens to the coefficient of determination (r squared) when an outlier is removed from a scatterplot?
Removing an outlier increases the coefficient of determination (r squared) because the remaining data points will have a better fit to the regression line. This is because the removed outlier was influencing the overall fit of the line and decreasing its accuracy.
Q: Does removing an outlier affect the correlation coefficient (r)?
Removing an outlier can actually improve the correlation coefficient (r) by increasing its value. This is because the outlier was causing a decrease in the correlation, and removing it allows the remaining data points to have a stronger positive correlation.
Q: How does removing an outlier impact the slope of the least-squares regression line?
Removing an outlier can increase the slope of the least-squares regression line. This is because the outlier was pulling the line down, and by removing it, the line can adjust to better fit the remaining data points and have a steeper slope.
Q: What happens to the y-intercept of the least-squares regression line when an outlier is removed?
Removing an outlier does not affect the y-intercept of the least-squares regression line. The y-intercept is determined by the mean of both variables and the removal of an outlier does not change this central tendency measure.
Summary & Key Takeaways
-
The presence of an outlier in a scatterplot can negatively impact the fit of the regression line and decrease the coefficient of determination (r squared).
-
by removing an outlier, the regression line can be adjusted to better fit the remaining data points, leading to an increase in r squared and the slope of the line.
-
Removing an outlier generally improves the accuracy of regression analysis and provides a more representative model for the data.
Read in Other Languages (beta)
Share This Summary 📚
Summarize YouTube Videos and Get Video Transcripts with 1-Click
Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator
Explore More Summaries from Khan Academy 📚
Summarize YouTube Videos and Get Video Transcripts with 1-Click
Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator


