3.3.5 The Framingham Heart Study - Video 3: A Logistical Regression Model | Summary and Q&A

4.3K views
December 13, 2018
by
MIT OpenCourseWare
YouTube video player
3.3.5 The Framingham Heart Study - Video 3: A Logistical Regression Model

TL;DR

Logistic regression is used to predict the 10-year risk of coronary heart disease (CHD) based on various risk factors collected at the first examination of patients.

Install to Summarize YouTube Videos and Get Transcripts

Key Insights

  • ✳️ Logistic regression rarely predicts a 10-year CHD risk above 50%.
  • ❓ The accuracy of the logistic regression model is comparable to a baseline method that predicts no CHD.
  • ✳️ The model shows a good ability to differentiate between low risk and high-risk patients with an out-of-sample AUC of 0.74.
  • ✳️ Risk factors such as smoking, higher cholesterol, systolic blood pressure, and glucose levels are associated with an increased risk of CHD.
  • ⚾ The analysis suggests possible interventions to prevent CHD based on the significant variables identified in the logistic regression model.
  • 😷 The dataset used for analysis contains information on various demographic, behavioral, medical history, and physical exam risk factors.
  • 😫 Splitting the data into training and testing sets allows for evaluating the predictive power of the logistic regression model on new data.

Transcript

Read and summarize the transcript of this video on Glasp Reader (beta).

Questions & Answers

Q: How is logistic regression used to predict the 10-year risk of coronary heart disease?

Logistic regression is used by creating a model that predicts the dependent variable (10-year CHD) using all other variables in the dataset as independent variables. The model is built using the glm function with the family argument set to "binomial".

Q: What are some significant variables in the logistic regression model?

The significant variables in the model include male, age, prevalent stroke, total cholesterol, systolic blood pressure, and glucose levels. These variables have positive coefficients, indicating that higher values contribute to a higher probability of 10-year CHD.

Q: What is the accuracy of the logistic regression model?

The accuracy of the model is approximately 84.8%, which is calculated by dividing the sum of correct predictions (1069 true positive + 11 true negative) by the total number of observations in the dataset.

Q: How does the model compare to a baseline method in terms of accuracy?

The baseline method, which always predicts 0 or no CHD, would have an accuracy of approximately 84.4%. Therefore, the logistic regression model slightly outperforms the baseline in terms of accuracy.

Summary & Key Takeaways

  • The content discusses the process of using logistic regression to predict the 10-year risk of CHD based on risk factors collected at the first examination.

  • The data set used for analysis contains information on demographic, behavioral, medical history, and physical exam risk factors, as well as the outcome variable of whether or not the patient developed CHD in the next 10 years.

  • The training and testing sets are created using sample.split, and a logistic regression model is built using the training set.

  • The significant variables in the model include male, age, prevalent stroke, total cholesterol, systolic blood pressure, and glucose levels.

Share This Summary 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on:

Explore More Summaries from MIT OpenCourseWare 📚

Summarize YouTube Videos and Get Video Transcripts with 1-Click

Download browser extensions on: