Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #8

TL;DR
Explains how to calculate email activity percentiles using SQL.
Transcript
hey guys this is sql sunday number eight let's jump into it all right so today's question is from google called activity percentile find the email activity percentile for each user email activity percentiles defined by the total number of emails set the user with the highest number of emails sent will have percent help one and so on i'll put the us... Read More
Key Insights
- The video focuses on solving a SQL interview question from Google that involves calculating email activity percentiles.
- The task is to determine the email activity percentile for each user based on the number of emails sent.
- The solution involves using SQL window functions, specifically 'row_number' to rank users by their email activity.
- The speaker clarifies the difference between 'row_number' and 'rank' functions, emphasizing the use of 'row_number' for unique ranking.
- A comparison is made between using 'row_number' and 'n_tile', both achieving similar results but with different approaches.
- The speaker suggests that the approach without a subquery might be more efficient, though both methods are valid.
- The video encourages viewers to try modifying the function to calculate actual percentiles, inviting engagement in the comments.
- The series aims to prepare viewers for data science interviews by simulating real interview scenarios.
Install to Summarize YouTube Videos and Get Transcripts
Explore YouTube Video Summarizer or Get YouTube Transcript Extractor
Questions & Answers
Q: What is the main problem addressed in the video?
The main problem addressed is calculating email activity percentiles for each user based on the total number of emails sent. The task involves ranking users by their email activity and determining their percentile rank using SQL.
Q: What SQL functions are used to solve the problem?
The solution utilizes SQL window functions, specifically 'row_number', to rank users by the total number of emails sent. The video also discusses the use of 'n_tile' as an alternative approach for achieving similar results.
Q: Why is 'row_number' preferred over 'rank' in this solution?
The 'row_number' function is preferred because it assigns a unique rank to each user, even if they have the same total number of emails. This ensures that each user has a distinct ranking, which is crucial for accurately determining their email activity percentile.
Q: How does the video suggest improving or modifying the solution?
The video suggests modifying the solution to calculate actual percentiles for each user, rather than just ranking them. This involves adjusting the SQL query to determine the percentile each user falls into based on their total email count.
Q: What is the purpose of the SQL Sundays series?
The SQL Sundays series aims to prepare viewers for data science interviews by providing a walkthrough of SQL interview questions. It simulates real interview scenarios, offering practical examples and explanations to help viewers improve their SQL skills and interview performance.
Q: What does the speaker think about the efficiency of their approach?
The speaker believes that their approach, which avoids using a subquery, might be slightly more efficient. However, they acknowledge that both their method and the alternative using 'n_tile' achieve similar results and are valid solutions.
Q: What engagement does the speaker encourage from viewers?
The speaker encourages viewers to share their own approaches and solutions in the comments, especially if they have used different methods to solve the problem. This engagement aims to foster a community of learning and exchange of ideas.
Q: What resources does the speaker offer for further learning?
The speaker offers resources such as a SQL for data science interviews course, the 365 Data Science platform, and StrataScratch for interview preparation. These resources provide additional learning opportunities for viewers interested in data science and SQL.
Summary & Key Takeaways
-
The video is part of the SQL Sundays series, focusing on a data science interview question involving email activity percentiles. The task is to rank users by the number of emails sent, using SQL window functions to achieve this.
-
The solution involves grouping data by user, counting emails, and applying a 'row_number' window function to rank users. The speaker explains the choice of 'row_number' over 'rank' for unique ranking.
-
The video encourages viewers to explore alternative solutions, such as using 'n_tile', and invites them to share their approaches in the comments. It aims to prepare viewers for data science interviews through practical examples.
Read in Other Languages (beta)
Share This Summary 📚
Summarize YouTube Videos and Get Video Transcripts with 1-Click
Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator
Explore More Summaries from Tina Huang 📚






Summarize YouTube Videos and Get Video Transcripts with 1-Click
Try YouTube Summary with ChatGPT & Claude or YouTube Transcript Generator