Kaggle's 30 Days Of ML (Competition Part-7): What are public and private leaderboard? | Summary and Q&A
TL;DR
The video explains how public and private leaderboards work in Kaggle competitions and offers tips on selecting the best submissions.
Key Insights
- 😒 Kaggle uses public and private leaderboards to evaluate participants' performance in competitions.
- 💯 Public leaderboards show scores on a selected subset of test data, while private leaderboards reveal scores on a different subset.
- 😵 Overfitting to the public leaderboard can be avoided by considering cross-validation scores and selecting submissions that generalize well.
- 💯 Participants should choose two submissions based on the best cross-validation score and the best public leaderboard score.
- 💯 The best public leaderboard score determines the ranking, while the best overall score is based on the private leaderboard.
- 🖐️ Luck can play a role in leaderboard rankings due to the randomness in selecting samples.
- 😫 There are no strict rules for selecting submissions; participants should choose the submissions they believe will perform best on the private data set.
Transcript
Read and summarize the transcript of this video on Glasp Reader (beta).
Questions & Answers
Q: How does Kaggle select the samples for the public leaderboard?
Kaggle randomly selects a percentage of the test data or uses an intelligent method to choose the samples for the public leaderboard.
Q: Should submissions be based solely on the public leaderboard score?
No, relying only on the public leaderboard score may lead to overfitting. It is better to also consider the cross-validation score and select submissions that generalize well.
Q: What happens if submissions are not selected?
If no submissions are chosen, Kaggle automatically selects the submission with the best public score as the first choice and the one with the second-best score as the second choice.
Q: Are there specific rules for selecting submissions?
There are no strict rules. It is recommended to select submissions based on the best cross-validation score and the best public leaderboard score, but the choice ultimately depends on the participant.
Summary & Key Takeaways
-
The video discusses the purpose of public and private leaderboards in Kaggle competitions and how they are related to the cross-validation system.
-
Public leaderboards show scores of selected samples, while private leaderboards reveal scores on a different set of samples.
-
It is important to select two submissions based on their performance in cross-validation and public leaderboards to achieve the best overall score.