C5W3L04 Refining Beam Search  Summary and Q&A
TL;DR
Length normalization is a modification to the beam search algorithm that improves its performance by reducing the penalty for longer translations.
Key Insights
 😁 Logarithmic transformation improves numerical stability in beam search algorithms.
 😁 Length normalization helps overcome the bias towards shorter translations in beam search.
 😁 The choice of beam width (B) affects the tradeoff between accuracy and computational cost in beam search.
 😁 For research purposes, very large beam widths may be used, but there are diminishing returns in terms of performance improvement.
Transcript
in the last video you saw the basic beam search algorithm in this video you learn some little changes that make it work even better length normalization is a small change to the beam search algorithm that can help you get much better results here's what it is we talked about beam search as maximizing this probability and this product here is just e... Read More
Questions & Answers
Q: What is the problem with multiplying probabilities in beam search?
Multiplying probabilities, especially when they are small, can lead to numerical underflow due to the limited accuracy of floatingpoint representations. To avoid this, log probabilities are used instead to maintain numeric stability.
Q: How does taking logs of probabilities help in beam search?
Taking logs of probabilities converts the product of probabilities into a sum of logarithms, which is more computationally stable and less prone to rounding errors or numerical underflow. Maximizing log probabilities achieves the same result as maximizing probabilities.
Q: Why does the original objective function tend to favor shorter translations?
The original objective function in beam search tends to favor shorter translations because multiplying fewer probabilities leads to a less significant decrease in overall probability. This biases the algorithm towards shorter outputs.
Q: How does length normalization address the issue of favoring shorter translations?
Length normalization divides the objective function by the number of words in the translation, reducing the penalty for longer translations. It can also be adjusted using a parameter alpha to find a balance between normalization and no normalization.
Summary & Key Takeaways

In this video, the speaker explains the concept of beam search and how it can be improved by using length normalization.

Beam search involves maximizing the probability of a sentence given an input, using a product of probabilities.

Length normalization addresses the issue of favoring shorter translations by taking the average log probability of each word in the translation.