Predicting Watch Time like YouTube via Weighted Logistic Regression
In reading the YouTube Paper, I came across one technical section that seemed very subtle and was not readily apparent why it was true: the watch time estimation using weighted logistic regression. It’s easy to gloss over this details, but as it turns out, many people before me were curious about this section as well. The previous links have already explored one view into the why behind this formula, but I would like to formalize the process and explain it end to end.