A Field Guide to Learning with Noisy Labels

Annotation is expensive. And when it’s expensive, it gets noisy. Labels can become unreliable for a number of reasons: non-expert annotators who disagree on ambiguous cases, labels derived from weak signals like search queries or hashtags, or predictions from an earlier, less robust model used to bootstrap a larger dataset. In real-world settings (fraud detection, medical imaging, content moderation), you rarely have the luxury of a perfectly clean training set. You train on what you have, and what you have is messy. ...

December 1, 2025 · 17 min · 3475 words

When Authors Disagree..

Underline not mine. The book is ‘A course in game theory’ by Martin J. Osborne and Ariel Rubinstein. I’m fascinated by the conviction expressed by the author specially in these lines (brackets added by me for additional context): “If such usage diverts some readers’ attentions from the subjects discussed in this book and leads them to contemplate sexism in the use of language, which is surely an issue at least as significant as the minutae of sequential equilibrium (a topic within game theory), then an increase in social welfare (an objective within some game theoretic problems) will have been achieved.” – Reader, I could feel a mic drop. ...

January 17, 2022 · 2 min · 392 words