Quantcast
Channel: Data Science, Analytics and Big Data discussions - Topics tagged data_science
Viewing all articles
Browse latest Browse all 787

New paper on Automatically Detecting Label Errors in Entity Recognition Data

$
0
0

Think your entity recognition data is perfectly labeled? Just published research investigates automated methods to find sentences with mislabeled words in such datasets. Mislabeling is especially common in ML tasks like token classification, where labels must be chosen on a fine-grained basis. It is exhausting to get every single word labeled right!

This paper benchmarks a bunch of possible algorithms on real-world data (with actual label errors rather than synthetic errors often considered in academic studies) and identifies a straightforward approach that can find mislabeled words with better precision/recall than others.

This algorithm is now available to run on your own text data in one line of open-source code. Running this code on the famous CoNLL-2003 entity recognition dataset revealed hundreds of label errors.

Blogpost: https://cleanlab.ai/blog/entity-recognition/

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 787

Trending Articles