R
8

I used to spend hours making my own training data for a simple image classifier

About six months ago, I was building a model to sort pictures of my dog from pictures of my cat, and I manually labeled over 800 photos. It took me a whole weekend. Then I tried a new AI tool called Cleanlab that automatically finds and fixes bad labels in datasets. I ran my photos through it, and it flagged about 50 that were mislabeled or blurry. It cut my prep time down to just a few hours. Has anyone found other tools that help clean up messy data this fast?
2 comments

Log in to join the discussion

Log In
2 Comments
mia592
mia59210d ago
Wow, that's wild. I always just assumed the blurry pics were the real problem, not the labels.
7
jessica_dixon
Right, the "blurry pics" thing is what everyone focuses on. I read this article about how the labels are actually way more important for the computer learning. The AI needs clear text to understand what it's even looking at, otherwise it's just guessing at a fuzzy shape. A blurry picture with a perfect label can still teach it something, but a super clear picture with a wrong or messy label teaches it the wrong thing. It's like if you kept calling a cat a dog while showing someone pictures, they'd get totally mixed up. So the labels are the real foundation.
8