Scientists have created a computer model that can "solve" CAPTCHA, the widespread text-based system used by websites to check whether a user is human, with very little training data. Much like the human brain, their model has the ability to learn and generalize using relatively few examples, especially compared to current deep learning models, the authors say; it is roughly 300 times more data-efficient. CAPTCHAs are made to be uncrackable for computer algorithms by clustering many different letter combinations together in a million different styles. While humans can naturally recognize an object even amidst layers of overlap or styles, computers have difficulty classifying each letter from the jumble. Previous algorithms for solving CAPTCHA are data-intensive, requiring training on millions of labeled CAPTCHA image examples or coded rules on how to crack each type of image. Here, Dileep George and colleagues have built a more efficient model, dubbed the Recursive Cortical Network (RCN), that incorporated insights from neuroscience to "train" the computer to generalize beyond what it's primarily taught. The key of RCN's success, the authors say, is that it's encoded with strong assumptions that it then uses to recognize inputs it never encountered in training. With this, RCN could solve CAPTCHA text, identify handwritten digits, delineate complexly layered objects and recognize text in photos of real-world scenarios. Compared to state-of-the-art deep learning approaches for reading text, RCN has comparable or higher accuracy while using around 5,000 times fewer training images. Critically, the findings of this study suggest the need for more robust spam-thwarting and human-checking techniques that go beyond what's encoded into today's CAPTCHA system.
Science Press Package
Related Journal Article