Typos Can Easily Defeat Google's Anti-Trolling AI: Research

Typos Can Easily Defeat Google’s Anti-Trolling AI: Research

Use of artificial intelligence (AI) to reduce abuse and trolls on social networks or news websites is proving to be pretty useless, because those comments just keep on coming. There was some hope with Jigsaw, an Alphabet startup effort spun off from Google. The company launched Perspective to detect abuse, but it appears that even this tool needs some improvement.

It is easy to deceive Jigsaw

Jigsaw’s Perspective, an API that uses machine learning to detect harassment and abuse online, is still in its development stage. However, its API seems to be a major disappointment so far, if we consider the findings of recent research. A research paper entitled “Deceiving Google’s Perspective API Built for Detecting Toxic Comments” is easily available on the internet, but it has not been peer-reviewed yet.

In the paper, the authors show that an adversary can easily “modify a highly toxic phrase in a way that the system assigns significantly lower toxicity score to it.” Words such as “stupid” and “idiot” cannot be detected as toxic language by Jigsaw’s system when they are misspelled as “st.upid” or “idiiot,” according to the BBC.

The paper further noted that the comment “Anyone who voted for Trump is a moron” got a toxicity score of 80%, but the comment “Anyone who voted for Trump is a mo.ron” got just a 13% score. The same was the case when using the word “idiot,” which received a 90% toxicity score, whereas “id.iot” just received a 12% score.

Such research could help improve AI

In a statement sent to ArsTechnica, Jigsaw’s product manager CJ Adams said they welcome academic researchers to join their research efforts on Github, explore how they can spot the flaws of their present models, and discover ways to overcome or eliminate them.

“Perspective is still a very early-stage technology, and as these researchers rightly point out, it will only detect patterns that are similar to examples of toxicity it has seen before,” Adams said.

The company, formerly called “Google Ideas,” says on its website that it will be releasing more machine learning models in 2017. Jigsaw said its “first model identifies whether a comment could be perceived as “toxic” to a discussion.” Currently, Jigsaw is partnering with The New York Times, Wikipedia and several other major sites to install its Perspective API to aid in detecting abusive comments or words and help in moderating reader-contributed content, notes ArsTechnica.

It cannot be ignored that the Perspective API needs some training and improvement, but if it improves and removes all its flaws, then the Internet may just become a safer place.