Researchers have created a new system that can flag spoilers in online reviews of books and TV shows. The system is based on artificial intelligence (AI) technology.
"Spoilers are everywhere on the internet and are very common on social media. As internet users, we understand the pain of spoilers, and how they can ruin one's experience," said Ndapa Nakashole, a professor at the University of California San Diego in the US.
Some movie review websites like IMDb allow users to manually flag their posts with tags that serve as 'spoiler ahead' warning signs. However, this does not always happen. That's why scientists decided to develop an AI tool powered by neural networks to automatically find out spoilers. Scientists named the tool SpoilerNet.
Theoretically, researchers want to better learn how people write spoilers and what kind of linguistic patterns and common knowledge mark a sentence as a spoiler. The AI-based tool could be used to develop a browser extension to shield people from spoilers, the researchers said.
To train the AI algorithm, researchers built their own dataset by collecting more than 1.3 million book reviews annotated with spoiler tags by book reviewers from Goodreads, a social networking site that allows people to track what they read, and share thoughts and reviews with other readers.
The tags encompass sentences that include spoilers and hide them behind a "view spoiler" link in the text.
"To our knowledge, this is the first dataset with spoiler annotations at this scale and at such a fine-grained granularity," said Mengting Wan, a PhD student in computer science at UC San Diego.
Researchers found that spoiler sentences tend to clump together in the latter part of reviews. However, they also found that different users had different standards to tag spoilers, and neural networks needed to be carefully calibrated to take this into account.
Additionally, the same word may have different semantic meanings in different contexts. For example, 'green' is just a colour in one book review, but it can be the name of an important character and a signal for spoilers in another book. Identifying and understanding these differences is challenging, Wan said.
Researchers trained SpoilerNet system, courtesy of 80 per cent of the reviews on Goodreads. As part of the training, the system runs the text through several layers of neural networks. The system could discover spoilers with 89 to 92 per cent accuracy.
Scientists also ran SpoilerNet on a dataset of more than 16,000 single-sentence reviews of more than 100 TV shows. The accuracy of the tool to detect TV show spoilers was 74 to 80 per cent. Most of the errors came from the system getting distracted by words that are usually loaded and revelatory -- for example murder or killed.
(WIth PTI inputs)