Bad news for fake news: rice r
Image: Anshumali Shrivastava is Assistant Professor of Computer Science at Rice University. (Photo by Jeff Fitlow / Rice University)
Photo credit: Jeff Fitlow / Rice University
HOUSTON – (Dec. 10, 2020) – Rice University researchers have discovered a more efficient way for social media companies to prevent the spread of misinformation online using probability filters trained with artificial intelligence.
The new approach to social media scanning is outlined in a study presented today at the 2020 online-only conference on neural information processing systems (NeurIPS 2020) by Rice computer scientist Anshumali Shrivastava and statistics student Zhenwei Dai. Their method applies machine learning in smarter ways to improve the performance of Bloom filters, a widely used technique developed half a century ago.
Using test databases of fake news and computer viruses, Shrivastava and Dai showed that their Adaptive Learned Bloom Filter (Ada-BF) required 50% less memory to achieve the same performance as learned Bloom filters.
To explain their filtering approach, Shrivastava and Dai cited some data from Twitter. The social media giant recently announced that its users were adding around 500 million tweets every day, and tweets typically appeared online a second after clicking Send.
“At the time of the election, they were getting about 10,000 tweets per second, and with one second latency that’s about six tweets per millisecond,” said Shrivastava. “If you want to apply a filter that reads every tweet and marks those with information that are known to be fake, your flagging mechanism can’t be slower than six milliseconds or you’ll fall behind and never catch up.”
When sending flagged tweets for additional, manual review, it’s also crucial to have a low false-positive rate. In other words, you need to minimize how many real tweets are accidentally flagged.
“If your false positive rate is only 0.1%, you are mistakenly flagging 10 tweets per second or more than 800,000 per day for manual review,” he said. “This is exactly why most of the traditional pure AI approaches are prohibitively expensive to control the misinformation.”
Shrivastava said Twitter doesn’t disclose its methods of filtering tweets, but it is believed they use a Bloom filter, a low-memory technique invented in 1970, to check if a particular element of data, like a piece of computer code, is part of it is of a known set of elements, such as a database of known computer viruses. A Bloom filter is guaranteed to find all codes that match the database, but it also records some false positives.
“Suppose you’ve identified misinformation and want to make sure it doesn’t get spread in tweets,” Shrivastava said. “With a bloom filter, you can check tweets very quickly, in a millionth of a second or less. If it says a tweet is clean, that it doesn’t match anything in your misinformation database, that’s 100% guaranteed. So there is no way to fix a tweet with known misinformation. But the bloom filter will mark harmless tweets in a fraction of the time. “
Over the past three years, researchers have offered various schemes for using machine learning to extend Bloom filters and improve their efficiency. Speech recognition software can be trained to recognize and approve most tweets, reducing the volume that needs to be processed by the Bloom filter. Using machine learning classifiers can reduce the computational effort involved in filtering data so that organizations with the same resources can process more information in less time.
“When people use machine learning models today, they are wasting a lot of useful information that comes from the machine learning model,” said Dai.
The typical approach is to set a tolerance threshold and send anything that falls below that threshold to the Bloom filter. If the confidence threshold is 85%, it means that information that the classifier believes is safe with a confidence level of 80% is checked in the same way as information that it is only 10% certain about.
“While we cannot rely entirely on the machine learning classifier, it still provides us with valuable information that can reduce the amount of Bloom filtering resources,” said Dai. “We used these resources probabilistically. We give more resources when the classifier is only 10% safe, as opposed to a little less when it is 20% safe, etc. We take the full range of the classifier and solve it with the full range of resources given by the Bloom filter can be assigned. “
Shrivastava said that Ada-BF’s reduced storage requirements translate directly into additional capacity for real-time filter systems.
“We need half the space,” he said. “So we can essentially process twice as much information with the same resource.”
The research was supported by the National Science Foundation (1652131, 1838177), the Air Force Office of Scientific Research (FA9550-18-1-0152), and the Office of Naval Research.
Links and resources:
The study entitled “Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web” is available at: https://bit.ly/2JPFses
High-resolution IMAGES are available for download at:
Caption: Anshumali Shrivastava is an assistant professor of computer science at Rice University. (Photo by Jeff Fitlow / Rice University)
Caption: Zhenwei Dai is a Ph.D. Student of Statistics at Rice University. (Courtesy photo by Peng Yang)
This press release is available online at news.rice.edu.
Follow Rice News and Media Relations on Twitter @RiceUNews.
Located on a 300 acre wooded campus in Houston, Rice University is consistently voted one of the 20 best universities in the country by the US News & World Report. Rice has highly respected schools in architecture, business, continuing education, engineering, the humanities, music, natural and social sciences and is home to the Baker Institute for Public Policy. With 3,978 students and 3,192 graduate students, Rice’s student-to-faculty ratio is just under 6 to 1. His residential-college system creates close communities and lifelong friendships, just one reason Rice ranks # 1 for on the Princeton Review a lot of interaction between race and class and ranked first for quality of life. Reis is also rated best among private universities by Kiplinger’s Personal Finance.
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of press releases sent to EurekAlert! by contributing institutions or for the use of information via the EurekAlert system.