Fake news generated by artificial intelligence can be convincing enough to deceive experts
If you use social media sites like Facebook and Twitter, you may have come across posts marked with misinformation warnings. So far, most of the misinformation – reported and unreported – has been aimed at the general public. Imagine the possibility of misinformation – information that is false or misleading – in scientific and technical areas such as cybersecurity, public safety, and medicine.
There is growing concern about the spread of misinformation in these critical areas as a result of common biases and practices in the publication of scientific literature, even in peer-reviewed research. As a graduate student and faculty member doing cybersecurity research, we explored a new avenue of misinformation in the scientific community. We have found that it is possible for artificial intelligence systems in critical areas such as medicine and defense to generate false information that is convincing enough to fool experts.
General misinformation is often aimed at damaging the reputation of companies or public figures. Misinformation within expert communities can have frightening consequences, such as: B. false medical advice to doctors and patients. This could endanger life.
To test this threat, we studied the impact of misinformation spreading in the cybersecurity and medical fields. We have used artificial intelligence models known as transformers to generate false cybersecurity news and medical studies on Covid-19 and submitted the cybersecurity misinformation to cybersecurity experts for testing. We found that the misinformation generated by transformers could mislead cybersecurity experts.
Much of the technology used to identify and manage misinformation is powered by artificial intelligence. AI enables computer scientists to quickly check large amounts of misinformation for fact because humans can see too much without the help of technology. Ironically, although AI helps people spot misinformation, in recent years it has also been used to generate misinformation.
Transformers like BERT from Google and GPT from OpenAI use natural language processing to understand texts and produce translations, summaries and interpretations. They were used for tasks such as storytelling and answering questions, which pushed the limits of machines with human-like abilities to generate text.
Transformers have helped Google and other tech companies by improving their search engines and helping the general public tackle common problems like combating writer’s block.
Transformers can also be used for malicious purposes. Cross-platform social networks such as Facebook and Twitter have already faced the challenges of AI-generated fake news.
Our research shows that transformers pose a misinformation threat in medicine and cybersecurity as well. To illustrate how serious this is, we’ve refined the GPT-2 transformer model for open online sources that discuss cybersecurity vulnerabilities and attack intelligence. A cybersecurity vulnerability is the vulnerability of a computer system, and a cybersecurity attack is an act that takes advantage of a vulnerability. For example, if a vulnerability is a weak Facebook password, one attack that exploited it would be a hacker who could find out your password and break into your account.
We then seeded the model with the sentence or phrase of an actual cyber threat intelligence sample and had the remainder of the threat description generated. We presented this generated description to cyber threat hunters who are sifting through a lot of information about cybersecurity threats. These experts read the threat descriptions in order to identify potential attacks and adjust the countermeasures of their systems.
We were surprised by the results. The examples of cybersecurity misinformation we generated have been able to fool cyber threat hunters who are familiar with all types of cybersecurity attacks and vulnerabilities. Imagine this scenario with a critical piece of cyber threat intelligence that the aviation industry is involved in and that we generated in our study.
An example of AI-generated cybersecurity misinformation. Photo credit: The conversation, CC BY-ND
This misleading information contains false information about cyber attacks on airlines with sensitive real-time flight data. This false information could prevent cyber analysts from fixing legitimate vulnerabilities in their systems by calling their attention to bogus software bugs. If a cyber analyst reacts to the falsified information in a real-life scenario, the airline in question could have been exposed to a serious attack that exploited a real, unaddressed vulnerability.
A similar transformer-based model can generate information in the medical field and potentially mislead medical experts. During the Covid-19 pandemic, preprints of research that has not yet been rigorously reviewed are constantly being uploaded to websites such as medrXiv.
Not only are they described in the press, but they are also used to make public health decisions. Consider the following, which is not real, but was generated by our model after minimally fine-tuning the standard GPT-2 in some Covid-19 related papers.
An example of AI-generated misinformation in healthcare. Photo credit: The conversation, CC BY-ND
The model was able to generate complete sentences and form a summary that supposedly describes the side effects of Covid-19 vaccinations and the experiments carried out. This is worrying both to medical researchers, who constantly rely on accurate information to make informed decisions, and to the general public, who often rely on public news for vital health information. If accepted as correct, this type of misinformation could endanger lives by misdirecting the efforts of scientists doing biomedical research.
Arms race with misinformation
While examples like this from our study can be fact tested, transformative misinformation is preventing industries like healthcare and cybersecurity from adopting AI to help with information overload. For example, automated systems are being developed to extract data from cyber threat intelligence that is then used to inform and train automated systems to detect possible attacks. When these automated systems process such bogus cybersecurity text, they are less effective at detecting real threats.
We believe the result could be an arms race as people who spread misinformation develop better ways to create incorrect information in response to effective methods of detecting that information.
Cyber security researchers are continuously investigating ways to detect misinformation in various domains. Understanding how misinformation is automatically generated helps understand how to spot it. For example, automatically generated information often has subtle grammatical errors that systems can detect. Systems can also cross-correlate information from multiple sources and identify claims for which there is no substantial support from other sources.