Updating internet governance is as urgent as regulating social media
Facebook had a week before hell last week, even without whistleblower Frances Haugen’s testimony to Congress. Two separate technical failures made up the entire service suite – Facebook, Messenger, WhatsApp, Oculus and Instagram. The incident underscores the fragility that a massive consolidation of resources for the global information and communication network brings with it, caused by the emergence of super nodes like Facebook and its big tech rivals. It also shows the inequality of the public debate around social media platforms on the one hand and the basic protocols of the internet on the other. (For the sake of transparency, Facebook is a customer of my company, Oxford Information Labs, but the information in this article is only from public sources.)
The second Facebook outage on Friday October 8th was bad; A configuration error led some of the users to some services. But the previous outage on Monday, October 4th, was horrific, a catastrophic spiral of events that resulted in a single configuration error that left some of the world’s most popular digital services out for six hours.
Initially, there was speculation that Monday’s events were the result of a hack that cleverly coincided with Haugen’s damning evidence before Congress. It was not. Instead, it was caused by a combination of a routine update that went wrong and the perverse effects of security measures designed to prevent unauthorized updates and optimize user experiences and load times.
A software command designed to evaluate the available capacity of the global Facebook network contained a bug that accidentally disconnected Facebook’s global data centers. The audit system, which was supposed to sort out errors in commands affecting the network, had an error and could not detect the error.
In addition to the failure of all of Facebook’s customer-facing systems, there were reports that much of Facebook’s internal work was also affected. A New York Times reporter claimed that some employees did not even have access to the building. A later deleted post from someone claiming to be part of the recovery team seemed to confirm this.
The impact of the design flaws was compounded by the interplay of two protocols that underpin the global Internet – the Domain Name System (DNS) and the Border Gateway Protocol (BGP).
The domain name system is a bit like the internet address book, providing people with memorable names for resources on the network – like facebook.com – which can then be translated into the numbers machines use to identify resources, known as IP addresses. The border gateway protocol performs several functions related to the routing of messages within the distributed network of networks that is the Internet. These individual networks or autonomous systems have a dual role. They are both the endpoints at which communication begins or ends and nodes in the network that can forward data packets between endpoints. BGP acts in part like a post office system, announcing the presence of the individual networks or autonomous systems so that messages can reach their final destination. BGP also acts as a kind of map by publishing routing tables that allow other servers to determine the most efficient route for traffic between the source and destination.
The Facebook outage shows the systemic dependence of the Internet on protocols that allowed the network to grow exponentially, but which hardly anyone understands.
In cybersecurity circles, it is believed that if something goes wrong, it is always DNA. And so it happened, as Santosh Janardhan, the vice president of Facebook in charge of infrastructure, explained in a blog post. A feature designed to optimize the user experience and speed up loading times led Facebook’s DNS servers to disable BGP advertising. Since these DNS servers controlled access to all of Facebook’s offerings as well as to many internal systems, Facebook simply disappeared from the Internet.
Not only do the failures show that even the best-resourced tech companies can be prone to human error, but they also show how quickly internet markets have concentrated into the hands of some powerful players, including Facebook.
The direct impact has been evident to the billions of Facebook users, including a growing number of small businesses that use the company’s platforms as their primary online location. Facebook’s share price initially fell by an estimated $ 50 billion.
Less obvious – and more worrying at the systemic level – were the indirect effects and where they were felt. According to Cloudflare, the outage had its greatest impact in developing countries and regions, with Turkey, Grenada, Congo and Lesotho topping the list. For users of Facebook’s “Free Basics” – a kind of Internet-Lite that is made available in some developing countries via a Facebook portal – the entire Internet would have been dark for the duration of the failure. This supports the view that developing countries are consumers rather than creators of technology platforms. Cloudflare also reports a “massive increase” in server failure responses during the outage, even slowing down the loading time of websites that embed Facebook scripts on their pages to give their users access to “Like” buttons or comments from the platform.
Last week’s incidents also highlight the global network’s systemic dependence on protocols dating back to the 1980s, such as DNS and BGP. These simple, lightweight, interoperable protocols allowed the network to grow exponentially. But hardly anyone understands it. As a result, many policymakers currently planning plans to regulate big tech and social media platforms seem to be keeping Facebook, Google, and a small handful of applications for the internet, resulting in poor regulatory decisions in some cases.
Another aspect of this knowledge gap relates to how the architecture of the Internet is managed and developed. For example, the public engagement and academic grants related to ICANN – which coordinates the system of unique identifiers like domain names on the internet – are meager compared to the engagement related to social media platforms or the regulation of big tech. A director of a leading research institute told me last week that none of their researchers are currently working on internet governance.
The lack of engagement and public scrutiny has both positive and negative effects. The upside is that the relative obscurity of traditional internet governance sometimes creates a collegial environment where technicians can work together across political rifts to solve technical and political problems. The downside is that low engagement can make it easier to “capture” political debates, leading to unbalanced policy outcomes. Even without the urgency of broader engagement, basic protocol security issues remain unsolved or undervalued, and the transition to newer technologies is taking much longer than they should.
Facebook’s Week out of Hell will give its system administrators a lot to think about and will undoubtedly lead to operational security changes. It also provided a textbook example of the potential dangers of supernodes like Facebook on the global internet: if they fail, the network as a whole will feel the effects. Most importantly, no matter how powerful a big tech company is, if it uses the Internet at all, it relies on a number of protocols and standards that hardly anyone understands or that hardly anyone deals with Headlines, The governance of the Internet’s fundamental protocols requires urgent attention if the global, open and interoperable Internet is to be saved.
Emily Taylor is the CEO of Oxford Information Labs and an Associate Fellow of the International Security Program at Chatham House. She is also editor of the Journal of Cyber Policy, research fellow at the Oxford Internet Institute and lecturer at the Dirpolis Institute of the Sant’Anna School of Advanced Studies in Pisa. She has written for The Guardian, Wired, Ars Technica, The New Statesman, and Slate. Her weekly WPR column appears every Tuesday. Follow her on Twitter at @etaylaw.