Internet Survivability and Resilience

13 November 2007

Introduction

In the world of commerce in developed countries, the Internet is now an essential feature. In the public sector, the Internet is starting to pay off in terms of procurement efficiencies and outreach to the citizen. For the consumer too, the Internet has become a fact of life in the home as much as a telephone or a refrigerator. However, recent events, not least the advent of mass terrorist attacks on Western infrastructures, and also more obscure but arguably equally influential developments in the world of telecommunications and computing, have called into question the survivability and resiliency of this system.

The Internet is undeniably complex and responsibility for its survivability rests in a number of different areas - a classic 'not my problem' situation, particularly as there is no overarching governing body (at once called the worst and best facet of the Internet) tasked with Internet governance. As 'an entity' it is a complex interaction between a number of different things: technology, hardware, software, processes and humans. Its architectural resiliency is really a story of the resiliency of the telecommunications infrastructure. However, difficulties arise in assessing its survivability because upon this robust but aging infrastructure there now exists an extremely complex multi-actor system in a state of constant flux. This socio-technical system is now regarded as a critical service for business applications, government, society and even defence. Protecting it demands a new outlook on complexity and a deep awareness of how the Internet functions normally, let alone under attack. Addressing the resiliency of the underlying architecture is one thing. The real issue is at a much more abstract level, where governments, business and the citizen interact in much more complex ways.

Background

Paul Baran's seminal 1964 paper, written in the early days of computing at RAND, paved the way for a distributed communications network that could be dynamically re-configured and would find the quickest route for message transmission. Such a network would be built upon the existing telecommunications infrastructure or Public Switched Telephone Network (PSTN). The (PSTN) is a unique example of a manmade highly resilient distributed system on a vast scale. In one study, the US PSTN was estimated to average an availability rate in excess of 99.999 per cent.2 Due to its distributed nature, the PSTN is organized very loosely, and generations of designers have played to the strengths of distributed systems by building upon this trait. Failures, although common at a local level, are bypassed easily by switches containing millions upon millions of lines of code, to achieve the simple task of getting a message from point A to point B.3 However, the PSTN must keep up-to-date consistent distributed databases of information regarding the state of the network, to ensure effective routing

. The Internet, co-opted by the US Advanced Research Projects Agency (ARPA) for national communication in times of nuclear war, and built upon the preexisting and highly robust PSTN, is essentially a network of networks, and bears out the principle of distribution and resiliency in systems by loose coupling and flat hierarchies. A combination of logical and physical network design and its packet switched (i.e., each message separated into packets, with its own idea of how to get to its destination) nature means high redundancy and 'reachability'.

At the logical level below the average human user experience, the operation of this network (known as the Information Infrastructure) must also rely upon consistent distributed databases of up-to-date and readily available routing information. Interaction at this level is a mix of technology, competitiveness, complex commercial agreements and dense, Byzantine regulation. Internet Service Providers (ISPs), companies operating data and telecommunication backbones and long-haul networks, and the telecommunication behemoths must exchange routing and 'reachability' information with each other on a constant basis in real-time. This is done via complex peering and routing protocols that allow network problems, whether physical (such as a non-functioning switch) or logical (such as traffic latency) to be avoided in less than a blink of an eye. Routers and switches talk to each other and the hardware from other companies at large telecom hotels and co-location facilities.

This process is so complex that researchers have tried and failed, many times, to measure or record its normal functioning. It is only now possible to measure isolated areas that give an indication, but not a definite or absolute measurement, of how well the system is doing at any given time. Various means of measuring 'reachability', for example, have been designed as an ideal 'measurement' of the Internet, as a way to indicate its health. But these remain mere snapshots or simplified indications due to the fluid nature of the network and its arcane complexity. Getting an idea of what would happen in the event of a large scale failure, such as might be perpetrated by terrorists, competent criminals or nation states is thus almost impossible.

One step further up the chain, where users begin to interact with the system in more abstract ways, other routing systems take over. One such example is the Domain Name System (DNS). For those that consume Internet services (whether they be commercial, government or citizen), this is of critical importance. The DNS is the 'thing' that makes the Internet work for humans. It translates machinereadable Internet Protocol (IP) addresses in 32 bit form, into slightly more 'human friendly' domain names - such as www.bbc.co.uk. Master data about which IP address means what domain name is kept in the DNS system and is located on thirteen servers situated across the globe. There have been numerous debates about their security and these continue in the halls of the Internet Corporation for Assigned Names and Numbers (ICANN), where this has been a favourite topic of conversation of late.4 But few realize that if the DNS failed or was attacked the Internet will still continue to function. It will simply be that for those who use it - most of the citizenry, government and business in the developed world - it will be denied. From the perspective of the end user, this is still the same thing, but this little quoted fact serves to bring a dose of reality to those predicting the 'downfall of the Internet' due to DNS vulnerabilities. Indeed, for such an attack to be successful, it must simultaneously affect every single computer at once - no mean feat considering their geographic and logical dispersion.

From the above two simplified examples of the logical network architecture and of the DNS, we can see that the Internet is a complex distributed socio-technical system, constantly evolving, and almost impossible to measure. At this level it is fair to say that following the principles of distributed network design, it is inherently resilient and survivable. In areas such as e-commerce its resiliency determines whether our business fails or succeeds, and in other domains, such as remote telemedicine, it determines whether we live or die.

The Internet although similar in design to the PSTN, has markedly different resiliency and survivability requirements. There are difficulties faced in understanding a system (the basic architecture of which is little understood in the first place) that is greater than the sum of many different parts, each owned by different actors with different motives. Each of these different players has different responsibilities, but their relationships and dependencies create complexities that are difficult to assess from the standpoint of resiliency or survivability.

The Private Sector

Commerce, specifically former state owned telecommunication companies like Worldcom and AT&T, owns the infrastructure. Market interactions between players in a number of other areas mean that on top of the architectural complexity already outlined, are added business drivers and dependencies, motives, profit margins and market shares; all interacting in complex ways which affect how the Internet functions.

Software manufacturers design products that fulfil some need, but they frequently do not have security or resiliency at the top of their agenda. They prefer to build technology that has the most functionality for the user base. Hardware manufacturers are similarly motivated. Small companies in niche markets have their part to play in the puzzle. Smartcard manufacturers and cryptography designers are two such examples at the end of long and complex value chains. Trying to map these interactions and dynamics, let alone placing them into the context of the architecture already outlined, seems like a Sisyphean task.

Governments

Governments have to regulate a number of different areas, and look after interests of business, national security, justice and democratic freedom. Civil rights concerns at the US federal government's desire to implement processes and systems to assist in the fight against terrorism since 2001 have focused on the erosion of civil liberties at the expense of national security. In the digital age, where identity is defined by information held almost exclusively in digital form, the government must walk a difficult line between doing too much and not enough if it is to effectively address these issues. There have been moves to address the poor state of legislation designed to deal with computer crime, but the speed at which legislation is revised is, in the eyes of many, not nearly fast enough.

Users and society

For the average man on the street, the Internet is in fact something that exists on copper wire infrastructure first put in place at the start of the last century. Furthermore, common perceptions of the Internet are not the whole story - to the average user the Internet is simply the World Wide Web, which only came into commercial existence in the mid- 1990s, and e-mail.

Knowledge of the perception of what the Internet means to the average user is important, because in the context of national resiliency we need to be more clearly aware of the sorts of events that have the potential to affect the majority of users of the Internet. This informs activities ranging from provision of end user early warning and intelligence about threats, to self-protection measures and guidance, to ensuring better quality in software and hardware and legislation for deterrence and punishment.

However, the individual user has a responsibility for security. Many of the Denial of Service (DoS) attacks favoured in the last two years take advantage of poorly secured home computers. Home users have a responsibility to keep virus products up-to-date and be aware about threats to their own systems, because they are part of a much more complex network. They arguably have a responsibility to participate in a dialogue with the private and public sector, informing them of incidents and adding to the early warning process.

Conclusions

This is not to say that this problem has been ignored. Governments are devoting time and effort to their Critical Infrastructure Protection (CIP) programmes, such as at the National Infrastructure Co-ordination Centre (NISCC) in the United Kingdom and the Critical Infrastructure Assurance Office (CIAO) & Department of Homeland Security in the United States. They have begun to become aware of the importance of the Internet in running critical services, such as banking, regional government and vital human services. National strategies towards securing the 'Information Infrastructure' are emerging and some regional efforts are appearing, most notably with the recently announced European Network and Information Security Agency (ENISA).5 Businesses, too, are realizing that in these times they cannot simply stand idly by until their bottom line is affected. More realistically, they are beginning to catch on that in some cases, security & resiliency may be their bottom line. In the case of Microsoft, its Trustworthy Computing strategy is no surprise, given increasing concern (especially from the US federal government, which is a major customer) that its software is insecure.

Consideration of the survivability and resiliency of the Internet is no mean feat. To begin with the basic architecture or the Information Infrastructure is not well understood. It is senseless to begin to try and draw even the most limited of conclusions about the resiliency of the Internet, given the lack of knowledge about its normal operation. On top of this complex, distributed system, a number of different actors interact which affect not only their own markets and constituencies, but also the operation of the underlying infrastructure. Traffic moves around the Internet as the result of a complex relationship between regulation, competition and technology. To understand and defend it requires an understanding that the Internet is an entity beyond the sum of its parts. Indeed, although its basic architecture is robust, if one of the additional elements that makes it so complex (such as the dependencies between commercial value chains or government regulation) is mismanaged or breaks down, then it will have unknown consequences for the underlying architecture.

Neil Robinson is the Research Co-ordinator for the Information Assurance Advisory Council and an Associate Analyst with RAND Europe

NOTES

1 Survivability can be defined as how well a system can operate under attack. One of the qualities of survivability is security and another is dependability. Resiliency is related to failure recovery and fault tolerance.

Kuhn, Richard D., Sources of Failure in the Public Telephone Network, IEEE Computer, Vol. 30, No. 4 (April, 1997).available at: http://hissa.nist.gov/kuhn/pstn.html (visited 10/02/2003)

It is a well known fact that a large proportion of the code in even the most simplistic of switches is responsible for error checking and fault correction

Information on ICANN DNS Root Server System Advisory Committee available at: http:// www.icann.org/committees/dns-root/ (visited 10/02/2003)

RAPID Press Release European Commission proposes creation of Network Security Agency to boost Cyber Security in Europe, 10/03/2003, Brussells available at: http://europa.eu.int/rapid/start/cgi/guesten.ksh? p_action.gettxt=gt&doc=IP/03/208|0|RAPID&lg=EN& display= (visited 10/02/2003)

Internet Survivability and Resilience

Footnotes

Explore our related content

Stay up to date with RUSI