# Routing

Elliot Schwartz <elliot@mit.edu>

## The Problem

How do we get packets from A to B?

## Assumptions

• Hosts
• Each host has an interface on a subnet.
• Subnets
• Each host on a subnet can communicate with any other host on the subnet.
• Routers
• Hosts can have multiple interfaces on different subnets.
• The machine can relay packets from hosts on one of its subnets to hosts on the other subnet.
So, we can view the network as an arbitrary topology of subnets connected by routers.

## High Level Solution

• The first clue is given by the problem statement itself: "How to we get packets from A to B?" First we need a way to identify or name A and B. Since computers like numbers, we'll assign them each a number. For now we'll assume the number has no special significance i.e. the number can be assigned randomly, but doesn't have to be.
• Prefixes

• We also need some way to identify a subsection of the network. By a subsection we could mean just a subnet, or even a group of subnets. To do this, we'll assign hosts that are "close" addresses with the same "prefix". We'll also assign these subsections netmasks or prefix lengths, to indicate how many of the bits all the hosts in the subsection have in common.
• Routing Tables

• Once we've got all the hosts addresses, there's a simple and obvious way to think about how routing works: We assume that each node has a list of all of the addresses and where to send a packet for that address. This list is called a routing table, and each entry in the routing table has an address and a destination for it. The destination is lower-level address of either a host that is on a subnet the router is connected to (if it's the final destination), or the next router that the packet needs to be forwarded to.
• Router Behavior

• Routers have a simple job. First, the receive a packet on one of their incoming interfaces. They examine the destination address of the packet, and look this up in their routing table. Then they forward it where it should go. This gets repeated in every router on the path until the packet reaches its final destination.
Note: This approach is called Next Hop Routing, since each router only knows and uses the next hop that the packet is going to take, and does a lookup based on the destination address. An alternative approach might be to carry a multi-hop route in the packet, which could be placed by either the source node or an intermediate router.

Now that you have an idea of the basic solution, we'll explain how this is actually done under IPv4.

• What Gets Addressed?

• Each destination on the Internet gets an address. Most people think of destinations as hosts, but in fact hosts can be attached to the network in multiple places, and each interface of a host is a destination. So, we assign an address to every interface on the Internet.

• IP addresses are 32 bits (4 bytes) long. This allows for 2^32 or 4294967296 (4.3 billion) different combinations. Internet addresses are most commonly seen in something referred to as "dotted quad notation". In this format, the 32-bit number is written as 4 8-bit decimal numbers, separated by periods. An example of this is "18.70.0.58". Internally, most computers will represent it as a 32-bit binary number - this is accomplished by converting each of the parts of the quad to binary and concatenating them. Very rarely one will also see the entire address written out as one decimal number, perhaps has part of a debugging print out.
• Expressing Prefixes

• Prefixes (sections of the network that are topologically close) are expressed by giving an address, and the number of bits of the address that all hosts in the section have in common. For example, all hosts on MITnet have addresses starting with 18 (the addresses are of the form 18.*.*.*.), so a prefix expressing this would be 18.0.0.0/8 (to imply the 8 left bits are in common), or 18.0.0.0 255.0.0.0 (where the length is expressed as a left-aligned bit-mask), or simply 18/8. All the hosts in Building E52 have addresses starting with 18.42., so 18.42/16 or 18.42.0.0 255.255.0.0 are ways the prefix might be described.

## Example Network

We'll use this example network for illustrating points. On the left side, we've got hosts a, b, and Z on a subnet numbered 10.0.1.0/24. This means that all the hosts on this subnet have addresses start with 10.0.1. A network like this might be an Ethernet. This network is connected to another subnet, 172.20/16, by the router Z. Similarly, on the right hand side, there's another subnet numbered 192.168.1/24, which has one host d and a router Y that connects it to 172.20/16. On 172.20/16 is a host c and a router X that routes to a point-to-point link with host e, numbered 192.168.2.16/30.

## Filling the Table

The next question is how do the hosts get routes in their routing tables. All hosts (including routers) get them from the following methods:

• Every host has a loopback address, which is used for testing. Packets with the loopback address as a destination address will never leave the machine i.e. they're delivered back to the machine internally. In most cases the loopback address of a machine is 127.0.0.1, and it uses a special interface called the loopback interface which takes the packet off of the outgoing queue, and puts it on the incoming queue rather than transmitting it across the network. So, on most machines there will be a routing table entry that looks something like:
Prefix 127.0.0.1/32 is delivered via lo0 (the loopback interface).

• Every interface on a host has an address and netmask associated with it. The netmask can be used to determine what hosts are on the subnet accessible through that interface. You'll notice that the notation for expressing the netmask is the same as that used for expressing IP prefixes - surprise! - netmasks are also routing information. So, for example, the host c can add the following routing table entry because of its knowledge of its interface:
Prefix 172.20.0.0/16 is delivered via en0 (the ethernet interface).
• Default Route

• Most hosts and routers have a "default route" that is used as the last-resort destination of a packet to be forwarded. (i.e. if nothing else more specific matches) This route is typically represented by 0.0.0.0/0, a zero-bit prefix (which means that no part of the prefix is significant / needs to match). In fact, on the Internet only a small subset of the routers in the very core of the network are "default-free"; these routers keep a route for every globally-visible prefix, so they generally have a lot of memory (~64 Mb). On machines that do have default routes, the route is either manually configured or found by a Router Discovery protocol.
• Manual

• For manual routes, the user of a machine just enters a "default gateway" when they configure the machine.
• Router Discovery [RFC1256]

• The Router Discovery protocol is used if there's more than one router on a subnet that would make a good default route, and some redundancy is desired, or if one wishes to avoid having to configure default routes into each machine on a subnet individually. In this protocol, routers announce their existance via periodic multicast ICMP messages that contain a preference number. Hosts listen to these router announcement messages and select the router with the best preference number. Router Discovery is not widely used on the Internet today.
• ICMP Redirects [RFC792][RFC950]

• If there are multiple routers on a subnet, the default route a machine has may not actually be the best route for a particular destination. If the packet is sent to the default router, the router will still forward the packet to the right destination (one of the other routers on the subnet), but it may also send the machine an ICMP Redirect, which tells the machine the router that it should send the packet directly to. The machine can then add a route for this to its routing table, and save itself a hop. How does the router know that it should send an ICMP Redirect? It can easily tell that one should be sent by looking for packets that are sent out the same interface they came in. ICMP Redirects are really only used between end hosts and their first-hop routers.

As an example, consider host c which may have a default route pointing to router Y. The first time c sends a packet to host a it will send it to Y. Y will forward it to the correct router (Z), which then passes it on to a. Y may also send an ICMP Redirect back to c which would say "Next time you need to send a packet to a, send it to Z instead of to me." The following entry would be added to c's routing table:
Prefix 10.0.1.1/32 (a) is delivered via 172.20.0.1 (Z).

• Static Routes

• ICMP Redirects help in the above situation, but as you can see, if c then went and sent a packet to b, it would repeat the whole ICMP Redirect procedure again. Not only does this add to network bandwidth, it also creates extra very-specific entries in c's routing table. A better way to fix this might be to add a static route to c's routing table, so that it knows that all the hosts on a and b's subnet are reachable through Z. Static routes are manually configured by a user. The static route in question would look like:
Prefix 10.0.1.0/24 (a and b's subnet) is delivered via 172.20.0.1 (Z).
One can see how a packet being fowraded to either node a or b will match this prefix, since they both have addresses starting with 10.0.1.

## Routing Protocols

The routers on the Internet need a way to communicate toplogical information (how various subnets are connected) and reachability information (what links/subnets are up or down) among themselves. To do this, they use a bunch of routing protocols. Some hosts also participate in routing protocols, mostly by eavesdropping on them to figure out where to send a particular packet, but this is considered a bad thing. The routing protocols provide dynamically updated routing, adjustment of routing tables during failures, "optimal" paths, policy routing, and reduced administration.

There are two main kinds of routing protocols on the Internet: Internal and External. Internal routing protocols are used within an Autonomous System or AS, which is a set of routers under a common administrative control (for example, MITnet). External routing protocols are used to communicate between autonomous systems. Internal routing is largely concerned with optimizing the path length to reduce delay, since within an organization one generally wants to achieve the best connectivity one can, regardless of where the packets go. Common Internal routing protocols are RIP [RFC1058], RIPv2 [RFC1723], OSPF [RFC2178], and ISIS. External routing is largely concerned with enforcing policy. For example, only an ISP's customers should be able to use its infrastructure for through-traffic. The External routing protocol currently used on the Internet is BGP4 [RFC1771]. The following diagram illustrates the role of Internal and External routing protocols:

• Static Routing

• All the automatic transmission and updating of routing information that routing protocols do is very handy, but the original information needs to come from somewhere. Sometimes all of it gets picked up from describing the interfaces and netmasks that connect to the routers, but sometimes you need to tell some routes more information about how you want routing to work. This information is configured manually as static routes on routers.
• Internal

• Internal routing protocols are used to communicate between the routers within an autonomous system. They share routing information with each other in order to find the "optimal" route to a destination. What "optimal" means, is optimal with respect to some metric. Values of the metric are distributed either implicitly or explicitly with the routing information. There are two main styles of routing protocols used for Internal routing:

• Distance Vector (RIP, RIPv2)

• Distance Vector routing protocols used the Bellman-Ford shortest paths algorithm. In this algorithm, the metric is the number of hops (i.e. router traversals) between two machines or prefixes. The general approach is that each router tells its neighbours about all the network it knows about and how many hops away they are. The neighbors then add their distance to the number, and use the route if they don't have a route to that network, or if the route is lower cost than the one they have. This results in rougly O(N^3) computation, where N is the number of subnets.

One problem with Distance Vector routing is the Count to Infinity problem. When there are multiple paths between two routers, and a link on the one being used breaks, sometimes a routing update will already be in transit around the long way, and indicate there's still a route. This gets advertised back and forth among the routers, with the metric growing by one each time, until it gets to some maximum defined by the protocol. The first solution to this was something called Split Horizon, which simply says "don't announce routes over the link that the route routes over." A later solution which works even faster is called Poisor Reverse, where the route is announced over that link, but with an infinite metric.

However, even with these fixes, Distance Vector routing protocols are still fairly slow, so long transient loops can occur. (There is a proprietary routing protocol designed by Cisco Systems called EIGRP which uses a Distance Vector routing protocol but doesn't have these problems.)

• Link State (OSPF, ISIS)

• Link State routing protocols are based on Dijkstra's shorest paths algorithm. However, in order to run Dijkstra's algorithm, each router needs to have a complete map of the network topology / availability. To do this, each router multicasts a list of all the links (subnets) it's connected to and their status (a metric) to all the other routers in the AS. Once the routers have received this information, they can independantly run the algorithm on their own nearly-identical link-state databases to create their routing tables. This requires rougly O(N^2) computation, where N is the number of subnets.

• External

• The idea of External routing is to choose and advertise routes according to an administrative policy. Policies can be economic ("only send routes to people who pay me"), political ("only send routes to universities"), or technical ("send this group of routes to the particular AS, since we're closest to them"). In reality, policies are usually written in much lower level terms by using lists of ASes, tags, and expression matching, but they are based on those high-level ideas.

The way one measures the effectiveness of an addressing and routing system is not by looking at the design, but rather by seeing what impact the system has on the Internet and how it makes it perform. There are a number of different dynamic characteristics we can look at to consider performance:
• Routing Protocol Traffic

• All of this exchanging of information that the routing protocols do takes place on the same wires as all the other Internet traffic. Therefore, we'd like to keep the amount of routing protocol traffic low so that the bandwidth is available for actual applications to use.
• Stability

• Fluctuations in routing can cause rapidly varying return trip times, misordered packets, or even short outages and loops. These all are bad for end-to-end throughput.
• Table Size

• Routers need to have enough memory to store the routing table. The size of the table is strongly affected by address allocation policy.
• Shortest Path Computation

• The shortest path algorithms aren't cheap - appropriate address allocation also reduces to computation time.

• There's only a finite address space, so inefficient allocation will lead to premature exhaustion.
These all argue for minimizing the number of routes!

To minimize the number of addresses, we use an approach called hierarchicial addressing.
• Aggregates

• Aggregates are a way to assosicate topologically-close groups of nodes. Within a group, node know the detailed location of hosts in the group. However, outside of a group nodes only know the way to get to the group as a whole. The basic unit of aggregation is the subnet, since all hosts on a subnet already know about each other (by virtue of their lower-level connectivity). Higher units of aggregation can be any cluster of subnets.
• Implementation

• Aggregates are implemented using prefixes. That is, related addresses start with the same prefix. Broad routes that cover large groups of addresses will have short prefixes, while more specific routes that cover smaller groups have longer prefixes. Now, instead of just doing a pure table lookup we make a "longest match" i.e. pick the most specific route that our address fits into. Most routers do this by storing the routing table in a radix tree.
• Requirements

• In order for hierarchical addressing to work, the topology needs to be hierarchical too. The Internet is mostly hierarchical: there are a few national ISPs which have POPs in various regions, or sell to regional ISPs. These ISPs have a collection of customers, and each customer's network is usually further divided into a bunch of subnets.
• Results
1. The nodes are really simple. The need to know who they are and what subnet they're on, but everything else is just out there, past the default router on their subnet.
2. Routers don't have to deal with entries for every single hosts, just subnets, or groups of subnets. In fact, groups of subnets can be as big as "the network of company X", or "North America", or "the rest of the Internet".