Elliot Schwartz <firstname.lastname@example.org>
How do we get packets from A to B?
So, we can view the network as an arbitrary topology of subnets connected
Each host has an interface on a subnet.
Each host on a subnet can communicate with any other host on the subnet.
Hosts can have multiple interfaces on different subnets.
The machine can relay packets from hosts on one of its subnets to hosts
on the other subnet.
High Level Solution
Note: This approach is called Next Hop Routing, since each router
only knows and uses the next hop that the packet is going to take, and
does a lookup based on the destination address. An alternative approach
might be to carry a multi-hop route in the packet, which could be placed
by either the source node or an intermediate router.
The first clue is given by the problem statement itself: "How to we
get packets from A to B?" First we need a way to identify
or name A and B. Since computers like numbers, we'll assign them each a
number. For now we'll assume the number has no special significance i.e.
the number can be assigned randomly, but doesn't have to be.
We also need some way to identify a subsection of the network.
By a subsection we could mean just a subnet, or even a group of subnets.
To do this, we'll assign hosts that are "close" addresses with the same
"prefix". We'll also assign these subsections netmasks or prefix lengths,
to indicate how many of the bits all the hosts in the subsection have in
Once we've got all the hosts addresses, there's a simple and obvious
way to think about how routing works: We assume that each node has a list
of all of the addresses and where to send a packet for that address.
This list is called a routing table, and each entry in the routing table
has an address and a destination for it. The destination is lower-level
address of either a host that is on a subnet the router is connected to
(if it's the final destination), or the next router that the packet needs
to be forwarded to.
Routers have a simple job. First, the receive a packet on one of their
incoming interfaces. They examine the destination address of the packet,
and look this up in their routing table. Then they forward it where it
should go. This gets repeated in every router on the path until the packet
reaches its final destination.
Now that you have an idea of the basic solution, we'll explain how
this is actually done under IPv4.
What Gets Addressed?
Each destination on the Internet gets an address. Most people
think of destinations as hosts, but in fact hosts can be attached to the
network in multiple places, and each interface of a host is a destination.
So, we assign an address to every interface on the Internet.
IP addresses are 32 bits (4 bytes) long. This allows for 2^32 or 4294967296
(4.3 billion) different combinations. Internet addresses are most commonly
seen in something referred to as "dotted quad notation". In this format,
the 32-bit number is written as 4 8-bit decimal numbers, separated by periods.
An example of this is "126.96.36.199". Internally, most computers will represent
it as a 32-bit binary number - this is accomplished by converting each
of the parts of the quad to binary and concatenating them. Very rarely
one will also see the entire address written out as one decimal number,
perhaps has part of a debugging print out.
Prefixes (sections of the network that are topologically close) are
expressed by giving an address, and the number of bits of the address that
all hosts in the section have in common. For example, all hosts on MITnet
have addresses starting with 18 (the addresses are of the form 18.*.*.*.),
so a prefix expressing this would be 188.8.131.52/8 (to imply the 8 left bits
are in common), or 184.108.40.206 255.0.0.0 (where the length is expressed as
a left-aligned bit-mask), or simply 18/8. All the hosts in Building E52
have addresses starting with 18.42., so 18.42/16 or 220.127.116.11 255.255.0.0
are ways the prefix might be described.
We'll use this example network for illustrating points. On the left side,
we've got hosts a, b, and Z on a subnet numbered 10.0.1.0/24.
This means that all the hosts on this subnet have addresses start with
10.0.1. A network like this might be an Ethernet. This network is connected
to another subnet, 172.20/16, by the router Z. Similarly, on the
right hand side, there's another subnet numbered 192.168.1/24, which has
one host d and a router Y that connects it to 172.20/16.
On 172.20/16 is a host c and a router X that routes to a
point-to-point link with host e, numbered 192.168.2.16/30.
Filling the Table
The next question is how do the hosts get routes in their routing tables.
All hosts (including routers) get them from the following methods:
Every host has a loopback address, which is used for testing. Packets
with the loopback address as a destination address will never leave the
machine i.e. they're delivered back to the machine internally. In most
cases the loopback address of a machine is 127.0.0.1, and it uses a special
interface called the loopback interface which takes the packet off of the
outgoing queue, and puts it on the incoming queue rather than transmitting
it across the network. So, on most machines there will be a routing table
entry that looks something like:
Prefix 127.0.0.1/32 is delivered via lo0 (the loopback interface).
Interface Address & Netmask
Every interface on a host has an address and netmask associated with
it. The netmask can be used to determine what hosts are on the subnet accessible
through that interface. You'll notice that the notation for expressing
the netmask is the same as that used for expressing IP prefixes - surprise!
- netmasks are also routing information. So, for example, the host c
can add the following routing table entry because of its knowledge of its
Prefix 172.20.0.0/16 is delivered via en0 (the ethernet interface).
Most hosts and routers have a "default route" that is used as the last-resort
destination of a packet to be forwarded. (i.e. if nothing else more specific
matches) This route is typically represented by 0.0.0.0/0, a zero-bit prefix
(which means that no part of the prefix is significant / needs to
match). In fact, on the Internet only a small subset of the routers in
the very core of the network are "default-free"; these routers keep a route
for every globally-visible prefix, so they generally have a lot of memory
(~64 Mb). On machines that do have default routes, the route is either
manually configured or found by a Router Discovery protocol.
For manual routes, the user of a machine just enters a "default gateway"
when they configure the machine.
Router Discovery [RFC1256]
The Router Discovery protocol is used if there's more than one router
on a subnet that would make a good default route, and some redundancy is
desired, or if one wishes to avoid having to configure default routes into
each machine on a subnet individually. In this protocol, routers announce
their existance via periodic multicast ICMP messages that contain a preference
number. Hosts listen to these router announcement messages and select the
router with the best preference number. Router Discovery is not widely
used on the Internet today.
ICMP Redirects [RFC792][RFC950]
If there are multiple routers on a subnet, the default route a machine
has may not actually be the best route for a particular destination. If
the packet is sent to the default router, the router will still forward
the packet to the right destination (one of the other routers on the subnet),
but it may also send the machine an ICMP Redirect, which tells the
machine the router that it should send the packet directly to. The machine
can then add a route for this to its routing table, and save itself a hop.
How does the router know that it should send an ICMP Redirect? It
can easily tell that one should be sent by looking for packets that are
sent out the same interface they came in. ICMP Redirects are really
only used between end hosts and their first-hop routers.
As an example, consider host c which may have a default route
pointing to router Y. The first time c sends a packet to
host a it will send it to Y. Y will forward it to
the correct router (Z), which then passes it on to a. Y
may also send an ICMP Redirect back to c which would say "Next
time you need to send a packet to a, send it to Z instead
of to me." The following entry would be added to c's routing table:
Prefix 10.0.1.1/32 (a) is delivered via 172.20.0.1 (Z).
ICMP Redirects help in the above situation, but as you can see, if
c then went and sent a packet to b, it would repeat the whole
ICMP Redirect procedure again. Not only does this add to network bandwidth,
it also creates extra very-specific entries in c's routing table.
A better way to fix this might be to add a static route to c's routing
table, so that it knows that all the hosts on a and b's subnet
are reachable through Z. Static routes are manually configured by
a user. The static route in question would look like:
Prefix 10.0.1.0/24 (a and b's subnet) is delivered via 172.20.0.1
One can see how a packet being fowraded to either node a or
b will match this prefix, since they both have addresses starting
The routers on the Internet need a way to communicate toplogical information
(how various subnets are connected) and reachability information (what
links/subnets are up or down) among themselves. To do this, they use a
bunch of routing protocols. Some hosts also participate in routing protocols,
mostly by eavesdropping on them to figure out where to send a particular
packet, but this is considered a bad thing. The routing protocols provide
dynamically updated routing, adjustment of routing tables during failures,
"optimal" paths, policy routing, and reduced administration.
There are two main kinds of routing protocols on the Internet: Internal
and External. Internal routing protocols are used within an Autonomous
System or AS, which is a set of routers under a common administrative control
(for example, MITnet). External routing protocols are used to communicate
between autonomous systems. Internal routing is largely concerned with
optimizing the path length to reduce delay, since within an organization
one generally wants to achieve the best connectivity one can, regardless
of where the packets go. Common Internal routing protocols are RIP [RFC1058],
RIPv2 [RFC1723], OSPF [RFC2178],
and ISIS. External routing is largely concerned with enforcing policy.
For example, only an ISP's customers should be able to use its infrastructure
for through-traffic. The External routing protocol currently used on the
Internet is BGP4 [RFC1771]. The following diagram
illustrates the role of Internal and External routing protocols:
All the automatic transmission and updating of routing information
that routing protocols do is very handy, but the original information needs
to come from somewhere. Sometimes all of it gets picked up from
describing the interfaces and netmasks that connect to the routers, but
sometimes you need to tell some routes more information about how you want
routing to work. This information is configured manually as static routes
Internal routing protocols are used to communicate between the routers
within an autonomous system. They share routing information with each other
in order to find the "optimal" route to a destination. What "optimal" means,
is optimal with respect to some metric. Values of the metric are distributed
either implicitly or explicitly with the routing information. There are
two main styles of routing protocols used for Internal routing:
Distance Vector (RIP, RIPv2)
Distance Vector routing protocols used the Bellman-Ford shortest paths
algorithm. In this algorithm, the metric is the number of hops (i.e. router
traversals) between two machines or prefixes. The general approach is that
each router tells its neighbours about all the network it knows about and
how many hops away they are. The neighbors then add their distance to the
number, and use the route if they don't have a route to that network, or
if the route is lower cost than the one they have. This results in rougly
O(N^3) computation, where N is the number of subnets.
One problem with Distance Vector routing is the Count to Infinity problem.
When there are multiple paths between two routers, and a link on the one
being used breaks, sometimes a routing update will already be in transit
around the long way, and indicate there's still a route. This gets advertised
back and forth among the routers, with the metric growing by one each time,
until it gets to some maximum defined by the protocol. The first solution
to this was something called Split Horizon, which simply says "don't announce
routes over the link that the route routes over." A later solution which
works even faster is called Poisor Reverse, where the route is announced
over that link, but with an infinite metric.
However, even with these fixes, Distance Vector routing protocols are
still fairly slow, so long transient loops can occur. (There is a proprietary
routing protocol designed by Cisco Systems called EIGRP which uses a Distance
Vector routing protocol but doesn't have these problems.)
Link State (OSPF, ISIS)
Link State routing protocols are based on Dijkstra's shorest paths
algorithm. However, in order to run Dijkstra's algorithm, each router needs
to have a complete map of the network topology / availability. To do this,
each router multicasts a list of all the links (subnets) it's connected
to and their status (a metric) to all the other routers in the AS. Once
the routers have received this information, they can independantly run
the algorithm on their own nearly-identical link-state databases to create
their routing tables. This requires rougly O(N^2) computation, where N
is the number of subnets.
The idea of External routing is to choose and advertise routes according
to an administrative policy. Policies can be economic ("only send routes
to people who pay me"), political ("only send routes to universities"),
or technical ("send this group of routes to the particular AS, since we're
closest to them"). In reality, policies are usually written in much lower
level terms by using lists of ASes, tags, and expression matching, but
they are based on those high-level ideas.
The way one measures the effectiveness of an addressing and routing system
is not by looking at the design, but rather by seeing what impact the system
has on the Internet and how it makes it perform. There are a number of
different dynamic characteristics we can look at to consider performance:
These all argue for minimizing the number of routes!
Routing Protocol Traffic
All of this exchanging of information that the routing protocols do
takes place on the same wires as all the other Internet traffic. Therefore,
we'd like to keep the amount of routing protocol traffic low so that the
bandwidth is available for actual applications to use.
Fluctuations in routing can cause rapidly varying return trip times,
misordered packets, or even short outages and loops. These all are bad
for end-to-end throughput.
Routers need to have enough memory to store the routing table. The
size of the table is strongly affected by address allocation policy.
Shortest Path Computation
The shortest path algorithms aren't cheap - appropriate address allocation
also reduces to computation time.
There's only a finite address space, so inefficient allocation will
lead to premature exhaustion.
To minimize the number of addresses, we use an approach called hierarchicial
Aggregates are a way to assosicate topologically-close groups of nodes.
Within a group, node know the detailed location of hosts in the group.
However, outside of a group nodes only know the way to get to the group
as a whole. The basic unit of aggregation is the subnet, since all hosts
on a subnet already know about each other (by virtue of their lower-level
connectivity). Higher units of aggregation can be any cluster of subnets.
Aggregates are implemented using prefixes. That is, related addresses
start with the same prefix. Broad routes that cover large groups of addresses
will have short prefixes, while more specific routes that cover smaller
groups have longer prefixes. Now, instead of just doing a pure table lookup
we make a "longest match" i.e. pick the most specific route that our address
fits into. Most routers do this by storing the routing table in a radix
In order for hierarchical addressing to work, the topology needs to
be hierarchical too. The Internet is mostly hierarchical: there are a few
national ISPs which have POPs in various regions, or sell to regional ISPs.
These ISPs have a collection of customers, and each customer's network
is usually further divided into a bunch of subnets.
The nodes are really simple. The need to know who they are and what
subnet they're on, but everything else is just out there, past the default
router on their subnet.
Routers don't have to deal with entries for every single hosts, just subnets,
or groups of subnets. In fact, groups of subnets can be as big as "the
network of company X", or "North America", or "the rest of the Internet".
Some of you may have heard of things like "Class A" addresses. These
are part of a historical system called Classfull Addressing where the netmask/prefix
information was encoded in the address. The address space was divided up
into a bunch of different sections called classes, and all addresses in
a class had the same fixed and implicit prefix length. Since there were
only a few prefix lengths (8, 16, and 24 bits) this created the "three
bears" problem: 16-bit prefixes were "just right" for most organizations,
and became scarce quickly. It also caused problems when people wanted to
expand, since they had to renumber all their hosts into a higher class,
or inject more routes onto the Internet. These problems led to an inefficient
allocation of the address space and a higher routing load on the Internet.
Thankfully most network devices now support Classless Addressing, though
a lot of the damage caused by classfull addressing is still around.
ICMP Router Discovery Messages. S. Deering. Sep-01-1991.
Internet Control Message Protocol. J. Postel. Sep-01-1981.
Internet Standard Subnetting Procedure. J.C. Mogul, J. Postel. Aug-01-1985.
Routing Information Protocol. C.L. Hedrick. Jun-01-1988.
RIP Version 2 - Carrying Additional Information. G. Malkin. November 1994.
OSPF Version 2. J. Moy. July 1997.
A Border Gateway Protocol 4 (BGP-4). Y. Rekhter & T. Li. March 1995.