Building VPNs on OpenBSD - IPsec overview

2. IPsec overview

IPsec configuration on OpenBSD is a pretty easy and straightforward process, especially compared to most other implementations; nevertheless, IPsec is a rather complicated beast and a good working knowledge of its protocols and internals is essential to configure it and get it to work properly. Therefore, before beginning the configuration, let's take a brief tour of the IPsec protocols and features.

IPsec (IP security) is a suite of standard protocols designed to provide interoperable, high quality, cryptographically-based security [RFC4301] for protecting communications over IPv4 and IPv6 networks. The main security services offered by IPsec are:

Confidentiality: traffic is encrypted to ensure that only the legitimate receiver is able to access the data transmitted.
Connectionless integrity: ensures that no modifications were made to the data while in transit across the network.
Data origin authentication: the receiver is able to verify that data actually originates from the claimed source.
Detection and rejection of replays: duplicate IP datagrams are detected and processed only once.

These security services are provided at the IP layer (layer 3 of the OSI model), thus protecting all protocols that may be carried over IP, including IP itself.

2.1 IPsec protocols

Most of IPsec security services are provided using two traffic security protocols:

AH (Authentication Header): defined in [RFC4302], AH is used to provide connectionless integrity, data origin authentication and optional (at the discretion of the receiver) anti-replay protection for IP datagrams.
ESP (Encapsulating Security Payload): defined in [RFC4303], ESP offers the same set of services as AH (data origin authentication, connectionless integrity and anti-replay), plus confidentiality.

ESP is by far the most popular of the two protocols, since it provides confidentiality by encrypting network traffic, thus protecting transmitted data from passive attacks. On the other hand, AH provides stronger authentication than ESP as it protects part of the outer IP header as well as the next level protocol data, while ESP only protects the inner (encapsulated) IP header; however, this feature, in addition to not being of great use in most cases, also violates the modularization of the protocol stack (see [SCHNEIER], where the AH protocol is proposed for complete elimination).

AH and ESP may also be applied in combination with each other to exploit the strengths of both protocols but, in most real-world scenarios, ESP alone is enough.

Both ESP and AH support two modes of operation:

transport mode: IPsec protects only the payload of the IP packet (usually the transport layer data, hence its name), leaving the IP header, and thus routing, unchanged; transport mode can be used only for host-to-host communication;
tunnel mode: the entire IP packet is encrypted and/or authenticated and then encapsulated into a new IP packet; tunnel mode is typically used to connect either two remote networks or a host and a network; it is more flexible than transport mode, but imposes more bandwidth overhead;

The flexibility of tunnel mode allows it to fully supersede the functionality of transport mode, at the reasonable expense of a slightly higher bandwidth overhead. As a consequence, transport mode is rarely used in real-world VPNs and, just like AH, [SCHNEIER] suggests that transport mode be eliminated altogether, with the advantage of significantly reducing IPsec complexity.

In a nutshell, while ESP and tunnel mode are by far the most prevalent choice, AH and transport mode can be considered the black sheeps of the IPsec protocol family!

2.2 SA, SPI, SPD and other acronyms

To actually establish the VPN, the IPsec protocols require that some state data be shared between the VPN endpoints, such as the cryptographic algorithms for encryption and authentication, the keys used as input to the cryptographic algorithms, the current sequence number, the antireplay window and so on.

These data are held in a data structure called a Security Association (SA); SAs are created by a specific protocol, IKEv2 (defined in [RFC4306]), which also has the responsibility of mutually authenticating the two communicating parties, setting up the encrypted channel for secure information exchange (these steps are part of the so-called IKE phase 1) and negotiating the shared secret from which cryptographic keys are derived (IKE phase 2).

A Security Association applies to a single protocol (AH or ESP) and to a single direction of traffic flow; therefore, to secure typical, bi-directional communication between two IPsec-enabled systems, a pair of SAs (one in each direction) is required. IKE explicitly creates SA pairs in recognition of this common usage requirement [RFC4301].

SAs are collected in a Security Association Database (SAD), where they are uniquely identified by the combination of protocol (AH or ESP), destination address and an arbitrary 32-bit value called the Security Parameter Index (SPI). The SPI has the specific task of helping the receiver to identify the SA under which an incoming packet should be processed.

But how does IPsec decide which datagrams to send through the VPN and which not? For instance, in a typical site-to-site VPN scenario, the IPsec gateway will usually tunnel and/or protect only traffic between the remote LANs, leaving all other traffic unaffected. Well, IPsec makes such decisions based on policies, i.e. user-defined rules stating which packets should be protected using IPsec security services, which should be allowed to bypass IPsec protection and which should be discarded. IPsec policies are applied based on some specific fields in the datagram headers, called selectors, which include: source and destination addresses, Next Layer Protocol, source and destination ports (if used by the next layer protocol).

As with Security Associations, IPsec policies are held in a database, called the Security Policy Database (SPD), which must be consulted during the processing of all traffic (inbound and outbound), including traffic not protected by IPsec, that traverses the IPsec boundary.

2.3 The life of an IPsec packet

To recap, let's have a look at what the (brief) life of an IPsec packet looks like; we will consider the most common case: an ESP tunnel-mode VPN between two remote networks (see picture above). The story begins when the first gateway (GW1) receives an outbound packet from a host (Host1) within its internal network and destined for a host (Host2) on the remote network:

the gateway first compares the datagram's selector fields against the SPD to find the first matching policy;
the policy may specify one of three possible processing choices:
- DISCARD, the packet is not allowed to traverse the IPsec boundary and is dropped;
- BYPASS, the packet is allowed to cross the IPsec boundary without IPsec protection and will be routed normally;
- PROTECT, the packet must be afforded IPsec protection and the policy will point to zero or more SAs in the SAD;
in the present case, the gateway has a policy specifying that the datagram must be encapsulated with tunnel-mode ESP and sent to GW2;
if no SA exists for this policy, IKE will be invoked to negotiate the SAs with the appropriate peer;
the first matching SA(s) will be applied, providing the requested security services to the datagram;
the IP datagram will be encapsulated in ESP and the outer IP header will have the addresses of GW1 and GW2 as source and destination addresses respectively;

After a brief walk around the Internet, the encapsulated packet hits the second gateway (GW2):

the datagram is checked to see whether it contains an IPsec header; if not, the datagram is forwarded normally;
using the destination address, the SPI and the type of IPsec header of the incoming datagram, the gateway determines which SA to use; if no matching SA is found, the packet is dropped;
if antireplay is activated, the sequence number is checked for validity;
the packet is decrypted and/or authenticated as specified by the SA;
the gateway locates the SPD entry that applies to the datagram based on its selectors and verifies that the SA(s) applied in the previous steps match with SA(s) specified by the policy;
the packet is decapsulated and forwarded to next hop or to the appropriate transport protocol.