Saturday, December 31, 2011

Locator and Identifier Seperation Protocol (LISP) - One more tunnel protocol


In 2012, I think that there would be focus on two technologies in network infrastructure market - SDN and  LISP (Locator and Identifier Separation Protocol).  LISP work is going on for few years and it seems to be talked about quite often in recent past.

Why LISP?

The reasons for LISP is detailed very well in the RFC 4984.  Some points of RFC 4984 are worth noting down and I am mentioned them here.

Multihoming

Internet presence is now part of business model of many organizations. Hence high availability of connectivity to Internet is becoming very important to organizations.   High availability is being achieved by having multiple links to ISPs and also multiple links to different ISPs.   Multiple links are used for load balancing the traffic as well as for redundancy.

Customers (companies) get IP address block (subnet) from the ISPs  and this address block is used by organizations to assign IP addresses to the machines that needs to be reachable from external nteworks.  Since each ISP would assign different blocks,  critical machines are provided with multiple IP addresses - one from each ISP assigned block.  Operating systems and routing protocols running in the machines and routers would ensure that the right IP addresses of the active links are used.  Each machine operating system should have this intelligence so that connections from applications running on operating sytem's TCP/IP stack are assigned with active IP addresses.  Since multiple IP addresses are assigned to a  machine,  machine is termed as multihomed machine. This concept is called multihoming. 

Even though above scheme works in general, existing active connections would get terminated if the link associated with IP addresses of the connections go down.  This could result in lost voice calls,  termination of very important TCP/IP connections.  This is one problem with provider assigned (PA) IP addresses (also called Provider Aggregatable addresses).   There is no issue for new connections though as routing protocols propagate this information.  Note that service providers don't allow the packets having source IP address other than the IP address block assigned by them from their customer networks.  Due to this, packets belonging to  active connections can't be sent onto links of other  service providers.  This is  one of the challenges organizations have with multihoming.

Second challenge with multihoming is that the propagation of active routes and links to each machine.  All machines that can be reachable from external networks need to have routing protocols implemented.  As you all know, end nodes typically don't have routing protocols enabled to not increase the maintenance headache for IT department.

Third challenge is multihoming for inbound connections to the organizations. When a link is down, somehow remote systems should not be using the addresses associated with the down links.   Typically, this is achieved by having FQDN (full qualified domain name) to each internal server machine and updating the DNS Server  with the IP addresses of active links.  That is, DNS Server can't just be using static information.  DNS Servers should be informed of  changes as soon as possible.  Even though this can be done at local DNS Servers level,  many DNS resolvers in Internet might have cached this information and remote systems continue to use down IP addresses for some time.  This can be achieved by not sending DNS response with 0 TTL, but that would increase the load on the local DNS Servers.

Finally,  organizations would like certain type of traffic (both inbound and outbound connections) to use some links over other links for several reasons such as cost of the link,  time of day etc..  

Basically, traffic engineering for outbound and inbound connections have good number of challenges and the techniques to overcome these challenges have limitations as described above.

Provider Independent Addresses:

Finally,  to address the issues of Multihoming and Traffic Engineering,  RIR (Regional Internet Registrar) introduced a policy document allowing organizations to request provider independent addresses (PI addresses). Provider Independent addresses are expected to be routed by all service providers.  That is, packets coming from their customers with these IP addresses as source IP address is expected to be honored by the service providers.

Benefits of the PI addresses are obvious with above background,  but consolidated reasons are given below:
  • No need for multihoming support in end nodes, hence no need for enabling routing protocols in the end nodes.
  • Traffic Engineering is simple - No need for dynamically updating DNS Servers.
  • Simple to move to new service providers by organizations.  No renumbering the machines every time service provider is changed.
  • With acquisitions and mergers,  consolation of networks is simple.
With PA (Provider Assigned) addresses,  addresses are aggregatable.  Hence the routing entries used to be small in number in the routers.  With provider independent addresses, the routes can't be aggregatable and hence the routing table size increases dramatically. Routing table sizes of  DFZ (Default Free Zone) routers are going up dramatically due to PI addresses.  According to BGP Routing Analysis Report, the number of routes in the DFZ routers went from 5000 in year 2000 to 400,000 in year 2011.  With IPv6 popularity and more liberal assignment of PI addresses in IPv6, it will not be a surprise where the number of routes in DFZ routes going to millions in next few years.  Since the routing table is referred by DFZ & service provider  routers  for every packet that is coming in,  more routes in the table reduces the performance of the router and hence the performance of overall Internet.

LISP is mainly born to address the issue of the scaling in DFZ routers.


Basic concept of LISP:

LISP is trying to keep the advantages of Provider Independent Addresses to the organizations and keep the routing table to reasonable size by using aggregatable addresses.  To address this,  LISP proposes two addressing schemes - Identifier Address space and  Locator Address space.  Identifier address space is similar to  provider independent address space. Organization are expected to assign addresses from allocated space to individual network elements.  Locator space is also assigned to organization, but it is it provider aggregatable address space.  Hence, this space should not be used to assign addresses to all network elements.  This address space should be assigned to tunnel routers (LISP tunnel routers) only.   When the organization changes the service provider,  it should only need to worry about IP address assignment to LISP routers and no other change is expected.

LISP standards call Identifier address EID (Endpoint ID) and locator IP address RLOC (Router LOCator).

LISP router contains two functions - Ingress Traffic Router(ITR)  and Egress Traffic Router (ETR). Ingress and Egress terms are with respect to Endpoint network.  Since endpoint identifier space is not expected to be visible to the core network routers,  ITR encapsulates the traffic coming from the endpoint network with tunnels with LISP, UDP and IP headers and sends it out onto the Internet.  ETR is expected to decapsulate the traffic coming from the Internet and pass the internal packet to the endpoint network.   Typically, ITR and ETR are implemented in customer edge routers. Initially,  Enterprises might expect service providers to provide LISP service and eventually Enterprise routers will have this functionality. 

The IP addresses used in IP header of tunnel  are from the RLOC space.  Since RLOC space is provider aggregatable,  routing table size will not increase dramatically.  Please see the LISP draft for more information on the tunnel header formats.


How does it solve the issues/challenges discussed above?

Multihoming is no longer required in end nodes. But it is still required on LISP routers though - That is there would be requirement for multiple links from different providers for redundancy and traffic engineering.  Active connections will not be suffered if traffic is redirected to other links as endnodes work with EID space always and those IP addresses would continue to work, similar to provider independent addresses.  Outer IP header address of LISP tunnel would change when links go down and come back up.  That should be okay as these addresses are only used to get to the LISP ETR.

EID to RLOC mapping:

ITR needs to know the source IP and destination IP to be used for the tunnel header. ITR uses the destination IP (EID) of the packets coming in from the local network to determine the remote ETR RLOC IP address. It does this using mapping database.  Each ITR expected to maintain EID to RLOC cache.  If it does not find the matching entry in the cache, then it talks to mapping resolvers.  Mapping resolve servers uses the Mapping Database to figure out the destination ETR and lets the destination ETR to send actual EID to RLOC mapping to the requesting ITR.   Basically, Mapping resolves and mapping databases only used to find the ETR.  But ETR is the one which gives the EID to RLOC mapping to the ITRs.

ETRs are expected to register its RLOC with the mapping database for EID prefixes it controls.  This is done using MAP_REGISTER message.  ITRs send MAP_REQUEST message to get the EID to RLOC mapping to mapping resolvers.  Mapping resolvers use the map register database to know the RLOC of the ETR and translates the map request destination IP address with the RLOC to redirect the  MAP-REQUEST packet to the ETR.  ETR then replies using MAP-REPLY message with the actual RLOCs to be used by ITR.  One might ask why can't mapping resolvers itself sends the MAP-REPLY to the ITR.  ETR is given this opportunity to do inbound traffic engineering.   ETR can give different RLOC IP address for different type of traffic or use different link at different times etc..

Mapping database, mapping resolver servers and associated message formats are described in IETF draft LISP MAP Server interface.

In summary,  MAP resolvers and MAP database servers are used to index ETRs and ETRs are the ones which actually provides EID to RLOC mapping.

The challenge really is how the index database is implemented.  Note that this database can become big as all EID prefixes would be maintained in this database.  This database search needs to be fast and the database is updated by multiple ETRs.  Update database operation also need to be fast. Ofcourse serach operation needs to be very fast.  To take care of scalability issues, multiple database servers would need to be used.  It is also required to divide the database into multiple servers.  One proposal I see usage of DHT (Distributed Hash Table).

Please see following links:

Alternate network:  http://tools.ietf.org/html/draft-ietf-lisp-alt-1
There is a DHT alternative to this.

Summary:

Year 2012 would see LISP based routers.  Initial set of routers would have software implementation of LISP routing functionality.  Once the standards achieve certain level of  maturity,  one would see Ethernet controllers (standalone or Multicore based) vendors adopting this technology in hardware.

1 comment:

Anonymous said...

Good summary about LISP. It seems to me LISP just another protocol to allow virtualization of the network. When I read the article, I got a feeling that the LISP mechanism is very similar to Mobile IP. Router LOCator function is so much like the Home Agent in the Mobile IP context. One difference is Mobile IP architecture doesn't require or define the mapping dbase. To the organization, they treat the MIP no different from EIP. I guess LISP will work over two ISPs (i.e., multi-homing) while MIP won't. That may be the motivation behind LISP.