Fabrics have been the talk of the industry for the last few years but they are getting overshadowed by the buzz around software defined networking (SDN) and OpenFlow. It is easy to see why based on what I saw at last weeks Open Network Summit (ONS). There were dozens of vendors showcasing OpenFlow based solutions including six (Arista, Dell, Extreme, HP, IBM and vArmour) that were showing solutions based on the controller technology from Big Switch Networks.
One of the highlights of the show was the talk by Urs Hölzle from Google where he announced they are 100% OpenFlow across their inter-datacenter network. This is in stark contrast to the detractors that claim there are no OpenFlow networks in production today. Working on OpenFlow and SDN I certainly have information to the contrary but it’s difficult to announce it publicly when you work at a startup, especially when deployments are under NDA.
Now that there is a large scale publicly known OpenFlow network, I think its time to start putting this technology in the context of new and interesting architectures. You may take what I say with a grain of salt, or downright dismiss it as network heresy, and that’s ok. What I propose goes against everything we think we know about networks, which is exactly why I find it so compelling.
Building a (bigger) Better Bridge
Long ago we settled a debate in the network world that Layer 3 networks were more stable, easier to manage, and infinitely less complex to troubleshoot than Layer 2. Layer 3 was being pushed all the way down to the access layer, and in some cases right into the host. Then virtualization came along and Layer 2 became (begrudgingly) required again. The industry responded with the concept of a fabric which is essentially a scalable Layer 2 architecture. Unfortunately the ideal Layer 2 architecture from a server administrator’s point of view is one that spans the entire datacenter or even between data centers. This is well beyond the implementations of the most popular fabric offerings such as Cisco’s Fabric Path or Junipers QFabric, especially for web scale data centers. With the advent of OpenFlow the concept of building a massive bridged network, without the limitations of todays Layer 2 designs, is not only possible but achievable with the technology on the table today
The inner workings of OpenFlow are fairly simple. OpenFlow centers around the ability to program a flow table in a switch by matching patterns in a packets and performing actions on that packet. The most basic example would be for any packet with a source MAC of aa:aa:aa:aa:aa:aa and a destination MAC of bb:bb:bb:bb:bb:bb forward it to port 20 on the switch.
As a point of clarity I have simplified the Flow Tables in all of the diagrams by referring to only two octects of each MAC address.
The single switch example explains the basic concept but interesting possibilities open up when you start building a network topology. A SDN controller (a control plane decoupled from the switches) can take a network wide view and program all of the switches along the path required to get packets from Host-A to Host-B.
Think about that for a few minutes because it has some serious implications. If a SDN controller can program a path from end to end, there is no more need for explicitly managing the physical topology. All links in the network become potential bandwidth and everything from STP and Trill to EIGRP and OSPF can be eliminated by an intelligent SDN controller. Even with a physically looped topology like the example below, source-destination paths are loop free, rendering physical loops irrelevant. Traffic (unicast, multicast and broadcast) can always take the path desired, wether that is the shortest path or a traffic engineered path becomes a questions of network policy.
Your network begins to look like one big bridged network without the traditional limitations of large scale Layer 2. Crazy I know, but it is very possible and networks across the globe are already in production using it.
Anyone with the slightest insight into switch internals will immediately say this is not scalable because you will overrun the MAC address tables in even the most robust switches very quickly. Considering a rack of virtualized servers can contain thousands or even tens of thousands of MACs the concern appears to be well founded on the surface. In reality two features of the OpenFlow architecture limit concern fairly quickly, at least at the access layer of your network.
First, flows expire in a relatively short amount of time (usually 5-30 seconds) if they are not in use. Meaning that if Host-A moves or doesn’t need to communicate with Host-B anymore the flows will be expire from the switches automatically. The net result is that the current number of flows in a switch represent an almost realtime picture of what’s connected to the switch. The second benefit is that the concept of wildcard flows can be employed for the purposes of matching. For example if multiple hosts on Switch-1 wanted to talk to Host-B on Switch-3 you may summarize them in a single flow such as if src.MAC=* and dst.MAC=bb:bb:bb:bb:bb:bb then forward to port 15.
Access layer (or virtual access layer) switches can handle the number of flows needed at any given moment, but clearly aggregating those up to a distribution layer creates a challenge; one that is actually simple to overcome.
MAC Based Routing
One of the key requirements of any fabric solution is that you present the source and destination with accurate L2 and L3 information about each other. The easiest way to do this today is to carry the L2/L3 headers all the way across the network using some form of tunneling. A number of solutions are using this method, including LISP, VxLAN, and NVGRE to architect and ‘overlay network’ on top of your physical network. With OpenFlow L2 information could be ‘recreated’ at the penultimate destination, thereby sparing the rest of the network from learning every MAC address, and eliminating the need for tunneling within a data center.
In the example below each switch is assigned a MAC address by the SDN controller. Because the controller knows the location of the source, destination and every switch in between simple rules can be programmed to ensure the packet gets from Host-A to Host-B, without the aggregation switch (Switch-2 in this example) learning any host MAC addresses using MAC rewrite (which can be done in hardware at line rate).
Larger networks with tens of thousands of possible switch destinations still risk overrunning the aggregation or core switch MAC address tables. Building a multi-tier rewrite approach can solve the problem with ease, requiring little but a more intelligent SDN controller. From here it doesn’t take too much imagination to express a possible switch destination with multiple MAC addresses that represent different QoS requirements, or any other network policy.
Taking this concept to a logical conclusion we can use both the source and destination MAC fields as one big playground for policy and routing decisions. Claiming some of the bit space for a network ID (or tenant ID) would be a simple way to eliminate VLANs, QinQ, VxLAN, VRFs and MPLS. Both MAC fields can easily be rewritten at the edge and devices in the middle of the network can use a combination of IP address and the network ID portion of the MAC addresses to provide isolation, route traffic, and apply other policy.
Networks have refused to evolve at the same pace as server virtualization technologies. SDN and OpenFlow have the potential to revolutionize networking and enable it to meet the challenges of todays business requirements. I can’t wait to see what the rest of the industry cooks up over the next few years.