A New Open Data Center Network: Disaggregating the Network Operating System from Switch/Router Gear

June 25, 2013
By

Introduction:

SDN proponents cite the proprietary, closed and tightly integrated hardware/software architectures of switch/routers being an impediment towards achieving network agility, more efficient bandwidth utilization, and lower costs.  That’s because those boxes are complex, aren’t generally inter-operable with other vendor gear, and are almost impossible to control using external software.

With more intelligence built into silicon from companies like Broadcom, Intel, Marvell, and others, there are fewer advantages of the proprietary, vertically integrated switch/ routers, which enjoy very high profit margins (for vendors like Cisco, Juniper, Brocade and even Arista Networks).   And those L2/L3 network fabrics can’t be easily controlled by external software to meet the needs of the applications that use them for connectivity.

James Hamilton, Distinguished Engineer at Amazon Web Services wrote in a blog post last week:

“Because networking gear is complex and, despite them all implementing the same RFCs, equipment from different vendors (and sometimes the same vendor) still interoperates poorly. It’s very hard to deliver reliable networks at controllable administration costs from multiple vendors freely mixing and matching. The customer is locked in, the vendors know it, and the network equipment prices reflect that realization.

Not only is networking gear expensive absolutely but the relative expensive of networking is actually increasing over time. Tracking the cost of networking gear as a ratio of all the IT equipment (servers, storage, and networking) in a data center, a terrible reality emerges.  For a given spend on servers and storage, the required network cost has been going up each year I have been tracking it. Without a fundamental change in the existing networking equipment business model, there is no reason to expect this trend will change.

What is missing is high quality control software, management systems, and networking protocol stacks that can run across a broad range of competing, commodity networking hardware. It’s still very hard to take merchant silicon ASICs packaged in ODM produced routers and deploy production networks. Very big datacenter operators actually do it, but it’s sufficiently hard that this gear is largely unavailable to the vast majority of networking customers.”

Enter Cumulus Networks:

A completely different approach was announced last week by Cumulus Networks- a stealth mode start-up that announced it has built a Linux-based operating system for “bare metal” or commodity switches.

Cumulus Networks, founded by x-Cisco engineers JR Rivers and Nolan Leake,  announced a Linux-based real time operating system to control the commodity switches (that use merchant silicon) being built by Quanta, Accton, Foxconn and other ODMs.  The company has raised money from investors that include Andreessen Horowitz, Battery Ventures and VMware founders Diane Greene and Mendel Rosenblum.

Cumulus will provide a simple, open, and stable Linux-based Network OS that enables ODMs and merchant silicon manufacturers to provide multiple, common commodity hardware platforms for switching and routing.  The company calls that “Disaggregating the Network” as shown in the before and after illustrations below.

The rise of merchant switch silicon and “bare metal” switches enables a paradigm shift to an open network ecosystem.

Cumulus Networks images showing the disaggregating of network gear from the Network Operating Systems.

Networking Gear and Networking OS
locked onto one vendor. Image courtesy of Cumulus Networks.

After

Image depicting an unlocked Network OS with commodity hardward.

Disaggregate the Networking Gear from the Network OS. Image courtesy of Cumulus Networks.

The company claims that:

“Cumulus Linux is the first true Linux OS.  It enables users to take full advantage of the latest industry standard networking hardware while enabling the latest Linux applications and automation tools, delivering new levels of agility, scalability and flexibility to the enterprise data center. Now you can choose from a number of native Linux applications, and third party applications can be integrated as add-on Linux packages to optimize your business. Cumulus Linux enables a consistent experience between the network and the compute, and it brings about the missing piece to fuel the next wave of scale, collaboration and innovation in networking, realizing the full extent of a Software Defined Data Center.”

The advantages of this new approach to data center networking are claimed to be:

  • Delivers unprecedented price-performance
  • Enables the next wave of scale, collaboration and innovation
  • Simplifies orchestration, automation and monitoring of networks
  • Consistent Linux-based toolsets for network and compute
  • Lower OPEX/CAPEX
  • Break free from vendor lock-in
  • Enables large ecosystem of native Linux apps

Cumulus plans to generate revenue through the licensing of th Linux OS, maintenance, support, and yet-to-be announced feature sets layered on top of the OS.  That is the same business model that Red Hat has around Linux for compute servers.

“Linux revolutionized the compute-side of the datacenter over the past 15 years. Having a common OS broke vendor lock-in, drove down server hardware cost, allowed scale-out architectures, and provided a common platform for innovations like virtualization. Meanwhile networking remained stagnant,” said JR Rivers, co-founder and CEO of Cumulus Networks. “Innovation is finally coming to the network, and we are bringing that same transformational impact that Linux has had on datacenter economics and innovation to the networking side of the house.”

Piston Cloud Computing, Inc., focused on enterprise OpenStack,  announced a technology partnership with Cumulus Networks last Wednesday, June 19th.

Whither SDN?

SDN is another way to open up the data center network and create a more open ecosystem.  However, Gigaom reported that SDN was not a hot topic at their annual Structures conference last week in San Francisco, CA.  “But SDN was also deemed not relevant for a variety of use cases, and it was also roundly declared a loser, and something that hasn’t really changed in the years since it has hit the network scene.”

Yet SDN continues to make progress.  More and more companies are joining the Open Network Foundation and the Open Daylight consortium. Dan Pitt, executive director of the Open Networking Foundation was quoted elsewhere as saying:

“The broad turnout for implementing OpenFlow 1.3, specifically using merchant silicon, at the PlugFest demonstrated growth in deployment of OpenFlow–based SDN.  The successful interoperability testing of OpenFlow 1.3 on hardware switches and controllers is a significant step towards offering our end users interoperable products for accelerated adoption of the OpenFlow protocol and SDN.”

Note:  Dan emailed me a comment which he was good enough to revise and post in the Comment box below this article.  Hopefully, someone with more knowledge of this subject than me will post a reply comment.

Industry Comments on Cumulus Networks:

Scott Thompson of FBR Research:

“Cumulus Networks opens the floodgates for price compression in network hardware. Cumulus Networks emerged from stealth-mode operation last Wednesday. We view Cumulus as a key strategic piece of the puzzle to enable hyperscale, service provider, and large enterprises to transition to bare metal switching and routing platforms. We view Cumulus as the first truly credible and scalable organization with the expertise necessary to unlock the value of truly commoditized hardware.We view the competitive and strategic threat enabled by CumuIus to have a much greater impact than the direct threat from the start-up itself.”

James Hamilton of Amazon:

“One of my favorite startups, Cumulus Networks, has gone after exactly the problem of making ODM produced commodity networking gear available broadly with high quality software support. Cumulus supports a broad range of ODM produced routing platforms built upon Broadcom networking ASICs. They provide everything it takes above the bare metal router to turn an ODM platform into a production quality router.  Included is support for both layer 2 switching and layer 3 routing protocols including OSPF (v2 and V3) and BGP.  Because the Cumulus system includes and is hosted on a Linux distribution (Debian), many of the standard tools, management, and monitoring systems just work. For example, they support Puppet, Chef, collectd, SNMP, Nagios, bash, python, perl, and ruby.

Rather than implement a proprietary device with proprietary management as the big networking players typically do, or make it looks like a CISCO router as many of the smaller payers often do, Cumulus makes the switch look like a Linux server with high-performance routing optimizations. Essentially it’s just a routing optimized Linux server.

Cumulus supported platforms include Accton AS4600-54T (48x1G & 4x10G), Accton AS5600-52x (48x10G & 4x40G), Agema (DNI brand) AG-6448CU (48x1G & 4x10G), Agema AG-7448CU (48x10G & 4x40G), Quanta QCT T1048-LB9 (48x1G & 4x10G), and Quanta QCT T-3048-LY2 (48x10G & 4x40G).”

Andreessen Horowitz

Andreessen Horowitz is betting heavily on the transformation of the datacenter from something that was traditionally hardware-centric to a new world where the intelligence lives in software. Nicira was an investment that addressed a key part of this, and now Cumulus Networks is filling another critical piece on the networking side,” said Peter Levine, Partner, Andreessen Horowitz. “The recent announcement from Facebook’s Open Compute Project underscored this need for a Linux OS for networking. Clearly the need is massive. And the opportunities for enterprises and service providers to drive massive new efficiencies in the datacenter is massive as well.”

VMware:

“There is an emerging network architecture being adopted by enterprises and service providers consisting of intelligent edge software, decoupled from the underlying physical network, running over general purpose network hardware. There are many benefits to this architecture, such as a software operational model and software innovation speeds, but the biggest benefit is customer choice,” said Hatem Naguib, vice president of networking and security, VMware. “Cumulus Linux provides customers more flexibility in choosing the underlying infrastructure used to deploy network virtualization from VMware.”

More information at:


Conclusions: #

The Cumulus approach is a clear alternative to SDN for open networking. It is compatible with VMware’s network virtualization and Opens Stack’s cloud orchestration software which appear as “applications” that accesses the Linux OS to control the physical network hardware (e.g. “bare metal” commodity switches made by ODMs).

The revolutionary disruptive force here is that the “bare metal” switch/routers must use the Cumulus Networks Linux based Network OS to control their operations and schedule real time tasks. That’s never been tried before in any networking equipment I’m aware of- ever! It remains to be seen how many ODM “bare metal” switches will support this new paradigm rather than use their own proprietary real time Network OS.

The “bare metal” switch/routers from ODMs are ultra low cost and use merchant silicon for switching/routing functions. Those ODMs (with or without a disaggregated Network OS) will threaten the high profit margin L2/L3 switch makers, especially Cisco and Juniper!
It will be very interesting to see how Cisco, Juniper, Brocade, Dell (Force10), and Arista Networks respond to this competitive threat.

Tags: , ,

17 Responses to A New Open Data Center Network: Disaggregating the Network Operating System from Switch/Router Gear

  1. Anonymous
    July 3, 2013 at 6:01 pm

    Great article, but where’s the follow up? One pundit wrote that he wants to love Cumulus Networks, but there are good precursors to their model that have been around for awhile.
    ” Is it too bold? Too soon? I wish I knew.”

    http://www.forwardingplane.net/2013/07/i-want-to-love-cumulus-networks/

  2. Anonymous
    June 28, 2013 at 4:12 pm

    The key to making the Cumulus Networks scheme work is Open Network Installer Environment (ONIE)- an installer environment for bare metal switches. It is a small, Linux Operating System (OS) that comes pre-installed on bare metal switches to enable complete separation between the networking gear and the network operating system. ONIE is network OS agnostic and allows manual and out of the box installs of a networking OS. This installer environment is available at github.com/onie/onie and is covered under the GNU General Public License v2 (GPLv2) agreement.
    http://cumulusnetworks.com/product/open-innovations/

    • June 28, 2013 at 7:49 pm

      Thanks for that info! I was wondering how the bare metal switches could totally separate their operations from the Network OS. Wonder how Cumulus will make money as they say their software is open source=free: ” Cumulus Networks is committed to delivering open source innovation, embracing the full potential of an open ecosystem”

  3. Anonymous
    June 28, 2013 at 10:40 am

    Research and Markets Adds Report: Global Software-defined Networking Market 2012-2016. Analysts forecast the Global Software-defined Networking market to grow at a CAGR of 151.12 percent over the period 2012-2016. One of the key factors contributing to this market growth is the increasing virtualization in network environments. The Software-defined Networking market has also been witnessing the technological awareness through the Open Networking Foundation. However, the uncertainty in the market due to it being the early adoption stages of SDN could pose a challenge to the growth of this market.
    http://www.researchandmarkets.com/reports/2365989/global_softwaredefined_networking_market

  4. anonymous
    June 27, 2013 at 2:21 pm

    Infonetics Research: Data center equipment takes hit in 1Q13!

    “Following 2 strong years of investment, growth in the data center equipment market is starting to slow,” notes Matthias Machowinski, directing analyst for enterprise networks and video at Infonetics Research. “While the near term outlook remains positive, ultimately we think the market is headed for a peak, as data center operators improve infrastructure utilization, and adoption of cloud services moves hardware consumption from enterprises to large-scale data center operators.” http://www.infonetics.com/
    …………………………………………………………………
    So all this hype about SDN, Network Virtualization, Disaggregated Network OS, bare metal/commodity switches, etc for the “new Data Center” will likely be a smaller total market than the new stakeholders thought! That means many start-ups will go belly-up, similar to what happened after the Dot Com/fiber optic boom went bust!

  5. Quang Dai Tam
    June 27, 2013 at 9:44 am

    Thanks for a great article with terrific comments! Here’s an interesting video chat featuring Rackspace Exec and Cumulus Networks’ co-Founder & CEO, JR Rivers

  6. June 25, 2013 at 10:40 pm

    Arista Networks provides a highly scalable proven network OS stack, designed to quickly integrate with whichever silicon vendor of choice maybe. The fact it is sold integrated with hardware from Arista does not preclude this model from changing in the future. Creating a strong network OS distro makes sense right now it is in the early stages will be interesting to see this evolve…

  7. June 25, 2013 at 6:18 pm

    Alan, I do not see the Cumulus approach as an alternative to SDN. I like their use of bare metal server-switches, but if all you do is lower the price point of complex L3 routers without changing the practice of distributed control and reliance on 6000 RFCs you haven’t solved the Opex problem, which in my view is bigger than the Capex problem. The infrastructure Cumulus advocates actually makes it very easy to add SDN and OpenFlow capability, through the same software download (of an OpenFlow client and management) that they use now for the routing code. I think the main novelty of Cumulus is that they rent the routing code (per router per month) and only bundle the hardware (at cost) if you insist.

    • June 26, 2013 at 8:40 pm

      Thanks Dan. I wonder if anyone will use SDN Open Flow protocol in the Cumulus Networks Linux OS model. If they do, won’t that produce a duplicate Control plane- one in the “SDN-Open Flow” client and another in the Routing protocol (OSPF or BGP) invoked in the Cumulus Linux OS, which resides in the “Bare Metal” switches? Cumulus does show Open Flow as one of several clients or “Linux Apps” in their 3rd illustration at: http://cumulusnetworks.com/rethink_network/

    • Al Koenig
      June 29, 2013 at 1:57 pm

      The key issue is distributed control planes like we have today in all routers. What advantage does SDN have by centralizing the control plane in a SDN Controller?

  8. June 25, 2013 at 5:52 pm

    Alan – I think I can answer your questions, although I am not affiliated with Cumulus in any way:

    1. The OS is not moved out of the bare metal switch. Modern switches have an x86-based control processor which is where the network OS runs – in essence they look like servers with lots of ports and a dedicated ASIC for packet processing. Most of the newer switches, like Arista, and my company’s (Plexxi) switches, also run Linux as the control OS. The difference in the Cumulus model is that they *sell* the SW separately from the HW. Whereas most vendors either custom build the HW or buy off-the-shelf or custom spec’d ODM hardware, ensure the HW and SW work well together and sell the combination as an “appliance”. Cumulus takes off-the-shelf ODM hardware and ensures their software runs well on it, and leverages channel partners to provide combined fulfillment of HW and SW (I believe).

    2. The control plane (which runs in the control OS, which runs on the bare metal switch control processor) runs the routing protocols and determines how to populate the forwarding tables in the Broadcom (or other vendor) silicon. So in essence the control planes that you reference are one in the same. Broadcom silicon does not do route computation and path selection by itself, it needs to be told what to do with packets and can forward them at high speeds based on table lookups that are populated by control plane protocols.

    • June 25, 2013 at 6:14 pm

      Mat, Many thanks for your explanation, which raises 2 new ?s:
      1. How do Applications and/or VMware’s Network Virtualization SW, that run on the Compute Server, gain access to the Linux Network OS if it is in the bare metal switches? How do those Higher Layer functions control or interact with the Physical Network of Bare Metal switches?
      2. (From a reader that emailed me): This scheme doesn’t end “the practice of distributed control and reliance on 6000 RFCs. Hence, it doesn’t solve the OPEX problem, which is bigger than the CAPEX problem.”

      • Mat Mathews
        June 26, 2013 at 8:01 am

        Depends on which applications you are referring to… The actual compute applications (e.g. Oracle, Exchange, etc) would not interact with the Linux Network OS in any way, at least in the current form as they have described in their literature. Currently their model is to run traditional routing/switching protocols on the Network OS – these protocols determine path by exchanging reachability information between devices – but they don’t accept external input from applications in any way.

        So essentially, the Cumulus model is very much traditional networking. Their innovation is in the pricing models and the fact that the Linux network tools (not sure if you are referring to these as apps) that normally configure the Linux SW network stack on a server now configure the Broadcom silicon.

        Newer “Software-define Networking” models look to change the nature of allowing distributed protocols to determine path and open up APIs for applications to tell the network what to do (via some sort of network controller SW) that is translated to network hardware specifics (table entries, etc).

  9. Ganesan
    June 25, 2013 at 9:53 am

    Cumulus Linux is quite interesting but making the switch look like a Linux server is not a new idea. Arista EOS essentially does the same thing. Quoting from “https://eos.aristanetworks.com/2011/05/the-joy-of-an-open-switch-operating-system/”:

    “A switch can be a plain old server with only one material difference: an extra PCI device, the forwarding ASIC, connected to a bunch of ports. But at its core, an Arista switch is really just a server.”

    EOS provides a Cisco like CLI wrapper, but you do get direct access to the bash shell and can install Fedora Linux rpms (Arista EOS is based on Fedora Linux).

  10. June 25, 2013 at 8:53 am

    Thanks Alan for writing this article. As you say Cumulus Networks’ solution seems like it could be very disruptive to the legacy vendors. It seems like this would make it a candidate to be acquired by one of those very same vendors.

    • June 25, 2013 at 3:09 pm

      Thanks Ken. Here are my comments about Cumulus Network’s open network model/ disaggregated Network OS from the L2/L3 hardware:
      1. The key unknown is how the “Bare Metal” commodity switches interface to the Linux Network OS to do various functions, e.g. task scheduling, routing/switching path selection, failure detection & recovery, etc. All other networking gear EVER produced uses their own (mostly proprietary) embedded OS to control all real time software operations. A good example is Cisco’s iOS.

      2. While the cost of the L2/L3 networking gear is drastically reduced, there is still the problem of distributed control planes in the interconnected “Bare Metal” switches. This is in sharp contrast to SDN which separates the control and data planes. The former is centralized (not distributed) while the latter is resident ins a data forwarding engine that does NOT do path set-up or re-routing the path on restoration after failure recovery.

Leave a Reply

Viodi Tweets

%d bloggers like this: