gdt: Kangaroo road sign (Default)
Glen Turner ([personal profile] gdt) wrote2013-07-07 08:27 am

SDN: Week 2: Control plane and data plane separation

Introduction

Control plane: logic control forwarding plane
eg: routing protocols, firewall configuration

Data plane: forwards traffic according to configuration by control plane
eg: IP forwarding, ethernet switching

Why separate?

  • Independent evolution and development. Especially software control of network.

  • Control from a single high-level software program. Easier to reason and debug.

Why does it help?

  • Routing

  • Enterprise networks. Security

  • Research networks. Coxistence with production networks.

  • Data centres. VM migration. eg: Yahoo have 20,000 hosts, 400,000 VMs. Want sub-second VM migration. Program switches from a central server, so that forwarding follows migration.

  • eg: AT&T filtering DoS attacks. IRSCP (commercialised RCP) will insert a null route to filter DoS at network edge.

Challenges for separation

  • Scalability. Control element responsible for thousands of forwarding elements

  • Reliability and security. What if a controller fails or is compromised.

Opportunities for control and data separation

Two examples

  • New routing services in the wide area. Maintenance, egress selection, security.

  • Data centres. Cost, management.

Example 1. Wide area.
There are a few constrained ways to set interdomain routing policy: BGP.
Limited knobs, no external knowledge (time of day, reputation of route, etc)
Instead of BGP route controller updates forwarding table.

Example 1. Maintenance dry-out
Planned maintenance of a edge router
Tell ingress routers to avoid the router with pending maintenance.
Too difficult to do in existing networks, eg, buy tuning OSPF.

Example 2. Customer controlled egress router
Customer selects data centre they want to use
Difficult in existing networks, as routing uses destination prefix.

Example 3. Better BGP security.
Offline we can determine reputation of a route. But this can't be incorporated into BGP route selection.
Off-line anomaly detection. Prefer "familiar" routes over unfamiliar routes. RCP tells routers to avoid odd routes.

Example 4. Data Centres, costs
Reduce cost. 200,000 servers. A 20x fanout gives 10,000 switches. Huge saving between $1000 per switch ($10m) and $5000 per switch ($50m). That's $400m for Google, Facebook, Yahoo, etc.
So these networks run a separate control plane on merchant silicon. Tailor network for services. Quick innovation.

Example 5. Data Centres, addressing
Layer 2 addressing: less configuration, bad scaling
Layer 3 addressing: use existing routing protocols, good scaling, but high administration overhead.
Use layer 2 addressing, but to make the addresses topology-specific rather than topology-independent.
MAC addresses depend where they are in the topology.
Hosts don't know they have MAC address re-assigned, so how is ARP done? Destination host won't respond.
A "fabric manager" will intercept ARPs, it then replies with the topology-dependent Pseudo-MAC (pMAC).
Switches re-write MAC addresses at network edge to hosts.

Example N. Others
Dynamic access control
Mobility and migration
Server load balancing
Network virtualisation
Multiple wireless access points
eEnergy-efficient networking
Adaptive traffic monitoring
DoS detection.

Challenges of separating control and data planes

Scalability, reliability, consistency. Approaches in RCP and ONIX.

Scalability in RCP.
RCP must stores routes and compute routing decisions for all routers in the AS. That's a lot to do at a single node. Strategies to reduce this are
Eliminate redundancy: store a single copy of each route to avoid redundant computation.
Accelerate lookups: maintain indexes to identify affected routers. Then RCP computes routes only for routers affected by a change.
Punt: Only performs inter-domain (BGP) routing.

Scalability in ONIX.
Partitioning. Keep track of subsets of the network state. Then apply consistency measures to ensure consistency between the partitions. Choice of strong and weak consistency models to select correctness versus computation tradeoff.
Aggregation. A hierarchy of controllers. ONIX controllers for departments or buildings, then a super-controller for the domain.

Reliability in RCP.
Replicate. RCP has a hot spare. Each replica has its own feed of routes, recieving exactly same inputs, running exact same algorithms, so output should be the same. So no need for consistency protocol.
Consistency. But if different RCPs see difference routes then they will have different ouptuts. If the two replicas are inconsistent then they can install a routing loop. Need to guaruntee consistent inputs: for RCP that's easy as the IGP passes the full link-state to RCP. So RCP should compute next-hops only for routers it is connected to.
For example, one RCP in partitioned network. Only use candidate routes from partition 1 to set next-hops in partition 1.
For example, two RCP in partitioned network. Since the two RCPs have the same data from each partition from the IGP then they give the same output for each partition.

Reliability in ONIX.
Network failures. ONIX leaves it to applications to detect and recover.
Reachabilty to ONIX. Solve using typical network practices, such as multipath.
ONIX failure. Replication and distributed coordination protocol.

Summary.
Three issues:

  • Scalability. Making decisions for many network elements.

  • Reliability. Correct operation under failure of the network or controller.

  • Consistency. Ensure consistency between controller replicas, especially in partitioned networks.

Solutions:

  • Hierarchy

  • Aggregation

  • State management and distribution

Each controller uses a set of tactics from those available.

Tutorial

Download OpenFlow tutorial (URL fixed as per class e-mail).
You need a "host only adapter", vboxnet0. That will allow SSH to the VM from the localhost.
You get a Ubuntu login. mininet/mininet.
Configure host only adapter. Configure eth1 by running dhcp (eg, dhclient eth1). Do ifconfig to determine IP address, then connect using "ssh -X mininet@....".

Start mininet
sudo mn --topo single,3 --mac --switch ovsk --controller remote

"nodes" shows hosts, switches and controllers
To run a command on a host say {host} {command}, eg: h1 ifconfig
Should be able to start a XTerm with "xterm h1 h2 h3" is logged in with X11 forwarding.

Follow the tutorial and have a play around.