This is some highlights

1. Northbound API and applications
- wide area netowrking and interdomain routing
- programming and debugging

2. Control plane
- security, data leak prevention
- combining big data and network mgnt
- orchestration - coordination of increasingly diverse network devices

3. Data plane
- moving beyond (match, action)

New applications and services
- SDN is a tool, not a killer application

Wide area networking
- brittle
- indirect mechanisms
- dst prefix
- influence only neighbour ASs
- use of SDN in IX might be disruptive
- business relationships are coarse-grained, SDNs allows more subtlety.

Programming and debugging
- getting easier
- but still difficult
- composition between controllers is hard
- making debugging and testing easier

- currently no accountability
- security properties are difficult to verify and enforce
- data leaks common

Big data
- no existing technology takes advantage of huge amount of data about networks
- aim is intelligent network controllers

Orchestration and beyond (match, action)
- SDN is any controller controlling multiple data plane units
- still need a unifying control frameork for coordination of increasingly diverse network devices.

Summary: open problems
- northbound API: programming, new applications
- control plane: orchestration
- data plane: beyond (match, action)
SDN paradigm is switches performing (match, action), both match and action are limited
- no payload examination
- no advanced actions

Expanding this requires a practical approach
- extending OpenFlow
- implementing a richer data plane in the controller: flexible, but performance
- send traffic through custom devices: middleboxes, custom hardware

Middlebox orchestration -- "Slick"
- installs code on middleboxes
- if the middleboxes raise events then Slick handles them
- installs OpenFlow rules on switches to divert desired traffic through the middleboxes

Slick elements
- arbitrary code. Functions implement Slick API. Raises trigger at controller.
- self-describing manifest: hardware requirements (so only placed on a middlebox which can run the element), triggers, network requirements (eg, seeing both directions of traffic, or all traffic to a host, etc)

Slick application
- implements network policies: which elements to run on which traffic, how to react to changes in network conditions
- controller abstracts policy away from the application (where to place elements, how traffic should be routed)

Slick controller
- manages and configures a network of middleboxes. Resource discovery. Deploys elements. Ensures element availability in the face of failure.
- implements policies for the application. Where to place elements. How to steer traffic to elements.

Eg: Dynamic redirection
- inspect all DNS traffic with a DPI device
- if suspicious lookup, send to traffic scubber

Custom data plane orchestration
- put Custom Packet Processors in network
- use a device abstraction layer to allow programming of the Processors (individually, and as a fabric)

- big data
- encryption, transcoding, classification
- selective DPI

- SDN is broader than OpenFlow and (match, action)
- SDN is about separating data plane and control place
- That allows orchestration
Two challenges
1. Visibility of usage and problems
2. Controls to better manages resources, usage caps, etc

Home network configuration is complex and low level. Too much complexity is exposed to user. But still no visiblity or controll "how much data on quota", "which application is using my bandwidth", "how is using the bandwidth".

Use SDN to centralise control
- controller outside the home, monitoring the home router
- build GUIs on to of the controller

- based on resonance
- tracks usage caps by device, shift hungry devices from normal to capped state

Eg 1: Home network management
- outsource home mgnt
- key problem is usage control
- users don't understand how different devices use the cap
- UCAP -- users can see how different devices contribute. Users can the control and cap devices' use. All implemented by resonance.

Eg 2: Slicing home netowrk
- allow SPs (phone, video, etc) direct access to home infrastructure
- use flowspace concept to allocate network resources.
Google has two backbones: user and inter-data centre.

Google has many WAN-intensive applications. The cost per bit doesn't decrease as network size increases.
- complexity of pairwise interactions
- manual management
- non-standard vendor APIs

Solution: "WAN fabrics"
- manage it as a fabric, not as individual boxes
- many boxes and protocols are box-centric
- distributed routing protocols aren't aware of congestion caused by each other
- want centralised traffic engineering: better network utilisation, faster convergence, more control, can specify intent, mirror prodution event streams for testing, modern server equipment for controller (about 50x faster than a router's CPU)

SDN also helps testing
- decentralised networks require full scale testbed replica of network to test new TE features
- centralised control can use the real production network as input, use a replica controller with production binaries, but virtualised switches. This is exactly what is done with mininet.

- choose hardware based on features
- choose software based on protocols
- logical centralised TE: easier mgnt, easier testing
- separates monitoring, mgnt and ops from individual boxes.
Limitations of BGP
- routing only by dst prefix (no customisation by application, sender, etc)
- influence only over immediate neighbours, not the end-to-end path
- only indirect expression of policy (med, prepend, etc)

Evolve the inter-domain routing at a IX
- lots of people connect, to lots of benefit
- IXs looking for differentiation
- new applications (eg, streaming) create need for richer peering

Opportunities for SDN at IXs:
- freedom from routing constraints: matching different packet header fields, control messages from remote networks, direct control over data plane
- challenges: no existing SDN control framework, scaling issues as thousands of customers at IXs

What IXs can't do today:
- application-specific peering (eg, for video)
- redirect subsets of traffic to middleboxes
- traffic offloading (say two ISPs connecting to same transit at IX, automatically peer those ISPs rather than trombone traffic)
- prevent free-riding (dropping ingress traffic from non-peers)
- wide area load balancing (currently done through DNS, an indirect mechanism)

SDX initial design
- controller runs switches
- it takes routes and other attributes
- route selection function per AS
- load FIB entries into switch
- rules to rewrite packet headers

1. Controller recieves routes
2. Each participant submits a function to controller to select routes, rewrite headers, etc.
3. Controller pushed those rules to IX switches

- Pyretic
- SDX runtime
- App for each IX member, with the apps seeing only the topology for that IX member
- runtime uses composition to resolve conflicts

Virtual software IX abstraction
- ISPs with no IX relationship don't see each other
- enforced by symbolic execution (ie, tagging packets in ingress, then uses state machine to validate or determine egress port) So packets can't take non-compliant paths
- SDX runtime composes policies in order of AS traversal (egress AS, then ingress AS)

- interdomain routing plagued by security, manageability
- SDN-based exchange is a promising approach
- research, not production
Data centres are used for cloud computing, where elastic infrastructure can be allocated as demands changes.
- software as a service
- platform as a service
- infrastructure as a service

Key enabling technology is virtualisation.
- applications run unmodified
- VMs can migrate, without knowledge of customer, depending on demand
- easier if a large layer 2 network

- easy migration of VMs
- minimal configuration of switches
- efficient communication along forwarding paths
- loopfree
- failure detection fast effective

Pre-SDN datacentres
- core to internet via L3 router
- access switches to servers
- aggregation switches in the middle

- core routers single point of failure
- links higher in topology may be oversubscribed
- multirooted fat tree

Need a large L2 topology
- minimal configuration
- scaling problems: state, avoiding flooding (ARP, etc), fast updates to addressing on a VM move

PortLand: SDN for data centre networks
- "fabric manager"
- each host has a pseudo MAC addr with heirarchical structure
- thus efficient forwarding
- need to intercept and re-write MAC addresses
- need to rewrite ARPs
- done by Fabric Manager

MAC learning
- new source MAC addresses are forwarded to Fabric Manager
- FM constructs mapping to pseudo MAC

- edge switch sends ARP to FM
- FM proxy ARPs, returning PMAC

Strong similarities to SDN.

- data centre networks have unique requirements for scaling and flexiblilty
- 10,000s servers
- minimal configuration and state
- quickly migrate VMs
- PortLand Fabric Manager is a early SDN controller for data centres
- using structured pseudo MACs and intercept-and-proxy ARP
Network configuration must respond to changing network conditions
- eg: peak/offpeak, shifts in traffic load, security events
- eg: ratelimit bittorrent traffic in business hours, if host infected then send to captive portal

Dynamic-driven control domains:
- time (peak, dates)
- history (data usage, traffic rate, delay or loss)
- user (identity or policy group)
- plus the usual packet headers

Resonance: a finite state machine
- dynamic event handler listens to network events then updates state
- the state change may update the flowtable entries on switches

Example: access control for campus network
- guest portals and infection scanners want to change user VLANs, but changing a device's IP address needs a reboot
- doing this in OpenFlow can express what the host can reach in different state, the host doesn't need to readdress as the VLAN doesn't change

Can run seperate state machines and then sequentially compose them.
- eg: authentication (with states Authenticated, Unauthenticated) and instrusion detection (with states Quarantined, Clean)

- network configuration often has to respond to events
- state machines can determine which rules to install
- composition can help keep the FSM simple
- SDN language: express high level policies
- runtime: "compiling" those policies to OpenFlow rules

Packets have location as an attribute.

Features of Pyretic:
- Write network policy as a function. Input a packet. Return packets at differing locations.
- Boolean predicates. AND, OR, NOT. Rather than OpenFlow exceptions.
- Virtual packet header fields. Such as locations, operator-applied tags.
- Parallel and sequential composition operators.

Network policies:
- OpenFlow (match, action) bit patterns are tough to reason about.
- Pyretic policies are functions which map packets to other packets.
identity returns original packet
none returns empty set
match(f=v) returns identity if field f matches value v, otherwise none
mod(f=v) returns packet with field f set to value v
fwd(a) returns mod(outport=a)
flood() returns one packet for each port on the network spanning tree

Boolean predicates
- OpenFlow packets either match or "fall through" to next rule. Simple OR, NOT are tough
- Pryetic match() outputs the packet or nothing depending on the predicate
eg: match(dstip= | match(dstip=

Virtual packet headers
- unified way of representing packet meta-data (ingress port, etc)
- packet is a dictionary which maps a field name to a value
- match(inport=a), match(switch=T), match(dstmac=b)
- use mod() to change or add meta-data

Policy composition
- sequential. eg: match(dstip= >> fwd(1)
- parallel. eg: (match(dstip= >> fwd(1)) + (match(dstip= >> fwd(2))

Traffic monitoring
- create a query to see packet streams
eg: return first packet seen on a switch of a previously-unseen MAC address
self.query = packets(1, ['srcmac', 'switch'])

- callbacks invoked for each match to query

Dynamic policies
- polices who's forwarding behaviour changes
- represented as a timeseries of static policies
- current value is self.policy
- common idiom: set a default policy, register query callbacks to update policy
eg: learning switch pyretic.examples.simple_learner

eg: firewall pyretic.examples.simple_firewall

- Pyretic makes writing complex policies easy:
- network policy as function
- predicates on packets
- metadata as packet headers
- policy composition
- composition makes it easy for one module to build upon another
Networks perform many tasks
- could use a monolithic application (count, route, firewall, load balance, etc). Difficult to program, test, debug, reuse, port.
- solution: modularise control. monitor program, routing program, firewall program, etc.
- getting modules to play nice
- controller has to arbitrate between modules. These aren't "tenants" with parts of the network to themselves. Different modules affect the same traffic.
- solution: composition
- parallel composition. Multiple operations simultaneously. eg: counting and forwarding.
- sequential composition. One operation, then another. eg: firewall, then switch.

Example application of parallel composition: monitor arriving traffic, route traffic by dst prefix
- parallel composition: do both simultaneously
srcip ==, dstip == ---> fwd(1), count
srcip ==, dstip == ---> fwd(2), count

- note that the rules can be installed in any order

Example application of sequential composition: server load balancer
Sequence is:
- split traffic based on client's ip src addr
- rewrite server dst ip addr
- route traffic to replica

Can use a predicate to control which modules see which traffic
eg: if port 80 then load balance, then forward; else count and forward

A module should not specify everything:
- leave flexibility to allow other modules to exist
- avoid trying module to a specific setting
- eg: a load balancer spread the traffic across the replicas, but leaves determining network paths to the routing.

A module should have a abstract view of the topology
- "separations of concerns" by "information hiding", as per programming languages, for the same reasons
- so present a simple network rather than the complex reality
eg: a load balancer doesn't see any routing changes

- SDN control programs perform many tasks on the same traffic
- this requires
1. compositional operators. How to compose the policies
2. logical switch abstraction. Hiding irrelevant details.
Three steps of SDN programming
1. read and monitor network state
2. compute policy
3. write policy

Issues with reading state:
Conflicting rules
- traffic counters: a rule counts bytes and packets, controller polls counters
- multiple rules can exist, and these can conflict. Solution: predicates. eg: (srcip != && (srcport == 80)
- run time system translates predicates into OpenFlow match patterns

Limited rules in switches
- limited number of rules which can be installed on switch: can't push all rules to the switch.
eg: counts of traffic by IP address. We can't preload the switch with all possible IP addresses. Solution: dynamic unfolding where program says GroupBy(srcip) and runtime system dynamically adds match patterns.

Unexpected packets to controller
- unexpected packet punted to controller, controller sends rule to switch for subsequent packets
- but say another packet gets punted before rule is installed
- suppress extra events, using a clause like Limit(1)

- SQL like query lanugage
- get what you asked for, nothing more, nothing less
- returns a stream of packets
- controller overhead minimised: filters using high-level patterns, limits num of values return, aggregates by number and size of packets.
- eg: Traffic Monitoring. Select(bytes) Where(in:2 & srcport:80) GroupBy([dstmac]) Every(60)
- eg: Learning Host Location/Port. Select(packets) GroupBy([srcmac]) SplitWhen([inport]) Limit(1)

Coming up next: Computing policy
- many modules can affect same traffic
- they might conflict: eg: routing says output to port, but firewall says to block

- Looked at SDN programs to read network state
- Frentic: SQL-like query language to control the traffic seen at the controller
- Coming up: other challenges: composing policy, responding to events, compilation
OpenFlow programming is not easy
- difficult to perform multiple independent tasks
- low level of abstraction
- controller only sees packets for events it does not know how to handle
- race conditions, avoid incorrect installation of rules

Solution: northbound API
- programming interface which allows applications and orchestration systems to program the network (not just an individual switch)
- uses: path computation, loop avoidance, routing, security
- users: sophisticated network operators, service providers offering value-added services, vendors, researchers. In short: people wanting to offer services over OpenFlow
- benefits: vendor independence, ability to quickly modify or customise control
- eg: present network as one large virtual switch, security, TE, security, middlebox integration
- goals: orchestration of high-level services

- OpenFlow is a southbound API which controls a switch
- it makes it possible to program a network, but doesn't make it easy
- northbound API offers: sophisticated events, composition of policies, error handling

Control plane basics

OpenFlow controller communicates with switch over secure channel. Message format is OpenFlow Protocol. Logic executed at controller. Flow table updated on switch.

Switch components
Flow table. All packets examined for a match. Then action.
Secure channel. Communication to external controller.

Parse header fields.
Check for match in table 0, then apply actions, then exit.
Check for match in table 1, then apply actions, then exit.
Check for match in table n, then apply actions, then exit.
If no match, then send packet to controller over secure channel.

For scalability, match as many packets in the switch as possible.

Match fields in OpenFlow 1.0: ingress port, eth src, eth dst, eth type, eth vlan, eth pri, ip src, ip dst, ip tos, ip proto, tcp src port, tcp dst port.

Actions in flow tables
Forward (mandatory to implement): ALL (out all bar ingress interface), CONTROLLER (encapsulate and send to controller), LOCAL (send to switch control plane), TABLE (run flow table's actions), IN PORT (out ingress port). Optional to implement: normal forwarding, spanning tree.
Drop (mandatory to implement). If there is no action then drop the packet.
Modify (optional). Alter packet headers.
Enqueue (optional). Send to an output port queue, for QoS.

Can talk to switch using OpenFlow protocol with dpctl. Can inspect and modify flow table entries.
"dpctl show tcp:".
"dpctl dump-flows tcp:". Display flow table.
"dpctl add-flow". Alter flow table, For example, "dpctl add-flow tcp: in_port=1,actions=output:2"

OpenFlow 1.3 enhancements.
Action sets. Multipe actions on each packet.
Group. List of action sets. Can execute all the action sets in a group, nice way to do multicast. Indirect group, execute the something on all packets (decrementing TTL, MPLS tagging, QoS)
Each table can add to header fields, and alter headers.
The white papers are good.

Other SDN control architectures than OpenFlow
- Juniper Contrail. XMPP control plane. Can implement L2 and L3 virtual networks. "Open Daylight" is open source implementations of SDN controllers.
- Cisco Open Network Environment. Centralised controller, programmable data plane, can implement virtual overlay networks.

OpenFlow protocol is evolving. OpenSwitch doesn't include newer protocol options.

The controller

Lots of SDN controllers ("almost as many SDN controllers as SDNs :-)")

A quick list:
Frenetic (OCAML)
RouteFlow (for SDNs interested in inter-domain routing)

Programming language. Comfort and performance.
Learning curve.
Active community. Community serves as base of support.
Areas of focus. Southbound or northbound ("policy layer") API. Support for OpenStack. Education, Research or Production.

The first OpenFlow controller. Open source, stable, widely used.
"NOX-classic": C++ and Python, no longer supported.
"New NOX": C++, fast, well written, maintained and supported.
Users write in C++.
Programming model is that programmers write event handlers.
Supports OpenFlow 1.0 ("CPqD" fork supports 1.1, 1.2, 1.3)
When to use NOX: you know C++, willing to use low-level facilities (eg, meddle with southbound API), you need performance.

NOX in Python.
Supports OpenFlow 1.0 only
Widely used, maintained, supported.
Relatively easy to read and write code.
Poorer performance.
When to use POX: rapid prototyping, and thus research, demos, learning SDN concepts.

OpenFlow 1.0, 1.2, 1.3, Nicera
Works with OpenStack. A could operating system.
Poorer performance.
Moderate learning curve.

OpenFlow 1.0
Fork of "beacon" maintained by Big Switch Networks.
Good documentation. REST API. Production performance. OpenStack integration.
Disadvantages: steep learning curve.
When to use Floodlight: production performance, you know Java, REST API

Critera for choice: OpenFlow versions, Language, Performance, OpenStack integration, Learning curve.

Control plane - switching

Use mininet to set up topology
sudo mn --topo single,3 --mac --switch osk --controller remote

Note that pings fail, as there is no controller running.

Can use dpctl to add flow entries

Aside: hub: ingress packet is flooded out all ports bar ingress port.
Have a look at The method launch() adds a listener for new OpenFlow switches. forwarding.hub
You can see OpenVSwitch connect to POX. You can type "net" at mininet and see "c0".

Upon a connection up event our controller sends a message to flood packets out all ports:
msg.actions.append(of.ofp_action_output(port = of.OFPP_FLOOD))

Can also implement a learning switch.
- use srcmac and ingress port to update (dstmac, egress port) table
- drop BPDUs and LLDP
- if multicast (dstmac) then FLOOD
- if dstmac not in table then FLOOD
- if egress port same as ingress port then DROP (avoid loops)
- install flow table modification entry to OpenFlow switch, to send (ingress, dstmac) to egress. forwarding.l2_learning
launch() method registers for callback.
for each connecting OpenFlow switch, instantiate a learning switch object and pass it the connection
the learning switch sets up a table, installs a packet listener, and then implements the learning switch algorithm and sends flow table modifications.

Control plane - firewalls

Extend the L2 learning switch to add firewall rules. Illustrating how easy it is to make significant functional changes with small amounts of code.

L2 switching just uses dst mac addr.  OpenFlow switching has many more header fields available. In our example, the src mac addr.

Augment the controller to check the src mac addr by adding another step to check the src mac addr. Add a hash table (switch, mac) which returns True to pass a packet, False or not existent to drop a packet.

Add this code:
if self.CheckRule(dpidstr, packet.src) == False:

What is CheckRule()
self.firewall = {}
def AddRule(self, dpidstr, src=0, drop=True):
    self.firewall = [(dpidstr, src)] = value

CheckRule sees if there is a entry in the hash table.
def CheckRule(self, dpidstr, src=0):
        return self.firewall[(dpidstr, src)]
    except KeyError:
        return False

To allow a MAC address to communicate:
self.AddRule('00-00-00-00-00-01', EthAddr('00:00:00:00:00:01'))
self.AddRule('00-00-00-00-00-01', EthAddr('00:00:00:00:00:02'))

Performance: cache decisions at switch using the flowtable in the switch
- limits traffic to controller, which has high latency

- customising control is easy
- performance benefit of caching decisions at switch

Programmable data plane (module 5)

Data plane operation
- ingress packet
- get dst addr
- look up dst addr in forwarding table, returning egress interface
- modify header, lowering TTL, and altering checksum
- queue and egress packet

The lookup for IP forwarding is simple: longest prefix match. Compare that with OpenFlow's match:action list.

Richer actions in data plane.
- data plane is streaming algorithms that match on packets
- wide range of functions: forwarding, ACLs, mapping headers, traffic monitoring, buffering, marking, shaping, scheduling, DPI.

Motivation for software data plane
- network devices are diverse
- difficult to alter
- can we extend current data plane: flexible, extensible, clean interfaces

Click modular router, a customisable data plane
- elements are the building block: switching, lookup, classification, dropping, etc
- these elements are clicked together into a pipeline
<pre>FromDevice(eth0) -> Print(ok) -> Discard;</pre>
<pre>click -f config.cfg</pre>
Can build up NATs, tunnels, etc. Can build an entire IP router.

- data plane needs to be programmable
- Click. open, extensible, configurable
- complex data plane functions can be built from simple building blocks
- performance is acceptable (0.9 of native speed on Linux)

Scaling programmable data planes - making software faster

Want the flexibility of software but speed of hardware, so:
- make software perform better
- make hardware more programmable

Need a platform for developing new protocols
- high speed
- multiple data planes in parallel (for various experiments)

1. Custom software. Flexible, easy to program. Low forwarding speeds.
2. Modules in custom hardware. Long development cycles, rigid.
3. Programmable hardware. Flexible and fast, but programming can be difficult.

Typical routers, N linecards, running at R, switch fabric running at NxR.

- line cards on servers
- each server must pass c x R bps
- interconnect must pass at NxR. We want internal link rates < R (if we needed a speedup then we couldn't use commodity hardware)
- we'd like the fanout to be constant, not a rising number of interior interfaces (and thus falling number of exterior interfaces) as switch becomes larger.

- limited internal link rates
- limited per-node processing rate
- limited fanout

Strawman approach: connect everything to everything. Doesn't scale.

Valiant load balancing: intermediate servers. This reduces interconnects speeds to R/N, but servers must process traffic at 3R (2R if load is equally ballanced).

Servers must also be fast.
Problem 1. Processing packets one at a time has huge bookkeeping overhead. These can be batched, at the cost of latency and jitter.
Problem 2. Map ports to cores. Approach #1: 1 core per queue (avoids locking). Approach #2: 1 core per packet (faster).

Other speed tricks:
- large socket buffers (PacketShader)
- batch processing(PacketShader)
- ethernet GRE (tunnelling, thus avoiding lookups) (Trellis)
- avoiding lookups when bridging from VMs to physical interfaces (Trellis)

- software can be fast
- general purpose processors and hardware can be fsat, but the details do matter
- Intel Data Place Development Kit (DPDK) is a commercial effort

Making hardware more programmable

What do we want from SDN? Does current hardware provide this?
- protocol independent processing (ie, independent of STP, OSPF, etc)
- control and repurpose in the field
- fast low power ships

Current hardware constrains what can be done
- protocol dependent because of capabilities of existing chips
- OpenFlow has mapped its features onto what is available on existing chips
- quick adoption, but constrains what is in OpenFlow

There are a few primitives we want a network device to perform
- parsing and rewriting headers
- bit shifting, etc
- traffic shaping
- forwarding
- etc

Build a flexible data plane by hardware modules and ways to interconnect them.

Two examples:
- an OpenFlow chip
- a programmable, modular data plane on a FPGA

OpenFlow chip
- 64 x GE OpenFlow-optimised 28nm ASIC
- 32 stages of (match, action)
- large tables, 1M x 40b TCAM, 370Mb SRAM
- uses VLIW Action processing, very RISC-like

Match tables use TCAM and SRAM. Matches are "logical", allowing complex matches to span multiple processor stages.

Actions hit a VLIW "action processor", these are small (1mm^2) and so you can run multiple in parallel.

Can't do more complex operations (transcoding, tunnels, encryption, etc)

Modular hardware building blocks to perform a range of data-plane functions.

Enable or disable and connect these building blocks using software.

Step 1, Select Virtual Data Plane
Operations can be in parallel, including multiple data planes ("virtual data plane"). SwitchBlade puts ingress packets into a VDP, each VDP has it's own processing pipeline, lookup tables, forwarding modules.

A table maps src mac addr to VDP. This attaches a 64b "platform header" to control functions in the pipeline.

Step 2, shaping

Step 3, preprocessing

Steps: processing selector | custom processor | hasher

Processing functions are from a library of reusable modules (Path Slicing, IPv6, OpenFlow).

Hash is created from user-selected bits in header. That hash then allows customised forwarding based on arbitary bits in header.

For example, a limited OpenFlow forwarder: parse packet, extract data for tuples, this 240b "bitstream" passed to hasher, hasher outputs 32b value for controlling processing an forwarding.

New modules require Verilog programming.

Step 4, forwarding

Steps: egress port loopkup based on hash | postprocessor wrapper modules | custom postprocessor

Wrapper modules allow matching on custom packet header bits

Custom postprocessors allow other functions to be selected on the fly

Software exceptions
Throw a packet to software. The passed-up packet has VDP and Plaform Header, allowing Virtual Data Plane to continue in software.

Custom postprocessing
eg: Decrement TTL for IPv6

Make hardware more programmable
There are a few primatives, allowing these to be composed gives simple hardware yet flexible software
Two examples: OpenFlow Chip, SwitchBlade.


Network virtualisation is just like server virtualisation. Rather than virtualise CPUs are virtualise networks -- multiple networks on ne physical infrastructure. "Hypervisor" equivalent implements isolation and resource sharing. Nodes are VMs. Links are tunnels.

Motivation. "Ossification". Too difficult to change underlying IP infrastructure. Need a way of allowing technologies to evolve. Originally used overlay networks.

The promise. Rapid innovation (delivering services at software speed). New forms of network control. Vendor choice, as logical network decoupled from underlying physical infrastucture (all the magic happens in the virtual network). Simpler programming and operations, as details of physical network are hidden.

Distinguish SDN and Virtual networks. SDN does not abstract the details of the physical network. SDN separates control plane and forwarding plane. Virtual networks instantiate multiple networks one physical infrastructure.

Virtual private networks. A different thing, they connect distributed sites. VPNs don't allow multiple custom architectures to run.

Design goals of virtual networks. Flexibility, in topology, routing and forwarding architectures. Manageability, as separate control and data planes and distinct policy and implementation. Scalability, many multiple logical networks. Isolation of one network and its resources from another for robustness and security. Programmability, for test beds, etc. Hetrogeneity, to support many different topologies and techniques.

Building a virtual network

Virtual nodes. Xen. User-mode Linux and network name spaces. KVM, VMWare, VirtualBox.

Example VM environment: Xen. Multiple guest OSs. Domain0 runs the control software to arbitrate access to resources.

Virtual links, based on tunnel technologies. GRE (ethernet frames in IP packets). These may traverse multiple IP hops. Others: VXLAN.

Tunnels use interior and exterior interfaces. A "short-bridge" extended Linux bridging.  OpenVSwitch reimplemented bridging, with OpenFlow and JSON control access.

Summary. Motivation is flexible and agile deployment. Giving innvoation, vendor independence, scale. Technologies required are virtual nodes, links and switches. Distinction between SDN and Virtual networks. SDN separates control and data plan, Virtual networks separate logical and physical networks. SDN a useful tool for implementing virtual networks, but they remain distinct concepts.

Applications of virtual networks

Allows experimentation on production networks, virtual experimental infrastucture. Rapid development and deployment of new network services. Allows dynamic scaling of resources.

Experimentation on production networks

Historically new protocols and archectures were emulated and simulated, but deployment hit a Catch-22 roadblock: to show it will work in production it needs to be seen to work in production. So a VN allows a research network in parallel to a production network. eg: "FlowVisor", where a subset of a users flows (a "flowspace") can be send into differing logical networks with different controllers.

Rapid deployment of new services.

Nicira. Hosts see a virtual network. Provisioning done by a distributed controller. egs:  give each tenant their own VN, virtualisation allows the resources applied to be dynamically right-sized.

Dynamic security. Central management of access to virtual network.

Dynamic scaling of resources. Can logically knit together clouds to allow dynamic scaling. A Virtual private cloud allows "seemingly direct" connection of cloud servers to customer networks (eg, by Amazon). Useful for outsourcing management of servers.

Wide area virtual networks. Parallel experimentation: VINI, GENI. Value added services: CABO. Multiple control structures: Tempest.

"Virtual network in a box". Networking for VMs on a single server.

Network functions virtualisation. Unification of middleboxes: firewalls, load balancers, DPI. Let's replacement them with a distributed compute pool and run those as software, attach them to network using a VN.

Summary of applications: experimentation, isolation on shared resources, reuse of resrouces, dynamic scaling, easier management.


Fast, custom topologies, real programs, programmable openflow switches, easy to use, open source.

Alternatives. Real system: pain to configure. Networked VMs: scalability. Simulator: no path to deployment.

How Mininet works. nm is controlling script. Uses namespaces, with a shell script and ethernet interfaces in each. The namespace interfaces (h2-eth0, etc) are veth-ed to switch interfaces (s2-eth0, etc). The switch is ofdatapath which is minded by ofprotocol. This switch is programmed by the controller.

Mininet itself runs in a VM, for ease of distribution.

Mininet examples

Single switch, three hosts: sudo mn --test pingall --topo single,3. Uses default controller and default switch.

Starting VM: User mininet, password mininet. dhclient ..., ifconfig ..., ssh -X mininet@....

Mininet options
--topo topology
--switch switch, uses OVSK by default
--controller uses a hub by default


One hub, two hosts: sudo mn --topo minimal

Four hosts, four switches, linear topology: sudo mn --topo linear,4

Three hosts, one switch: sudo mn --topo single,3

Tree with depth and fanout: sudo mn --topo tree,depth=2,fanout=2

Under the Mininet hood

mn is a launch script which runs Python. Consider mn --topo linear,4:

from import Mininet
from mininet.topo import LinearTopo

Linear = LinearTopo(k=4)
net = Mininet(topo=Linear)

Can also create you own code, such as this two hosts and one switch:

from import Mininet
from mininet.util import createLink

net = Mininet()

# Create nodes in the network
c0 = net.addController()
h0 = net.addHost('h0')
h1 = net.addHost('h1')
s0 = net.addSwitch('s0')

# Create links between nodes
net.addLink(h0, s0)
net.addLink(h1, s0)

# Configure IP addresses
h0.ipSet('', 24)
h1.ipSet('', 24)

# Run

If you want to debug then before net.end() say mininet.cli.CLI(net)

addLink() has some parameters: bw, bandwidth in Mbps; delay; max_queue_size; loss, in percentage.

Things not covered:

- accessing files. They are shared between namespaces, use the usual Python methods.
- link speeds and properties
- custom controllers and switches
- host configuration (not just IP addresses)
- performance measurements.


Control plane: logic control forwarding plane
eg: routing protocols, firewall configuration

Data plane: forwards traffic according to configuration by control plane
eg: IP forwarding, ethernet switching

Why separate?

  • Independent evolution and development. Especially software control of network.

  • Control from a single high-level software program. Easier to reason and debug.

Why does it help?

  • Routing

  • Enterprise networks. Security

  • Research networks. Coxistence with production networks.

  • Data centres. VM migration. eg: Yahoo have 20,000 hosts, 400,000 VMs. Want sub-second VM migration. Program switches from a central server, so that forwarding follows migration.

  • eg: AT&T filtering DoS attacks. IRSCP (commercialised RCP) will insert a null route to filter DoS at network edge.

Challenges for separation

  • Scalability. Control element responsible for thousands of forwarding elements

  • Reliability and security. What if a controller fails or is compromised.

Opportunities for control and data separation

Two examples

  • New routing services in the wide area. Maintenance, egress selection, security.

  • Data centres. Cost, management.

Example 1. Wide area.
There are a few constrained ways to set interdomain routing policy: BGP.
Limited knobs, no external knowledge (time of day, reputation of route, etc)
Instead of BGP route controller updates forwarding table.

Example 1. Maintenance dry-out
Planned maintenance of a edge router
Tell ingress routers to avoid the router with pending maintenance.
Too difficult to do in existing networks, eg, buy tuning OSPF.

Example 2. Customer controlled egress router
Customer selects data centre they want to use
Difficult in existing networks, as routing uses destination prefix.

Example 3. Better BGP security.
Offline we can determine reputation of a route. But this can't be incorporated into BGP route selection.
Off-line anomaly detection. Prefer "familiar" routes over unfamiliar routes. RCP tells routers to avoid odd routes.

Example 4. Data Centres, costs
Reduce cost. 200,000 servers. A 20x fanout gives 10,000 switches. Huge saving between $1000 per switch ($10m) and $5000 per switch ($50m). That's $400m for Google, Facebook, Yahoo, etc.
So these networks run a separate control plane on merchant silicon. Tailor network for services. Quick innovation.

Example 5. Data Centres, addressing
Layer 2 addressing: less configuration, bad scaling
Layer 3 addressing: use existing routing protocols, good scaling, but high administration overhead.
Use layer 2 addressing, but to make the addresses topology-specific rather than topology-independent.
MAC addresses depend where they are in the topology.
Hosts don't know they have MAC address re-assigned, so how is ARP done? Destination host won't respond.
A "fabric manager" will intercept ARPs, it then replies with the topology-dependent Pseudo-MAC (pMAC).
Switches re-write MAC addresses at network edge to hosts.

Example N. Others
Dynamic access control
Mobility and migration
Server load balancing
Network virtualisation
Multiple wireless access points
eEnergy-efficient networking
Adaptive traffic monitoring
DoS detection.

Challenges of separating control and data planes

Scalability, reliability, consistency. Approaches in RCP and ONIX.

Scalability in RCP.
RCP must stores routes and compute routing decisions for all routers in the AS. That's a lot to do at a single node. Strategies to reduce this are
Eliminate redundancy: store a single copy of each route to avoid redundant computation.
Accelerate lookups: maintain indexes to identify affected routers. Then RCP computes routes only for routers affected by a change.
Punt: Only performs inter-domain (BGP) routing.

Scalability in ONIX.
Partitioning. Keep track of subsets of the network state. Then apply consistency measures to ensure consistency between the partitions. Choice of strong and weak consistency models to select correctness versus computation tradeoff.
Aggregation. A hierarchy of controllers. ONIX controllers for departments or buildings, then a super-controller for the domain.

Reliability in RCP.
Replicate. RCP has a hot spare. Each replica has its own feed of routes, recieving exactly same inputs, running exact same algorithms, so output should be the same. So no need for consistency protocol.
Consistency. But if different RCPs see difference routes then they will have different ouptuts. If the two replicas are inconsistent then they can install a routing loop. Need to guaruntee consistent inputs: for RCP that's easy as the IGP passes the full link-state to RCP. So RCP should compute next-hops only for routers it is connected to.
For example, one RCP in partitioned network. Only use candidate routes from partition 1 to set next-hops in partition 1.
For example, two RCP in partitioned network. Since the two RCPs have the same data from each partition from the IGP then they give the same output for each partition.

Reliability in ONIX.
Network failures. ONIX leaves it to applications to detect and recover.
Reachabilty to ONIX. Solve using typical network practices, such as multipath.
ONIX failure. Replication and distributed coordination protocol.

Three issues:

  • Scalability. Making decisions for many network elements.

  • Reliability. Correct operation under failure of the network or controller.

  • Consistency. Ensure consistency between controller replicas, especially in partitioned networks.


  • Hierarchy

  • Aggregation

  • State management and distribution

Each controller uses a set of tactics from those available.


Download OpenFlow tutorial (URL fixed as per class e-mail).
You need a "host only adapter", vboxnet0. That will allow SSH to the VM from the localhost.
You get a Ubuntu login. mininet/mininet.
Configure host only adapter. Configure eth1 by running dhcp (eg, dhclient eth1). Do ifconfig to determine IP address, then connect using "ssh -X mininet@....".

Start mininet
sudo mn --topo single,3 --mac --switch ovsk --controller remote

"nodes" shows hosts, switches and controllers
To run a command on a host say {host} {command}, eg: h1 ifconfig
Should be able to start a XTerm with "xterm h1 h2 h3" is logged in with X11 forwarding.

Follow the tutorial and have a play around.
Why separate control plane?

  • rapid innovation, not tied to hardware vendor

  • network-wide view

  • flexibility to introduce new services

Control plane: FORCES

  • Control Elements set forwarding table for Forwarding Elements using the FORCES interface

  • Minus: Requires standardisation, deployment, etc. But these are the exact problems.

Control plane: Routing Control Platform

  • Hijack BGP as the device to modify the forwarding plane

  • Each AS runs a RCP and communicates routes to forwarding plane using BGP.

  • Plus: Easy transition, doesn't require a new set of protocols

  • Minus: constrained by what an existing protocol can do

Custom hardware: Ethane

  • "Domain controller" computes forwarding tables for each switch

  • Needed custom switches. Ethane implemented these for OpenWRT, NetFPGA, Linux.

  • Minus: Requires custom switches which support Ethane protocol

What we want:

  • Existing protocols

  • Not custom hardware

Answer: OpenFlow

  • Take capabilities of existing hardware

  • Open those so a standard control protocol can set their behaviour

  • "Controller" communicates with switches to set forwarding table entries.

  • Most switches already implement flow tables, so only thing which is necessary for deployment is to persuade vendors to provide interface to the flow tables.

What have we learned:

  • Control and data planes should be decoupled, as coupling makes it too difficult to introduce new protocols.

  • Control plane protocols which require alterations to the hardware can't get traction.

  • Existing protocols can't express the full range of behaviour desired by network operators.

  • Open hardware allows decoupling of control.

Page generated 2017-06-27 13:49
Powered by Dreamwidth Studios