Design and integrate a TCP Offload Engine for use in high-speed video transmission devices.
Our client, a major communications equipment manufacturer in China, contracted CréVinn Teoranta and Emutex to design and integrate a TCP Offload Engine (TOE) into a proprietary system-on-chip solution for use in high-speed video transmission devices. CréVinn engineered the TOE's RTL block and Emutex integrated its capability into the Linux IP stack.
CréVinn engineered the TOE's RTL block and Emutex integrated its capability into the Linux IP stack. The solution allows Linux software applications to seamlessly leverage the TOE's TCP/IP fastpath capability through standard IP socket calls. The TOE can facilitate up to 2000 TCP/IP connections and achieve wire-speed TCP/IP transmission on Gigabit Ethernet.
In general, the Linux open source community is opposed to supporting custom hardware in the Linux kernel and it was therefore believed that gaining acceptance of a TOE solution into the standard kernel would not be possible. Unfortunately Linux does not have any standard hook points for TOE acceleration and modifications to the Linux IP stack were going to be unavoidable. For this reason, it was necessary to separate the TOE driver as much as possible from the IP stack modifications. The IP stack changes were made to invoke a set of callback functions to open, close, transmit and receive using the TOE driver. These changes have no effect if the corresponding TOE driver is not loaded. The changes themselves amounted to only a dozen or so lines of code changed within the kernel itself which allows Emutex or the client to easily upgrade and merge with future Linux kernels.
The intercept points in the IP stack were carefully chosen to be as early as possible in the transmit and receive cycles to avoid, as much as possible, the traditional IP and Ethernet packet processing workloads. It was possible to invoke the TOE driver almost immediately after the user had invoked a call to read() or write() in user space. This meant that if, for example, a user wrote a 16K buffer to a TCP socket to be transmitted over an Ethernet interface, the stack could immediately hand this buffer to the TOE driver for transmission. This fully bypasses fragmentation, sequence numbering, ack packet handling, retransmissions, re-ordering, out-of-order queuing and all of the other complex and expensive processing that normally is required for TCP packets. The TOE driver simply builds a request with a pointer to the data and hands it to the TOE silicon engine to perform DMA transmission and re-transmission as needed. The receive path similarly fills as large a buffer as possible and hands back to socket users. In addition to user mode acceleration, the kernel socket functions splice_read() and splice_write() were also accelerated which allowed kernel TCP protocols such as NFS to fully benefit from TCP offload.
One of the many complexities in TCP offload is TCP connection creation and completion. Since these operations were deemed to be relatively infrequent compared to the amount of data that was passed the Linux IP stack was left fully in charge of all connection management. Incoming SYN or SYN-ACK packets as well as outgoing FIN, FIN-ACK or RST packets are routed through the IP stack for normal processing. Only after the three-way handshake had completed and the amount of data passed exceeded a pre-configured threshold. This scheme has many advantages. Firstly it greatly reduces the complexity (and therefore increases the reliability) of the TOE driver and silicon. It allows Linux to continue to make full use of all the algorithms built into the stack over the years such as slow starts. It allows full use of netfilter, iptables, etc. to implement border controls. Finally it allows failover to the IP stack when the connection table is full. While a system may be unlikely to have more than 2,000 connections, it may be desirable to build a system that supports few connections to reduce memory and footprint.
The Emutex TOE driver has several tunables that are available from the /sys interface. It also has a comprehensive set of /proc files to monitor and analyse TCP activity. Thresholds can be set for when connections are to be accelerated or de-accelerates. For example, connections that have a data rate that falls below the minimum acceleration rate are moved back into control of the Linux IP stack.
Fully integrating the TOE driver with the Linux stack was a very challenging and complex task. In particular, handing back connections to the stack at completion or due to an error is highly complex. As well as sequence and acknowledgement number synchronisation, careful management of several timers and up to four queues are required. The work required extensive testing but the result is an ASIC with a fully functioning, fully integrated IPv4 and IPv6 accelerated Linux IP stack.