\section{Targeting Production Readyness}

\subsection{Debugging Strategies}

Like almost every non-trivial program, the ESP kernel module prone to
programming errors. A list of the bugs which were found and could be
fixed using either of the presented debugging techniques can be found
in the appendix of this document on page~\pageref{fixed-bugs}.

\subsubsection{Debugging with KGDB}

KGDB \cite{kgdb-intro} is a source-level debugger for the Linux
kernel. It allows to use the GNU debugger (GDB)\footnote{In order to
  be able to analyze dynamically loaded kernel modules, a modified
  version of GDB has to be used.} to debug the Linux kernel as if it
was a regular program, including breakpoints, stepping through the
kernel code, watching the contents variables and with support for
multithreading. Because suspending kernel code execution for analysis
causes any user-space applications to be halted as well, it is
indispensable to have two machines in order to use KGDB: One as the
testing machine, where the kernel to debug is running and another
which is used as the development machine, where the kernel code
execution can be monitored by the means of the GDB program.

KGDB is distributed as a kernel patch, which must be applied to the
kernel source tree before the kernel is compiled. These modifications
add some functionality to the kernel, which is necessary for the
debuggung process:

\begin{itemize}
\item The GDB stub, which is the heart of the debugger. This is the
  part that handles requests coming from GDB on the development
  machine. It has control over all processors in the testing machine
  when the kernel running on it is inside the debugger.
\item Modifications to the kernel fault handlers -- instead of doing a
  kernel panic as outlined in section~\ref{kernel-panics}, these
  modifications to the fault handlers allow kernel developers to
  analyze unexpected faults by giving the control over the machine to
  the GDB stub.
\item Communication -- there are two versions of this component. They
  both have the purpose of establishing a connection between the
  development and the testing machine. One version can use a serial
  line to connect these two machines, the other can work over ethernet
  by using UDP/IP frames for message exchange. It is necessary to have
  a implementation of this functionality seperate from the one the
  Linux kernel already offers in order to keep the side effects of
  debugging as small as possible. This component is also responsible
  for handling control break requests sent by the GDB on the
  development machine.
\end{itemize}

In this work, KGDB could be used to track down several bugs which
involved connection establishment and shutdown. While having a
full-fledged debugger at hand to analyze kernel execution is a very
valueable thing in itself, its applicability for debugging a
networking protocol implementation is limited. These limitations arise
from the fact that it is not possible to synchronize the debugging of
two or more machines. Because of this, while one machine is suspended
for debugging, the other machine experiences several timeouts and
eventually assumes the connection is dead. Additionally some problems
did not show up at all when using the kernel versions for which KGDB
is available (new versions of KGDB are released for chosen kernel
versions only, and with considerable delay). Therefore, another way to
debug the ESP kernel module had to be used in addition to KGDB, as
described in the next section.

\subsubsection{Analyzing Kernel Oops Messages}
\label{kernel-panics}

\begin{figure}
\lstset{numbers=left, stepnumber=3, 
  breaklines, breakatwhitespace, frame=single}
\centering
\begin{lstlisting}
Unable to handle kernel NULL pointer dereference at virtual
   address 0000008
*pde = 00000000
Oops: 0000
CPU:     0
EIP:     0010:[<c026cb16>]
EFLAGS:  00210213
eax: 00000000    ebx: c6155c6c  ecx: 00000038   edx: 00000000
esi: c672f000    edi: c672f07c  ebp: 00000004   esp: c6155b0c
ds:  0018         es: 0018       ss: 0018
Process netgauge (pid: 2293, stackpage=c6155000)
Stack: c672f000 c672f07c 00000000 00000038 00000060 00000000
       c6d7d2a0 c6c79018 00000001 c6155c6c 00000000 c6d7d2a0
       c017eb4f c6155c6c 00000000 00000098 c017fc44 c672f000
       00000084 00001020 00001000 c7129028 00000038 00000069
Call Trace: [<c017eb4f>] [<c017fc44>] [<c0180115>]
            [<c018a1c8>] [<c017bb3a>] [<c018738f>]
            [<c0177a13>] [<d0871044>] [<c0178274>]
            [<c0142e36>] [<c013c75f>] [<c013c7f8>]
            [<c0108f77>] [<c010002b>]
Code: 8b 40 14 ff d0 89 c2 8b 06 83
      c4 10 01 c2 89 16 8b 83 8c 01
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
\end{lstlisting}
\caption[Linux kernel Oops message example]{An example of a Linux
  kernel Oops message which caused a kernel panic. This information is
  printed out to the console in the event of a detected error
  condition inside the kernel.}
\label{fig:kernel-oops}
\end{figure}

Kernel Oops messages are a mechanism in the linux kernel which aims
for printing out some vital information whenever the kernel encounters
an error condition. This information may be used by a developer to
track down and fix the problem. An example for such a message is shown
in figure~\ref{fig:kernel-oops}. Whenever a Oops occures, the causing
kernel component is killed instantly, together with any userspace
processes currently doing system calls to this component. This is done
without releasing any locks or cleaning up half-modified data
structures, so a machince with an Oopsed kernel should be rebootet as
soon as possible to avoid further problems, which are to be expected.
Additionally, if killing the causing component implies killing a vital
part of the kernel like an interrupt handler, the system is halted
completely. The information contained within an Oops message is as
follows:

\begin{itemize}
\item Line 1: An error message briefly decribing what happened. In the
  example it was attempted to dereference a \NULL\/ pointer. The low,
  but non-zero address is an indication that there was an attempt to
  read from a member variable of a \fname{struct}, and the address of
  this \fname{struct} was assumed to be \NULL.
\item Line 4: The value of a counter which is incremented for each
  Oops the kernel produces. It is important to observe, that only the
  first of these message contains reliable information.
\item Line 6: The code segment (0010) and the value of the extended
  instruction pointer (EIP). This unambiguously identifies the faulty
  instruction.
\item Lines 7 - 10: The values of the program status and control
  register, the general purpose registers and more segment registers.
\item Lines 12 - 15: The last values stored on the stack. These are
  parameters to half-run function calls and return addresses.
\item Lines 16 - 20: The call trace. These are the addresses of the
  entry points of the functions which were executed when the error
  condition occured.\footnote{The in-kernel symbol resolver
    (``kksymoops'') has the ability to translate these adresses into
    function names and offsets within these functions. Unfortunately,
    it was not found to always give reliable results for the ESP
    module.}
\end{itemize}

While this information on its own is not very usefull, additional
user-space apllications exist which can be used to track down the
cause of the problem. Because the value of the EIP gives the function
base address plus the instruction offset, it can be used to identify
the kernel function where the crash occured. The ``System.map'' file
belonging to the kernel allows to look up the corresponding symbol
(function-) name, along with its base adress. And knowing the base
address of the function gives the offset within this function by
subtracting it from the EIP. Now it is possible to inspect the true
cause of the problem by examining the specific instructions of the
failed function. Fortunately, the process described above can be
performed by the ``ksymoops'' user-space program, which takes a kernel
Oops message as input and automatically extracts all usable
information.

\subsection{Creating an Interface for User Settings}

The ESP protocol has a view parameters which may be tuned for optimal
performance. While there are default values for each of this parameter
defined by the means of compile-time constants, it is desireable to
give the system administrator a way to modify these parameters without
doing a unload module, recompile, load module cycle every time.

Setting kernel parameters always involves passing some data from
user-space to kernel-space. The most generic way to do this would be
to create a new system call, write a user-space library exhibiting the
capabilities of this call and finally to create a application which
uses this library to allow retrieving or setting the parameters
needed. This is just the way iptables (which is the user-space part)
and netfilter (which is the part running in kernel-space) are
implemented \cite{iptables-netfilter}.

Fortunately, this is not necessary if the preferences to set have a
structure as simple as just a few integer values, which is the case
for the ESP protocol. To have a consistent interface for accessing
such parameters, the ``sysctl'' interface was introduced with the
4.4BSD version of Unix and ported to Linux as of kernel version
1.3.57.

\subsubsection {The Sysctl Interface}
\label{sysctl-interface}

The sysctl interface consists of a single function\footnote{Under BSD,
  two more sysctl-related functions exist. These are
  \fname{sysctlbyname} and \fname{sysctlnametomib} and allow for
  accessing the sysctl interface via human-readable names instead of
  an array of intergers.} implemented in the standard C library, libc.
This function transports the parameter to set from user- to
kernel-space and vice versa. This definition of this fuction is:

\begin{lstlisting}
int sysctl(int *name, u_int namelen,
           void *oldp, size_t *oldlenp,
           void *newp, size_t newlen);
\end{lstlisting}

The first two parameters tell the \fname{sysctl} function, which
kernel parameter shall be accessed by the means of an array of integer
values and the length of this array. All sysctl parameters are
organized in an out-tree, where the nodes and leaves are identified by
integer numbers. The array \fname{name} gives a path through this
tree; the root node is implicitly given. For the sake of
unambiguousness of the \fname{name} parameter, it is obviously
neccessary that the children of any node in the tree have associated
unique numbers. While this requirement is easy to stick with for the
interior nodes of an sub-tree added to this hierarchy, special care
has to be taken about the root of this sub-tree as outlined in
section~\ref{adding-sysctl}.

The next two parameters are for retrieving the old value of the
parameter and storing it at the memory pointed to by \fname{oldp}.
Finally, the last two parameters are for setting a new value. If one
of these operations is not desired, its corresponding parameters may
be set to zero. The sysctl interface itself does not make any
assumptions about the structure of the data passed between user- and
kernel-space, as the data is given by a \fname{void}-typed pointer.
The convention about the data transferred is implmented only in the
specific kernel module and the calling user-space application.

\subsubsection{Adding a Sysctl Interface to ESP}
\label{adding-sysctl}

In order to exhibit any internal settings via the sysctl interface, an
array of the \fname{struct ctl\_table} has to be filled in. Each entry
stands for either an interior node, which allows to logically group
the parameters; these groups can be used for access permissions as
well--or it stands for a leaf node representing an actual value which
may be read or set. The most interesting fields to be initialized in
the \fname{struct ctl\_table} are:

\begin{itemize} 
\item \fname{ctl\_name}, which is the mentioned unique identification
  number,
\item \fname{mode}, the access permissions in classical Unix notation,
\item \fname{data}, a pointer the the destination of the supplied data
  in kernel-space and 
\item \fname{procname}, a human-readable name of the parameter (why
  this is supported under Linux despite the absence of the
  \fname{sysctlbyname()} and \fname{sysctlnametomib()} functions is
  explained in section~\ref{using-sysctl}), as well as
\item \fname{proc\_handler} and \fname{strategy}, which are both
  function pointers
\end{itemize}

The \fname{proc\_handler()} function implements a Linux specific
extension of the sysctl interface. Under Linux, the complete tree of
sysctl parameters known to the system is mirrored in the directory
\fname{sys/} of the procfs \cite{procfs-guide} virtual filesystem.
There, every interior node of the sysctl tree is represented by a
directory entry and the leaf nodes are represented by files, which may
be read from or written to like any other file in the filesystem.
This requires converting the parameters to be accessed from/to a
string representation and some handshaking to allow serial access to
these strings, as required by the file access API. These tasks are
carried out by the \fname{proc\_handler()} function.

The \fname{strategy()} function pointer may be set to be zero, which
will cause the kernel to call a default implementation of this
function. This default implementation will just perform some minimal
validity checks based on the size parameters of the \fname{sysctl()}
function and copy the data from user-space to where \fname{data}
points in kernel-space. This behavior is fine for all parameters of
the ESP protocol, with the only exception being the round-trip time.

The round-trip time is special because this value determines a
timeout. In the ESP protocol implementation, a timeout is realized by
scheduling a timer. This is accomplished by making a call to the
\fname{\_\_mod\_timer()} or a similar kernel function. All these
functions expect the timeout to be given in ``jiffies''. Jiffies is a
variable inside the Linux kernel which keeps increasing forever at a
fixed rate as the result of a hardware interrupt. It is the basic
packet of time in the Linux kernel, and the rate of this hardware
interrupt is given by the kernel's compile-time constant \fname{HZ}.
Its value depends on the architecture, and on some architectures it
can even be modified during kernel configuration.

In conclusion, it it desireable that the person who wants to set up
the ESP protocol on a machine does not have to know about these
details. Therefore the time measure for the round trip time was chosen
to be $\mu$s and is automatically converted upon getting and setting
of this parameter by special-cased implementations of the
\fname{proc\_handler()} and \fname{strategy()} functions. These
functions take the desired timeout value in microseconds and set ESP's
internal variable to the next full jiffie, thus rounding up the given
value. The method of rounding up was chosen, because having the RRQ
timer to kick in a little too late has only minor effect on overall
performance as shown in \ref{handling-packet-loss}. On the other hand,
if an RRQ is sent too early, it will cause a bunch of packets to be
transmitted again at no avail, which would degrade performance badly.

The preferred place where the ESP options should show up in the sysctl
tree is \fname{CTL\_NET/CTL\_ESP}. But with the current implementation
of the sysctl interface in Linux it is not possible to attach new
children to the interior nodes of the sysctl without patching the
kernel source, which does not seem to be worth the hassle. Therefore,
all ESP options are grouped under the \fname{CTL\_ESP} node, which is
a direct child of the sysctl root node. The numerical value of the
\fname{CTL\_ESP} constant is defined in the \fname{af\_enet.h} header
file and was set to a value that is currently unused by the rest of
the Linux kernel. The values reserved for Linux core components can be
found in the file \fname{linux/sysctl.h} of the kernel source code.

\subsubsection{Using the Sysctl Interface}
\label{using-sysctl}

Under Linux, there are three ways to read and set the parameters
exhibited through the ESP sysctl interface:

\begin{enumerate}
\item Using the \fname{sysctl()} function call. To use this function
  call, it is necessary to know the \fname{ctl\_name} constants which
  were used by the kernel module upon registration. These are defined
  in the \fname{af\_enet.h} header file which comes with the ESP
  kernel module and has to be included by every application which
  wants to use this protocol.
\item Accessing the files under the \fname{/proc/sys/esp/} virtual
  file system. This allows for quick testing of parameters by just
  using console commands like \fname{cat} and \fname{echo}. Having to
  give these directories and files senseful names is the reason why
  the \fname{procname} entry in the registration struct is needed.
\item Using the \fname{/sbin/sysctl} program. This program is also
  capable of reading the parameters to set from a file, which is used
  by most Linux distributions to set the parameters specified in the
  \fname{/etc/sysctl.conf} file at boot time.
\end{enumerate}

A full list of all parameters ESP offers through the sysctl interface
is shown in table~\ref{sysctl-overview}, along with a short
description of meaning of the individual settings.

\begin{sidewaystable}[p]
\begin{tabular}{|l|l|p{8cm}|}
  \hline
  Constant & procfs Entry Name & Description\\
  \hline\hline
  \fname{CTL\_ESP} & \fname{esp/} & The root node of the sysctl
  subtree for ESP.\\
  \hline
  \fname{CTL\_BURST\_LENGTH} & \fname{burst\_length} & The
  window size $w$ used during bulk transfers. The is the
  number of packets the protocol may have in flight when it
  knows the TXS was received successfully.\\
  \hline
  \fname{CTL\_INITIAL\_ACK\_BURST\_LENGTH} & 
  \fname{initial\_ack\_burst\_length} & The  window size when
  waiting for the first ACK which acknowledges the TXS frame
  has been received. This resembles the initial window size of the TCP
  protocol.\\
  \hline
  \fname{CTL\_PACKETS\_TO\_ACK} & \fname{packets\_to\_ack} & The
  number of data frames needed to trigger the sending of an ACK. 
  The detection of packet loss and the receipt of an TXS cause the
  immediate sending of an ACK, independent of this setting.\\
  \hline
  \fname{CTL\_SEND\_BUFF} & \fname{send\_buff\_size} & The
  size of the send buffer to be used. Only affects sockets
  allocated after setting a new value.\\
  \hline
  \fname{CTL\_RECV\_BUFF} & \fname{recv\_buff\_size} & The
  size of the receive buffer to be used. Only affects sockets
  allocated after setting a new value.\\
  \hline
  \fname{CTL\_ROUND\_TRIP\_TIME} & \fname{round\_trip\_time}
  & The round trip time assumed for the connection. This value
  is to be given in $\mu$s and is automatically rounded up to
  the next full jiffie\footnote{See \ref{adding-sysctl} for an
    detailed explanation.}.\\
  \hline
\end{tabular}

\caption{Parameters of the ESP protocol exhibited through the sysctl
  interface.}
\label{sysctl-overview}
\end{sidewaystable}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% IspellDict: "english"
%%% End: 

