<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
     "http://docbook.org/xml/4.2/docbookx.dtd" [

<!ENTITY ecrypt
'<ulink url="http://www.ecrypt.eu.org">ECRYPT</ulink>'>
<!ENTITY estream
'<ulink url="http://www.ecrypt.eu.org/stream/">eSTREAM</ulink>'>
<!ENTITY ist
'<ulink url="http://www.cordis.lu/ist/">IST</ulink>'>
<!ENTITY nessie
'<ulink url="http://www.cryptonessie.org">NESSIE</ulink>'>

<!ENTITY svn
'http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk'>

<!ENTITY perf
'http://www.ecrypt.eu.org/stream/perf'>

<!ENTITY ecrypt-sync.h 
'<filename class="headerfile"
  ><ulink url="&svn;/api/ecrypt-sync.h?view=auto"
    >ecrypt-sync.h</ulink></filename>'>

<!ENTITY ecrypt-sync-ae.h 
'<filename class="headerfile"
  ><ulink url="&svn;/api/ecrypt-sync-ae.h?view=auto"
    >ecrypt-sync-ae.h</ulink></filename>'>

<!ENTITY ecrypt-test.c 
'<filename
  ><ulink url="&svn;/test/ecrypt-test.c?view=auto"
    >ecrypt-test.c</ulink></filename>'>

<!ENTITY ecrypt-portable.h 
'<filename class="headerfile"
  ><ulink url="&svn;/include/ecrypt-portable.h?view=auto"
    >ecrypt-portable.h</ulink></filename>'>

]>

<article>

  <!-- ==================================================================== -->

  <articleinfo>
    <title>eSTREAM Optimized Code HOWTO</title>
    <author>
      <affiliation>
	<orgname>ECRYPT NoE</orgname>
	<address><email>estreamtesting@ecrypt.eu.org</email></address>
      </affiliation>
    </author>
    <pubdate>2005-11-01</pubdate>
    <revhistory>
      <revision>
	<revnumber>1.0</revnumber>
	<date>2005-11-01</date>
	<authorinitials>cdecanni</authorinitials>
	<revremark>first public version</revremark>
      </revision>
      <revision>
	<revnumber>0.9</revnumber>
	<date>2005-09-26</date>
	<authorinitials>cdecanni</authorinitials>
	<revremark>first draft</revremark>
      </revision>
    </revhistory>

    <abstract>
      <para>This document describes the &estream; testing framework
      and provides guidelines on how to write and submit optimized
      code.</para>
    </abstract>
  </articleinfo>

  <!-- ==================================================================== -->

  <section id="intro">
    <title>Introduction</title>
    
    <para>One of the requirements imposed on all eSTREAM stream cipher
    submissions was that they should be <quote>demonstrably superior
    to the AES in at least one significant aspect</quote>.  An aspect
    which is particularly significant for Profile I candidates is
    software performance.</para>

    <para>Software performance can be measured in many different ways,
    and in order to make comparisons as fair as possible, eSTREAM has
    decided to develop a testing framework. The framework has two
    objectives:</para>

    <orderedlist>
      <listitem>
	<para>assuring that all stream cipher proposals are submitted
	to the same tests under the same circumstances</para>
      </listitem>
      <listitem>
	<para>automating the test procedure as much as possible such
	that new optimized implementations can be included and tested
	with as little effort as possible.</para>
      </listitem>
    </orderedlist>

    <para>This second goal requires some cooperation from the
    submitters of optimized code, and the purpose of this document is
    to provide guidelines on how to write code that can easily be
    integrated in the testing framework.</para>

    <section id="disclaimer">
      <title>Disclaimer</title>

      <para>&ecrypt; is a Network of Excellence within the Information
      Societies Technology (&ist;) Programme of the European
      Commission. The information in this note is provided as is, and
      no guarantee or warranty is given or implied that the
      information is fit for any particular purpose. The user thereof
      uses the information at his or her sole risk and
      liability.</para>
    </section>

    <section id="feedback">
      <title>Feedback</title>

      <para>Feedback is most certainly welcome for this document. Send
      your additions, comments and criticisms to the following email
      address : <email>estreamtesting@ecrypt.eu.org</email>.</para>
    </section>

  </section>

  <!-- ==================================================================== -->

  <section id="overview">
    <title>The Testing Framework: An Overview</title>

    <para>The testing framework consists of a collection of scripts
    and C-code which test three aspects of the submitted code:
    <emphasis>API compliance</emphasis>,
    <emphasis>correctness</emphasis>, and
    <emphasis>performance</emphasis>. Many of these tests have been
    borrowed from the &nessie; Test Suite.</para>

    <section id="compliance">
      <title>API Compliance</title>

      <para>The eSTREAM API is specified in the files &ecrypt-sync.h;
      and &ecrypt-sync-ae.h;. The framework verifies whether the code
      complies to this API by performing the following tests:</para>

      <orderedlist>
	<listitem>
	  <para>It checks that the code provides the necessary
	  interfaces, i.e., that it compiles and links correctly with
	  the test code (&ecrypt-test.c;).</para>
	</listitem>
	<listitem>
	  <para>It checks that the <code>ECRYPT_KEYSIZE(i)</code> and
	  <code>ECRYPT_MAXKEYSIZE</code> macros allow key sizes to be
	  enumerated as specified by the API. Idem for IV and MAC
	  sizes.</para>
	</listitem>
	<listitem>
	  <para>It checks that calls to the same functions with the
	  same parameters produce the same results, no matter how they
	  are interleaved. When this test fails, this is often an
	  indication that the code stores data in static variables, or
	  that it uses uninitialized variables.</para>
	</listitem>
	<listitem>
	  <para>It checks that the incremental encryption functions
	  <function>ECRYPT_encrypt_blocks</function> and
	  <function>ECRYPT_encrypt_bytes</function> produce the same
	  ciphertext as <function>ECRYPT_encrypt_packet</function>
	  when fed with the same plaintext. It also verifies that this
	  ciphertext decrypts to the original plaintext.
	  </para>
	</listitem>
      </orderedlist>

    </section>

    <section id="correctness">
      <title>Correctness</title>

      <para>The correctness of the code on different platforms is
      verified by generating and comparing test vectors. For
      convenience, eSTREAM has chosen to use the same format as the
      <ulink url="http://www.cryptonessie.org/testvectors/">NESSIE
      test vectors</ulink>.</para>

      <caution>
	<para>The test vectors currently included in the testing
	framework were generated by eSTREAM and still need to be
	verified by the designers.</para>
      </caution>

    </section>

    <section id="performance">
      <title>Performance</title>

      <para>Stream ciphers can be deployed in various situations, each
      imposing specific requirements on the efficiency of the
      primitive. Hence, defining a small set of performance criteria
      which reflects all relevant implementation properties of a
      stream cipher is not an easy task. In the current version of the
      framework, eSTREAM has limited itself to four performance
      measures. More detailed tests might be added in the future,
      though.</para>

      <orderedlist>
	<listitem>
	  <formalpara>
	    <title>Encryption rate for long streams</title>
	
	    <para>This is where stream ciphers have the biggest
	    potential advantage over block ciphers, and hence this
	    figure is likely to be the most important criterion in
	    many applications. The testing framework measures the
	    encryption rate by encrypting a long stream in chunks of
	    about 4KB using the
	    <function>ECRYPT_encrypt_blocks</function> function. The
	    encryption speed, in cycles/byte, is calculated by
	    measuring the number of bytes encrypted in 250
	    &micro;sec. Note that the time to setup the key or the IV
	    is not considered in this test.</para>
	  </formalpara>
	</listitem>
	<listitem>
	  <formalpara>
	    <title>Packet encryption rate</title>
	
	    <para>While a block cipher is likely to be a better choice
	    when encrypting very short packets, it is still
	    interesting to determine at which length a stream cipher
	    starts to take the lead. Moreover, stream ciphers whose
	    encryption speeds do not deteriorate too much for small
	    packets could have a distinct advantage in applications
	    which use a wide range of packet sizes. The packet
	    encryption rate is measured by applying the
	    <function>ECRYPT_encrypt_packet</function> function to
	    packets of different lengths. Each call to
	    <function>ECRYPT_encrypt_packet</function> includes a
	    separate IV setup and, if authenticated encryption is
	    supported, a MAC finalization step. The packet lengths
	    (40, 576, and 1500 bytes) were chosen to be representative
	    for the traffic seen on the Internet <xref
	    linkend="JTC-003"/>.
	    </para>
	  </formalpara>
	</listitem>
	<listitem>
	  <formalpara>
	    <title>Agility</title>
	
	    <para>When an application needs to encrypt many streams in
	    parallel on a single processor, its performance will not
	    only depend on the encryption speed of the cipher, but
	    also on the time spent switching from one session to
	    another. This overhead is typically determined by the
	    number of bytes of <code>ECRYPT_ctx</code> that need to be
	    stored or restored during each context switch. In order to
	    build a picture of the agility of the different
	    submissions, the testing framework performs the following
	    test: it first initiates a large number of sessions
	    (filling 16MB of RAM with <code>ECRYPT_ctx</code>
	    structures), and then encrypts streams of plaintext in
	    short blocks of around 256 bytes using
	    <function>ECRYPT_encrypt_blocks</function>, each time
	    jumping from one session to another.</para>
	  </formalpara>
	</listitem>
	<listitem>
	  <formalpara>
	    <title>Key and IV setup (+ MAC generation)</title>
	
	    <para>The last test in the testing framework separately
	    measures the efficiency of the key setup
	    (<function>ECRYPT_keysetup</function>) and the IV setup
	    (<function>ECRYPT_ivsetup</function>). Given that each
	    call to <function>ECRYPT_AE_ivsetup</function> comes
	    together with a call to
	    <function>ECRYPT_AE_finalize</function>, both functions
	    are benchmarked together in case of authenticated stream
	    ciphers. This is probably the least critical of the four
	    tests, considering that the efficiency of the IV setup is
	    already reflected in the packet encryption rate, and that
	    the time for the key setup will typically be negligible
	    compared to the work needed to generate and exchange the
	    key.</para>
	  </formalpara>
	</listitem>
      </orderedlist>

      <para>The different tests are illustrated below with an example
      for SNOW 2.0. The latest results for all submissions, measured
      by eSTREAM on various platforms, can be found in <xref
      linkend="results"/>.</para>

      <example>
	<title>Output of performance tests</title>

	<screen><computeroutput>Primitive Name: SNOW-2.0
========================
Profile: SW
Key size: 128 bits
IV size: 128 bits

CPU speed: 1694.8 MHz
Cycles are measured using RDTSC instruction

Testing memory requirements:

Size of ECRYPT_ctx: 108 bytes

Testing stream encryption speed:

Encrypted 22 blocks of 4096 bytes (under 1 keys, 22 blocks/key)
Total time: 415015 clock ticks (244.87 usec)
Encryption speed (cycles/byte): 4.61
Encryption speed (Mbps): 2943.95

Testing packet encryption speed:

Encrypted 350 packets of 40 bytes (under 10 keys, 35 packets/key)
Total time: 411499 clock ticks (242.80 usec)
Encryption speed (cycles/packet): 1175.71
Encryption speed (cycles/byte): 29.39
Encryption speed (Mbps): 461.29
Overhead: 538.2%

Encrypted 120 packets of 576 bytes (under 10 keys, 12 packets/key)
Total time: 416341 clock ticks (245.66 usec)
Encryption speed (cycles/packet): 3469.51
Encryption speed (cycles/byte): 6.02
Encryption speed (Mbps): 2250.95
Overhead: 30.8%

Encrypted 50 packets of 1500 bytes (under 1 keys, 50 packets/key)
Total time: 395528 clock ticks (233.38 usec)
Encryption speed (cycles/packet): 7910.56
Encryption speed (cycles/byte): 5.27
Encryption speed (Mbps): 2570.96
Overhead: 14.5%

Weighted average (Simple Imix):
Encryption speed (cycles/byte): 7.35
Encryption speed (Mbps): 1844.62
Overhead: 59.6%

Testing key setup speed:

Did 7000 key setups (under 10 keys, 700 setups/key)
Total time: 446655 clock ticks (263.54 usec)
Key setup speed (cycles/setup): 63.81
Key setup speed (setups/second): 26561211.67

Testing IV setup speed:

Did 500 IV setups (under 10 keys, 50 setups/key)
Total time: 397912 clock ticks (234.78 usec)
IV setup speed (cycles/setup): 795.82
IV setup speed (setups/second): 2129634.19

Testing key agility:

Encrypted 270 blocks of 256 bytes (each time switching contexts)
Total time: 412653 clock ticks (243.48 usec)
Encryption speed (cycles/byte): 5.97
Encryption speed (Mbps): 2271.07
Overhead: 29.6%


End of performance measurements</computeroutput></screen>

      </example>

    </section>

  </section>

  <!-- ==================================================================== -->

  <section id="installing">
    <title>Installing the Testing Framework</title>

    <para>A tarball of the latest version of the testing framework can
    always be downloaded from ECRYPT's <ulink url="&svn;/">SVN
    repository</ulink>. This repository contains the most recent
    implementations of all stream cipher candidates, together with
    test vectors, test scripts, and a few benchmark ciphers (AES in
    CTR mode, RC4, SNOW 2.0).</para>

    <para>The scripts expect a shell compatible with <ulink
    url="http://www.gnu.org/software/bash/">GNU Bash</ulink>, system
    utilities compatible with <ulink
    url="http://www.gnu.org/software/coreutils/">GNU
    Coreutils</ulink>, and one (or more) ANSI C compiler(s). The
    following sections discuss how these requirements can be fulfilled
    on various platforms.</para>

    <section id="livecd">
      <title>x86 Live CD</title>

      <para>The easiest way to run the testing framework on a x86
      platform is to download eSTREAM's bootable Live CD. The CD
      allows the framework to run without any installation and without
      affecting the existing configuration on the host machine in any
      way. The Live CD is based on <ulink
      url="http://www.ubuntulinux.org">Ubuntu</ulink> and
      includes:</para>

      <itemizedlist>
	<listitem>
	  <para>a stripped-down version of <ulink
	  url="http://www.ubuntu.com/504Released">Ubuntu
	  5.04</ulink>.</para>
	</listitem>
        <listitem>
	  <para>a working copy of the <ulink url="&svn;/">testing
	  framework</ulink>.</para>
	</listitem>
	<listitem>
	  <para>different versions of <ulink
          url="http://gcc.gnu.org">GCC</ulink> (2.95, 3.3, 3.4, and
          4.0).</para>
	</listitem>
        <listitem>
	  <para><ulink
	  url="http://www.intel.com/cd/software/products/asmo-na/eng/compilers/clin/">Intel
	  C++ Compiler 8.1 for Linux</ulink>.</para>
	</listitem>
	<listitem>
	  <para><ulink
	  url="http://msdn.microsoft.com/visualc/vctoolkit2003/">Microsoft
	  Visual C++ Toolkit 2003</ulink>.</para>
	</listitem>
      </itemizedlist>

      <para>To run the Live CD, complete the following steps:</para>

      <orderedlist>
	<listitem>
	  <para>Download the <ulink
	  url="http://www.ecrypt.eu.org/stream/perf/ecrypt-live-i386.iso">ISO-file</ulink>
	  (about 400MB) and burn it on a CD.</para>
	</listitem>
	<listitem>
	  <para>If you want to use the Intel C Compiler, you will need
	  a license file. You can obtain a free non-commercial license
	  by subscribing <ulink
	  url="http://www.intel.com/cd/software/products/asmo-na/eng/compilers/clin/">here</ulink>.
	  If you store the license file on a memory stick, the Live CD
	  will recognize it automatically.</para>
	</listitem>
	<listitem>
	  <para>Boot the CD and choose your language, keyboard layout,
	  etc.</para>
	</listitem>
	<listitem>
	  <para>When the screen shown in <xref linkend="screenshot"/>
	  comes up, double-click on the <quote>ECRYPT test
	  suite</quote> icon. This will install the testing framework
	  in <filename
	  class='directory'>~/ecrypt-test-suite</filename>, fetch
	  updates from the eSTREAM server if requested, and
	  immediately launch the testing process (see <xref
	  linkend="running"/>). The tests can be aborted at any time by
	  pressing <keycombo><keycap>Ctrl</keycap>
	  <keycap>C</keycap></keycombo>.</para>

	  <caution>
	    The Live CD does not store anything on the hard disk. Any
	    changes you make will be lost after reboot, unless you
	    copy them manually (e.g., on a memory stick).
	  </caution>
	</listitem>
      </orderedlist>

      <figure id="screenshot">
	<title>Screenshot of Live CD</title>
	<screenshot>
	  <screeninfo>800x600</screeninfo>
	  <graphic fileref="livecd.png"></graphic>
	</screenshot>
      </figure>

    </section>

    <section id="linux">
      <title>GNU/Linux</title>

      <para>The testing framework should install and run without
      problems on any recent Linux distribution. Here are the
      Installation instructions:</para>

      <orderedlist>
	<listitem>
	  <para>Download and untar the <ulink
	  url="&svn;.tar.gz">tarball</ulink> of the testing framework:</para>

          <screen><prompt>$ </prompt><userinput><![CDATA[wget http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk.tar.gz]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[tar -xzf trunk.tar.gz]]></userinput></screen>
	  
	</listitem>
	<listitem>
	  <para>Correct the permissions of the scripts (this fix is
	  necessary because the current version of <ulink
	  url="http://viewcvs.sourceforge.net">ViewCVS</ulink> does
	  not understand the <code><ulink
	  url="http://svnbook.red-bean.com/en/1.1/ch07s02.html#svn-ch-7-sect-2.3">svn:executable</ulink></code>
	  property).</para>

          <screen><prompt>$ </prompt><userinput><![CDATA[cd trunk/]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[chmod +x start scripts/{cleanup,collect,configure,run,tabulate}]]></userinput></screen>
	  
	</listitem>
	<listitem>
	  <para>Make sure you have a compiler (GCC) installed and
	  configure the framework as explained in <xref
	  linkend="configuring"/>.</para>
	</listitem>
      </orderedlist>

    </section>

    <section id="windows">
      <title>Microsoft Windows</title>

      <para>The shell and system utilities required by the testing
      framework are not present on a standard Windows
      platform. Fortunately, there exist several freely available
      software packages which provide this functionality. The
      instructions below explain how to install the framework using
      the <ulink
      url="http://www.mingw.org/msys.shtml">MinGW/MSYS</ulink>
      packages.</para>

      <orderedlist>
	<listitem>
	  <para>First install an ANSI C compiler. The testing
	  framework currently detects two compilers under
	  Windows:</para>
	  <itemizedlist>
	    <listitem>
	      <para>Microsoft C/C++ Optimizing Compiler, which is
	      included in <ulink
	      url="http://msdn.microsoft.com/vstudio/">Microsoft
	      Visual Studio</ulink> and can be downloaded separately
	      from <ulink
	      url="http://msdn.microsoft.com/visualc/vctoolkit2003/">MSDN</ulink>.</para>
	    </listitem>
	    <listitem>
	      <para><ulink url="http://gcc.gnu.org">GCC</ulink>, which
              has been ported to Windows by the <ulink
              url="http://www.mingw.org">MinGW</ulink> project
              (amongst others). The installation program is available
              <ulink
              url="http://prdownloads.sourceforge.net/mingw/MinGW-4.1.0.exe">here</ulink>.</para>
	    </listitem>
	  </itemizedlist>
	</listitem>
	<listitem>
	  <para>Install the <ulink
          url="http://www.mingw.org/msys.shtml">MSYS</ulink>
          shell. The current version can be downloaded <ulink
          url="http://prdownloads.sourceforge.net/mingw/MSYS-1.0.10.exe">here</ulink>.</para>

	  <caution>
	    <para>If you plan to use the MinGW compiler, make sure to
	    install it before installing MSYS.</para>
	  </caution>
	</listitem>
	<listitem>
	  <para>Download the <ulink url="&svn;.tar.gz">tarball</ulink>
	  of the testing framework and store it in your MSYS home
	  directory. In a default installation, this directory is
	  located in <filename
	  class='directory'>C:\msys\1.0\home\%USERNAME%</filename>. Open
	  an MSYS terminal and extract the tarball:</para>

<screen><prompt>$ </prompt><userinput><![CDATA[tar -xzf trunk.tar.gz]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[cd trunk/]]></userinput>
</screen>
	</listitem>
	<listitem>
	  <para>Configure the framework as explained in <xref
	  linkend="configuring"/>.</para>

	  <tip>
	    <para>If the configuration script does not detect the
	    Microsoft Compiler, this probably indicates that the
	    environment variables <envar>PATH</envar>,
	    <envar>INCLUDE</envar>, and <envar>LIB</envar> are not set
	    correctly. The correct values can be found in a file named
	    <filename>vcvars32.bat</filename>. You can either add
	    these variables to your system in
	    <menuchoice><guimenu>Control Panel</guimenu>
	    <guimenuitem>System</guimenuitem>
	    <guimenuitem>Advanced</guimenuitem>
	    <guimenuitem>Environment Variables</guimenuitem>
	    </menuchoice>, or (for a default installation) add the
	    following line in
	    <filename>C:\msys\1.0\msys.bat</filename>:</para>

	    <programlisting>C:\Program Files\Microsoft Visual C++ Toolkit 2003\vcvars32.bat</programlisting>
	  </tip>
	</listitem>
      </orderedlist>

    </section>

    <section id="unix">
      <title>UNIX Platforms</title>

      <para>The installation instructions for UNIX platforms are the
      same as the ones given in <xref linkend="linux"/>. However,
      depending on your operating system, some of the scripts might
      fail to run correctly because of small compatibility
      problems. The easiest way to avoid this is to replace some of
      the UNIX utilities by their GNU equivalents:</para>

      <itemizedlist>
	<listitem>
	  <para><ulink url="http://www.gnu.org/software/bash/">GNU
	  Bash</ulink></para>
	</listitem>
	<listitem>
	  <para><ulink url="http://www.gnu.org/software/make/">GNU
	  Make</ulink></para>
	</listitem>
	<listitem>
	  <para><ulink
	  url="http://www.gnu.org/software/coreutils/">GNU
	  Coreutils</ulink></para>
	</listitem>
      </itemizedlist>

      <para>With these tools installed, the framework is known to run
      correctly on the following platforms:</para>

      <itemizedlist>
	<listitem>
	  <para>HP-UX 11.00 (PA-RISC version) with HP C/HP-UX Version
	  B.11.11.02 and/or GCC.</para>
	</listitem>
	<listitem>
	  <para>Solaris 8 (SPARC edition) with SUN Forte Developer 7 C
          5.4 and/or GCC.</para>
	</listitem>
	<listitem>
	  <para>Tru64 UNIX V5.1B (Alpha) with Compaq C V6.5-011 and/or
	  GCC.</para>
	</listitem>
      </itemizedlist>

    </section>

    <section id="other">
      <title>Other Systems</title>

      <para>It is probably not too hard to make the framework run on
      other systems (e.g., Mac OS X on PowerPC). As soon as we have a
      chance to test it, we will update this document.</para>

    </section>

  </section>

  <!-- ==================================================================== -->

  <section id="running">
    <title>Running Tests</title>

    <para>This section explains how to use the scripts inluded in the
    testing framework. The three most important scripts are called
    <command>configure</command>, <command>run</command>, and
    <command>collect</command> and are located in the directory
    <filename class='directory'>./scripts</filename>. A fourth script,
    called <command>start</command>, runs the three previous commands
    one after another.</para>

    <tip>
      <para>In order to avoid having to prefix all commands with
      <userinput>./scripts/</userinput>, add the scripts directory to
      the <envar>PATH</envar> variable:</para>

      <screen><prompt>$ </prompt><userinput><![CDATA[export PATH=$PWD/scripts:$PATH]]></userinput></screen>
    </tip>

    <section id="configuring">
      <title>Configuring</title>

      <para>The <command>configure</command> script searches the path
      for compilers, tests which compiler options are supported, and
      collects information about the CPU. All information is stored in
      the directory <filename
      class='directory'>./reports-$HOSTNAME</filename>. The first part
      of the script's output is reproduced below:</para>

      <screen><prompt>$ </prompt><userinput><![CDATA[./scripts/configure]]></userinput>
<computeroutput><![CDATA[
 * searching for compilers ... done

The following executables look like compilers:
  - gcc
  - gcc-3.3
  - i386-linux-gcc
  - i386-linux-gcc-3.3
  - i486-linux-gcc-3.3
  - icc
  - cl

I will now execute them for further testing.
Is this safe? [Y/n] ]]></computeroutput><userinput>y</userinput>
<computeroutput><![CDATA[
 * checking compilers and discarding duplicates ... done
 * checking compiler options ... done]]></computeroutput></screen>

      <para>If the list of executables above contains programs which
      you definitely do not want the script to run and test, simply
      press <userinput>n</userinput>, edit the file
      <filename>./reports-$HOSTNAME/candidates</filename>, and start
      the script again.</para>

      <para>The final list of supported compilers and options is
      stored in
      <filename>./reports-$HOSTNAME/compilers</filename>. Note that
      this first stage is normally only executed when the script is
      launched for the very first time. The script will only perform a
      new search, if the file
      <filename>./reports-$HOSTNAME/compilers</filename> has been
      deleted.</para>

      <para>The second part of the script provides the possibility to
      enable or disable any compiler previously detected. The set of
      active compilers can be modified at any time by running
      <command>configure</command> again.</para>

      <screen><computeroutput><![CDATA[The following compilers are supported:
  1. gcc
  2. icc
  3. cl

Enter a comma-separated list of numbers to select compilers
or press <return> to select everything: ]]></computeroutput><userinput>1,2,3</userinput>
<computeroutput><![CDATA[
 * creating config files ... done]]></computeroutput>
</screen>

      <para>The result of the script is a list of configuration files
      in <filename
      class='directory'>./reports-$HOSTNAME/configs</filename>, each
      of which defines a compiler and a combination of flags. Which
      configurations will eventually be used during the benchmarking,
      and in which order, is determined by the file
      <filename>./reports-$HOSTNAME/shortlist</filename>, constructed
      in the next step of the script: </para>

      <screen><computeroutput><![CDATA[The following shortlists contain compiler options which are
likely to produce fast code on particular platforms:
  1. shortlist.alpha (11 options)
  2. shortlist.amd64 (17 options)
  3. shortlist.hppa (10 options)
  4. shortlist.pentium-4 (32 options)
  5. shortlist.pentium-m (29 options)
  6. shortlist.sparc (19 options)

Enter a number to select a list: ]]></computeroutput><userinput>5</userinput>
<computeroutput><![CDATA[
 * copying shortlist.pentium-m ... done

After having tested the options in the shortlist, should
the script start testing other options? [Y/n] ]]></computeroutput><userinput>y</userinput>
</screen>

      <para>If you want the script to finish immediately after all
      configurations in the shortlist have been tested, answer
      <userinput>n</userinput>. The framework's default behavior is to
      start testing compiler options which are not on the list, until
      the script is either interrupted by the user (pressing
      <keycombo><keycap>Ctrl</keycap> <keycap>C</keycap></keycombo>),
      or all configuration files in <filename
      class='directory'>./reports-$HOSTNAME/configs</filename> have
      been tested.</para>

      <para>The last task of the configuration script is to determine
      the CPU's clock frequency. The correct frequency should
      automatically be detected on most platforms:</para>

      <screen><computeroutput><![CDATA[ * collecting CPU information ... done

The processor seems to run at 1694.829 MHZ.
Press <return> if this is correct, or enter the clock speed:
]]></computeroutput>
</screen>

    </section>

    <section id="tests">
      <title>Launching Tests</title>

      <para>The actual tests are launched with the command
      <command>run</command>. When invoked without arguments, the
      script will run through all implementations in the test suite,
      compile each of them using the current compiler settings, and
      perform the tests described in <xref linkend="overview"/>. This
      is repeated for all compiler configurations in <filename
      class='directory'>./reports-$HOSTNAME/configs</filename> (or at
      least those in the shortlist), and the results are stored in the
      current working directory. The following example shows how
      <command>run</command> is invoked in the
      <command>start</command> script (all test reports are stored in
      <filename class='directory'>./reports-$HOSTNAME</filename> in
      this case):</para>

      <screen><prompt>$ </prompt><userinput><![CDATA[cd reports-$HOSTNAME/]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[../scripts/run]]></userinput>
</screen>

      <para>An optional argument can be used to specify the directory
      which contains the implementations to be tested. If the
      specified path does not point to an existing directory, the
      script will check whether it matches a directory in <filename
      class='directory'>submissions</filename> or <filename
      class='directory'>benchmarks</filename>. The command below, for
      example, will only test the implementation of SNOW 2.0 (assuming
      that the <filename class='directory'>scripts</filename>
      directory is in the <envar>PATH</envar>):</para>

      <screen><prompt>$ </prompt><userinput><![CDATA[mkdir snow-results]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[cd snow-results/]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[run snow-2.0]]></userinput>
</screen>

      <note>
	<para>On some platforms, the testing framework might uses the
	standard <function>clock()</function> function to measure
	timings. Unfortunately, this function has a rather low
	resolution on most platforms. As a consequence, tests need to
	be run for several seconds in order for the timing results to
	be accurate. Depending on the number of primitives and
	compiler options, the benchmarking can therefore take a very
	long time. The tests can be aborted at any time by pressing
	<keycombo><keycap>Ctrl</keycap> <keycap>C</keycap></keycombo>,
	though.</para>
      </note>

    </section>

    <section id="collecting">
      <title>Collecting the Results</title>

      <para>Once the tests are finished (or have been aborted by the
      user), the results can be collected with the
      <command>collect</command> command. This script traverses the
      current working directory tree and creates an HTML report (named
      <filename>index.html</filename>) summarizing the benchmark
      results for each subdirectory it encounters. For examples of the
      HTML output, see <xref linkend="results"/>.</para>

    </section>

  </section>

  <!-- ==================================================================== -->

  <section id="submitting">
    <title>Submitting Optimized Code</title>

    <para>This section is a step-by-step guide to writing code which
    can easily be integrated into the testing framework. Please make
    sure to follow the steps described below before submitting
    optimized code to the eSTREAM project.</para>

    <section id="step-1">
      <title>Step 1: Install the Testing Framework</title>

      <para>First, dowload, install, and configure the testing
      framework as explained in detail in <xref linkend="installing"/>
      and <xref linkend="configuring"/>.</para>

    </section>

    <section id="step-2">
      <title>Step 2: Edit Source Files</title>

      <para>The easiest way to create a new API-compliant
      implementation of a stream cipher is to copy and edit the
      reference implementation included in the testing
      framework:</para>

      <screen><prompt>$ </prompt><userinput><![CDATA[cd ./submissions/xyz/]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[mv `echo *; mkdir old` old/]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[cp -r old/ new]]></userinput>
</screen>

      <para>There are a number of issues that should be taken into
      account when editing the source code:</para>

      <itemizedlist>
	<listitem>
	  <formalpara>
	    <title>Portability</title>
	    
	    <para>The optimized code should compile under any ANSI C
	    compiler and run on any platform. This does not mean that
	    compiler or platform specific extensions cannot be
	    used. However, if non-standard constructs are used, then
	    the code should first check if the extensions are
	    supported by the compiler, and if not, provide (possibly
	    non-optimized) alternatives. Here is an example:</para>

	  <programlisting><![CDATA[#if defined(_MSC_VER) && (_MSC_VER >= 1300)

/* optimized code using Microsoft Visual C++ .NET extensions */

#elif defined(__x86_64__)

/* code optimized for the AMD64 architecture */

#else

/* standard C code */

#endif]]></programlisting>

	  <para>The code should also anticipate possible endianness
	  and data alignment problems when running on non-x86
	  platforms. This involves the following measures:</para>

	  <itemizedlist>
	    <listitem>
	      <para>Use the <function>UXTOY_LITTLE</function> or
	      <function>UXTOY_BIG</function> macros defined in
	      &ecrypt-portable.h; anywhere words are stored into bytes
	      or the other way around. These macros will change the
	      order of the bytes on all platforms where it is
	      required.</para>
	    </listitem>
	    <listitem>
	      <para>Align all memory accesses. Several UNIX machines
	      will generate bus errors when words loaded from or
	      stored into the memory are not aligned on multiples of
	      the word size. Note that the code can safely assume that
	      all byte strings passed to the API are aligned on
	      multiples of the largest word size supported on the
	      machine (e.g., 128 bit on modern Pentium 4
	      processors).</para>
	    </listitem>
	  </itemizedlist>

	  </formalpara>
	</listitem>

	<listitem>
	  <formalpara>
	    <title>Readability</title>
	    
	    <para>Avoid writing unnecessary complex code, i.e.:</para>
	    
	  <itemizedlist>
	    <listitem>
	      <para>Only implement the functions strictly required by
	      the API. Code providing additional debugging
	      functionality, interactive user interfaces, etc. should
	      be disabled (using <code>#ifndef ECRYPT_API</code>), or
	      better yet, completely removed.</para>
	    </listitem>
	    <listitem>
	      <para>Remove unused variables and 'dead' code.</para>
	    </listitem>
	    <listitem>
	      <para>Manually unroll loops only if this makes the code
	      significantly faster.</para>
	    </listitem>
	  </itemizedlist>
	</formalpara>
	</listitem>

	<listitem>
	  <formalpara>
	    <title>Assembly</title>
	    
	    <para>An efficient C implementation, compiled with a
	    modern compiler and with the proper optimization flags, is
	    often pretty difficult to beat using hand-coded
	    assembly. However, if you feel that your implementation
	    could significantly benefit from assembly, then here are
	    some guidelines:</para>
	    
	  <itemizedlist>
	    <listitem>
	      <para>Always analyze the machine code generated by your
	      C compiler before writing your own assembly.</para>
	    </listitem>
	    <listitem>
	      <para>If the only purpose of the assembly is to take
	      advantage of SIMD instructions, consider to replace it
	      by C intrinsics (the MMX intrinsics defined in <filename
	      class="headerfile">mmintrin.h</filename>, for example,
	      are supported by GNU, Intel, and Microsoft).</para>
	    </listitem>
	    <listitem>
	      <para>Avoid the use of assembly in non-critical parts of
	      your implementation. The preferred approach is to use
	      only a few short blocks of inline assembly in the inner
	      loops of your algorithm. As before, make sure that the
	      code checks whether the compiler supports the assembly
	      syntax, for example:</para>

	      <programlisting><![CDATA[#if defined(__GNUC__) && defined(__i386__)
  asm
    ("\n	sall	$5, %[a]"
     "\n	sarl	$27, %[b]"
     "\n	orl	%[b], %[a]"
     "\n	addl	%[c], %[a]"
     : [a] "+r" (a)
     : [b] "r" (b), [c] "g" (c));
#else
  a = ((a << 5) | (b >> 27)) + c;
#endif]]></programlisting>
	    </listitem>
	    <listitem>
	      <para>While eSTREAM does not encourage this, plain
	      assembly implementations can also be submitted, provided
	      that they can be processed by GCC (i.e.,
	      <filename>.s</filename> or <filename>.S</filename>
	      files). Here is an example of a <filename>.S</filename>
	      file:</para>

<programlisting><![CDATA[	.text
	
	.globl	ECRYPT_init
	.type	ECRYPT_init, @function
ECRYPT_init:
	ret
	.size	ECRYPT_init, .-ECRYPT_init

	.globl	ECRYPT_keysetup
	.type	ECRYPT_keysetup, @function
ECRYPT_keysetup:

#if defined(__pentium4__)
	
	/* assembly optimized for Pentium 4 */

#elif defined(__x86_64__)

	/* assembly optimized for AMD64 */

#else
#error architecture is not supported
#endif

	ret
	.size	ECRYPT_keysetup, .-ECRYPT_keysetup

	...]]></programlisting>

	    </listitem>
	  </itemizedlist>
	  </formalpara>
	</listitem>

      </itemizedlist>

    </section>

    <section id="step-3">
      <title>Step 3: Run Tests</title>

      <para>Before submitting an optimized implementation, you should
      make sure that (1) it interacts correctly with the testing
      framework, and (2) it is indeed more efficient than the existing
      code. In order to verify these conditions, create a test
      directory and run the tests as described in <xref
      linkend="tests"/>:</para>

      <screen><prompt>$ </prompt><userinput><![CDATA[mkdir test-results]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[cd test-results/]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[run xyz/new]]></userinput>
<prompt>$ </prompt><userinput><![CDATA[collect]]></userinput>
</screen>

      <para>The scripts should not return any error. If they do, then
      the code is probably not API compliant. The sections below
      describe a number of common problems and explain how to resolve
      them.</para>

      <section>
	<title>The compilation fails</title>

	<para>If the compilation fails, check the error messages in
	<filename>errors_*</filename>. Typical compilation errors
	are:</para>

	<variablelist>
	  <varlistentry>
	    <term><quote><errortext>undefined reference to
            `ECRYPT_init'</errortext></quote></term>
	    <listitem>
	      <para>This message indicates that your code does not
              define the <function>ECRYPT_init</function>
              function. This function should always be defined, even
              if it is left empty.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><quote><errortext>syntax error before '/'
            token</errortext></quote></term>
	    <listitem>
	      <para>Replace non-standard
	      <quote><code>//</code></quote> comments by
	      <quote><code>/* ... */</code></quote>.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><quote><errortext>syntax error before
	    "u64"</errortext></quote></term>
	    <listitem>
	      <para>On 32-bit platforms, <code>u64</code> variables
	       are defined as <code>unsigned long long</code>. This
	       type exists in the ISO C99 standard, but not in ISO
	       C89. Therefore, if your code uses <code>u64</code>
	       variables, replace the third line of the
	       <filename>Makefile</filename> by:</para>

	      <programlisting><![CDATA[std = -std=c99]]></programlisting>
	    </listitem>
	  </varlistentry>
	</variablelist>
      </section>

      <section>
	<title>The execution fails when trying to generate test
	vectors</title>

	<para>If the script refuses to generate test vectors, this
        typically indicates that the code did not pass the API
        compliance tests described in <xref
        linkend="compliance"/>. The file <filename>errors_*</filename>
        reports which tests failed. In order to correct these
        problems, make sure that:</para>

	<itemizedlist>
	  <listitem>
	    <para>all variables that need to be transferred from one
            function call to another are stored in the
            <code>ECRYPT_ctx</code> structure, and not in static
            variables or dynamically allocated memory;</para>
	  </listitem>
	  <listitem>
	    <para>all variables are initialized;</para>
	  </listitem>
	  <listitem>
	    <para>the function <function>ECRYPT_ivsetup</function>
	    reinitializes all variables whose values were changed
	    during the encryption with previous IVs.</para>
	  </listitem>
	</itemizedlist>

	<para>Another common problem which causes the execution to
	fail on 64-bit machines are bus errors due to misaligned
	data. When loading or storing words into the memory, always
	make sure that they are aligned on multiples of the word
	size (see also <xref linkend="step-2"/>).</para>

      </section>

      <section>
	<title>The test vectors do not match</title>

	<para>If the test vectors generated by the scripts differ from
        the ones included in the testing framework, this might have
        two causes:</para>

	<itemizedlist>
	  <listitem>
	    <para>There is a bug in the optimized
            implementation.</para>

	    <para>A common problem on UNIX platforms are endianness
	    issues. As explained in <xref linkend="step-2"/>, always
	    use the <function>UXTOY_LITTLE</function> or
	    <function>UXTOY_BIG</function> macros when translating
	    bytes into words or vice versa.</para>
	  </listitem>
	  <listitem>
	    <para>There was a bug in the original reference code. In
	    this case new test vectors need to be generated. This can
	    be done by issuing the following command in the source
	    directory:</para>

	    <screen><prompt>$ </prompt><userinput><![CDATA[make vectors]]></userinput></screen>

	    <para>If you do not have GCC installed, you will also need
	    to specify which configuration file (see <xref
	    linkend="configuring"/>) to use, for example:
	    <code><![CDATA[make vectors conf=cl_default_default]]></code>.</para>

	    <para>The command above will generate a file called
	    <filename>unverified.test-vectors</filename>, which can be
	    renamed to <filename>verified.test-vectors</filename> once
	    the correctness of the test vectors have been
	    verified:</para>

	    <screen><prompt>$ </prompt><userinput><![CDATA[mv unverified.test-vectors verified.test-vectors]]></userinput></screen>
	  </listitem>

	</itemizedlist>
      </section>

    </section>

    <section id="step-4">
      <title>Step 4: Submit Code</title>

      <para>Optimized implementations which pass all tests described
      above should be mailed to
      <email>estreamtesting@ecrypt.eu.org</email>. The mail should
      contain the following information:</para>

      <orderedlist>
	<listitem>
	  <para>A <filename>.tar.gz</filename> or
	  <filename>.zip</filename> attachment containing the following
	  files (and only these files):</para>

	  <orderedlist>
	    <listitem>
	      <para>an API compliant header file (i.e., the <filename
	      class="headerfile">ecrypt-sync.h</filename> or <filename
	      class="headerfile">ecrypt-sync-ae.h</filename> file);</para>
	    </listitem>
	    <listitem>
	      <para>the <filename>.c</filename> file (and
	      <filename>.h</filename> files, if any) implementing the
	      primitive;</para>
	    </listitem>
	    <listitem>
	      <para>a <filename>Makefile</filename> (this file should
	      normally not have been modified);</para>
	    </listitem>
	    <listitem>
	      <para>a file called
	      <filename>verified.test-vectors</filename> containing
	      the correct test vectors (only if the existing test
	      vectors happened to be incorrect).</para>
	    </listitem>
	  </orderedlist>

	</listitem>
	<listitem>
	  <para>A confirmation that the test vectors included in the
	  testing framework have been verified.</para>
	</listitem>
	<listitem>
	  <para>A note stating whether or not the implementation
	  should replace the existing reference code.</para>
	</listitem>
      </orderedlist>

    </section>

  </section>

  <!--
  ====================================================================
  -->

  <section id="results">
    <title>Latest Performance Figures</title>

    <para>The table below links to the most recent reports produced by
    the eSTREAM testing framework. These reports will regularly be
    updated as new implementations are submitted. It is important to
    emphasize that the current results are very preliminary: the
    implementations currently included in the framework only serve as
    reference code, and are not necessarily optimized.</para>

    <table>
      <title>Preliminary performance figures</title>
      <tgroup cols="4">
	<thead>
	  <row>
	    <entry>CPU</entry>	
	    <entry>Clock</entry>
	    <entry>Model</entry>
	    <entry>Latest reports</entry>
	  </row>
	</thead>
	<tbody>
	  <row>
	    <entry>Intel Pentium M</entry> <entry
	    align="center">1700MHz</entry> <entry
	    align="center">6/9/5</entry> <entry align="center">[<ulink
	    url="&perf;/pentium-m/">revision 115</ulink>]</entry>
	  </row>
	  <row>
	    <entry>Intel Pentium 4</entry> <entry
	    align="center">2.40GHz</entry> <entry
	    align="center">15/2/9</entry> <entry
	    align="center">[<ulink url="&perf;/pentium-4/">revision
	    115</ulink>]</entry>
	  </row>
	  <row>
	    <entry>AMD Athlon 64 3000+</entry> <entry
	    align="center">1.80GHz</entry> <entry
	    align="center">15/47/0</entry> <entry
	    align="center">[<ulink url="&perf;/amd64/">revision
	    115</ulink>]</entry>
	  </row>
	  <row>
	    <entry>Alpha EV5.6</entry> <entry
	    align="center">400MHz</entry> <entry
	    align="center">21164A</entry> <entry
	    align="center">[<ulink url="&perf;/alpha/">revision
	    115</ulink>]</entry>
	  </row>
	  <row>
	    <entry>HP 9000/785</entry> <entry
	    align="center">875MHz</entry> <entry
	    align="center">J6750</entry> <entry align="center">[<ulink
	    url="&perf;/hppa/">revision 115</ulink>]</entry>
	  </row>
	  <row>
	    <entry>UltraSPARC-III</entry> <entry
	    align="center">750MHz</entry> <entry
	    align="center">V9</entry> <entry align="center">[<ulink
	    url="&perf;/sparc/">revision 115</ulink>]</entry>
	  </row>
	</tbody>
      </tgroup>
    </table>

  </section>

  <!-- ==================================================================== -->

  <section id="faq">
    <title>Frequently Asked Questions</title>

    <para>Send your questions to
    <email>estreamtesting@ecrypt.eu.org</email>. Frequently asked
    questions will be added to this section.</para>

    <!--
    <qandaset>
      <qandaentry>
	<question>
	  <para>Question One</para>
	</question>
	<answer>
	  <para>Answer One</para>
	</answer>
      </qandaentry>
    </qandaset>
    -->
  </section>

  <!-- ==================================================================== -->

  <section id="moreinfo">
    <title>Further Information</title>
    
    <para>For further information about the eSTREAM project, please
    visit the eSTREAM <ulink
    url="http://www.ecrypt.eu.org/stream/">webpage</ulink> and the
    <ulink url="http://www.ecrypt.eu.org/stream/phorum/">discussion
    forum</ulink>.</para>
  </section>

  <!-- ==================================================================== -->

  <bibliography>
    <title>References</title>

    <biblioentry id="JTC-003">
      <corpauthor>Agilent Technologies</corpauthor>
      <title>JTC 003 Mixed Packet Size Throughput</title>
      <bibliosource><ulink url="http://advanced.comms.agilent.com/n2x/docs/journal/JTC_003.html"/></bibliosource>
    </biblioentry>

  </bibliography>

  <!-- ==================================================================== -->

</article>

