-*- Mode: Text -*-
;;;>
;;;> *****************************************************************************************
;;;> ** (c) Copyright 1993-1986 Symbolics, Inc.  All rights reserved.
;;;> ** Portions of font library Copyright (c) 1984 Bitstream, Inc.  All Rights Reserved.
;;;>
;;;>    The software, data, and information contained herein are proprietary 
;;;> to, and comprise valuable trade secrets of, Symbolics, Inc., which intends 
;;;> to keep such software, data, and information confidential and to preserve 
;;;> them as trade secrets.  They are given in confidence by Symbolics pursuant 
;;;> to a written license agreement, and may be used, copied, transmitted, and 
;;;> stored only in accordance with the terms of such license.
;;;> 
;;;> Symbolics, Symbolics 3600, Symbolics 3670 (R), Symbolics 3675 (R), Symbolics
;;;> 3630, Symbolics 3640, Symbolics 3645 (R), Symbolics 3650 (R), Symbolics 3653,
;;;> Symbolics 3620 (R), Symbolics 3610 (R), Symbolics Common Lisp (R),
;;;> Symbolics-Lisp (R), Zetalisp (R), Genera (R), Wheels (R), Dynamic Windows (R),
;;;> SmartStore (R), Semanticue (R), Frame-Up (R), Firewall (R), Document Examiner (R),
;;;> Delivery Document Examiner, "Your Next Step in Computing" (R), Ivory, MacIvory,
;;;> MacIvory model 1, MacIvory model 2, MacIvory model 3, XL400, XL1200, XL1201,
;;;> Symbolics UX400S, Symbolics UX1200S, NXP1000, Symbolics C, Symbolics Pascal (R),
;;;> Symbolics Prolog, Symbolics Fortran (R), CLOE (R), CLOE Application Generator,
;;;> CLOE Developer, CLOE Runtime, Common Lisp Developer, Symbolics Concordia, Joshua,
;;;> Statice (R), and Minima are trademarks of Symbolics, Inc.
;;;> 
;;;> RESTRICTED RIGHTS LEGEND
;;;>    Use, duplication, and disclosure by the Government are subject to restrictions 
;;;> as set forth in subdivision (c)(1)(ii) of the Rights in Technical Data and Computer 
;;;> Software Clause at DFAR 52.227-7013.
;;;> 
;;;>      Symbolics, Inc.
;;;>      6 Concord Farms
;;;>      555 Virginia Road
;;;>      Concord, Massachusetts 01742-2727
;;;>      United States of America
;;;>      508-287-1000
;;;>
;;;> *****************************************************************************************
;;;>


This file is an informal description of the internals of the Lisp
Machine TCP implementation.  Where the description in this file and the
code differ, the code takes precedence.

.What

As of January 8, 1984 (and probably for some time to come) the Lisp
Machine implementation of TCP implements everthing in the specification
except (** for complete non-implemtation, * for items that should not
affect functionality):
 ** Security/precedence/compartment.  These are never set on output and
    are ignored on input.  (Actually, the interface to IP doesn't have
    provision for them at this time.)
 *  There is no way for the user to set the urgent pointer on output, so
    it is effectively unimplemented.  All the code except the user
    interface exists.
 *  There is no way for the user to read the input urgent pointer. Also,
    no special action is taken when the urgent flag is set on an
    incoming segment.  The code exists to maintain the urgent pointer.
 *  No special action is taken when an incoming segment contains a push
    flag.  Timely acknowledgements are always sent, so this should never
    be a problem.  There are probably some paths where the push flag may
    get dropped from a segment, in case a user interface ever were
    written.
 *  Adaptive retransmission is not implemented.  The major reason is
    that the algorithm described in the specification is meta-stable.
    If the timer is not reset every time packets are retransmitted
    then it is possible for the retransmit interval to shrink to a
    minimum and never grow, causing highly excessive retransmissions.
    If the timer is not reset when packets are retransmitted then it is
    possible to have the retransmit interval grow to a maximum and cause
    unnecessary delays.  I don't expect many people to understand this;
    it took me several days to realize what was happening.

(Somewhat) optional things that are implemented, in case you were
wondering:
 *  Push flag does get set when a user does a :FORCE-OUTPUT to the
    stream.
 *  Full Silly Window avoidance, both on the input and output sides.
 *  Probing the zero window.
 *  Variable MSS (foreign host's lower limit takes precedence, of
    course).


If TCP finds an internal inconsistency, an error is signalled with a
brief description of the inconsistency.  If the problem is severe, TCP
disables itself and calls ERROR.  The rest of the network system
(including IP) is unaffected.  The user will have to decide whether the
circumstances are severe enough to reset the entire network system.  If
the problem is not severe and TCP can do something reasonable and
moderately harmless, FSIGNAL is called allowing the user to continue.
Please report any protocol implementation bugs; ones that TCP finds and
ones that you may find.


.File organization

There are currently 5 files which contain code to implement TCP.  (IP is
a separate system which has its own files).  These files are

 * TCP-DEFS -- This file contains the definitions of data
	structures, accessors for the structures, functions for doing
	trivial computation on the structures, constants associated with
	the structures, helping functions (e.g., sequence number
	helpers), global variables for keeping track of things or for
	tuning things, debugging support, metering support, interaction
	with IP (which handles most of the interaction with the service
	mechanism).

 * TCP-ERROR -- This file contains the error conditions that TCP can
	signal and the necessary support code.

 * TCP -- This is the actual "NCP."  It interfaces with IP, handles
	packets, TCBs, retransmission, etc.  It does not interact with
	users, that is what TCP-USER does.

 * TCP-USER -- This file contains the user interface to TCP. The
	user/application always deals with a stream; never a TCB.  Most
	messages to streams are handled, as well as most of the messages
	expected of network streams.  In addition, there are a few
	others to support some specific applications (such as TCP-FTP).

 * TCP-DEBUG -- This file contains debugging support.  This includes
	interacting with IP's debugging, interacting with PEEK, packet
	header recording and displaying, and full dumping of TCP state
	(dump-tcp-guts).


.Segments

Segments are art-8b packets (see the document Interfacing to the Network
System) and all slots start with "SEG-".  The definition of the header
of a segment is not completely straightforward.  This is because TCP is
a big-ender protocol (most significant portion of a word is transmitted
first) and the Lisp Machine is a little-ender machine (least significant
portion of a word is transmitted first).  We get around this in the
following way.

Each byte of the header is given a name.  If a TCP value spans more than
one byte (e.g., a port or sequence number) then the name is suffixed
with something appropriate (see the code).  Then, macros are generated
(with define-structure-substs) which allow accessing and setting the
multi-byte TCP values.

In addition to the TCP segment header, other information about the
segment is kept in the array's leader.  Some of this information is for
interaction with IP, some is computed, and some is bookkeeping.  Here is
a slightly more detailed description of the slots than exists in the
code.

    SEG-LINK -- If this segment is on some list/queue/chain
	(e.g., an input or output queue), then this is either NIL
	(meaning the end of the list) or the next segment of the list.
	(This helps avoid consing to keep track of lists of segments.
	This is OK, since a packet can be on at most one queue in the
	Lisp Machine network system.)  If it is not on a list, this slot
	is T.  

	This slot is carefully maintained (perhaps even in a paranoid
	manner).  When a segment is output, this slot determines whether
	the transmitter is to free the packet to the free pool or if it
	is must not free it (there are no other options, and the
        transmitter.  If this slot is T, the packet is not on any list
        and must be freed on transmit.  If it is not T, the packet is on
        a transmit queue and may need to be retransmitted, and
        therefore may not be freed by the network system (it will later
        be freed when the data bytes are ACKed or the TCB is reset).

    SEG-SOURCE-ADDRESS, SEG-DESTINATION-ADDRESS -- These are the
        Internet address (32 bit fixnums on the 3600, arrays on the
        LM-2) which are the addresses of the sender and intended
        destination of the TCP segment, respectively.  These are given
        to TCP when IP sends TCP an input packet, and are given to IP
        when TCP transmits a packet.  Actually, the destination is
        always one of the machine's Internet address since IP is the
        routing layer.

    SEG-START -- This is the byte offset to the start of the data in the
        segment.  This is the byte offset, not the number of 32 bit
        quantities (that is kept in SEG-DATA-OFFSET in the TCP header).
        Each is computed from the other.  This exists to save a few
        cycles.

    SEG-LENGTH -- This is what the spec calls SEG.LEN.  It is the number
        of data bytes in the segment, which does not count the header
        but does count the SYN and FIN control bits.  This is the number
        of bytes occupying sequence space.

    SEG-BOUND -- This is the highest array index (including
        header) that is allowed to be used (exclusive, not inclusive).
        For a received packet, this is the index of the end of the user
        data portion (without control) of the segment.  On output, this
        is the limit to the ammount of data that can be filled into this
        segment (and is controlled by MSS).

    SEG-TIME-TRANSMITTED, SEG-TIMES-TRANSMITTED -- These should be
        rather obvious.  seg-time-transmitted is updated for
        retransmission as well.  The time is the value returned by the
        function TIME, which is a fixnum that ticks at 60 times per
        second.  Currently these are not used very much, but may be used
        more if/when adaptive retransmission is implemented.

    SEG-ALLOCATED -- This is largely for debugging.  When a segment
        enters the realm of TCP, this slot is set to T.  When it is
        known to be leaving, it is set to NIL.  This is an aid to catch
        bugs that are not returning segments and thereby stagnating the
        entire network system.  See the discussion of the packet buffer
        pool in the document Interfacing to the Network System.

    SEG-OUTPUT-STATE -- This is a good trick.  When the user
	requests an output buffer, TCP first checks the last segment on
	the TCB's output queue.  If there is still more room (as
	indicated by SEG-LENGTH and SEG-BOUND), then the segment is kept
	from being freed when an ACK is received.  The user fills in
	more bytes to the segment, and then tries to send the segment.
	At this time, SEG-LENGTH is updated and the segment is
	transmitted (if possible, if not retransmission will take care
	of it).  User TELNET (SUPDUP, etc) probably make the most use of
	this.  Instead of allocating many small packets while the user
	types continuous text, it is often the case the that the same
	packet is getting new data appended to it.  This takes full
	advantage of the overlapping sequence number feature of TCP.

	To implement this, this slot can take on four values:
	    NIL -- not for output.
	    :TCP-ONLY -- the segment is only on the transmit queue of
		the TCB.  When an ACK comes in when is above the last
		sequence number of the segment, the segment may be
		removed from the output queue and returned to the free
		pool.
	    :USER-ONLY -- The user's stream is the sole posessor of the
		packet.
	    :TCP-AND-USER -- This is the fun state.  Here, the segment
		has been sent for transmission at least once, but the
		user has asked for (and received) this part of the
		segment as the output buffer before an ACK has come in
		from the foreigh host.  When the user sends the output
		buffer, the segment fields are updated (usually just the
		length) and the state is changed to :TCP-ONLY.  When an
		ACK comes in for the previously sent sequence numbers
		before the user sends the segment, instead of returning
		the segment to the free pool the state is simply changed
		to :USER-ONLY.

    SEG-SPARE-1 -- This is a spare slot in case we need to implement
	something else but don't want to (or can't) recompile all of
	TCP.

Trick: many places in the code use the fact that seg-syn and seg-fin
return numbers (0 or 1) which just so happens to be the number of bytes
occupied in sequence space by the control flags.  E.g., note the subst
    (defsubst seg-length-data-only (seg)
      (- (seg-length seg) (seg-syn seg) (seg-fin seg)))


.Sequence numbers trick (quick note (Lisp sales pitch?))

One of the great features of Lisp is the macro facility, which can often
make code clearer.  Comparing TCP sequence numbers can serve as an
example.  TCP sequence numbers are compared using modulo 2**32
arithmetic.  The Lisp Machine deals with numbers generically and
(currently) has no instructions for adding, subtracting or comparing
numbers mod 2**32.  (Indeed, 32 bit numbers on the LM-2 are bignums!)
Therefore, macros exists for adding subtracting and comparing TCP
sequence numbers.  Most simple macro factilities can handle this, which
is usually just simple substitution.  But what about something like
	SEG.UNA < SEG.ACK =< SEG.NXT
??  In the TCP implementation, this reads as
	(SEQ-NUM-COMPARE SEG.UNA < SEG.ACK =< SEG.NXT)
which expands into
	(AND (SEQ-NUM-< SEG.UNA SEG.ACK)
	     (SEQ-NUM-=< SEG.ACK SEG.NXT))
which would further expand into machine dependent comparison functions.
The SEQ-NUM-COMPARE macro actually looks at how many forms there are
(some odd number, the odd numbered items are sequence numbers and the
even numbers items are relation operations) and expands into the
appropriate code to model it.

Manipulating the multi-byte values in the TCP header could be another
example.


.TCBs

A Lisp Machine TCP connection is in two pieces.  One is the so called
"Transmission Control Block" (TCB).  This is the object that the
internal network control code looks at.  The other piece is a stream.
The user always sees the stream.  The two are separate to impose a
modularity boundary between the user (stream) and the system (tcb).

TCBs are implement as arrays and are declared with DEFSTRUCT.  They are
reusable so some care must be taken to make sure a TCB is not used by
two people.  One example is that errors must copy out information from a
TCB since the TCB may be reused by the time the error reports.  Each
slot of a TCB begins with "TCB-".

There are some interesting divergences from the specification on
maintaining a TCB, often for simplicity and clarity.  (Postel may adopt
some of these ideas, he seemed to express some interest at one time.)
One departure is not using computations such as RCV.NXT+RCV.WND but
instead maintaing what would be called RCV.LIM.  RCV.LIM is always an
exclusive number and other computations from the specification are
converted to exclusive arithmetic.  For example, the segment-in-window
computation in the specification reads as
	   RCV.NXT =< SEG.SEQ            < RCV.NXT+RCV.WND
	or RCV.NXT =< SEG.SEQ+SEG.LEN-1  < RCV.NXT+RCV.WND
but the Lisp Machine implements this as
	   RCV.NXT =< SEG.SEQ  < RCV.LIM
	or RCV.NXT =< SEG.LIM =< RCV.LIM

Another advantage of this is that RCV.WND does not need to be
laboriously maintained.  Instead, when a segment is sent, SEG.WND is
simply RCV.LIM-RCV.NXT.  Using limits also removes the SND.WL1 and
SND.WL2 tcb variables mentioned in the specification.  Silly window
avoidance is also made much simpler with limits.

Details of the slots of a TCB:

    TCB-STATE, TCB-SUBSTATE -- TCB-STATE is a keyword which
	describes the major state of the connection and corresponds to
	the states in the specification, namely one of :CLOSED, :LISTEN,
	:SYN-SENT, :SYN-RECEIVED, :ESTABLISHED, :CLOSE-WAIT,
	:LAST-ACKED, :FIN-WAIT-1, :FIN-WAIT-2, :CLOSING and :TIME-WAIT.
	Sometimes the major state is not quite descriptive enough for
	the Lisp Machine, so a substate is needed.  The currently
	implemented substates are For :CLOSED --
		    :INACTIVE
		    :RESET
		    :ABORTED
		    :TIMEOUT
		    :IMPLEMENTATION-ERROR
		    :SYN-OUT-OF-WINDOW
		    :DATA-WHILE-CLOSING
	    For :LISTEN --
		    :PASSIVE
		    :DONT-AUTO-SYN
	    For :ESTABLISHED, :SYN-SENT, SYN-RECEIVED, :CLOSE-WAIT
		    :PASSIVE
		    :ACTIVE
		    :CLOSING

	TCB-PROTOCOL -- This is the TCP-PROTOCOL instance that
	    this TCB uses to communicate with IP.
	TCB-NETWORK -- This is the IP-PROTOCOL instance that is
	    the transport layer for packets.  This is NOT the
	    INTERNET-NETWORK instance used for addresses.
	TCB-ROUTE -- This is a structure used by the IP-PROTOCOLs
	    for efficient routing of packets.

	TCB-SERVER-DESCRIPTION -- If this TCB is the result of
	    server creation and the service is declared to be
	    noteworthy, then this is set to a structure returned
	    by NETI:NOTE-SERVER-ESTABLISHED (see Interfacing to
	    the Network System).

	TCB-2MSL-TIMER -- TCP goes through great lengths to make
	    sure connections close synchronously and to make it
	    nearly impossible to reuse an old connection.  Part
	    of this is to have a :TIME-WAIT state which is a
	    placeholder for the connection for 2 Maximum Segment
	    Lifetimes (2MSL).  When a TCB enters the :TIME-WAIT
	    state this slot is set to the current time and the
	    background removes it after 2MSLs.

	TCB-LOCAL-ADDRESS -- is the 32 bit Internet address of
	    the local side of the connection.  On the 3600 this
	    is kept in a fixnum; on the LM-2 an array.
	TCB-LOCAL-PORT -- is the port number of the local side of
	    the connection.
	TCB-LOCAL-WINDOW-SIZE -- is the number of bytes of
	    buffering for receiving segments.  This is not always
	    RCV.WND.  When the receive limit (RCV.LIM) is updated
	    (with an eye out to avoid silly window) RCV.LIM is
	    set to RCV.NXT + LOCAL-WINDOW-SIZE.

	TCB-FOREIGN-HOST -- is the host object of the machine
	    owning the foreign end of the connection.  This may
	    be NIL if it has not been computed yet, since it is
	    seldom needed for the operaton of TCP.
	TCB-FOREIGN-ADDRESS -- is the 32 bit Internet address of
	    the machine owning the foreign end of the connection.
	TCB-FOREIGN-PORT -- is the port number of the foreign
	    side of the connection.
	TCB-MAX-SEG-SIZE -- is the maximum number of bytes the
	    Lisp Machine may send in one packet to the other end
	    of the connection.  This is actually the MIN of what
	    the other end declared and what the Lisp Machine is
	    comfortable sending.

	TCB-READ-SEGS, TCB-READ-SEGS-LAST, TCB-READ-SEGS-LENGTH -- These
	    implement the in order input queue.  TCB-READ-SEGS is either
	    NIL or a segment containing the next expected byte(s),
	    successive segments are fby chaining through SEG-LINK.
	    TCB-READ-SEGS-LAST is the last segment on the chain and is used
	    for quick insertion of a new segment to the end of the chain.
	    TCB-READ-SEGS-LENGTH is simply the length of the chain.
	TCB-RECEIVED-SEGS -- is the chain of out of order segments.  This
	    chain is kept sorted to make sequence number testing, removal,
	    splicing and end-insertion easier.
	TCB-INITIAL-RECEIVE-SEQ -- (called IRS in the specification) is the
	    sequence number of the foreign host's SYN.

	TCB-SEQ-NUM-READ -- is the sequence number that has already been
	    delivered to the user.
	TCB-SEQ-NUM-RECEIVED -- This is what the specification calls
	    RCV.NXT.  It is the next sequence number the connection is
	    expecting.
	TCB-SEQ-NUM-ACKED -- is the highest sequence number for which this
	    the Lisp Machine has sent an ACK.  This usually has the same
	    value as TCB-SEQ-NUM-RECEIVED; if it doesn't, and ACK should be
	    sent in a timely manner.
	TCB-SEQ-NUM-LIMIT -- is the highest sequence number the other side
	    of the connection is allowed to send.  RCV.WND is
	    TCB-SEQ-NUM-LIMIT - TCB-SEQ-NUM-RECEIVED, but keeping it as a
	    limit has several advantages.
	TCB-ADVANCE-WINDOW-THRESHOLD -- This implements the receive part of
	    silly window avoidance.  TCB-SEQ-NUM-LIMIT is not updated when
	    TCB-SEQ-NUM-RECEIVED is updated, as that would imply infinite
	    buffer space since the user has not necessarily read out any of
	    the data.  Nor is TCB-SEQ-NUM-LIMIT updated when
	    TCB-SEQ-NUM-RECEIVED is updated, since that could lead to silly
	    window.  Instead, when TCB-SEQ-NUM-READ advances beyond
	    TCB-ADVANCE-WINDOW-THRESHOLD, then enough bytes have been read
	    out that it is reasonable to open the window.
	    TCB-SEQ-NUM-LIMIT is set to TCB-SEQ-NUM-READ +
	    TCB-LOCAL-WINDOW-SIZE and TCB-ADVANCE-WINDOW-THRESHOLD is set
	    to TCB-SEQ-NUM-READ + (3/8 * TCB-LOCAL-WINDOW-SIZE).  Thus,
	    when the window is opened it is always opened by at least 3/8
	    of the local window size.
	TCB-READ-URGENT-POINTER -- is either NIL or the sequence number of
	    the highest urgent pointer that has been received but not yet
	    read by the user.
	TCB-TIME-LAST-RECEIVED -- is used to timeout connections.  If there
	    is outstanding send segments, then the data in them should be
	    acknowledged in a timely manner.  If they aren't, then the
	    connection is declared to be inoperative.  It is not sufficient
	    to receive a valid ack; the ack must be useful.  There is an
	    exception to this: it is OK not to acknowledge data only when
	    the data is the single byte probing a zero send window.  In
	    this case, the acknowledged thus received is useful in
	    declaring the connection still alive.

	TCB-SEND-SEGS, TCB-SEND-SEGS-LAST, TCB-SEND-SEGS-LENGTH --
	    implement the output (and retransmission) segment chain.  These
	    are similar to TCB-READ-SEGS et al.
	TCB-INITIAL-SEND-SEQ -- s the sequence number of the SYN that was
	    sent for this connection.  The spec calls this ISS.