hjp: doc: RFC 2066: TELNET CHARSET Option

Rfc	2066
Title	TELNET CHARSET Option
Author	R. Gellens
Date	January 1997
Format:	TXT, HTML
Status:	EXPERIMENTAL

Network Working Group                                         R. Gellens
Request for Comments: 2066                                        Unisys
Category: Experimental                                      January 1997

                         TELNET CHARSET Option

Status of this Memo

   This memo defines an Experimental Protocol for the Internet
   community.  This memo does not specify an Internet standard of any
   kind.  Discussion and suggestions for improvement are requested.
   Distribution of this memo is unlimited.

Abstract

   This document specifies a mechanism for passing character set and
   translation information between a TELNET client and server.  Use of
   this mechanism enables an application used by a TELNET user to send
   and receive data in the correct character set.

   Either side can (subject to option negotiation) at any time request
   that a (new) character set be used.

1.   Command Names and Codes

   CHARSET.......................42

      REQUEST ....................01
      ACCEPTED ...................02
      REJECTED ...................03
      TTABLE-IS ..................04
      TTABLE-REJECTED ............05
      TTABLE-ACK .................06
      TTABLE-NAK .................07

   As a convenience, standard TELNET text and codes for commands used in
   this document are reproduced here (excerpted from [1]):

      All TELNET commands consist of at least a two byte sequence:  the
      "Interpret as Command" (IAC) escape character followed by the code
      for the command.  The commands dealing with option negotiation are
      three byte sequences, the third byte being the code for the option
      referenced. ... [O]nly the IAC need be doubled to be sent as data,
      and the other 255 codes may be passed transparently.  The
      following are [some of] the defined TELNET commands.  Note that
      these codes and code sequences have the indicated meaning only
      when immediately preceded by an IAC.

Gellens                       Experimental                      [Page 1]

RFC 2066                 TELNET CHARSET Option              January 1997

      NAME          CODE  MEANING

      SE            240   End of subnegotiation parameters.

      SB            250   Indicates that what follows is
                          subnegotiation of the indicated
                          option.

      WILL          251   Indicates the desire to begin
                          performing, or confirmation that
                          you are now performing, the
                          indicated option.

      WON'T         252   Indicates the refusal to perform,
                          or continue performing, the
                          indicated option.

      DO            253   Indicates the request that the
                          other party perform, or
                          confirmation that you are expecting
                          the other party to perform, the
                          indicated option.

      DON'T         254   Indicates the demand that the other
                          party stop performing, or
                          confirmation that you are no longer
                          expecting the other party to
                          perform, the indicated option.

      IAC          255    Data Byte 255.

2.   Command Meanings

   A very simple meta-syntax is used, where most tokens represent
   previously defined items (such as IAC); angle-brackets ("<>") are
   used for items to be further defined; curly-braces ("{}") are used
   around optional items; ellipses represent repeated sequences of
   items; and quotes are used for literal strings.

   IAC WILL CHARSET
     The sender REQUESTS permission to, or AGREES to, use
     CHARSET option subnegotiation to choose a character set.

   IAC WON'T CHARSET
      The sender REFUSES to use CHARSET option subnegotiation
      to choose a character set.

Gellens                       Experimental                      [Page 2]

RFC 2066                 TELNET CHARSET Option              January 1997

    IAC DO CHARSET
      The sender REQUESTS that, or AGREES to have, the other
      side use CHARSET option subnegotiation to choose a
      character set.

   IAC DON'T CHARSET
      The sender DEMANDS that the other side not use the
      CHARSET option subnegotiation.

   IAC SB CHARSET REQUEST { "[TTABLE ]" <Version> } <char set
   list> IAC SE

      Char set list:

      <sep> <character set> { ... <sep> <character set> }

   This message initiates a new CHARSET subnegotiation.  It can only be
   sent by a side that has received a DO CHARSET message and sent a WILL
   CHARSET message (in either order).

   The sender requests that all text sent to and by it be encoded in one
   of the specified character sets.

   If the string [TTABLE] appears, the sender is willing to accept a
   mapping (translation table) between any character set listed in <char
   set list> and any character set desired by the receiver.

   <Version>  is an octet whose binary value is the highest version
   level of the TTABLE-IS message which can be sent in response.  This
   field must not be zero.  See the TTABLE-IS message for the permitted
   version values.

   <Char set list>  is a sequence of 7-BIT ASCII printable characters.
   The first octet defines the separator character (which must not
   appear within any character set).  It is terminated by the IAC SE
   sequence.  Case is not significant.  It consists of one or more
   character sets.  The character sets should appear in order of
   preference (most preferred first).

   <Sep>  is a separator octet, the value of which is chosen by the
   sender.  Examples include a space or a semicolon.  Any value other
   than IAC is allowed.  The obvious choice is a space or any other
   punctuation symbol which does not appear in any of the character set
   names.

Gellens                       Experimental                      [Page 3]

RFC 2066                 TELNET CHARSET Option              January 1997

   <Character set>  is a sequence of 7-BIT ASCII printable characters.
   Case is not significant.

   If a requested character set name does not start with "X-" or "x-",
   it MUST be registered with the Internet Assigned Number Authority
   (IANA) [2].

   The receiver responds in one of four ways:

   If the receiver is already sending text to and expecting text from
   the sender to be encoded in one of the specified character sets, it
   sends a positive acknowledgment (CHARSET ACCEPTED); it MUST NOT
   ignore the message.  (Although ignoring the message is perhaps
   suggested by some interpretations of the relevant RFCs ([1], [3]), in
   the interests of determinacy it is not permitted.  This ensures that
   the issuer does not need to time out and infer a response, while
   avoiding (because there is no response to a positive acknowledgment)
   the non-terminating subnegotiation which is the rationale in the RFCs
   for the non-response behavior.)

   If the receiver is capable of handling at least one of the specified
   character sets, it can respond with a positive acknowledgment for one
   of the requested character sets.  Normally, it should pick the first
   set it is capable of handling but may choose one based on its own
   preferences.  After doing so, each side MUST encode subsequent text
   in the specified character set.

   If the string [TTABLE] is present, and the receiver prefers to use a
   character set not included in <char set list>, and is capable of
   doing so, it can send a translate table (TTABLE-IS) response.

   If the receiver is not capable of handling any of the specified
   character sets, it sends a negative acknowledgment (CHARSET
   REJECTED).

   Because it is not valid to reply to a CHARSET REQUEST message with
   another CHARSET REQUEST message, if a CHARSET REQUEST message is
   received after sending one, it means that both sides have sent them
   simultaneously.  In this case, the server side MUST issue a negative
   acknowledgment.  The client side MUST respond to the one from the
   server.

   IAC SB CHARSET ACCEPTED <Charset> IAC SE
      This is a positive acknowledgment response to a CHARSET REQUEST
      message; the receiver of the CHARSET REQUEST message acknowledges
      its receipt and accepts the indicated character set.

Gellens                       Experimental                      [Page 4]

RFC 2066                 TELNET CHARSET Option              January 1997

      <Charset> is a character sequence identical to one of the
      character sets in the CHARSET REQUEST message.  It is terminated
      by the IAC SE sequence.

      Text messages which follow this response must now be coded in the
      indicated character set.  This message terminates the current
      CHARSET subnegotiation.

   IAC SB CHARSET REJECTED IAC SE
      This is a negative acknowledgment response to a CHARSET REQUEST
      message; the receiver of the CHARSET REQUEST message acknowledges
      its receipt but refuses to use any of the requested character
      sets.  Messages can not be sent in any of the indicated character
      sets.  This message can also be sent by the sender of a TTABLE-IS
      message, if multiple TTABLE-NAK messages were sent in response.
      This message terminates the current CHARSET subnegotiation.

   IAC SB CHARSET TTABLE-IS <version> <syntax for version> IAC SE
      In response to a CHARSET REQUEST message in which [TTABLE] was
      specified, the receiver of the CHARSET REQUEST message
      acknowledges its receipt and is transmitting a pair of tables
      which define the mapping between specified character sets.

      <Version> is an octet whose binary value is the version level of
      this TTABLE-IS message.  Different versions have different syntax.
      The lowest version level is one (zero is not valid).  The current
      highest version level is also one.  This field is provided so that
      future versions of the TTABLE-SEND message can be specified, for
      example, to handle character sets for which there is no simple
      one-to-one character-for-character translation.  This might
      include some forms of multi-octet character sets for which
      translation algorithms or subsets need to be sent.

   Syntax for Version 1:

      <sep> <char set name 1> <sep> < char size 1> < char count 1> <char
      set name 2> <sep> <char size 2> <char count 2> <map 1> <map 2>

      <Sep>  is a separator octet, the value of which is chosen by the
      sender.  Examples include a space or a semicolon.  Any value other
      than IAC is allowed.  The obvious choice is a space or any other
      punctuation symbol which does not appear in either of the
      character set names.

      <Char set name 1> and <Char set name 2>  are sequences of 7-BIT
      ASCII printable characters which identify the two character sets
      for which a mapping is being specified.  Each is terminated by
      <sep>.  Case is not significant.  If a character set name does not

Gellens                       Experimental                      [Page 5]

RFC 2066                 TELNET CHARSET Option              January 1997

      start with "X-" or "x-", it MUST be registered with IANA.  <Char
      set name 1> MUST be chosen from the <char set list> in the CHARSET
      REQUEST message.  <Char set name 2> can be arbitrarily chosen.
      Text on the wire MUST be encoded using <char set name 2>.

      <Char size 1>  and <char size 2>  are single octets each.  The
      binary value of the  octet is the number of bits nominally
      required for each character in the corresponding table.  It SHOULD
      be a multiple of eight.

      <Char count 1> and <char count 2>  are each three-octet binary
      fields in Network Byte Order [6].  Each specifies how many
      characters (of the maximum 2**<char size>) are being transmitted
      in the corresponding map.

      <Map1> and <Map 2>  each consist of the corresponding <char count>
      number of characters.  These characters form a mapping from all or
      part of the characters in one of the specified character sets to
      the correct characters in the other character set.  If the
      indicated <char count> is less than  2**<char size>, the first
      <char count> characters are being mapped, and the remaining
      characters are assumed to not be changed (and thus map to
      themselves).  That is, each map contains characters 0 through
      <char count> -1.  <Map 1> maps from <char set name 1> to <char set
      name 2>.  <Map 2> maps from <char set name 2> to <char set name
      1>.  Translation between the character sets is thus an obvious
      process of using the binary value of a character as an index into
      the appropriate map.  The character at that index replaces the
      original character.  If the index exceeds the <char count> for the
      map, no translation is performed for the character.

      [Note to implementers: since TELNET works in octets, it is
      possible for octets of value 255 to appear "spontaneously" when
      using multi-octet or non-8-bit characters.  All octets of value
      255 (other than IAC) MUST be quoted to conform with TELNET
      requirements.  This applies even to octets within a table, or text
      in a multi-octet character set.]

   IAC SB CHARSET TTABLE-ACK IAC SE
      The sender acknowledges the successful receipt of the translate
      table.  Text messages which follow this response must now be coded
      in the character set specified as <char set name 2> of the
      TTABLE-IS message.  This message terminates the current CHARSET
      subnegotiation.

Gellens                       Experimental                      [Page 6]

RFC 2066                 TELNET CHARSET Option              January 1997

   IAC SB CHARSET TTABLE-NAK IAC SE
      The sender reports the unsuccessful receipt of the translate table
      and requests that it be resent.  If subsequent transmission
      attempts also fail, a TTABLE-REJECTED or CHARSET REJECTED message
      (depending on which side sends it) should be sent instead of
      additional futile TTABLE-IS and TTABLE-NAK messages.

   IAC SB CHARSET TTABLE-REJECTED IAC SE
      In response to a TTABLE-IS message, the receiver of the TTABLE-IS
      message acknowledges its receipt and indicates it is unable to
      handle it.  This message terminates the current CHARSET
      subnegotiation.

      Any system which supports the CHARSET option MUST fully support
      the CHARSET REQUEST, ACCEPTED, REJECTED, and TTABLE-REJECTED
      subnegotiation messages.  It MAY optionally fully support the
      TTABLE-IS, TTABLE-ACK, and TTABLE-NAK messages.  If it does fully
      support the TTABLE-IS message, it MUST also fully support the
      TTABLE-ACK and TTABLE-NAK messages.

3.   Default

   WON'T CHARSET

   DON'T CHARSET

4.   Motivation for the Option

   Many TELNET sessions need to transmit data which is not in 7-bit
   ASCII.  This is usually done by negotiating BINARY, and using local
   conventions (or terminal type kluges) to determine the character set
   of the data.  However, such methods tend not to interoperate well,
   and have difficulties when multiple character sets need to be
   supported by different sessions.

   Many computer systems now utilize a variety of character sets.
   Increasingly, a server computer needs to document character sets or
   translate transmissions and receptions using different pairs of
   character sets on a per-application or per-connection basis.  This is
   becoming more common as client and server computers become more
   geographically disperse.  (And as servers are consolidated into
   ever-larger hubs, serving ever-wider areas.)  In order for files,
   databases, etc. to contain correct data, the server must determine
   the character set in which the user is sending, and the character set
   in which the application expects to receive.

Gellens                       Experimental                      [Page 7]

RFC 2066                 TELNET CHARSET Option              January 1997

   In some cases, it is sufficient to determine the character set of the
   end user (because every application on the server expects to use the
   same character set, or because applications can handle the user's
   character set), but in other cases different server applications
   expect to use different character sets.  In the former case, an
   initial CHARSET subnegotiation suffices.  In the latter case, the
   server may need to initiate additional CHARSET subnegotiations as the
   user switches between applications.

   At a minimum, the option described in this memo allows both sides to
   be clear as to which character set is being used.  A minimal
   implementation would have the server send DO CHARSET, and the client
   send WILL CHARSET and CHARSET REQUEST.  The server could then
   communicate the client's character set to applications using whatever
   means are appropriate.  Such a server might refuse subsequent CHARSET
   REQUEST messages from the client (if it lacked the ability to
   communicate changed character set information to applications, for
   example).  Another system might have a method whereby various
   applications could communicate to the TELNET server their character
   set needs and abilities, which the server would handle by initiating
   new CHARSET REQUEST negotiations as appropriate.

   In some cases, servers may have a large set of clients which tend to
   connect often (such as daily) and over a long period of time (such as
   years).  The server administrators may strongly prefer that the
   servers not do character set translation (to save CPU cycles when
   serving very large numbers of users).  To avoid manually configuring
   each copy of the user TELNET software, the administrators might
   prefer that the software supports translate tables.  (If the client
   software received a translate table from the server and stored it,
   the table would only need to be sent once.)

5.   Description of the Option

   When the client TELNET program is able to determine the user's
   character set it should offer to specify the character set by sending
   IAC WILL CHARSET.

   If the server system is able to make use of this information, it
   replies with IAC DO CHARSET.  The client TELNET is then free to
   request a character set in a subnegotiation at any time.

   Likewise, when the server is able to determine the expected character
   set(s) of the user's application(s), it should send  IAC DO CHARSET
   to request that the client system specify the character set it is
   using.  Or the server could send IAC WILL CHARSET to offer to specify
   the character sets.

Gellens                       Experimental                      [Page 8]

RFC 2066                 TELNET CHARSET Option              January 1997

   Once a character set has been determined, the server can either
   perform the translation between the user and application character
   sets itself, or request by additional CHARSET subnegotiations that
   the client system do so.

   Once it has been established that both sides are capable of character
   set negotiation (that is, each side has received either a WILL
   CHARSET or a DO CHARSET message, and has also sent either a DO
   CHARSET or a WILL CHARSET message), subnegotiations can be requested
   at any time by whichever side has sent a WILL CHARSET message and
   also received a DO CHARSET message (this may be either or both
   sides).  Once a CHARSET subnegotiation has started, it must be
   completed before additional CHARSET subnegotiations can be started
   (there must never be more than one CHARSET subnegotiation active at
   any given time).  When a subnegotiation has completed, additional
   subnegotiations can be started at any time.

   If either side violates this rule and attempts to start a CHARSET
   subnegotiation while one is already active, the other side MUST
   reject the new subnegotiation by sending a CHARSET REJECTED message.

   Receipt of a CHARSET REJECTED or TTABLE-REJECTED message terminates
   the subnegotiation, leaving the character set unchanged.  Receipt of
   a CHARSET ACCEPTED or TTABLE-ACK message terminates the
   subnegotiation, with the new character set in force.

   In some cases, both the server and the client systems are able to
   perform translations and to send and receive in the character set(s)
   expected by the other side.  In such cases, either side can request
   that the other use the character set it prefers.  When both sides
   simultaneously make such a request (send CHARSET REQUEST messages),
   the server MUST reject the client's request by sending a CHARSET
   REJECTED message.  The client system MUST respond to the server's
   request.  (See the CHARSET REQUEST description, above.)

   When the client system makes the request first, and the server is
   able to handle the requested character set(s), but prefers that the
   client system instead use the server's (user application) character
   set, it may reject the request, and issue a CHARSET REQUEST of its
   own.  If the client system is unable to comply with the server's
   preference and issues a CHARSET REJECTED message, the server can
   issue a new CHARSET REQUEST message for one of the previous character
   sets (one of those which the client system originally requested).
   The client system would obviously accept this character set.

   While a CHARSET subnegotiation is in progress, data SHOULD be queued.
   Once the CHARSET subnegotiation has terminated, the data can be sent
   (in the correct character set).

Gellens                       Experimental                      [Page 9]

RFC 2066                 TELNET CHARSET Option              January 1997

   Note that regardless of CHARSET negotiation, translation only applies
   to text (not commands), and only occurs when in BINARY mode [4].  If
   not in BINARY mode, all data is assumed to be in NVT ASCII [1].

   Also note that the CHARSET option should be used with the END OF
   RECORD option [5] for block-mode terminals in order to be clear on
   what character represents the end of each record.

   As an example of character set negotiation, consider a user on a
   workstation using TELNET to communicate with a server.  In this
   example, the workstation normally uses the Cyrillic (ASCII) character
   set [2] but is capable of using EBCDIC-Cyrillic [2], and the server
   normally uses EBCDIC-Cyrillic.  The server could handle the (ASCII)
   Cyrillic character set, but prefers that instead the client system
   uses the EBCDIC-Cyrillic character set.  (This and the following
   examples do not show the full syntax of the subnegotiation messages.)

                 CLIENT                        SERVER

             WILL CHARSET                   WILL CHARSET

             DO CHARSET                     DO CHARSET

             CHARSET REQUEST Cyrillic
                 EBCDIC-Cyrillic

                                            CHARSET ACCEPTED EBCDIC-
                                               Cyrillic

Gellens                       Experimental                     [Page 10]

RFC 2066                 TELNET CHARSET Option              January 1997

   Now consider a case where the workstation can't handle EBCDIC-
   Cyrillic, but can accept a translate table:

                 CLIENT                        SERVER

              WILL CHARSET                   WILL CHARSET

              DO CHARSET                     DO CHARSET

              CHARSET REQUEST [TTABLE] 1
                 Cyrillic

                                             CHARSET TTABLE-IS 1 Cyrillic
                                               EBCDIC-Cyrillic

              CHARSET TTABLE-ACK

   For another example, consider a case similar to the previous case,
   but now the user switches server applications in the middle of the
   session (denoted by ellipses), and the new application requires a
   different character set:

                CLIENT                        SERVER

              WILL CHARSET                   WILL CHARSET

              DO CHARSET                     DO CHARSET

              CHARSET REQUEST [TTABLE] 1
                 Cyrillic EBCDIC-INT

                                             CHARSET TTABLE-IS 1 Cyrillic
                                               EBCDIC-Cyrillic

              CHARSET TTABLE-ACK

              . . .                          . . .

                                             CHARSET REQUEST EBCDIC-INT

              CHARSET ACCEPTED EBCDIC-INT

Gellens                       Experimental                     [Page 11]

RFC 2066                 TELNET CHARSET Option              January 1997

6.   Security Considerations

   Security issues are not discussed in this memo.

7.   References

   [1] Postel, J. and J. Reynolds, "Telnet Protocol
       Specification", STD 8, RFC 854, ISI, May 1983.

   [2] Reynolds, J., and J. Postel, "Assigned Numbers",
       STD 2, RFC 1700, ISI, October 1994.

   [3] Postel, J. and J. Reynolds, "Telnet Option
       Specifications", STD 8, RFC 855, ISI, May 1983.

   [4] Postel, J. and J. Reynolds, "Telnet Binary
       Transmission", STD 27, RFC 856, ISI, May 1983.

   [5] Postel, J., "Telnet End-Of-Record Option", RFC 885,
       ISI, December 1983.

   [6] Postel, J., "Internet Official Protocol Standards",
       STD 1, RFC 1920, IAB, March 1996.

8.   Author's Address

   Randall Gellens
   Unisys Corporation
   25725 Jeronimo Road
   Mail Stop 237
   Mission Viejo, CA  92691
   USA

   Phone:  +1.714.380.6350
   Fax:    +1.714.380.5912
   EMail:  Randy@MV.Unisys.Com

Gellens                       Experimental                     [Page 12]