Control characters



         


In computing, a control character or non-printing character, is a code point (a number) in a character set that does not, in itself, represent a written symbol. All entries in the ASCII table below 32 are of this kind, including BEL (which is intended to cause an audible signal in the receiving terminal), SYN (which is a synchronization signal), and ENQ (a signal that is intended to trigger a response at the receiving end, to see if it is still present). The Unicode standard has added many new non-printing characters, for example the Zero-Width Non-Joiner.

[Top]

In ASCII

The control characters in ASCII still in common use include

Occasionally one might encounter modern uses of other codes such as code 4 (End of transmission) used to end a Unix shell session or PostScript printer transmission.

Code 27 (Escape) is a case worth elaborating. Even though many of these control characters are never used, the concept of sending device-control information intermixed with printable characters is so useful that device makers found a way to send hundreds of device instructions. Specifically, they used a series of multiple characters called a "control sequence" or "escape sequence". Typically code 27 was first sent to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow specifying some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of code 27, followed by the printable characters "[2;10H", would cause a Digital VT-102 terminal to move its cursor to the 10th cell of the 2nd line of the screen. Some standards exist for these sequences, notably ANSI X3.64 (1979), which was based on the behavior of VT-100 series terminals. But the number of non-standard variations in use is large, especially among printers, where technology has advanced far faster than any standards body can possibly follow.

[Top]

How control characters map to keyboards

ASCII-based keyboards have a key labelled "Control" or "Ctrl" (sometimes referred to as "Cntl"), which is used much like a shift key, being depressed in combination with another letter or symbol key to cause the keyboard to generate one of these 32 control codes. The keyboard produces the code 64 places below the code for the uppercase letter pressed (basically, it clears bit 5 to zero). Pressing "control" and the letter "G" (code 71), for example, would produce the code 7 (Bell). Keyboards also have single keys that produce codes in this range. For example, the key labelled "Backspace" typically produces code 8, "Tab" code 9, "Enter" or "Return" code 13 (though some keyboards might produce code 10 for "Enter").

Modern keyboards have many keys that do not correspond to ASCII characters or control characters, for example cursor control arrows and word processing functions. These keyboards communicate these keys to the attached computer by one of three methods: appropriating some otherwise unused control character for the new use, using some encoding other than ASCII, or using multi-character control sequences. Keyboards attached to stand-alone personal computers typically use one (or both) of the first two methods. "Dumb" computer terminals typically use control sequences.

[Top]

The design purpose

The control characters were designed fall into a few groups: printing control, data structuring, transmission control, and miscellaneous.

[Top]

Printing control

Printing control characters tell where to put the next character. Carriage return says to put the character at the edge of the paper at which writing begins (it may or may not also move to the next line). Line feed indicates to put the next character at the next line in the direction new lines occur (and may or may not also move to the beginning of the line). Vertical and horizontal tab request the printer to move the print head to the next tab stop in the direction of reading. Form feed starts a new sheet of paper. Shift in and shift out were to select alternate character sets, fonts, underlining or other printing modes. Backspace moves the next position one character backwards, so the printer can overprint characters to make special characters.

[Top]

Data structuring

The separators (group, record, etc) were made to structure data, usually on a tape, in order to simulate punch cards. End of media warns that the tape (or whatever) is ending.

[Top]

Transmission control

The transmission control characters were intended to structure a data packet and control when to retransmit it if it has an error.

The start of header was to mark the non-data section of a data packet--the part of a message with addresses and other housekeeping data. The start-of-text marked the end of the header, and the start of the text. End-of-text marked the end of the data of a message. A standard convention is to make the two characters preceding the end of text the checksum or CRC of the message.

Escape was supposed to preface a binary value in a message that might otherwise be interpreted as a control character. For example, the value for binary 27 would be Escape Escape.

Substitute was intended to request a translation of the next character from a printable character to a binary value, usually by setting bit 5 to zero. This is handy because some transmission media (such as sheets of paper produced by typewriters) only transmit printable characters.

Cancel would stop a transmission of a packet. Negative acknowledge requests a retransmission of a packet. Acknowledge indicates that a transmission was received correctly.

When a transmission medium is half duplex (that is, it can only transmit in one direction at a time), there is usually a master station that can transmit at any time, and one or more slave stations that transmit when they have permission. Enquiry is used by a master station to ask a slave station to send its next message. A slave station indicates that it has completed its transmission by sending end of transmission.

The device control codes were originally generic, to be defined differently for each device. However, a universal need in data transmission is to request the sender to stop transmitting when a receiver can't take more data right now. Digital Equipment Corporation invented a convention which used 19, (device control 3, also known as control S, or "X-OFF") to "S"top transmission, and 17, (device control 1, AKA control Q, or "X-ON") to start transmission. This lets manufacturers control the transmission without "transmission control" wires in the data cable. This saves money and makes operation more reliable by reducing the number of connections in a cable.

Data link escape tells the other end of the data link to end a session.

[Top]

Miscellaneous

Many of the ASCII control characters were designed for devices of the time that are not often seen today. For example, code 22, "Synchronous idle", was originally sent by synchronous modems (which have to send data constantly) when there was no actual data to send. (Modern systems typically use a start bit to announce the beginning of a transmitted word.)

Code 0, null, is a special case. In paper tape, it is the case when there are no holes. It's convenient to treat this as a non-existent character.

Code 127 is likewise a special case. Its code is all-bits-on in binary, which made it easy to erase a section of paper tape, a common storage medium of the day, by punching all the holes. Paper tape became obsolete quickly, so this feature was almost never used.

But because its code is in the range occupied by other printable characters, many computers used it as an additional printable character (often an all-black "box" character useful for erasing text by overprinting).

[Top]

Tables

Seven-bit ASCII defines 33 codes, 0 through 31 and 127, as control characters.

Control Characters in US-ASCII
DecHexAbbrCharacter name
000x00NULNull
010x01SOHStart of Heading
020x02STXStart of Text
030x03ETXEnd of Text
040x04EOTEnd of Transmission
050x05ENQEnquiry
060x06ACKAcknowledge
070x07BELBell
080x08BS Backspace
090x09HT Horizontal Tab
100x0ALF Line Feed
110x0BVT Vertical Tab
120x0CFF Form Feed
130x0DCR Carriage Return
140x0ESO Shift Out
150x0FSI Shift In
160x10DLEData Link Escape
170x11DC1Device Control 1
180x12DC2Device Control 2
190x13DC3Device Control 3
200x14DC4Device Control 4
210x15NAKNegative Acknowledge
220x16SYNSynchronous Idle
230x17ETBEnd of Transmission Block
240x18CANCancel
250x19EM End of Medium
260x1ASUBSubstitute
270x1BESCEscape
280x1CFS File Separator
290x1DGS Group Separator
300x1ERS Record Separator
310x1FUS Unit Separator
1270x7FDELRubout/Delete


The compatible 8-bit ISO-8859-1 additionally maps the 32 codes from position 128 through 159, which are unused in ISO/IEC 8859-1, to control characters.

Control Characters in ISO-8859-*
DecHexAbbrCharacter name
1280x80PADPadding Character
1290x81HOPHigh Octet Preset
1300x82BPHBreak Permitted Here
1310x83NBHNo Break Here
1320x84INDIndex
1330x85NELNext Line
1340x86SSAStart of Selected Area
1350x87ESAEnd of Selected Area
1360x88HTSHorizontal Tab Set
1370x89HTJHorizontal Tab Justified
1380x8AVTSVertical Tab Set
1390x8BPLDPartial Line Forward
1400x8CPLUPartial Line Backward
1410x8DRI Reverse Line Feed
1420x8ESS2Single-Shift 2
1430x8FSS3Single-Shift 3
1440x90DCSDevice Control String
1450x91PU1Private Use 1
1460x92PU2Private Use 2
1470x93STSSet Transmit State
1480x94CCHCancel Character
1490x95MW Message Waiting
1500x96SPAStart of Protected Area
1510x97EPAEnd of Protected Area
1520x98SOSStart of String
1530x99SGCISingle Graphic Char Intro
1540x9ASCISingle Char Intro
1550x9BCSIControl Sequence Intro
1560x9CST String Terminator
1570x9DOSCOS Command
1580x9EPM Private Message
1590x9FAPCApp Program Command


[Top]

See also






  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License