Aztec Barcode Symbology Specification Rev 3.0
(есть русский перевод)
http://dcd.welchallyn.com/techover/dcdwhite.htm
©1999 HandHeld Products Data Collection Inc. All rights reserved.
I. Introduction
II. Symbol Structure
II. A. Graphical Symbol Structure
II. B. Mathematical Symbol Structure
II. C. Message Encodation
III. SMALL AZTEC CODE
III. A. GRAPHICAL SMALL AZTEC SYMBOL STRUCTURE
III. B. MATHEMATICAL SMALL AZTEC SYMBOL STRUCTURE
III. C. SMALL AZTEC MESSAGE ENCODATION
IV. Ordered Append Protocol
Aztec code is a 2D matrix barcode symbology designed to combine the best characteristics of several 1st generation symbologies, with special attention paid to ease of printing, ease of finding in any orientation, allowance for field distortion, high data security with user-selected redundancy, and efficient storage over the range from small to large data messages. Aztec symbols are always a square array of square cells with a square «bullseye» at the center. Figure 1 shows an Aztec code symbol encoding 109 characters of text with 50% error correction.
32 sizes of Aztec symbol are defined. With a cell spacing of «x», the symbols range from as small as 19x square, which can encode up to 12 characters of data, to as large as 151x square, capable of encoding over 3000 characters. Reed-Solomon error control encoding provides a user-selected level of data security and robustness.
The structure of an Aztec Code symbol is best understood by viewing it on three different levels: the graphical level, the mathematical level, and finally the message level. The graphical level involves the actual physical arrangement of dark and light cells which make up the «printed» symbol and their defined relationship to an encoded data stream. The mathematical level entails error control encoding which is applied to that data stream both for data security and for recovering from symbol damage or scanning errors. The message level is a set of rules that unambiguously translates between the data stream and the ASCII or 8-bit message string that the symbol is intended to represent. This data stream takes on distinct and well-defined forms in passing from level to level.
When encoding, one generally starts with a desired message and steps from message encoding to mathematical/error encoding to graphical encoding. When decoding the process is reversed, starting with a stored image and stepping from graphical decoding to mathematical/error decoding to message decoding.

An Aztec Code symbol is constructed on a grid of nominally square cells and is always square in overall extent. The graphical structure consists of several fixed patterns and a couple of variable or data fields. Figure 2 shows just the fixed patterns which comprise three basic symbol components: the finder, orientation bits, and a reference grid.
At the symbol's center as «finder» is a square 7-level «bullseye» pattern, easily found in a 2D image by scans for topological connectivity, then useful both for pinpointing the exact center and for determining the main axes and local x-dimensions. On the four corners of the finder are 3-bit clusters of orientation bits, some black and some white, which allow the symbol's orientation (and even possible mirror imaging) to be quickly determined. Finally, spreading out from the finder is a reference grid, with horizontal and vertical components at every 16x spacing that extend to (but never run along) the edges of the symbol. These ladder-like structures are useful for detecting and backing out image distortion in larger symbols.
In Figure 3, the two variable fields are also shown. First, the cells in the layer immediately adjoining the finder (other than the orientation bits) comprise a 40-bit string, starting upper left and circling clockwise, with 16 Mode Bits and 24 Reed-Solomon check bits using GF(16). With light cells representing binary «0»s and dark cells representing binary «1»s, the first 5 mode bits encode the symbol's overall size (the # of data «layers» minus 1) and the remaining 11 mode bits encode the number of data words in the encoded message (again minus 1, leaving the remaining data words in the symbol as Reed-Solomon checkwords). The data themselves are then encoded in 2 cell high layers, again starting upper left (at the «L» in «Layer 1» in Figure 3) and spiraling clockwise out to the edge of the symbol. The smallest possible symbol has just one data layer and the largest has 32.
The encoded data and their associated check values are represented in symbol character structures that are N x 2 «bricks», as shown along the right margin of Figure 3, again with light cells representing binary «0»s and dark cells representing binary «1»s. At the graphical level though, it is useful to think of these symbol characters as being composed, in turn, of a sequence of N related «dominos», each 1X wide and 2X tall, with their more significant bit always further from the central finder. It is then a sequence of dominos, representing a sequence of codewords, that spirals from Layer 1 outward to the symbol's edge, skipping over positions occupied by the reference grid pattern. Figure 4 illustrates the sequence and orientation of the dominos when turning the corners in any layer and when transitioning between layers.
The symbol character structures themselves are often irregular in shape at the four corners or non-contiguous when skipping across the reference grid, but they nonetheless represent integral binary codeword values. In Layers #12 and #27, even the dominos themselves are bisected by a reference grid cell, but still each domino's more significant bit is positioned directly «above» its less significant counterpart (that is, directly outward in line from the finder) following the orientations in Figure 4. In the outermost layer of each symbol, at the end of the spiral, there may be up to N-1 dominos left over and they are left blank.
The innermost Data Layer #1 contains 128 data bits and each succeeding layer holds 32 bits more, so the total data bit capacity of an L-layer symbol is given by the formula:
The way in which error correction is applied over these data bits requires that they be broken up into 6, 8, 10, or 12-bit symbol characters depending on symbol size:
- 1-2 layer symbols => 6-bit codewords
- 3-8 layer symbols => 8-bit codewords
- 9-22 layer symbols => 10-bit codewords
- 23-32 layer symbols => 12-bit codewords .
Thus a symbol's capacity in K-bit codewords becomes
- Cw = (Cb div K) where div is an integer divide function.
Table 1 presents the physical size and the data bit capacity (# codewords x # bits/codeword) of each of the 32 possible sizes of Aztec Code symbol. Typically 25% or more of a symbol's databits are reserved for error control encoding, and over the remaining message region 5 data bits on average will encode a text character and 4 bits on average will encode a numeric character. Thus a 4-layer (31x31) Aztec Code symbol holds 88 x 8 = 704 data bits, of which about 0.75 x 704 = 528 are available for the message, which can in turn be about 528 / 5 = 105 text or 528 / 4 = 132 numeric characters.
The systematic domino-based layout of Aztec Code simplifies both encoding and decoding at the graphical level. In trying to achieve optimum error correction performance though, two unnatural twists in the domino assignments have been employed. First, as introduced in Figure 3, the symbol character size varies with symbol size. Second, in order to locate detectable erasures towards the perimeter of the symbol, the outward spiral of codewords takes them in reverse order: the «first» word in Layer #1 is the final Reed-Solomon checkword, and so on through to the «last» word in Layer #L which is the start of the encoded message. Null left-over dominos, if needed, still occupy the upper left corner of the symbol, adjacent to the first message codeword.
The codeword-forming rules (see Section II.C) are designed to never create a message codeword of all 0's or all 1's, but the error encoding (Section II.B) tacks on additional codewords of any value. Thus during decoding, an occurrence of those illegal values within the message region (but not within the check region!) can be regarded as a correctable erasure. Decoding errors and symbol damage are expected to occur more towards the edges of the symbol, and by reversing the codeword sequence the message codewords occupy those edge regions.

1. The Mode Message
The mode message encodes 16 significant bits. The first five encode the size of the symbol, actually the number of data layers L less one, in binary. Thus the smallest symbol with just one data layer is signified by «00000» and the largest symbol with 32 data layers is signified by «11111». The next 11 bits encode the length of the message, actually the number of message codewords D less one, with the symbol's remaining codewords used for error control. D should always be at least 5 less than the symbol's capacity Cw and is typically about 75% of the capacity.
As a special case, «menuing» Aztec Code symbols may be desired whose sole purpose is reader initialization and configuration and whose encoded data are never transmitted. Symbols up to 24 layers in size can be flagged as menu symbols by artificially setting the MSB of the message length field, an otherwise improper setting because the symbol's total capacity is below 1024 codewords.
These 16 significant bits are parsed into four 4-bit words, then 6 additional check words are computed by systematic Reed-Solomon encoding over the Galois field GF(16) based on a prime modulus polynomial of x4 + x + 1 (= 19 decimal). The generator polynomial of (x-21)..(x-26) is
- x6 + 7x5 + 9x4 + 3x3 + 12x2 + 10x + 12.
All ten words are then laid in order, from the most significant symbol size bit through to the LSB of the final check word, clockwise around the finder starting in the upper left corner.
2. The Data Message
Reed-Solomon error encoding over the encoded message is provided in different sized Galois fields depending on the symbol size, as specified in Table 2. This novel approach optimizes the «graininess» of the error correction over the large range of symbol sizes possible.
The D codewords which encode the data message are augmented by K = Cw-D checkwords using systematic Reed-Solomon encoding over the designated Galois Field. No pad words are ever inserted to fill out a symbol; instead the generator polynomial is simply expanded out to (x-21)..(x-2K) as needed at the time of printing, providing excess error correction. There are several positive aspects of this unique error coding, the most interesting being that in applications well served by a fixed size of symbol (e.g., automated sortation), then less than maximal data messages will be accompanied by exceptionally high levels of error correction.
The resulting sequence of D message codewords followed by K check codewords, parsed anew into a sequence of 2-bit dominos, is then taken in reverse order(!) and graphically laid spiralling clockwise and outward through the symbol.

The high-level message encoding occurs in two steps. First, following a model introduced by SuperCode, the desired string of ASCII or 8-bit characters is translated into a simple ordered stream of bits. Then, second, that bit stream is laid into codewords of the appropriate size for error correction.
The first step involves taking the message characters in order and, using Table 3 with appropriate shifts and latches, translating them into 5 or 4, or occasionally 8, bit sequences, which taken MSB first form a data bit stream. Message encoding starts in Upper mode, and can be latched (permanently, via UL, LL, ML, PL, or DL) or shifted (for one character only, via US or PS) into the other modes as needed. In Upper, Lower, Mixed, and Punct each character, latch, or shift adds 5 bits to the string of message bits, while in Digit each character, latch, or shift adds just 4 bits to the string.
Binary Shift (BS) is a special case, shifting into a run-length-controlled string of literal 8-bit bytes. Following a BS is a 5-bit binary value: if non-zero, it encodes the number of bytes that follow, but if zero then the -next- 11 bits encode the number of bytes less 31. Byte Shift can thus encode either isolated extended ASCII or control characters or long strings of byte data, possibly filling the whole symbol. At the end of the byte string, encoding returns to the mode from which BS was invoked.
The character designated «FLG(n)» in the Punct column of Table 3 is a special in-place flag used to represent a variety of non-data characters supported by many standard symbologies. In the bit stream, the FLG(n) value is followed by 3 extra bits encoding its argument «n» in binary; thus n is valued between 0 and 7.
FLG(0) represents «FNC1», a non-data flag deriving from Code 128. When FNC1 occurs in the first data position, it signals the use of EAN/UPC data formatting rules using Application Identifiers and causes bit 0 in the symbology identifier modifier value to be set. When FNC1 occurs in the second data position, or in the third position following two digits, then it signals the use of some other industry-specific format, identified by the preceeding data, and causes bit 1 of the modifier value to be set. When FNC1 occurs at any later location in the data, it serves as a field separator and causes an ASCII 29 () to be inserted in its place in the output data string.
FLG(1) through FLG(6) are assigned to representing the Extended Channel Escape character ECE, which in the output data string is represented by «\nnnnnn», a backslash followed by a 6-digit number. The presence of an ECE anywhere in a symbol causes all backslashes throughout the encoded data to be doubled in the output string and also causes bit 2 of the symbology identifier modifier value to be set. The argument n in this case indicates how many of the 6 digits are explicitly encoded, using Digit mode, in the symbol, leading zeros being assumed for the rest. ECE #000123, for example, is encoded FLG(3) 1 2 3 , after which encoding reverts to the mode FLG(n) was invoked from.
FLG(7) is presently unassigned.
In the second step of message encoding, the resulting stream of message bits is laid into the sequence of B-bit (B = 6, 8, 10, or 12) message codewords in a generally direct fashion, starting at the most significant bit of the first codeword, with two key exceptions: whenever the first B-1 bits placed in a codeword are all «0»s, then a dummy «1» is inserted into that codeword's LSB and the following message bit starts off the next codeword. Similarly a message codeword that starts with B-1 «1»s has a dummy «0» inserted into its LSB. In this manner, message codewords of all «0»s or all «1»s are illegal and can be deemed erasures during the decoding process.
In the end, the character and byte boundaries in the original message have no necessary relationship with the codeword boundaries. Up to B-1 bits may remain unfilled in the final message codeword, and they are to be padded out with «1»s (and possibly a final dummy «0» if necessary) to eliminate any ambiguity.

Small Aztec Code is a special space-saving version of Aztec Code for encoding shorter messages (up to 95 characters). Space is saved by removing one set of rings from the finder pattern, eliminating the reference grid, and using a shorter mode message which limits the symbols to four data layers; otherwise, the encoding rules are generally the same as for standard Aztec Code. Small Aztec symbols are completely decoder compatible with standard Aztec symbols so the two types can be intermixed in applications.
Figure 5 shows two representative Small Aztec symbols, one on the left encoding 12 digits of data with 47% error correction in a 15x15 square module area and one on the right encoding 55 characters of data with 24% error correction in a 23x23 square module area. As with standard Aztec symbols, no quiet zone is required.
Table 4 lists the four possible sizes of Small Aztec symbol and their respective bit capacities. As with standard Aztec Code, the one and two layer symbols employ 6-bit codewords for error correction while the three and four layer symbols employ 8-bit codewords. Although the 4-layer symbol holds 76 codewords, at most 64 of those can be data words, thus Small Aztec symbols are limited to encoding 512 bits of data (typically about 95 characters or 120 digits).
Table 4 - The Size and Bit Capacity of Small Aztec Symbols
The following subsections provide details of those aspects where Small Aztec symbols differ from standard Aztec symbols.

Of the fixed pattern structures (refer to Figure 2), Small Aztec's finder is just a 5-level bullseye pattern lacking the outermost light and dark rings, the orientation bits are unchanged except that they're moved inward to touch the smaller finder, and there is no reference grid.
Of the variable structures (refer to Figure 3), Small Aztec's mode message is reduced to the 28 bits in the layer of modules touching the smaller finder and skipping over the orientation bits, again wrapping in a clockwise direction starting in the upper left corner. This bit string contains 8 mode bits and 20 Reed-Solomon check bits. Then, up to four layers of N x 2 data words wrap around the symbol, spiralling outward in exactly the same fashion as in standard Aztec symbols (refer to Figure 4) but not having to skip any locations for a reference grid.

1. The Small Aztec Mode Message
Small Aztec's mode message encodes 8 significant bits. The first two encode the size of the symbol, actually the number of data layers L less one, in binary. The next 6 bits encode the length of the message, actually the number of message codewords D less one. D cannot exceed 64; though a 4-layer Small Aztec symbol holds 76 codewords, 12 at least must be checkwords.
Special 1-layer Small Aztec symbols for menuing purposes can be encoded by artificially setting the MSB of the message length, an otherwise improper setting because that symbol's total capacity is below 32 codewords.
The 8 mode bits are augmented by 20 check bits using Reed-Solomon encoding over GF(16). The generator polynomial of (x-21)..(x-24) is x4 + 11x3 + 4x2 + 6x + 2. These bits wrap clockwise around the finder in four 7-bit segments starting in the upper left corner.
2. The Small Aztec Data Message
The D codewords encoding the message are augmented by Cw-D Reed-Solomon checkwords in exactly the same manner as with standard Aztec Code, where Cw is given in Table 4 and the Galois Field is defined in Table 2. These codewords, again taken in reverse order, form an outward clockwise spiral in the data fields.
High-level message encodation for Small Aztec symbols follows exactly to rules for standard Aztec Code; see Section II.C. Binary Shift, FNC1, and Extended Channel Escape sequences are all supportable, but the total encoded message can never exceed 512 bits.

It is occasionally desireable to distribute a data message across several Aztec Code symbols, either to fit a non-square area or to handle larger messages than are practical in a single symbol. While this can be achieved entirely by a user-defined header structure, the following is a standard method that shall be supported by Aztec Code readers.
All symbols which are part of an ordered append sequence shall start message encoding with the sequence:
- ML UL [space I...D space] I N ...
which consists of the ordered append flags ML/UL, an optional space-delimited message ID field, and a 2-character I-of-N sequencing field. This header is then followed directly by the data segment to be encoded.
An Aztec Code reader, upon detecting an UpperLatch ahead of any stored data, shall either (1) signal that this is an ordered append symbol in the Symbology Identifier modifier value, then transmit the header and message segment normally, or (2) interpret the header and store the message segment for later transmission as part of the complete assembled message.
The optional message ID field is any number of data characters starting and ending with a space character, and shall be the same for all symbols which comprise the same message. Though the I...D string may be made up of any characters (except additional spaces!), the most efficient encoding will result if it is a string of uppercase letters.
The sequencing field is two uppercase letters, the first encoding which symbol this is in a sequence and the second encoding the total number of symbols in the sequence, where «A» = 1, «B» = 2, and so forth. For example, a four symbol sequence would have sequencing fields of «AD», «BD», «CD», and «DD». Up to 26 symbols can be linked together using this ordered append protocol.

|
|
|
|
 |
|