Uuencode
Encyclopedia
Uuencoding is a form of binary-to-text encoding that originated in the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 program uuencode, for encoding
Code
A code is a rule for converting a piece of information into another form or representation , not necessarily of the same type....

 binary
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

 data for transmission over the uucp
UUCP
UUCP is an abbreviation for Unix-to-Unix Copy. The term generally refers to a suite of computer programs and protocols allowing remote execution of commands and transfer of files, email and netnews between computers. Specifically, a command named uucp is one of the programs in the suite; it...

 mail system.

The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computers' character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

 and posting to usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 newsgroups, etc.

It has now been largely replaced by MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

 and yEnc
YEnc
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method...

. With MIME, files that might have been uuencoded are transferred with base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

 encoding.

Encoded format

A uuencoded file starts with a header line of the form:
begin
is the file's Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 read/write/execute file permissions as three octal digits. This is typically only significant to UNIX and Linux based operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s but for reference this is 0644 (0 => signifies the number as octal, 6 => User can read + write, 4 => Group can read, 4 => Others can read) or 0744 (the same except 7 => User can read + write + exec).

is the file name to be used when recreating the binary data.

signifies a newline
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

 character.
Each data line uses the format:


is a character indicating the number of data bytes encoded on that line and ends with a newline character
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

.

The character is an ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 character determined by adding 32 to the actual byte count, with the sole exception of a grave accent
Grave accent
The grave accent is a diacritical mark used in written Breton, Catalan, Corsican, Dutch, French, Greek , Italian, Mohawk, Norwegian, Occitan, Portuguese, Scottish Gaelic, Vietnamese, Welsh, Romansh, and other languages.-Greek:The grave accent was first used in the polytonic orthography of Ancient...

 "`" (ASCII code 96) signifying zero bytes. All data lines except the last (if the data was not divisible by 45), have 45 bytes of encoded data. Therefore, the vast majority of length values is 'M', (32 + 45 = ASCII code 77 or "M").

The encoded characters. See Formatting Mechanism for more details on the actual implementation.

The file ends with two lines:
`
end

The second to last line is also a character indicating the line length with the grave accent signifying zero bytes.

As a complete file, the uuencoded output for a plain text file named cat.txt containing only the characters Cat would be
begin 644 cat.txt
#0V%T
`
end

The begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files.

Formatting Mechanism

The mechanism of uuencoding repeats the following for every 3 bytes:
  1. Start with (3) bytes from the source.
  2. Convert to 24 bits
    BITS
    BITS or bits may refer to:* Plural of bit* Background Intelligent Transfer Service, a file transfer protocol* Birla Institute of Technology and Science, a technology school in Pilani, Rajasthan, India, with campuses in Goa, Hyderabad, and Dubai...

    .
  3. Convert into (4) 6-bit
    Bit
    A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

     groupings, bits (00-05),(06-11),(12-17),(18-23).
  4. Evaluate the decimal
    Decimal
    The decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....

     equivalent of each of the (4) 6-bit groupings. 6 bits allows a range of 0 to 63.
  5. Add 32 to each of the 4. With the addition of 32 this means that possible results can be between 32 (" " space
    Space
    Space is the boundless, three-dimensional extent in which objects and events occur and have relative position and direction. Physical space is often conceived in three linear dimensions, although modern physicists usually consider it, with time, to be part of a boundless four-dimensional continuum...

    ) and 95 ("_" underline
    Underline
    An underline, also called an underscore, is one or more horizontal lines immediately below a portion of writing. Single, and occasionally double , underlining was originally used in hand-written or typewritten documents to emphasise text...

    ). 96 ("`" grave accent
    Grave accent
    The grave accent is a diacritical mark used in written Breton, Catalan, Corsican, Dutch, French, Greek , Italian, Mohawk, Norwegian, Occitan, Portuguese, Scottish Gaelic, Vietnamese, Welsh, Romansh, and other languages.-Greek:The grave accent was first used in the polytonic orthography of Ancient...

    ) as the "special character" is a logical extension of this range.
  6. Output the ASCII equivalent of these numbers.


Note that if the source is not divisible by 3 the last 4-byte section will contain padding
Padding
Padding is thin cushioned material sometimes added to clothes. It is often done in an attempt to soften impacts on certain zones of the body or enhance appearance by 'improving' a physical feature, often a sexually significant one...

 bytes to make it cleanly divisible. These bytes are subtracted from the line's so that the decoder does not append unwanted null characters
Null character
The null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...

 to the file.

uudecoding is reverse of the above, subtract 32 from each character's ASCII code, convert the 4 decimals to 24 bits then output 3 bytes.

The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat".
Original characters C a t
Original ASCII, decimal 67 97 116
ASCII, binary 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0
New decimal values 16 54 5 52
+32 48 86 37 84
Uuencoded characters 0 V % T

Uuencode table

The following table shows the conversion of the decimal value of the 6-bit fields obtained during the conversion process and their corresponding ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 character output code and character.

Note that 96 ("`" grave accent
Grave accent
The grave accent is a diacritical mark used in written Breton, Catalan, Corsican, Dutch, French, Greek , Italian, Mohawk, Norwegian, Occitan, Portuguese, Scottish Gaelic, Vietnamese, Welsh, Romansh, and other languages.-Greek:The grave accent was first used in the polytonic orthography of Ancient...

) is a character that is seen in uuencoded files but is typically only used to signify a 0-length line, usually at the end of a file. It will never naturally occur in the actual converted data since it is outside the range of 32 to 95. The sole exception to this is that some uuencoding programs use the grave accent to signify padding bytes instead of a space. However, the character used for the padding byte is not standardized, so either is a possibility.

six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
six
bits
ASCII
code
ASCII
char
00 32 SP   10 42 *   20 52 4   30 62 >   40 72 H   50 82 R   60 92 \
01 33 !   11 43 +   21 53 5   31 63 ?   41 73 I   51 83 S   61 93 ]
02 34 "   12 44 ,   22 54 6   32 64 @   42 74 J   52 84 T   62 94 ^
03 35 #   13 45 -   23 55 7   33 65 A   43 75 K   53 85 U   63 95 _
04 36 $   14 46 .   24 56 8   34 66 B   44 76 L   54 86 V
05 37 %   15 47 /   25 57 9   35 67 C   45 77 M   55 87 W
06 38 &   16 48 0   26 58 :   36 68 D   46 78 N   56 88 X
07 39 '   17 49 1   27 59 ;   37 69 E   47 79 O   57 89 Y
08 40 (   18 50 2   28 60 <   38 70 F   48 80 P   58 90 Z
09 41 )   19 51 3   29 61 =   39 71 G   49 81 Q   59 91 [


Forks (File, Resource)

Unix traditionally has a single fork
Fork (filesystem)
In a computer file system, a fork is byte stream associated with a file system object. Every non-empty file must have at least one fork, and depending on the file system, a file may have one or more other associated forks, which in turn may contain primary data integral to the file, or just metadata...

 where file data is stored. However some file systems support multiple fork
Fork (filesystem)
In a computer file system, a fork is byte stream associated with a file system object. Every non-empty file must have at least one fork, and depending on the file system, a file may have one or more other associated forks, which in turn may contain primary data integral to the file, or just metadata...

s associated with a single file. For example, classic Mac OS HFS
Hierarchical File System
Hierarchical File System is a file system developed by Apple Inc. for use in computer systems running Mac OS. Originally designed for use on floppy and hard disks, it can also be found on read-only media such as CD-ROMs...

 supported a data fork and a resource fork
Resource fork
The resource fork is a construct of the Mac OS operating system used to store structured data in a file, alongside unstructured data stored within the data fork. A resource fork stores information in a specific form, such as icons, the shapes of windows, definitions of menus and their contents, and...

. Mac OS HFS+
HFS Plus
HFS Plus or HFS+ is a file system developed by Apple Inc. to replace their Hierarchical File System as the primary file system used in Macintosh computers . It is also one of the formats used by the iPod digital music player...

 supports multiple forks, as does Microsoft Windows NTFS
NTFS
NTFS is the standard file system of Windows NT, including its later versions Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, and Windows 7....

 alternate data streams. Most uucoding tools will only handle data from the primary data fork
Fork (filesystem)
In a computer file system, a fork is byte stream associated with a file system object. Every non-empty file must have at least one fork, and depending on the file system, a file may have one or more other associated forks, which in turn may contain primary data integral to the file, or just metadata...

 that can result in a loss of information when encoding/decoding (for example, Windows NTFS file comments are kept in a different fork.) Some tools (like the classic Mac OS application UUTool
UUTool
UUTool was a freeware application written for the Apple Macintosh by Bernie Wieser. The purpose of UUTool was to uuencode and uudecode files, however, the application functionality grew to translate uLaw encoded files to AIFF format, segment large uuencoded files, and recombine multiple uuencoded...

) solved the problem by concatenating the different forks into one file and differentiating them by file name.

Relation to Xxencode and Base64

Despite its limited range of characters, uuencoded data is sometimes mangled on passage through certain computers using non-ASCII character sets such as EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

. One attempt to fix the problem was the Xxencode
Xxencode
Xxencode is an binary-to-text encoding similar to Uuencode which uses only the alphanumeric characters, and the plus and minus signs. It was invented as a means to transfer files in a format which would survive character set translation, particularly that between ASCII and the EBCDIC encoding used...

 format, which used only alphanumeric characters and the plus and minus symbols. More common today is the Base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

 format which is a based on the same concept of alphanumeric
Alphanumeric
Alphanumeric is a combination of alphabetic and numeric characters, and is used to describe the collection of Latin letters and Arabic digits or a text constructed from this collection. There are either 36 or 62 alphanumeric characters. The alphanumeric character set consists of the numbers 0 to...

-only as opposed to ASCII 32-95. All 3 formats use 6 bits (64 different characters) to represent their input data.

Base64 can also be generated by the uuencode program and is similar in format, with the exception of the actual character translation:

The header is changed to
begin-base64
the trailer becomes

and lines between are encoded with characters chosen from
ABCDEFGHIJKLMNOP
QRSTUVWXYZabcdef
ghijklmnopqrstuv
wxyz0123456789+/

Disadvantages

  • UUEncoding takes 3 pre-formatted bytes and turns them into 4 and also adds begin/end tags, filename, and delimiters. This adds approximately 33-40% data overhead compared to the source alone.
  • Newer alternatives exist such as yEnc
    YEnc
    yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method...

     and MIME
    MIME
    Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

    .

Support in Perl

The Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 language supports UUEncoding natively using the pack and unpack operators with the format string "u" - eg:-
perl -e 'print pack("u","Cat")'
#0V%T
Decoding base64 with unpack can likewise be accomplished by translating the characters:
perl -e ' $a="Q2F0"; $a=~tr#A-Za-z0-9+/\.\_##cd; # remove non-bas64 chars
$a=~tr#A-Za-z0-9+/# -_#; # translate sets
print unpack("u",pack("C",32+int(length($1)*6/8)) . $1) while($a=~s/(.{60}|.+)//); '
Cat

External links

  • UUDeview - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
  • UUENCODE-UUDECODE - open-source program to encode/decode created by Clem "Grandad" Dye
  • StUU - Open Source fast UUDecoder for Macintosh by Stuart Cheshire
    Stuart Cheshire
    Stuart Cheshire is the author of Bolo, a networked tank game, originally written for the BBC Micro and later ported to the Apple Macintosh....

  • UUENCODE-UUDECODE - Free on-line UUEncoder and UUDecoder
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK