Standard Compression Scheme for Unicode - Your Art History Reference Guide!

ArtHistoryClub Information Site on Standard Compression Scheme for Unicode Art History Art History Search        Art History Browse             News        Gallery        Forums        Articles        Weblinks        welcome to our free resource site for all art history lovers!

Standard Compression Scheme for Unicode

The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard to reduce the number of bytes needed to represent text, especially if that text uses mostly characters from a small number of Unicode blocks. It does so by dynamically mapping the values in the range 128-255 to blocks of 128 characters. Since most alphabets are in 128 contiguous Unicode codepoints, this allows for 1 byte per character (plus overhead) encoding for many text files. SCSU will also switch to UTF-16 internally to handle non-alphabetic languages.

SCSU is not a resounding success. Few places need to compress enough Unicode text to make it worth using a poorly supported compression scheme. Treated purely as a compression format, it's inferior to most commonly used compression programs for texts over a few kilobytes. It can be used as a text encoding, but it's very hard to handle internally, and the percentage savings between SCSU and UTF-16 or UTF-8 drops after external compression, dramatically in the case of bzip2 and other modern compression schemes. It does have the advantage that SCSU can compress texts that are only a few characters long, whereas most full-scale compressors need a few kilobytes of data to overcome the overhead.

Reuters, the organization that floated the first draft of SCSU, is believed to use SCSU internally.

External links

Last updated: 01-04-2007 01:18:57
The contents of this article are licensed from Wikipedia.org under the
GNU Free Documentation License. See original document.
Art History Search | Art History Browse | Contact | Legal info