|
Web Development -
PHP
|
|
Saturday, 08 May 2010 10:45 |
Introduction: What are Character Sets and Character Encoding?
While as humans we perceive words as a collection of letters or glyphs as full word representations (e.g., Chinese) computers store everything as a binary representation, including letters. Every letter is represented in binary as some numeric value which is constant within a character set. The most familiar simple character set is the ASCII Character Set, which covers most standard English letters and numbers. However of course, other letters exists as well as diacritic marks and accents (e.g. β, ©, Ü, ŏ). When text is saved into memory/storage it is usually saved using one specific character encoding. Since the saved binary is actually just binary data it is possible to render text saved with one encoding in a different encoding. This is usually not intended and often results accidently and results in corrupt data display; however, because some character encodings storage values overlap it may not always be discernable.
ASCII is insufficient for coverage of all global alphabets and glyphs. Unicode provides a set of standards that include characters in multiple languages, and allows for multi-byte character representations. In ASCII all characters are of a uniform length (8 bit/1 byte); however, there are obviously insufficient possibilities to store all the characters needed for global languages. Of the Unicode standards, the most commonly used is UTF-8, which is the most frequent Unicode character encoding using on UNIX systems and the Internet.
|
|
Click here to read the full article...
|