Basic knowledge of programming required | Principles of Computer Composition articles (06): The character set and encoding computer

In computer knowledge base, for some non-majors in college students is concerned, it has been a pain in their hearts, and for technical education students, many students after work, is also aware of shortcomings and lack the knowledge of their own, want to go back Bubu basics. Many courses on computer-based content complicated, but whether it is books or university courses, are a bit out of work. In particular, numerous and basic computer knowledge and want to learn from zero or review are time-consuming.

In view of this, this series of articles will take you faster to make up the necessary basic knowledge of programming, covering the three basic computer knowledge areas: computer composition principle, operating systems, computer networks, these are college computer courses inside the most important content. The content of the article and do a refining summary, abandoned as a programmer does not need knowledge.

The purpose is:

    Help you form a computer architecture knowledge

    Help you understand the underlying principles of computer

    Help you learn where the practical work of outstanding design

Benpian is a computer character encoding sets up a computer with the principles of the.

Welcome attention, forward, favorites, comments

History of character encoding

ASCII code

For the ASCII code, I believe we in the usual course of study, work, have to understand.

ASCII code, full name in English: American Standard Code for Information Interchange, translation is: American Standard Code for Information Interchange, is we usually commonly used coding. How does it come about?

In the computer, all data must be used when storing binary numbers and calculation expressed (as represented by the computer 1 and 0 respectively high and low), for example, as a, b, c, d such letters 52 (including capital) and 0,1 other digital there are some common symbols (such as *, #, @, etc.) when stored in the computer are also used to represent a binary number, and what specific binary digits with symbols indicating which, of course, every individuals can own set of conventions (this is called encoding), if, for character a, 0001 Bill Gates want to represent, and Jobs wanted to use 0010 to express, no one they can not understand each other when in communication language. Therefore, to computers to communicate with each other without causing confusion, you must use the same encoding rules, then the United States relating to standardization organizations on the introduction of ASCII coding, unified regulations which the above common symbols used to represent binary numbers.

Standard ASCII code, also called basic ASCII code, using 7 bits to represent all the uppercase and lowercase letters, numbers 0 through 9, punctuation marks, and special control characters used in American English, defines a total of 7128 (2 power) characters.

Now we come to understand through a portion of the ASCII table.

The table lists the common character of the ASCII code, such as: a letter is represented as 01100001, character (not a number oh) 1 expressed as 00110001 … …

In the early, ASCII code of the computer will be able to meet the specific group of people, but, as the universal development of the computer and the computer, ASCII code can not meet the people’s needs, such as some mathematical symbols and symbols of some countries can not representation. So, people ASCII code was expanded by eight bits to represent a character, which is to expand the ASCII code, expand the ASCII code can represent 256 characters.

Extended ASCII code

The following diagram, is to expand the ASCII code table.

This includes a common mathematical operators, with phonetic characters and other symbols of Europe, tables symbols. ASCII code used to expand greatly complement the original code table, so that the content can be expressed by a computer has become increasingly diverse.

Internationalization of character encoding

With the further development of computers, more and more countries joined the ranks of the use of computers, the demand for character encoding set higher and higher. For the countries of Europe, Central Asia, East Asia, Latin America, they are rich and varied language, the system is not the same, is not limited to a combination of the characters, particularly in China, South Korea, Japan and other languages ​​the most complex, ASCII code table can not the expression of these languages, there is an urgent need to use the new code set, which is the international character encoding set.

Chinese code sets


GB2312 1980 to develop national standards of Chinese character coding, it is the earliest the most comprehensive set of encoding a total of 7,445 characters, including 6,763 Chinese and 682 other characters, a character occupies two bytes.


Since GB2312 not meet international standards, Chinese scientists launched a complete second set of encoding set –GBK in 1995, GBK GB2312 backward compatible, supporting up international ISO standards, contains 21,003 Chinese characters, support for all CJK characters .

GB2312 and GBK are relatively complete code set, however, they are only a localized coding, use in China is no problem, but cross-border use have a problem. For example, a Chinese man developed a website, a foreign friends to visit this site, if they are not available locally GB2312 GBK code set or code set is installed, then they see when accessing web sites will be garbled. Therefore, we need a unified global coding standard.


For all Unicode characters, Unicode came into being. Unicode is a character set compatible global, defines the world-wide symbol set, you can express all of the text and characters in the world. All Unicode languages ​​are unified into a set encodings, so you do not have a garbage problem.

We usually use UTF-8 encoding is one of the rules, which in bytes encoding Unicode, usually write the code are recommended UTF-8 encoding. Chinese GBK encoding using the Windows operating system default, therefore, the use of programming IDE typically arranged UTF-8 encoding.


Leave a Reply