HomeFreebie IndexEncoding Decoded

Encoding Decoded

     Even though it wasn't the main topic of last month's issue , I have received a lot of feedback about encoding. So, I'll answer it all.

Encoding and Search Rankings

     The most asked question was, If you use UTF-8 (the same encoding used by major search engines -- Google, Yahoo! and MSN), will you get better search rankings? Using the same encoding as theirs doesn't do anything with relevancy, so it won't affect search rankings at all. However, adding proper "charset" attribute to your web pages may affect the rankings. Each letter is a set of bits. The same set of bits can be different letters in different character encoding. If you don't tell search engines what encoding you are using, they cannot match search queries to your content properly. Hence irrelevant search result for your site. This doesn't mean you will get better rankings; you will be on the same page.


     The next most received feedback was that UTF-8 is supported by very few computers. However, it is hard to find one which doesn't support UTF-8, these days. Just because a computer is unable to display UTF-8 encoded web pages doesn't mean it doesn't support UTP-8. It most likely means the specific font hasn't been installed. Once you install the proper font (not encoding), the computer can display the words properly.

Character Size

     Some technically oriented people claim Unicode is a double byte character set, so using Unicode will double file size. Using Unicode or UTF-16 will, but not UTF-8. UTF-8 encoding uses double byte characters only when needed. In UTF-8 encoded file, basic English characters (ASCII) are 1 byte, and characters beyond ASCII become 2 bytes. For example, in my Japanese version of web pages, all HTML tags are in ASCII, so each letter is 1 byte. But each Japanese character of my content is 2 bytes. This is the same as using other Japanese encoding, i.e. iso-2022-jp. So, the file size remains the same.


     To save files in UTF-8 with Windows XP notepad, simply select "UTF-8" from the "Encoding" drop down menu on the "Save File" window.

© January, 2007