« Need those reviews | Main | A few notes for IS449ers for 10/30's class »

Character Encoding

Problem using the PHP script to pull the data from mySql database to generate the XML data in Firefox (Nattakorn's Blog) You may try to compare between two php scripts. Both are the same except that data_utf8.php is saved in the UTF-8 type by using the notepad while data.php was saved into ASCII by using the notepad.

As readers may be aware, characters are encoded using bytes.  In other words, letters you see on the screen are represented as strings of 0's and 1's in the computer's internal memory.  Character encodings such as UTF-8 specify a mapping between characters and the strings of 0's and 1's used to represen them.  There are multiple competing character encodings used around the world for the different character sets.

In many cases, converting between character sets can be seamless.  However, Nattakorn has uncovered a case where it is not.  I checked on three browsers.  The problem Nattakorn mentioned appears in Firefox and Opera but not Safari on the mac.  Nattakorn has also discovered that the problem does not occur in a beta version of IE on windows.

I suspect his issue arose when saving the file in notepad.  Notepad inserted characters to indicated encoding that are being ignored by some browsers but picked up as junk by others.  The solution just seems to be to save as ascii, the earliest character encoding, in notepad.  I'll point out that ascii is a subset of utf-8, so it should not present a problem.

update:  I have found this W3C page that fully explains and reproduces the problem Nattakorn describes.  Their recommendation and solution are equivalent to the ones I have given here.

Post a comment

Tag cloud

Archives