gabriel rosenkoetter on 31 Mar 2004 23:44:02 -0000 |
On Wed, Mar 31, 2004 at 05:09:07PM -0500, Paul wrote: > OK. If we limit the question about UTF-8 to graphical Web browsers and > e-mail clients, then is the use of UTF-8 for character coding of Web > pages and e-mail a good thang? I only view emails in a terminal, and I'm far from alone in that, so I'd still say that UTF-8 is contraindicated there. There are methods for stating clearly what character set you're using, though, and smart MUAs (both mutt and pine included) do the best thing possible given that information, so it's less offensive than forcing a TERM setting on me (as Red Hat does). It *completely* makes sense for web pages (I'd rather see Japanese characters than garbaged ASCII, even if I can't read either; I at least know what language they're trying to speak, since the various Asian language fonts are visually distinguishible). > And, if a Web page or e-mail does not > specify its encoding, is UTF-8 a reasonable default? No. 7-bit ASCII is the only reasonable default for viewing in that case in order to maintain backwards compatibility. That said, I'm pretty sure (though I'm not bothering to check) that most Latin-character locales will do that for you. > I guess if you're running a text-based browser or e-mail client in a > terminal, you might not like the use of UTF-8, right? Well, I'm basically resigned to being treated like a second-class citizen if I'm web browsing with a text-based browser. You (or, rather, your MUA) shouldn't *send* me UTF-8 without saying it is, and it may or may not be safe to assume that unspecified character sets will work in UTF-8 (so you may get garbage when viewing 7-bit ASCII email if your MUA is stupid, or you force it to be stupid), but that'd be your problem. :^> > In my case, I'm concerned about "internationality" (I think I just > created that word.) and my ability to send and view Japanese text in > addition to English text. There are Japanese encodings such as > ISO-2022-JP, EUC-JP, and Shift_JIS. But, if UTF-8 can handle "all" > languages, isn't better to not worry about regional encodings and use > the theoretically universal encoding? Unicode keeps getting touted as the way to fix all sorts of character set differences, but it's a false win. It only gets you text, not various other behaviors (think Japanese keyboards; they've got three modes, none of which is "Kanji" exactly). Locales are still the right way to do this, Unicode is just extra fluff. The problem with assuming that UTF-8 will magically fix this for you is that if someone sends you ISO-2022-JP ASCII text, UTF-8 still won't display it properly. -- gabriel rosenkoetter gr@eclipsed.net Attachment:
pgpDHcuuPwpt2.pgp
|
|