more on blog names with funny characters
part of the blogs name problem on blo.gs is that the HTML::Entities module for perl does not handle decoding entities into utf-8 in perl 5.6.
perhaps the right thing to do would be to not assume that blog names are encoded in the lists from blogger and weblogs.com. (done. we'll see what blows up.)
then there's the matter of people pinging blo.gs directly. some people are pinging the xml-rpc interface with iso-8859-1 characters, and i doubt the encoding is being set correctly in the xml-rpc request. (i assume: i haven't actually double-checked. it should at least be handled correctly on the blo.gs side now - it was defaulting to iso-8859-1 before, contrary to the xml specification.)
i haven't quite figured out how to figure out the encoding of incoming GET-based pings. isn't there some sort of heuristic that makes it easy to figure out if a string is encoded in utf-8? does php support %u####
-style encoding?
(the goal is to move towards storing all the names in utf-8. not quite there yet.)
Add a comment
Sorry, comments on this post are closed.