July, 19, 2003 archives
blog names and blo.gs
one of the current problems with blo.gs is that it does not handle blogs with non-iso-8859-1 encoded names very well. this would be fairly easy to fix, except for the feeds from weblogs.com and blogger—they aren't very consistent in what shows up in the name attribute. sometimes it is actually iso-8859-1 text, sometimes numeric entities are used correctly, and sometimes html named entities (like â) and numeric entities (like €) are double-encoded.
my preference would be for names to not be double-encoded. there's no reason to do that, as far as i can determine. (i don't think allowing html markup in blog names is necessary.)
but i assume the blogger and weblogs.com feeds will never get fixed, so i'm going to have to code up some heuristics to handle the names picked up from those sites.
the damage
- the doors, the doors
- l.a. woman, the doors
- redemption's son, joseph arthur
- groove soundtrack, mixed by wishfm
- beautifulgarbage, garbage
- zooropa, u2
- the joshua tree, u2
- curse of the hidden mirror, blue öyster cult
- stark raving mad, john digweed
- vertigo, groove armada
- don't be afraid of love, lo fidelity allstars
- intensify, way out west
- expeditions, northern exposure: sasha + john digweed
- love box, groove armada
- (the white album), the beatles
- global underground: 001 sydney, john digweed
- communicate, sasha and john digweed
(why all the links? i wanted to look them up to grab the album covers for itunes, anyway.)
nearly every time i buy music, i end up with a few albums that are plucked from the shelf largely on a whim. the lo fidelity allstars album was only $3, and i remember them showing up in various "other people bought..." lists when using amazon. the way out west album was a near-total shot in the dark (it has a digweed connection).
all of the albums were used or cut-outs. (some of them with the "the cd surface does not look perfect" tag, but amoeba music has a 7-day return policy for defective used discs. no problem with the two i've ripped so far.)
one album i wanted to pick up, but they didn't have used (or i overlooked), was pet sounds, by the beach boys. i guess i'll just have to force myself to go back to amoeba some day.
one small step
today is the 34th anniversary of the apollo 11 moon landing. (or is it?)
by happy coincidence, i watched the dish last night: a fun movie about the radio telescope in parkes, australia that was the primary receiving station for the moon landing. it's a small-town-meets-big-event movie that doesn't look down on the inhabitants, or portray the outsiders as mustache-twirling villains or aloof sophisticates. everyone is very human—good intentioned and a little awkward.
more on blog names with funny characters
part of the blogs name problem on blo.gs is that the HTML::Entities module for perl does not handle decoding entities into utf-8 in perl 5.6.
perhaps the right thing to do would be to not assume that blog names are encoded in the lists from blogger and weblogs.com. (done. we'll see what blows up.)
then there's the matter of people pinging blo.gs directly. some people are pinging the xml-rpc interface with iso-8859-1 characters, and i doubt the encoding is being set correctly in the xml-rpc request. (i assume: i haven't actually double-checked. it should at least be handled correctly on the blo.gs side now - it was defaulting to iso-8859-1 before, contrary to the xml specification.)
i haven't quite figured out how to figure out the encoding of incoming GET-based pings. isn't there some sort of heuristic that makes it easy to figure out if a string is encoded in utf-8? does php support %u####
-style encoding?
(the goal is to move towards storing all the names in utf-8. not quite there yet.)