september, 9, 2004 archives

mailing list wishlist

justin mason has a mailing list wishlist. the ezmlm-based system for the mysql mailing lists does the archive-permalink thing. it is added to the message as the List-Archive header. (maybe that is an abuse of that header, but it seems more relevant than just putting the link to the main archive in the header.)

there are a number of things i don’t like about ezmlm, but the biggest advantage is that it is decomposed into enough distinct little bits that it is not difficult to rip out and replace specific bits. for example, you can replace the subscription confirmation (and make it web-based, and not vulnerable to stupid autoresponders subscribing themselves) by just adding a program into the manager that handles them before they get to ezmlm-request and ezmlm-manage.

i haven’t spent a lot of time futzing with mailman, but i’ve never really cared for it as a mailing list user.

but i’m not sure it really matters. all the kids are crazy about web-based forums these days. people who recognize the superiority of mailing lists are dinosaurs.

decentralized web(site|log) update notifications and content distribution

this is something that has been on my mind lately, and hope to talk about with smart people this weekend. (“the first rule is...”)

in a bit of interesting timing, this little software company in redmond recently hit the wall in dealing with feeding rss to zillions of clients on one of their sites.

in preparation, i’ve been digging into info on some of the p2p frameworks out there. the most promising thing i’ve come across is scribe. the disappointing thing (for me) is that it is built with java, which limits my ability to play with it.

while it would be tempting to think merely about update notifications, that just doesn’t go far enough. even if you eliminated all of the polling that rss and atom aggregation clients did, you would have just traded it for a thundering-herd problem when a notification was sent out. (this is the problem that shrook’s distributed checking has, aside from the distribution of notifications not being distributed.)

the atom-syntax list has a long thread on the issue of bandwidth consumption of rss/atom feeds, and bob wyman is clearly exploring some of the same edges of the space as me.

maybe it’s useful to sketch out a scenario of how i envision this working: i decide to track a site like boing boing, so i subscribe to it using my aggregation client. when it subscribes, it gets a public key (probably something i fetch from their server, perhaps embedded in the rss/atom feed). my client then hooks into the notification-and-content-distribution-network-in-the-sky, and says “hey, give me updates about boingboing”. later, the fine folks at boing boing (or xeni) post something, and because they’re using fancy new software that supports this mythical decentralized distribution system, it pushes the entry into the cloud. the update circulates through the cloud, reaching me in a nice ln(n) sort of way. my client then checks that the signature actually matches the public key i got earlier, and goes ahead and displays the content to me, fresh from the oven.

another scenario: now when i subscribe to jeremy zawodny’s blog, who has been slow to update his weblog software (in my hypothetical scenario) because he’s too busy learning how to fly airplanes, i don’t get updates whenever he publishes. but there’s enough other readers running this cloud-enabled aggregation software that when they decide they haven’t seen an update recently, they go ahead and poll his site. but when they notice an update, they inject it into the cloud. or they even notify the cloud that there hasn’t been an update.

obviously that second situation is much less ideal: there’s no signature, so some bozo could start injecting “postgresql is great!” entries into the jeremy zawodny feed space. or someone could just feed “nothing changed” messages, resulting in updates not getting noticed. the latter is fairly easy to deal with (add a bit of fuzzy logic there, where clients sometime decide to check for themselves even when they’ve been told nothing is new), but i’m not so sure about the forgery problem in the absence of some sort of signing mechanism.

in addition to notification, a nice feature for this cloud to have would be caching. that way when i wake up my machine in the morning, the updates i’ve missed can stream in from the network of peers who have been awake, and i don’t have to bother the original sites.

i don’t think there is going to be a quick and easy solution to this, but i hope to aid in the bootstrapping. if nothing else, can certainly gateway what it knows about blog updates into whatever system materializes. (it certainly can’t scale any worse than the existing cloud interface, which is pretty inefficient given the rate that pings are coming in.)

a footnote on the signing mechanism: there’s the xml-signature syntax and processing specification that covers this. i haven’t really looked at in detail to know what parts of the problem it solves or does not solve.

(anybody who suggests bittorrent as a key component of the solution will have to work much harder to get a passing grade.)

the cloud interface

there are three services that are currently hooked up to the cloud interface. there are 96 different hosts that polled one of the changes files yesterday.

« wednesday, september 8, 2004 friday, september 10, 2004 »