decentralized web(site|log) update notifications and content distribution

this is something that has been on my mind lately, and hope to talk about with smart people this weekend. (“the first rule is...”)

in a bit of interesting timing, this little software company in redmond recently hit the wall in dealing with feeding rss to zillions of clients on one of their sites.

in preparation, i’ve been digging into info on some of the p2p frameworks out there. the most promising thing i’ve come across is scribe. the disappointing thing (for me) is that it is built with java, which limits my ability to play with it.

while it would be tempting to think merely about update notifications, that just doesn’t go far enough. even if you eliminated all of the polling that rss and atom aggregation clients did, you would have just traded it for a thundering-herd problem when a notification was sent out. (this is the problem that shrook’s distributed checking has, aside from the distribution of notifications not being distributed.)

the atom-syntax list has a long thread on the issue of bandwidth consumption of rss/atom feeds, and bob wyman is clearly exploring some of the same edges of the space as me.

maybe it’s useful to sketch out a scenario of how i envision this working: i decide to track a site like boing boing, so i subscribe to it using my aggregation client. when it subscribes, it gets a public key (probably something i fetch from their server, perhaps embedded in the rss/atom feed). my client then hooks into the notification-and-content-distribution-network-in-the-sky, and says “hey, give me updates about boingboing”. later, the fine folks at boing boing (or xeni) post something, and because they’re using fancy new software that supports this mythical decentralized distribution system, it pushes the entry into the cloud. the update circulates through the cloud, reaching me in a nice ln(n) sort of way. my client then checks that the signature actually matches the public key i got earlier, and goes ahead and displays the content to me, fresh from the oven.

another scenario: now when i subscribe to jeremy zawodny’s blog, who has been slow to update his weblog software (in my hypothetical scenario) because he’s too busy learning how to fly airplanes, i don’t get updates whenever he publishes. but there’s enough other readers running this cloud-enabled aggregation software that when they decide they haven’t seen an update recently, they go ahead and poll his site. but when they notice an update, they inject it into the cloud. or they even notify the cloud that there hasn’t been an update.

obviously that second situation is much less ideal: there’s no signature, so some bozo could start injecting “postgresql is great!” entries into the jeremy zawodny feed space. or someone could just feed “nothing changed” messages, resulting in updates not getting noticed. the latter is fairly easy to deal with (add a bit of fuzzy logic there, where clients sometime decide to check for themselves even when they’ve been told nothing is new), but i’m not so sure about the forgery problem in the absence of some sort of signing mechanism.

in addition to notification, a nice feature for this cloud to have would be caching. that way when i wake up my machine in the morning, the updates i’ve missed can stream in from the network of peers who have been awake, and i don’t have to bother the original sites.

i don’t think there is going to be a quick and easy solution to this, but i hope to aid in the bootstrapping. if nothing else, blo.gs can certainly gateway what it knows about blog updates into whatever system materializes. (it certainly can’t scale any worse than the existing cloud interface, which is pretty inefficient given the rate that pings are coming in.)

a footnote on the signing mechanism: there’s the xml-signature syntax and processing specification that covers this. i haven’t really looked at in detail to know what parts of the problem it solves or does not solve.

(anybody who suggests bittorrent as a key component of the solution will have to work much harder to get a passing grade.)

« mailing list wishlistthe blo.gs cloud interface »

comments

Manual track back :-)

http://enthusiasm.cozy.org/archives/2004/09/collaborative-model-synchronization/

» Ben Hyde (link) » september 10, 2004 8:32am

map_all_freegorahe.com4.txt;15;20

» nicks.txt (link) » august 7, 2009 7:20am

this entry is closed to new comments.