decentralized web(site|log) update notifications and content distribution

this is something that has been on my mind lately, and hope to talk about with smart people this weekend. (“the first rule is...”)

in a bit of interesting timing, this little software company in redmond recently hit the wall in dealing with feeding rss to zillions of clients on one of their sites.

in preparation, i’ve been digging into info on some of the p2p frameworks out there. the most promising thing i’ve come across is scribe. the disappointing thing (for me) is that it is built with java, which limits my ability to play with it.

while it would be tempting to think merely about update notifications, that just doesn’t go far enough. even if you eliminated all of the polling that rss and atom aggregation clients did, you would have just traded it for a thundering-herd problem when a notification was sent out. (this is the problem that shrook’s distributed checking has, aside from the distribution of notifications not being distributed.)

the atom-syntax list has a long thread on the issue of bandwidth consumption of rss/atom feeds, and bob wyman is clearly exploring some of the same edges of the space as me.

maybe it’s useful to sketch out a scenario of how i envision this working: i decide to track a site like boing boing, so i subscribe to it using my aggregation client. when it subscribes, it gets a public key (probably something i fetch from their server, perhaps embedded in the rss/atom feed). my client then hooks into the notification-and-content-distribution-network-in-the-sky, and says “hey, give me updates about boingboing”. later, the fine folks at boing boing (or xeni) post something, and because they’re using fancy new software that supports this mythical decentralized distribution system, it pushes the entry into the cloud. the update circulates through the cloud, reaching me in a nice ln(n) sort of way. my client then checks that the signature actually matches the public key i got earlier, and goes ahead and displays the content to me, fresh from the oven.

another scenario: now when i subscribe to jeremy zawodny’s blog, who has been slow to update his weblog software (in my hypothetical scenario) because he’s too busy learning how to fly airplanes, i don’t get updates whenever he publishes. but there’s enough other readers running this cloud-enabled aggregation software that when they decide they haven’t seen an update recently, they go ahead and poll his site. but when they notice an update, they inject it into the cloud. or they even notify the cloud that there hasn’t been an update.

obviously that second situation is much less ideal: there’s no signature, so some bozo could start injecting “postgresql is great!” entries into the jeremy zawodny feed space. or someone could just feed “nothing changed” messages, resulting in updates not getting noticed. the latter is fairly easy to deal with (add a bit of fuzzy logic there, where clients sometime decide to check for themselves even when they’ve been told nothing is new), but i’m not so sure about the forgery problem in the absence of some sort of signing mechanism.

in addition to notification, a nice feature for this cloud to have would be caching. that way when i wake up my machine in the morning, the updates i’ve missed can stream in from the network of peers who have been awake, and i don’t have to bother the original sites.

i don’t think there is going to be a quick and easy solution to this, but i hope to aid in the bootstrapping. if nothing else, can certainly gateway what it knows about blog updates into whatever system materializes. (it certainly can’t scale any worse than the existing cloud interface, which is pretty inefficient given the rate that pings are coming in.)

a footnote on the signing mechanism: there’s the xml-signature syntax and processing specification that covers this. i haven’t really looked at in detail to know what parts of the problem it solves or does not solve.

(anybody who suggests bittorrent as a key component of the solution will have to work much harder to get a passing grade.)


Manual track back :-)

» Ben Hyde (link) » september 10, 2004 8:32am

Jim Winstead has touched an interesting topic in the content distribution world (and specifically in the world of RSS and Atom). In "decentralized web(site|log) update notifications and content distribution" he describes a scenario in which smart aggregation clients do not...

» Martin Jansen (link) » september 10, 2004 1:20pm

FeedMesh is a group working to establish a "peering network" for decentralized web(site|log) update notifications and content distribution. The initial discussion happened on Sept 10th at Foo Camp. Companies and representatives involved so far are: Sc

» Sam Ruby (link) » september 11, 2004 10:19am

Microsoft's recent troubles with RSS files have focused the RSS/Atom communities squarely on fundamental problems with feed syndication as practiced today. While some of us have been warning about this problem for quite a while, it has taken Microsoft's recent

» As I May Think... (link) » september 11, 2004 3:16pm

FeedMesh is a group working to establish a "peering network" for decentralized web(site|log) update notifications and content distribution. [via Sam Ruby]...

» MeshForum (link) » september 12, 2004 7:21pm

Jeff Barr has an interesting tale to tell regarding syndication, scalability, and mod-pubsub:

» Mod-pubsub blog (link) » september 13, 2004 1:20pm

Jeremy Zawodny talks about the inevitability of search results as RSS that can be subscribed to, quoting Tim Bray: They’ve also done something way cool with their Google appliance; one of the bright geeks there has set up a thing...

» The Now Economy (link) » september 13, 2004 1:45pm


» nicks.txt (link) » august 7, 2009 7:20am

add a comment

sorry, comments on this post are closed.