From rogerpack2005 at gmail.com Thu Nov 1 15:09:49 2007 From: rogerpack2005 at gmail.com (Roger Pack) Date: Thu, 1 Nov 2007 12:09:49 -0700 Subject: [distribustream-talk] what algorithm? In-Reply-To: References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710310638x6394381fu70e24fa8b9efca9@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> Message-ID: <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> On 10/31/07, Tony Arcieri wrote: > > On 10/31/07, Roger Pack wrote: > > > > the comments I'd have are > > 1) doesn't this create a bottleneck at the server? Don't you want peers > > to be able to self organize so it can scale more easily? > > > Self-organization can work quite well, particularly in trusted > environments. Link State Routing is a great example. Thus far it's worked > fairly well for P2P protocols in untrusted environments as well. > > DistribuStream uses a server-managed network for several reasons. The > first is to dramatically simplify the client-side logic, making clients > lightweight and easy to implement. > However it does require the servers themselves to be modified, making it a little harder to roll out :) More useful I think (I've never heard of one in existence--maybe does) would be to be able to p2p stream arbitrary streams (so the server never has to be modified, clients can immediately benefit from the usefulness of such a protocol). This also affords algorithmic improvements implemented on the server side > only without changes to the clients. All client/server and peer-to-peer > communication can be modeled in the form of state machines that clients must > obey if they wish for transfers to continue. Since the clients are just > state machines, algorithmic improvements can also be built with expectations > of clients functioning in exactly the same way, preventing such improvements > from being mitigated by differences in client implementations. > So since the clients are 'dumb' and server is easily modifiable it keeps testing sane, is that right? In terms of this congesting the server, it's a potential problem. The > client/server protocol uses lightweight JSON asynchronous messaging across a > persistent connection, so message processing is as simple. > > DistribuStream is built on the Ruby/EventMachine library. On Linux, this > library uses the epoll(4) system call for connection multiplexing, allowing > it to scale to thousands of concurrent connections. > I was mostly concerned with single point of failure (well, I guess with a stream the origin is always a single point of failure) and also after awhile a bottleneck of bandwidth/number of clients. 2) It is nice for these protocols to 'just' use HTTP instead of creating and > > using a whole new transport protocol. Then it's nicer on everyone :) > > > DistribuStream is built around existing standards as much as possible. > Using HTTP makes it effectively function as an ad hoc HTTP caching proxy > network I agree. Sounds like a fun project :) Having access to planetlab might help with testing. =Roger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071101/88fd266e/attachment-0001.html From tony at clickcaster.com Thu Nov 1 17:28:53 2007 From: tony at clickcaster.com (Tony Arcieri) Date: Thu, 1 Nov 2007 15:28:53 -0600 Subject: [distribustream-talk] what algorithm? In-Reply-To: <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710310638x6394381fu70e24fa8b9efca9@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> Message-ID: On 11/1/07, Roger Pack wrote: > > However it does require the servers themselves to be modified, making it a > little harder to roll out :) > More useful I think (I've never heard of one in existence--maybe does) > would be to be able to p2p stream arbitrary streams (so the server never has > to be modified, clients can immediately benefit from the usefulness of such > a protocol). > I'm not really sure what you mean by that... can you describe that a little more in depth? So since the clients are 'dumb' and server is easily modifiable it keeps testing sane, is that right? > Yes, since all communication can be modeled as state machines it's very easy to check whether a given client (or server) is well-behaved, simply by checking its responses to various events when it's in various known states. Moreover it makes reasoning about the network much simpler, since you don't have to deal with emergent behaviors. Bram Cohen himself (creator of BitTorrent) said this makes testing changes to BitTorrent incredibly difficult, since you have to account for network effects. Moreover, you have to account for such effects in a network of heterogeneous client implementations which may behave in different ways. I was mostly concerned with single point of failure (well, I guess with a stream the origin is always a single point of failure) > and also after awhile a bottleneck of bandwidth/number of clients. > Bandwidth-wise PDTP should be fairly similar to BitTorrent. The main difference is the messages going between the server/tracker are informational in BitTorrent (effectively providing a scoreboard that clients use to self-organize), and commands in PDTP (with all authority relegated to the server). I agree. > Sounds like a fun project :) Having access to planetlab might help with > testing. > We tried to get access to PlanetLab when it was a university project but for various reasons could not (I don't remember the specific details offhand) If anyone has access to it I'd be extremely grateful. Right now the only integration testing I've been able to perform is effectively a simulation on a single computer. -- Tony Arcieri ClickCaster, Inc. tony at clickcaster.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071101/b76428a2/attachment.html From rogerpack2005 at gmail.com Tue Nov 6 12:06:40 2007 From: rogerpack2005 at gmail.com (Roger Pack) Date: Tue, 6 Nov 2007 10:06:40 -0700 Subject: [distribustream-talk] what algorithm? In-Reply-To: References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710310638x6394381fu70e24fa8b9efca9@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> Message-ID: <966599840711060906q3560d14es173dd11a1b3b3000@mail.gmail.com> On Nov 1, 2007 2:28 PM, Tony Arcieri wrote: > On 11/1/07, Roger Pack wrote: > > > However it does require the servers themselves to be modified, making it a > > little harder to roll out :) > > More useful I think (I've never heard of one in existence--maybe does) > > would be to be able to p2p stream arbitrary streams (so the server never has > > to be modified, clients can immediately benefit from the usefulness of such > > a protocol). > > > > I'm not really sure what you mean by that... can you describe that a > little more in depth? > If we make the clients self-organizing and make it so that the server 'doesn't have to change' (we'll admit that most data streamed from a server to clients is the same data, so, through some means, those clients could share the data without offending the server :) > > > > So since the clients are 'dumb' and server is easily modifiable it keeps testing sane, is that right? > > > > Yes, since all communication can be modeled as state machines it's very > easy to check whether a given client (or server) is well-behaved, simply by > checking its responses to various events when it's in various known states. > Moreover it makes reasoning about the network much simpler, since you don't > have to deal with emergent behaviors. Bram Cohen himself (creator of > BitTorrent) said this makes testing changes to BitTorrent incredibly > difficult, since you have to account for network effects. Moreover, you > have to account for such effects in a network of heterogeneous client > implementations which may behave in different ways. > Yeah testing without a nice testbed would be hard! So is the purpose of this protocol to 'research' into behavior (research into how to make sure everyone participates), or...just to make a streaming protocol that works and is efficient? > > > I was mostly concerned with single point of failure (well, I guess with a stream the origin is always a single point of failure) > > > > and also after awhile a bottleneck of bandwidth/number of clients. > > > > Bandwidth-wise PDTP should be fairly similar to BitTorrent. The main > difference is the messages going between the server/tracker are > informational in BitTorrent (effectively providing a scoreboard that clients > use to self-organize), and commands in PDTP (with all authority relegated to > the server). > Gotcha PDTP sends them commands of who has what blocks and to get them from whatever, whereas in BT it just tells them of the existence of other peers, and the peers decide themselves who has what they want, they self organize, etc. So it's a centralized bitTorrent. > > > I agree. > > Sounds like a fun project :) Having access to planetlab might help with > > testing. > > > > We tried to get access to PlanetLab when it was a university project but > for various reasons could not (I don't remember the specific details > offhand) > > If anyone has access to it I'd be extremely grateful. Right now the only > integration testing I've been able to perform is effectively a simulation on > a single computer. I could let you access to my home computer :) DSL :) If testing comes to it eventually I might have access to Planetlab. I have a slice I'm using for my thesis but I might be able to talk to my advisor and see about using it for this one, esp. after the thesis is done. What I'd find fascinating to build would be a client-only system that 'works' for arbitrary streams. That would be cool. We could use opendht to connect the peers (if we want to--I have libs for it), or let the peers make their own DHT. Then just connect them appropriately. It would be a somewhat decentralized way. That's what my thesis is on mostly :) It depends if we want to research just one special niche, or just make 'something that works'. It might be worth researching into application layer multicast and p2p dht multicast to see if they have some insights in there on good things to do. Anyway some things to think about. Take care! > > > -- > Tony Arcieri > ClickCaster, Inc. > tony at clickcaster.com > -- -Roger Pack I like belief. http://www.google.com/search?q=free+bible -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071106/0ce71822/attachment.html From tony at clickcaster.com Tue Nov 6 14:13:20 2007 From: tony at clickcaster.com (Tony Arcieri) Date: Tue, 6 Nov 2007 12:13:20 -0700 Subject: [distribustream-talk] what algorithm? In-Reply-To: <966599840711060906q3560d14es173dd11a1b3b3000@mail.gmail.com> References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710310638x6394381fu70e24fa8b9efca9@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> <966599840711060906q3560d14es173dd11a1b3b3000@mail.gmail.com> Message-ID: On 11/6/07, Roger Pack wrote: > > > If we make the clients self-organizing and make it so that the server > 'doesn't have to change' (we'll admit that most data streamed from a server > to clients is the same data, so, through some means, those clients could > share the data without offending the server :) > Self-organization cuts both ways. A peer knows more about itself and its immediate partners than the server ever could (unless they did a lot of reporting and the server trusted the peers implicitly), but makes it immensely difficult to guarantee things like QoS. That said, pretty much every BitTorrent-like peer-to-peer protocol is attempting to use a decentralized approach. DistribuStream is experimenting with a server-regulated approach. Yeah testing without a nice testbed would be hard! > > So is the purpose of this protocol to 'research' into behavior (research > into how to make sure everyone participates), or...just to make a streaming > protocol that works and is efficient? > It's a combination of the thereof, but the main goal is to make a working, efficient protocol with QoS similar to traditional progressive download protocols like HTTP. Gotcha PDTP sends them commands of who has what blocks and to get them from > whatever, whereas in BT it just tells them of the existence of other peers, > and the peers decide themselves who has what they want, they self organize, > etc. So it's a centralized bitTorrent. > In PDTP clients do nothing except obey the server. If BitTorrent is declarative (here's who has what... figure out how to transfer the file yourself) PDTP is imperative (connect to this peer and download or upload this chunk of this file) Among other things this lets the server model the entire peer network as a digraph, weight the edges based on what it's learned about the peers and the connections between them (based on their transfer history), and perform some graph theoretic analysis across the whole system to discover optimal network configurations. The server can continuously recalculate the optimal network topology (using some sort of chaining algorithm), and send the peers commands to construct it. The peers give feedback about what worked and what didn't which can be factored back into the model ( e.g. this transfer failed, the hash didn't match, etc.) I could let you access to my home computer :) DSL :) > If testing comes to it eventually I might have access to Planetlab. I > have a slice I'm using for my thesis but I might be able to talk to my > advisor and see about using it for this one, esp. after the thesis is done. > That would be excellent, thank you What I'd find fascinating to build would be a client-only system that > 'works' for arbitrary streams. That would be cool. We could use opendht to > connect the peers (if we want to--I have libs for it), or let the peers make > their own DHT. Then just connect them appropriately. It would be a > somewhat decentralized way. That's what my thesis is on mostly :) > It depends if we want to research just one special niche, or just make > 'something that works'. It might be worth researching into application > layer multicast and p2p dht multicast to see if they have some insights in > there on good things to do. > You might have a look into the many variations on the BitTorrent theme which use a DHT to allow "trackerless" operation. DistribuStream is going for much the opposite in terms of centralized server regulation. -- Tony Arcieri ClickCaster, Inc. tony at clickcaster.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071106/2647254a/attachment-0001.html From rogerpack2005 at gmail.com Tue Nov 6 18:21:49 2007 From: rogerpack2005 at gmail.com (Roger Pack) Date: Tue, 6 Nov 2007 16:21:49 -0700 Subject: [distribustream-talk] what algorithm? In-Reply-To: References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710310638x6394381fu70e24fa8b9efca9@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> <966599840711060906q3560d14es173dd11a1b3b3000@mail.gmail.com> Message-ID: <966599840711061521t137c3154pc2be37e7c4990bf1@mail.gmail.com> > > > What I'd find fascinating to build would be a client-only system that > > 'works' for arbitrary streams. That would be cool. We could use opendht to > > connect the peers (if we want to--I have libs for it), or let the peers make > > their own DHT. Then just connect them appropriately. It would be a > > somewhat decentralized way. That's what my thesis is on mostly :) > > It depends if we want to research just one special niche, or just make > > 'something that works'. It might be worth researching into application > > layer multicast and p2p dht multicast to see if they have some insights in > > there on good things to do. > > > > You might have a look into the many variations on the BitTorrent theme > which use a DHT to allow "trackerless" operation. > Are there any industry streaming client-side only programs? That might be a niche worth considering, though I can see that the other would be interesting as well (if less scalable, less useful, and harder to actually get people to use, therefore not as useful to humanity, hence my favorance of client-side only ones). Take care! -Roger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071106/d0859974/attachment.html From rogerpack2005 at gmail.com Tue Nov 6 18:24:01 2007 From: rogerpack2005 at gmail.com (Roger Pack) Date: Tue, 6 Nov 2007 16:24:01 -0700 Subject: [distribustream-talk] what algorithm? In-Reply-To: <966599840711061521t137c3154pc2be37e7c4990bf1@mail.gmail.com> References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710310638x6394381fu70e24fa8b9efca9@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> <966599840711060906q3560d14es173dd11a1b3b3000@mail.gmail.com> <966599840711061521t137c3154pc2be37e7c4990bf1@mail.gmail.com> Message-ID: <966599840711061524v856fe61t7952a35c85326db2@mail.gmail.com> It would be interesting for peers to be able to volunteer as 'selfless hosts' that just help stream whatever people want. Maybe hook up some planetlab peers to do such. It would also be interesting to see if using lots of peers serving you different inter-leaved pieces of the file (a la move.tv) would help it or not. Things that would be interesting. -Roger -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071106/1dfe6da1/attachment.html From tony at clickcaster.com Tue Nov 6 20:00:14 2007 From: tony at clickcaster.com (Tony Arcieri) Date: Tue, 6 Nov 2007 18:00:14 -0700 Subject: [distribustream-talk] what algorithm? In-Reply-To: <966599840711061524v856fe61t7952a35c85326db2@mail.gmail.com> References: <966599840710302009w5be8a237w9585630cbec0cd76@mail.gmail.com> <966599840710311126p54455e8dl46730da0746e1fdc@mail.gmail.com> <966599840711011209i65687895l4576278ca2fb1412@mail.gmail.com> <966599840711060906q3560d14es173dd11a1b3b3000@mail.gmail.com> <966599840711061521t137c3154pc2be37e7c4990bf1@mail.gmail.com> <966599840711061524v856fe61t7952a35c85326db2@mail.gmail.com> Message-ID: On 11/6/07, Roger Pack wrote: > > > Are there any industry streaming client-side only programs? As far as I know there aren't any streaming client-side only programs, at least that support segmented downloading from multiple peers That might be a niche worth considering, though I can see that the other > would be interesting as well (if less scalable, less useful, and harder to > actually get people to use, therefore not as useful to humanity, hence my > favorance of client-side only ones). On the contrary, the goal of DistribuStream is "click the link and the video plays". I don't know how you can get any easier than that. Scalability is a concern, however... It would be interesting for peers to be able to volunteer as 'selfless > hosts' that just help stream whatever people want. Maybe hook up some > planetlab peers to do such. That's certainly in the works, and not particularly difficult to accomplish given the current implementation. You might have a look at the "piece proxies" described in the Defcon presentation I did on PDTP 4 years ago: http://althing.cs.dartmouth.edu/secref/resources/defcon12/dc-12-arcieri.ppt(see page 18+) These would effectively be "trusted clients" that the server prioritizes in terms of chunk picking. It would also be interesting to see if using lots of peers serving you > different inter-leaved pieces of the file (a la move.tv) would help it or > not. Things that would be interesting. The protocol already does that. At present it sets up 10 concurrent transfer slots at a time. Hopefully in the future concurrent transfers can either be specified by the client upon registration or determined by the server algorithmically. -- Tony Arcieri ClickCaster, Inc. tony at clickcaster.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/distribustream-talk/attachments/20071106/7ca2a977/attachment.html