How would you connect 10 million peers with decentralized p2p? - P2P-Zone

TankGirl · 23-09-02, 09:16 PM

How would you connect 10 million peers with decentralized p2p?

When Napster was living its heyday it provided a central meeting place for about a million online music lovers. Two years ago this million-seat filesharing venue was big enough to host almost the entire mp3 trading community. The p2p scene has evolved considerably since. A recent Odyssey study suggested that the number of online music swappers is already 40 million in US alone. Worldwide there might be perhaps 100+ million p2p users, and they are already consuming even 60 % of the bandwidth of some broadband ISPs. Comparing these figures to the total number of people on Internet which is about 560 million, we see that already every fifth or sixth Internet user is taking part in the fun.

Napster’s achievement to connect a million people to each other’s shares and chat reach was remarkable – both technically and socially. In terms of technical connectivity decentralized p2p is still about two magnitudes below Napster’s centralized performance. There may be two million people simultaneously online on FastTrack or WPN but not in any singular space where they could see each other. No two users are guaranteed to find each other when they connect to the network - actually they have a low chance of ending up to the same shared space (under the same supernode or under two linked supernodes). With each peer seeing just a fraction (perhaps 10.000 other peers) from the total population of millions, nice community-binding things like hotlists become unfeasible. And as each peer’s view to the content pool is similarly limited, the large network’s potential to provide rare and diverse content for everybody becomes seriously crippled. In other words there is plenty of room for innovation and improvements!

The present decentralized connectivity level has been achieved with a self-organizing supernode architecture where a number of powerful nodes volunteer to form a second connectivity level above the grass root level of 1-to-1 communications. The end result is not that different from Napster’s centralized architecture. P2p networks also have their own ‘server’ nodes (Gnutella’s ultrapeers, FastTrack’s supernodes, WinMX’s ‘primary connections’) acting as connectivity hosts, message brokers and search engines for the rest of the nodes. And just like Napster’s servers, these volunteering p2p supernodes are linked to each other to provide their local clients a wider view to the community and its resources.

Building connectivity on these two different platforms requires two very different approaches and architectures. It is almost like building two machines to do the same job, one working on Newtonian mechanics and the other on quantum mechanics. In the first case we have a deterministic system with components we know and control. Performance and reliability are built on powerful, reliable and well-maintained hardware. In the latter case we have a non-deterministic collective of unknown components interacting more or less freely (within the boundaries of the software) with each other. They keep appearing and disappearing at random, perform unpredictably and are subject to constant bandwidth fluctuations. There is plenty of collective power - technical and social - in large decentralized networks but it must be collected from a dispersed population of peers and its reliability (if any) must be based on redundancy and statistical behaviour of peer groups.

How to proceed from the supernode system? How to establish a smarter, more efficient connectivity infrastructure that could collect the growing megacommunities back into shared social spaces, into each other’s reach? The design target for next generation p2p should be connectivity for millions rather than for thousands. Should there be a number of 'hypernodes' selected from among the supernodes to form yet another layer of connectivity? If the present supernode architecture fails to scale up due to the increasing 'long distance' search and browse traffic between the linked supernodes, how would you regulate bandwidth usage with the possible additional connectivity layers? Or would you try some other approach altogether?

Think of it as a game with ten million active pieces in it. The pieces represent a wide spectrum of different computing powers, bandwidths and shared contents. You can instruct all pieces how to act in different situations but you cannot change the inherent properties (bandwidth etc.) of any particular piece. Nor can you control the online presence of any given piece although you can observe and analyse it. To make the challenge even tougher, many of the pieces will be shielded with firewalls which prevent them from responding to externally initiated contacts. How would you create a working decentralized hotlist in such an environment, so that an arbitrary peer A would find its hotlisted contact B in a reasonable time (even minutes would be acceptable in a large network) after popping online? And how would you handle the situation where peer B would not want to be found by A (ignore list)?

Your creative thoughts and reflections please, dear Napsterites.

- tg

assorted · 23-09-02, 10:26 PM

i don't understand why a p2p proggie hasn't used a system similar to what dynamic ip people use to get their own web addresses (like http://www.dyndns.org). every username/password connects to a central server that records their current IP address. When you check someones hotlist, you check that server which spits back the ip address currently associated with that username and then you contact them. It's what audiognome needed ages ago to make their pnp thingie work (and he never implemented i don't think).

i pointed out that there might be a legal problem with this idea some time ago, and someone responded that if this kind of lookup from a centralized server was illegal, then the backbone of the web is illegal for associating a warez url with a numerical ip address. I think that person was correct and this kind of lookup is pretty much on the safe side legally.

If there even was any question, the safest way to do it legally would be to get a 3rd party company to provide the service of connecting username to ip addy. maybe dyndns.org for example; since they are essentially doing this service already. that would make any legal arguement the media conglomerates come up with all the more difficult.

kento · 24-09-02, 07:15 AM

wow, good thoughts assorted...whaddayou think TG...could this work?

SA_Dave · 24-09-02, 03:21 PM

I basically agree with assorted, but using DNS is not going to work IMO.

Quote:

If there even was any question, the safest way to do it legally would be to get a 3rd party company to provide the service of connecting username to ip addy.

The problem here is that you're relying completely on a third party with centralised servers!

It might not be illegal for them to provide "connectivity", but they are open to intimidation, bankruptcy etc. How do you guarantee that the decentralised networks remain in the people's hands if you're dependant on a centralised, beauracratic despot?

The solution is that you don't! There has to be a simple way to guarantee connectivity, even if it's only by using distributed host caches. Limewire uses these, which means that they now cannot be shutdown, even if all the Limewire, Bearshare etc. caches are forced offline!

Guaranteeing identity in a decentralised environment is another difficult challenge, especially if there are no centralised authorities (login servers, irc or similar chat). Keeping logs of ip-addresses is impractical and useless if you want to guarantee that "some address" is the same as "some peer" you interacted with previously. A dynamic DNS solution is a good idea, but the other problems with it are that it's too static for the purposes of a dynamic p2p network (DNS takes about 2 days to fully refresh) and it also requires that a user register & install a third-party utility. It shouldn't be that complicated for the user!

The problem with today's networks is their non-compromising nature. Direct Connect, eDonkey & even Open-Nap for example have good content availability because the networks are generally user-controlled. You can create your own hub or server to cater to a particular niche. The problem is that this is done in a relatively centralised fashion, even though the networks are very dynamic. On the other hand, the more distributed systems like Gnutella & FastTrack leave little room for social, content-relevant groupings. There may be lots of content, but you can't be guaranteed that you'll even see it, let alone obtain it.

The solution in my mind is to create a large, completely decentralised network on the backbone of community or thematic groupings (which can generally be assumed to be similar.) It's far better to be connected to 50 people who have everything you could possibly be interested in, than to a million who only share naked Brittany pics.

It would be ideal to be connected to millions, but this is only necessary if you have extremely broad interests. This class of individuals shouldn't be alienated however : some might consider an mp3-only sharer to belong to a broader category, while others may not, depending on the actual specifics of the content involved.

So to sum up, I believe that relevant connectivity & visibility are more important factors than the number of arbitrary peers that you're connected to.

kento · 24-09-02, 04:22 PM

hmmm....SA_Dave (forgive me i keep wanting to call you "super dave" for some reason) i might be guilty of misunderstanding your post....but here i go again...(hehe...not understanding has never stopped me before so i will proceed)

okay i see what you mean about DNS companies like "deerfield" (dns2go.com) etcetera being open to intimidation by the various and sometimes "nefarious" parties that are interested in routing out this new scourge called "p2p"

okay so what about letting the "supernodes" do the "dns" work for us? (how? i dunno) maybe that was what you mean by 'dynamic host caches' if it wasn't can you please explain what is meant by "dynamic host caches" as i am not familiar with that term.

wow..that wasn't so as bad as i thought it was gonna be.

thanks,

kento

Scyth · 24-09-02, 09:26 PM

While, I'm not sure how future-generation scalable decentralized networks file will be organized, I'm sure it won't be around ultrapeers. Here's why:

Say you have 100 nodes that are sharing files, and you wish the search the file libraries of theses nodes. The most obvious solution, and the one used by the original (0.4) GnutellaNet, is to send a query to every node. Now, if the query traffic of the network is 20KB/s, then every node must be able to support this level of traffic.

An improvement on this scheme is to use ultrapeers. One node out of the hundred is selected, all the other nodes report the contents of their libraries to it, and only that node is queried. This is somewhat of an improvement as it means that 99 out of the 100 nodes need support only a very small level of traffic. However, one node is still left needing to support the full 20KB/s of query traffic.

Consider an alternate solution: instead of every node reporting its library to a single other node, have every node report its library to every other node. With this configuration, only one of any of the nodes need be sent each query. If the queries are evenly divided between the nodes, then each node need only handle .2KB/s of traffic.

It seems to me that the alternate solution is the better one. It still isn't perfect however: at some point the overhead involved in reporting the library contents to every other node will outweight the reductions in query traffic.

The problem with scalability in all current decentralized file-sharing networks is that adding nodes hurts the network. Using superpeers doesn't eliminate this problem, either; it simply causes a linear reduction in the apparent network size. In order to be scalable, adding nodes must not harm a network. Moreover, there's no reason to think a solution to this problem isn't possible. Each new node on a network contributes to the network's aggregate capacity. Why shouldn't this strengthen the network, rather than harm it?

greedy_lars · 25-09-02, 12:26 AM

Put up some bad ass hard wired servers in Sealand. Or in rebel held columbia.

TankGirl · 25-09-02, 04:56 PM

Quote:

Originally posted by SA_Dave
So to sum up, I believe that relevant connectivity & visibility are more important factors than the number of arbitrary peers that you're connected to.

A good sum-up, SA_Dave.

I also see relevant connectivity and relevant visibility as key concepts in tackling the decentralized scalability challenge. We can’t - and we don’t need to - be directly connected to all 10 million peers. What we want is to be well connected to those peers that are somehow useful, interesting or important to us. The same applies to the visibility of the content. We can live with slow searches from the total content pool if we have a good direct visibility to the stuff we are really interested in and to the people sharing it.

Quote:

Originally posted by Scyth
The problem with scalability in all current decentralized file-sharing networks is that adding nodes hurts the network. Using superpeers doesn't eliminate this problem, either; it simply causes a linear reduction in the apparent network size. In order to be scalable, adding nodes must not harm a network. Moreover, there's no reason to think a solution to this problem isn't possible. Each new node on a network contributes to the network's aggregate capacity. Why shouldn't this strengthen the network, rather than harm it?

Another good sum-up, and I agree 100 % with your conclusions, Scyth.

Like you say, each joining node contributes to the total capacity, to the total technical and social potential of the network. The quality and the efficiency of the connectivity infrastructure determine how much of this potential can be put into good use.

- tg

multi · 29-09-02, 06:04 AM

scuse my silly questions.....
would it be possible for each user to broadcast
what would be similar to a dynamic ip?, may be an encrypted version or something ?
that only a host or client with the right soft ware could find using ....say a search?

TankGirl · 30-09-02, 04:15 PM

Quote:

Originally posted by assorted
i don't understand why a p2p proggie hasn't used a system similar to what dynamic ip people use to get their own web addresses (like http://www.dyndns.org). every username/password connects to a central server that records their current IP address. When you check someones hotlist, you check that server which spits back the ip address currently associated with that username and then you contact them. It's what audiognome needed ages ago to make their pnp thingie work (and he never implemented i don't think).

i pointed out that there might be a legal problem with this idea some time ago, and someone responded that if this kind of lookup from a centralized server was illegal, then the backbone of the web is illegal for associating a warez url with a numerical ip address. I think that person was correct and this kind of lookup is pretty much on the safe side legally.

If there even was any question, the safest way to do it legally would be to get a 3rd party company to provide the service of connecting username to ip addy. maybe dyndns.org for example; since they are essentially doing this service already. that would make any legal arguement the media conglomerates come up with all the more difficult.

Quote:

Originally posted by multi inter user face
scuse my silly questions.....
would it be possible for each user to broadcast
what would be similar to a dynamic ip?, may be an encrypted version or something ?
that only a host or client with the right soft ware could find using ....say a search?

Both these ideas are related to the challenge of getting connected to the network in the first place. For this challenge DNS is a useful mapping system and can be used in a number of ways. SA_Dave pointed well out the problems of centralized DNS services so we don’t want to be in any way dependent on them. But that does not prevent us using these services to our benefit should they be conveniently available and usable without compromising peer security or privacy.

There is no need for every peer to regularly publish or broadcast its own port/IP info to the outside world. It is enough that a dedicated fraction of peers will provide some kind of DNS-mapped entry point service to the network. There are many ways to implement this, and some of them could be used side by side.

As suggested by assorted, some peers could make themselves publicly available on dynamic DNS addresses, acting directly as ‘doormen’ to the network. This approach might work at least for peers who wouldn’t mind revealing their IP addresses publicly and using a fixed port for the job.

Peers could also publish ‘host caches’ (IP/port listings of active peers providing entry point services) on personal websites. Entrusting a p2p client with FTP interface to upload such listings would not be too big a job. This approach would have many benefits. The bandwidth load of the entry point service would be on your ISP’s server and not on your own line. You would not need to reveal your own IP number and network identity in the lists you publish - a random selection of available peers would do, and also help to balance the load between entry point peers. And using encryption you could also publish entry point lists for closed groups. A few pages on private websites would be enough to keep a smaller private group well connected; a few thousand similar pages could handle a population of millions, providing a wide and hard-to-attack entry surface from the public DNS-mapped webspace.

Apart from the challenge of getting connected to the network is the challenge of re-establishing your social connectivity in it. Assuming you can hook yourself in with the help of peer X whose IP address and port you got from a known web address, your next question to X is: "where are my friends A, B and C?" Or: "can you find me somebody from group G so that I can connect to it?" As peer X (like any other peer) can know the IP addresses and ports only for a small fraction of the entire 10 million peer population, there must be a smart infrastructure to help peers to search each other by identities. If we have an internal connectivity system where any gatekeeper peer can help you to find the specific peers and groups you are looking for from among 10 million sparsely connected peers say in a minute or two, we can build working decentralized hotlists and communities on it.

- tg

23-09-02, 09:16 PM	#1
TankGirl Madame Comrade Join Date: May 2000 Location: Area 25 Posts: 5,587	How would you connect 10 million peers with decentralized p2p? How would you connect 10 million peers with decentralized p2p? When Napster was living its heyday it provided a central meeting place for about a million online music lovers. Two years ago this million-seat filesharing venue was big enough to host almost the entire mp3 trading community. The p2p scene has evolved considerably since. A recent Odyssey study suggested that the number of online music swappers is already 40 million in US alone. Worldwide there might be perhaps 100+ million p2p users, and they are already consuming even 60 % of the bandwidth of some broadband ISPs. Comparing these figures to the total number of people on Internet which is about 560 million, we see that already every fifth or sixth Internet user is taking part in the fun. Napster’s achievement to connect a million people to each other’s shares and chat reach was remarkable – both technically and socially. In terms of technical connectivity decentralized p2p is still about two magnitudes below Napster’s centralized performance. There may be two million people simultaneously online on FastTrack or WPN but not in any singular space where they could see each other. No two users are guaranteed to find each other when they connect to the network - actually they have a low chance of ending up to the same shared space (under the same supernode or under two linked supernodes). With each peer seeing just a fraction (perhaps 10.000 other peers) from the total population of millions, nice community-binding things like hotlists become unfeasible. And as each peer’s view to the content pool is similarly limited, the large network’s potential to provide rare and diverse content for everybody becomes seriously crippled. In other words there is plenty of room for innovation and improvements! The present decentralized connectivity level has been achieved with a self-organizing supernode architecture where a number of powerful nodes volunteer to form a second connectivity level above the grass root level of 1-to-1 communications. The end result is not that different from Napster’s centralized architecture. P2p networks also have their own ‘server’ nodes (Gnutella’s ultrapeers, FastTrack’s supernodes, WinMX’s ‘primary connections’) acting as connectivity hosts, message brokers and search engines for the rest of the nodes. And just like Napster’s servers, these volunteering p2p supernodes are linked to each other to provide their local clients a wider view to the community and its resources. Building connectivity on these two different platforms requires two very different approaches and architectures. It is almost like building two machines to do the same job, one working on Newtonian mechanics and the other on quantum mechanics. In the first case we have a deterministic system with components we know and control. Performance and reliability are built on powerful, reliable and well-maintained hardware. In the latter case we have a non-deterministic collective of unknown components interacting more or less freely (within the boundaries of the software) with each other. They keep appearing and disappearing at random, perform unpredictably and are subject to constant bandwidth fluctuations. There is plenty of collective power - technical and social - in large decentralized networks but it must be collected from a dispersed population of peers and its reliability (if any) must be based on redundancy and statistical behaviour of peer groups. How to proceed from the supernode system? How to establish a smarter, more efficient connectivity infrastructure that could collect the growing megacommunities back into shared social spaces, into each other’s reach? The design target for next generation p2p should be connectivity for millions rather than for thousands. Should there be a number of 'hypernodes' selected from among the supernodes to form yet another layer of connectivity? If the present supernode architecture fails to scale up due to the increasing 'long distance' search and browse traffic between the linked supernodes, how would you regulate bandwidth usage with the possible additional connectivity layers? Or would you try some other approach altogether? Think of it as a game with ten million active pieces in it. The pieces represent a wide spectrum of different computing powers, bandwidths and shared contents. You can instruct all pieces how to act in different situations but you cannot change the inherent properties (bandwidth etc.) of any particular piece. Nor can you control the online presence of any given piece although you can observe and analyse it. To make the challenge even tougher, many of the pieces will be shielded with firewalls which prevent them from responding to externally initiated contacts. How would you create a working decentralized hotlist in such an environment, so that an arbitrary peer A would find its hotlisted contact B in a reasonable time (even minutes would be acceptable in a large network) after popping online? And how would you handle the situation where peer B would not want to be found by A (ignore list)? Your creative thoughts and reflections please, dear Napsterites. - tg

23-09-02, 10:26 PM	#2
assorted WAH! Join Date: Apr 2001 Posts: 725	i don't understand why a p2p proggie hasn't used a system similar to what dynamic ip people use to get their own web addresses (like http://www.dyndns.org). every username/password connects to a central server that records their current IP address. When you check someones hotlist, you check that server which spits back the ip address currently associated with that username and then you contact them. It's what audiognome needed ages ago to make their pnp thingie work (and he never implemented i don't think). i pointed out that there might be a legal problem with this idea some time ago, and someone responded that if this kind of lookup from a centralized server was illegal, then the backbone of the web is illegal for associating a warez url with a numerical ip address. I think that person was correct and this kind of lookup is pretty much on the safe side legally. If there even was any question, the safest way to do it legally would be to get a 3rd party company to provide the service of connecting username to ip addy. maybe dyndns.org for example; since they are essentially doing this service already. that would make any legal arguement the media conglomerates come up with all the more difficult. __________________ I hate hate haters

29-09-02, 06:04 AM	#9
multi Thanks for being with arse Join Date: Jan 2002 Location: The other side of the world Posts: 10,343	scuse my silly questions..... would it be possible for each user to broadcast what would be similar to a dynamic ip?, may be an encrypted version or something ? that only a host or client with the right soft ware could find using ....say a search? __________________ i beat the internet - the end boss is hard Secret Temple of It

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

24-09-02, 07:15 AM	#3
kento Apprentice Napsterite Join Date: Aug 2002 Location: Germany Posts: 88	wow, good thoughts assorted...whaddayou think TG...could this work?

24-09-02, 04:22 PM	#5
kento Apprentice Napsterite Join Date: Aug 2002 Location: Germany Posts: 88	hmmm....SA_Dave (forgive me i keep wanting to call you "super dave" for some reason) i might be guilty of misunderstanding your post....but here i go again...(hehe...not understanding has never stopped me before so i will proceed) okay i see what you mean about DNS companies like "deerfield" (dns2go.com) etcetera being open to intimidation by the various and sometimes "nefarious" parties that are interested in routing out this new scourge called "p2p" okay so what about letting the "supernodes" do the "dns" work for us? (how? i dunno) maybe that was what you mean by 'dynamic host caches' if it wasn't can you please explain what is meant by "dynamic host caches" as i am not familiar with that term. wow..that wasn't so as bad as i thought it was gonna be. thanks, kento

24-09-02, 09:26 PM	#6
Scyth Registered User Join Date: Apr 2001 Location: Vancouver, Canada Posts: 454	While, I'm not sure how future-generation scalable decentralized networks file will be organized, I'm sure it won't be around ultrapeers. Here's why: Say you have 100 nodes that are sharing files, and you wish the search the file libraries of theses nodes. The most obvious solution, and the one used by the original (0.4) GnutellaNet, is to send a query to every node. Now, if the query traffic of the network is 20KB/s, then every node must be able to support this level of traffic. An improvement on this scheme is to use ultrapeers. One node out of the hundred is selected, all the other nodes report the contents of their libraries to it, and only that node is queried. This is somewhat of an improvement as it means that 99 out of the 100 nodes need support only a very small level of traffic. However, one node is still left needing to support the full 20KB/s of query traffic. Consider an alternate solution: instead of every node reporting its library to a single other node, have every node report its library to every other node. With this configuration, only one of any of the nodes need be sent each query. If the queries are evenly divided between the nodes, then each node need only handle .2KB/s of traffic. It seems to me that the alternate solution is the better one. It still isn't perfect however: at some point the overhead involved in reporting the library contents to every other node will outweight the reductions in query traffic. The problem with scalability in all current decentralized file-sharing networks is that adding nodes hurts the network. Using superpeers doesn't eliminate this problem, either; it simply causes a linear reduction in the apparent network size. In order to be scalable, adding nodes must not harm a network. Moreover, there's no reason to think a solution to this problem isn't possible. Each new node on a network contributes to the network's aggregate capacity. Why shouldn't this strengthen the network, rather than harm it?

25-09-02, 12:26 AM	#7
greedy_lars everything you do Join Date: Dec 2000 Location: wlll come back around to you Posts: 3,982	Put up some bad ass hard wired servers in Sealand. Or in rebel held columbia.