data integrity on p2p networks - an approach - P2P-Zone

indiana_jones · 04-03-02, 04:40 AM

with this thread i want to start some discussion about data integrity on sharing networks.
data integrity in principle concerns all data but especially data, which can damage anything like sw, scripts, codecs etc.
i only want ro consider the bad cases, the cases data are infected or damaged somehow, willingly or unwillingly and how to detect this or avoid this.
(for short i call bad data simply infected i.e. they may contain an unknown new virus or piece of added malicious code).

where or when are the sources of infection?[list=1][*]the originator infects the data and puts them on p2p[*]a downloader and sharer infects the data and reshares them[*]data get somehow infected or damaged on their way over different p2p networks[/list=1]all these things can happen without change of the main characterics like name, filesize or other details used by p2p networks to identify data. even the hashes are no real integrity criteria, because the not consequently used in p2p nets (i.e. gnutella), are not exhaustive calculated (i.e. fasttrack only uses sample blocks of a file) and mainly differ from net to net so they cannot consequently be used on all networks.

from all this i would say it is not possible for any user on any net to determine if just downloaded data are infected or not.

the only way in order to 100% prove if data are infected or not would be to compare them to the original, but this is not how p2p works.

a second way is to have a characteristic which proves with a very high probability that data are identical to the original.

one way i know to do this is a signature which must be of a certain length and exhaustive calculated - which means it must be sure that every bit has its correct value and is on its correct position.

this signature has to be calculated from a p2p network client independent open source tool and placed in public places so it could be proved and commented by users - this is the only way to cover case 1 (the bad originator).

all other cases are covered by this tool just using it after download on the data, recalculating the signature and comparing it to the published one.

i thought of a 32 byte 2*md5 hash, one in forward one in backward direction (block by block) which gives a 44 character uuencoded has string.

remarks, comments or anything else would be very welcome.
indy

Mowzer · 04-03-02, 06:16 AM

The most common virii infection that hits file share users I noticed often doesn't origitnate from the file share software itself.

Email worms that spread .vbs infections to .mp3s and .jpgs are common. A user reads email, and due to stupidity ends up opening a worm. This intern infects mp3s and jpegs, by chnging extensions to madonna.mp3.vbs for example.

The users may have a huge mp3 collection, and although they get the virus taken care of often they miss one or two mp3 files. Since they may have had kazaa or grokster running, and grokster and kazaa used to hide double extesnions, other users were getting the same worms.

In this type of case the user doesn't realize they are willingly sharing infected files.

Why some people don't clean up messes on there machines is beyond me. I know a girl who has her messenger sending out the message "Hi, I have new pics, take a look me.jpg(1)(@)(2)(1).exe" for others to download."

I told her she has a messenger virus running, her reply was, "I know, its been like that for months, I just tell my friends not to click it."

This is confusing, as I sent her a link to a easy step by step clean up article, and she read it, yet for some reason just never bothered cleaning it up.

I think your right indy, about it being impossible to determine for all p2p nets if a file is damaged, with out orignal comparisson.

"this signature has to be calculated from a p2p network client independent open source tool and placed in public places so it could be proved and commented by users - this is the only way to cover case 1 (the bad originator)."

With the search page I have been working on, I have it so you can reverse check hashes or signautres, for verification purposes. But I ma not really sure of the real need for that.

"all other cases are covered by this tool just using it after download on the data, recalculating the signature and comparing it to the published one.

i thought of a 32 byte 2*md5 hash, one in forward one in backward direction (block by block) which gives a 44 character uuencoded has string."

For that to be really sucessful there needs to be a set of widely used standards. Aswell as unity for it through out all p2p nets. In order for it to be used, it would need to be a built in feature of the p2p programs. It would also have to be a very easy to use feature.

Its good you posted this. Should be neat to see what others think about data integrity.

goldie · 04-03-02, 09:55 AM

Quote:

Originally posted by indiana_jones
with this thread i want to start some discussion about data integrity on sharing networks.
data integrity in principle concerns all data but especially data, which can damage anything like sw, scripts, codecs etc.
i only want ro consider the bad cases, the cases data are infected or damaged somehow, willingly or unwillingly and how to detect this or avoid this.
(for short i call bad data simply infected i.e. they may contain an unknown new virus or piece of added malicious code).

where or when are the sources of infection?[list=1][*]the originator infects the data and puts them on p2p[*]a downloader and sharer infects the data and reshares them[*]data get somehow infected or damaged on their way over different p2p networks[/list=1]all these things can happen without change of the main characterics like name, filesize or other details used by p2p networks to identify data. even the hashes are no real integrity criteria, because the not consequently used in p2p nets (i.e. gnutella), are not exhaustive calculated (i.e. fasttrack only uses sample blocks of a file) and mainly differ from net to net so they cannot consequently be used on all networks.

from all this i would say it is not possible for any user on any net to determine if just downloaded data are infected or not.

the only way in order to 100% prove if data are infected or not would be to compare them to the original, but this is not how p2p works.

a second way is to have a characteristic which proves with a very high probability that data are identical to the original.

one way i know to do this is a signature which must be of a certain length and exhaustive calculated - which means it must be sure that every bit has its correct value and is on its correct position.

this signature has to be calculated from a p2p network client independent open source tool and placed in public places so it could be proved and commented by users - this is the only way to cover case 1 (the bad originator).

all other cases are covered by this tool just using it after download on the data, recalculating the signature and comparing it to the published one.

i thought of a 32 byte 2*md5 hash, one in forward one in backward direction (block by block) which gives a 44 character uuencoded has string.

remarks, comments or anything else would be very welcome.
indy

What a good idea!! Seems to me that file sharing's biggest weakness IS that there is no way to be absolutely sure of what it is you're downloading and whether it contains harmful code or not.

We all take a chance of getting burned when using any of these programs.

It's already been proven that Vscanners can't be trusted to pick up EVERYthing, malicious users are becoming increasingly more inventive at writing, spreading and hiding bad code in innocent-looking files. Bad or malicious files can conceiveably cripple an entire network and I'm surprised the owners of these networks/programs haven't tried to find an effective way to protect their own communities!!

Perhaps it's an impossibility...........

IF there was way to verify a file's integrety across the entire file sharing communities, it could ONLY be seen as the most miraculous creation invented. It'd be right up there with the ability to download millions of types of files from millions of strangers clear across the world!!

I'm not technically inclined, I don't have to tell ya that, BUT I'd find a way, within my means, to support ANYone or any organization willing to undertake this difficult task!

TankGirl · 04-03-02, 08:18 PM

The problem of infected files is closely related to the problem of faked files - faked in content or in quality. All these problems call for similar cures: trusted sources providing the original files and publishing their signatures and the utilization of these signatures in the p2p software. This is in no way an unrealistic development target: there are good enough hash methods to provide unique and compact signatures to any sharable digital objects and upgrading the existing p2p protocols and clients to support a reasonably common signature standard should not be too hard for the programmers.

Trusted sources imply permanent, verifiable identities and an evolving network of trust relatiosnhips between them. At first this may sound like a dangerous and alien approach for an average p2p user who is (wisely) used to keep a low profile at the presence of the copyright nazis. But trust networks need and should not be public expect for those that wish to go public, including p2p-friendly artists to whom generic file signatures would give a good tool to help their own quality rips to be distributed in the networks. For the majority of users wishing to take part in the sharing in high privacy the root level technology (encryption, IP hiding etc.) should provide just that - a safe and enjoyable sharing environment with full control over one's sharing, privacy and forming of trusted personal relationships.

- tg

04-03-02, 04:40 AM	#1
indiana_jones B2B Protagonist ... Life is ... Bubble to Bubble ... Beer to Beer ... love a VLAIBB (Very Lonesome Artificial Intelligence Brained Bubble) @ http://www.geocities.com/vlaibb vlaibb@yahoo.com Join Date: Jan 2002 Posts: 206	data integrity on p2p networks - an approach with this thread i want to start some discussion about data integrity on sharing networks. data integrity in principle concerns all data but especially data, which can damage anything like sw, scripts, codecs etc. i only want ro consider the bad cases, the cases data are infected or damaged somehow, willingly or unwillingly and how to detect this or avoid this. (for short i call bad data simply infected i.e. they may contain an unknown new virus or piece of added malicious code). where or when are the sources of infection?[list=1][]the originator infects the data and puts them on p2p[]a downloader and sharer infects the data and reshares them[]data get somehow infected or damaged on their way over different p2p networks[/list=1]all these things can happen without change of the main characterics like name, filesize or other details used by p2p networks to identify data. even the hashes are no real integrity criteria, because the not consequently used in p2p nets (i.e. gnutella), are not exhaustive calculated (i.e. fasttrack only uses sample blocks of a file) and mainly differ from net to net so they cannot consequently be used on all networks. from all this i would say it is not possible for any user on any net to determine if just downloaded data are infected or not.* the only way in order to 100% prove if data are infected or not would be to compare them to the original, but this is not how p2p works. a second way is to have a characteristic which proves with a very high probability that data are identical to the original. one way i know to do this is a signature which must be of a certain length and exhaustive calculated - which means it must be sure that every bit has its correct value and is on its correct position. this signature has to be calculated from a p2p network client independent open source tool and placed in public places so it could be proved and commented by users - this is the only way to cover case 1 (the bad originator). all other cases are covered by this tool just using it after download on the data, recalculating the signature and comparing it to the published one. i thought of a 32 byte 2md5 hash, one in forward one in backward direction (block by block) which gives a 44 character uuencoded has string. remarks, comments or anything else would be very welcome. indy __________________ VLAIBB - The Ultimate Gateway to P2P Sites* File: surprise.mp3 Length:5845871Bytes UUHash:=1LDYkHDl65OprVz37xN1VSo9b00= Copy the lines above and use 'Paste from Clipboard' function of sig2dat 3.11.a (supports quicklinks) to create a startfile for your FastTrack p2p client for safe download

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

04-03-02, 06:16 AM	#2
Mowzer ' Join Date: Jan 2002 Posts: 209	The most common virii infection that hits file share users I noticed often doesn't origitnate from the file share software itself. Email worms that spread .vbs infections to .mp3s and .jpgs are common. A user reads email, and due to stupidity ends up opening a worm. This intern infects mp3s and jpegs, by chnging extensions to madonna.mp3.vbs for example. The users may have a huge mp3 collection, and although they get the virus taken care of often they miss one or two mp3 files. Since they may have had kazaa or grokster running, and grokster and kazaa used to hide double extesnions, other users were getting the same worms. In this type of case the user doesn't realize they are willingly sharing infected files. Why some people don't clean up messes on there machines is beyond me. I know a girl who has her messenger sending out the message "Hi, I have new pics, take a look me.jpg(1)(@)(2)(1).exe" for others to download." I told her she has a messenger virus running, her reply was, "I know, its been like that for months, I just tell my friends not to click it." This is confusing, as I sent her a link to a easy step by step clean up article, and she read it, yet for some reason just never bothered cleaning it up. I think your right indy, about it being impossible to determine for all p2p nets if a file is damaged, with out orignal comparisson. "this signature has to be calculated from a p2p network client independent open source tool and placed in public places so it could be proved and commented by users - this is the only way to cover case 1 (the bad originator)." With the search page I have been working on, I have it so you can reverse check hashes or signautres, for verification purposes. But I ma not really sure of the real need for that. "all other cases are covered by this tool just using it after download on the data, recalculating the signature and comparing it to the published one. i thought of a 32 byte 2*md5 hash, one in forward one in backward direction (block by block) which gives a 44 character uuencoded has string." For that to be really sucessful there needs to be a set of widely used standards. Aswell as unity for it through out all p2p nets. In order for it to be used, it would need to be a built in feature of the p2p programs. It would also have to be a very easy to use feature. Its good you posted this. Should be neat to see what others think about data integrity.

04-03-02, 08:18 PM	#4
TankGirl Madame Comrade Join Date: May 2000 Location: Area 25 Posts: 5,587	The problem of infected files is closely related to the problem of faked files - faked in content or in quality. All these problems call for similar cures: trusted sources providing the original files and publishing their signatures and the utilization of these signatures in the p2p software. This is in no way an unrealistic development target: there are good enough hash methods to provide unique and compact signatures to any sharable digital objects and upgrading the existing p2p protocols and clients to support a reasonably common signature standard should not be too hard for the programmers. Trusted sources imply permanent, verifiable identities and an evolving network of trust relatiosnhips between them. At first this may sound like a dangerous and alien approach for an average p2p user who is (wisely) used to keep a low profile at the presence of the copyright nazis. But trust networks need and should not be public expect for those that wish to go public, including p2p-friendly artists to whom generic file signatures would give a good tool to help their own quality rips to be distributed in the networks. For the majority of users wishing to take part in the sharing in high privacy the root level technology (encryption, IP hiding etc.) should provide just that - a safe and enjoyable sharing environment with full control over one's sharing, privacy and forming of trusted personal relationships. - tg