NOTICE: This channel is no longer actively logged.
[00:02:08] <choykloun> question - are there any recommendations for max extra overhead etc ? [00:02:15] <choykloun> ive only heard max packet size 1200 bytes [00:02:30] <choykloun> and i know average dht client traffic etc so i do have some idea [00:03:10] <choykloun> also [00:03:18] <choykloun> building darknets based on dht is a fucking excellent idea [00:03:41] <The_8472> you might want to look into I2P [00:03:51] <choykloun> the thing is [00:03:55] <choykloun> dht has several million users [00:03:56] *** bt42 has joined #BitTorrent [00:04:17] <choykloun> small groups of ppl using it for darknets would be totally impossible for outsiders to locate/map/target/etc [00:04:32] <The_8472> as for the overhead... consider 8 IPv6 node addresses (ip, port, id), 8 IPv4 node addresses, ~30 IPv6 peer addresses, regular packet overhead [00:04:39] <The_8472> those are the biggest possible packets [00:04:58] <The_8472> IPv6 get peers containing nodes, nodes6 and values [00:05:14] <The_8472> shouldn't really happen, but it's the worst case [00:06:55] <choykloun> k [00:07:07] <choykloun> coz i was thinking of what kinda crypto info i might be able to sneak into them [00:07:52] <The_8472> as little as possible ^^ [00:08:30] <The_8472> if we'd start HMACing values then separate packets would be required for them [00:08:38] <The_8472> since nodes lists wouldn't fit into them anymore [00:08:44] <choykloun> well apparently attempting to initiate curve25519 with 1.7M nodes wasn't any performance issue so there might be some room [00:09:00] <choykloun> well [00:09:07] <choykloun> check out http://nacl.cace-project.eu/box.html [00:09:29] <choykloun> encrypt+sign without increasing size of message [00:09:39] <choykloun> however both parties must have each others public keys [00:10:37] <The_8472> with 32byte node IDs this could be made implicit for one side. [00:11:44] <choykloun> ya [00:12:23] <choykloun> and you could do KEX quite secure against MITM also with minimal overhead [00:17:10] <choykloun> what about having a system where nodes learn eachothers pubkeys first time they speak [00:17:13] <choykloun> just like ssh [00:19:06] <The_8472> we wouldn't want that [00:19:08] <alus> The_8472: I'm interested in supporting scrapes, yes [00:19:25] <The_8472> alus, excellent. what about the bloom filters part? [00:19:40] <The_8472> or the specific proposal i've made so far [00:20:29] <The_8472> i.e. doing more than just sending seeds/peer counts as integers. [00:20:33] <alus> I need to think about it more before I'll have a useful response [00:20:39] <The_8472> ok [00:23:00] <choykloun> would having a way to authenticate nodes be a reasonable idea? [00:23:11] <choykloun> i dont see any reason why not if it can be done with minimal overhead [00:23:17] <The_8472> authenticate by which measure? [00:23:27] <TheSHAD0W> Just had a silly idea for reducing the ability for a denial of service for DHT lookup... [00:23:37] <choykloun> well everyone has a keypair in addition to the nodeid [00:23:54] *** bittwist has quit IRC [00:23:56] <choykloun> first time nodes contact each other and deem it relevant they exchange pubkeys [00:24:20] <The_8472> you don't want to do that for millions of nodes [00:24:31] <choykloun> note 'deem it relevant' :-) [00:24:45] <TheSHAD0W> How big is a lookup for DHT? 20 bytes? [00:24:48] <TheSHAD0W> Or is it variable? [00:25:01] <The_8472> hrm? [00:25:06] <The_8472> you mean the packet size? [00:25:07] <choykloun> lookup what what? [00:25:19] <TheSHAD0W> The torrent ID. [00:25:24] <The_8472> yeah, 20 bytes [00:25:25] <choykloun> its not variable no [00:26:29] <TheSHAD0W> Well, my idea is, if a peer doesn't find more than a certain number of other peers within a certain amount of time, it starts searching at sha(ID+'\00'), sha(ID+'\01'), etc. [00:26:40] <choykloun> the_8472: btw even if you had exchanged keys with 1M nodes that'd only mean 32e+6 bytes to store, which is probably much less than you download daily anyways :> [00:27:16] <choykloun> if you want to find peers just run my experimental implementation with the proper settings [00:27:16] <The_8472> well... we'd keep it in memory [00:27:19] <choykloun> :) [00:27:39] <choykloun> the_8472: only needed when actually talking with that node [00:27:44] <The_8472> TheSHAD0W, we call that diversification in the AZ dht :) [00:27:55] <TheSHAD0W> Damn! [00:27:58] <TheSHAD0W> You guys are good! [00:28:00] <TheSHAD0W> ;-) [00:28:22] <The_8472> choykloun, yeah. but if you dump it to the disk you have to do disk reads all the time, not good either [00:28:25] <choykloun> do you mean + as in appending, adding or XORing? :) [00:28:40] <TheSHAD0W> I was talking appending. [00:28:42] <The_8472> appending, since it's sha1 [00:28:47] <choykloun> the_8472: im not talking about using it for every exchange or such :) [00:28:54] <choykloun> ah [00:29:01] <choykloun> now i see what you mean [00:29:26] <choykloun> ive done more or less the same thing [00:29:33] <choykloun> as late as yesterday :) [00:29:40] <The_8472> we also encrypt stores to make sure they can't be faked without knowledge what you're storing [00:29:44] <The_8472> prevents sybil attacks [00:29:51] <choykloun> but then my implementation isnt exactly soemthing you'd use in a client [00:29:52] <The_8472> and passive eavesdropping [00:30:58] <choykloun> thats also simiar to how i currently derive encryption keys from the agreed secret :) [00:31:23] <choykloun> (and usually do it) [00:32:09] <The_8472> well, that's how we derive keys in the obfuscation protocol [00:32:14] <The_8472> it's common practice i think [00:32:20] <choykloun> ya [00:33:23] <choykloun> i use XOR with the first 4 bytes of the hash though [00:33:33] <choykloun> to avoid having to have an additional buffer [00:35:39] <choykloun> and in certain other cases its beneficial to do it that way for security reasons [00:43:28] <choykloun> (i'm also a bit damaged from years of implementing crypto under circumstances where the difference in code length would make it fail and so on :P) [00:53:33] <choykloun> oh by the way, something i just thought of: why the !@#!@ is 'values' a list instead of just a compact format in get_peers responses? [00:55:54] *** Snoopotic has quit IRC [00:55:57] <The_8472> it's actually a good thing [00:56:01] <The_8472> ipv6 [01:01:28] <choykloun> whats wrong with having different string names? :p [01:02:34] <TheSHAD0W> If you're going to revamp the protocol, you might as well do that too... [01:02:35] <The_8472> you have to specify things [01:02:45] <TheSHAD0W> Have a list of compact strings, either 4 bytes or 16. [01:03:12] <The_8472> doesn't really matter though. you could squeeze in maybe 3-4 more addresses... that's it [01:03:19] <The_8472> you're only losing 2 bytes per address [01:03:34] <The_8472> or 3 in the case of v6 [01:04:06] <choykloun> all crypto stuff i do atm is backwards compatible by the way [01:07:04] <choykloun> anyways.. the great opportunity now is that we can fix stuff BEFORE the bad guys even know about / are using it [01:07:56] <choykloun> by the time ISPs/govts/etc starts messing with DHT most users could already run clients that prevents it, and so on.. [01:09:05] <The_8472> well, ECDH is only needed against mitm attacks though [01:09:13] <The_8472> everything else can be done just with hashing [01:09:42] <The_8472> someone once described hashes as the workhorses of cryptography [01:09:53] <choykloun> haha yeah [01:10:23] <choykloun> but check out the nacl curve25519 based encrypt+sign function :) [01:11:04] <choykloun> both parties know each others pubkeys? great, message encrypted and signed without being a single byte longer! [01:43:25] <alus> choykloun: http://code.google.com/p/obstcp/ [01:43:42] <alus> curve25519 is fun. [01:51:45] *** Andrius has quit IRC [01:53:03] <TheSHAD0W> News update: [01:53:31] <TheSHAD0W> The Nigerian in this latest airplane incident refused to come out of the airplane toilet because he had food poisoning. [01:53:38] <TheSHAD0W> It was a bomb of a different sort. :-P [02:04:07] <The_8472> maybe they should just pressurize the cabins more. like 30bar ... and install nozzles under each seat, so passangers could just sucked out at lightning speed if they pose a danger [02:04:18] <The_8472> though it would make quite a mess [02:04:41] <Switeck> plane's skin is too thin [02:05:01] <Switeck> if they don't let some air out when they go to high altitude, it'd pop leaks [02:05:28] <The_8472> details [02:05:39] <The_8472> subs with wings! [02:05:44] <The_8472> reverse subs with wings! [02:05:56] * The_8472 pictures that design [02:06:04] * The_8472 rofls [02:06:21] * TheSHAD0W plucks off a wing and starts chewing on it [02:06:28] * TheSHAD0W reaches for a drumstick next [02:06:39] * The_8472 facepalms [02:06:47] <TheSHAD0W> Yummy meat! [02:07:44] <TheSHAD0W> BTW, it just hit freezing; it'll be in the mid-twenties tonight. [02:07:55] <TheSHAD0W> Gotta love this global warming! [02:08:45] <Switeck> I've been seeing 80 F weather in January here [02:08:51] <Switeck> for the past 3 years [02:08:54] <TheSHAD0W> Where is here? [02:08:59] <Switeck> Alabama [02:09:06] <TheSHAD0W> ... [02:09:21] <kjetilho> it's a sham. in ten years time, we will be saying, "remember the IPCC?" and snicker, just like we do now about Y2K. [02:09:41] <TheSHAD0W> Got a friend in Bama who certainly hasn't been complaining about heat. [02:10:04] <Switeck> I'm close to Florida, northern 'Bama is much cooler [02:19:58] <TheSHAD0W> kjetilho: Y2K was a self-correcting scare; if people hadn't paid attention to it, it would've been bad. [02:20:35] <Switeck> what's the one with computer's jiffy clocks flipping over in a couple years? [02:21:22] <TheSHAD0W> ? [02:21:41] <TheSHAD0W> The 32-bit date is going to break in 2034 or so... [02:21:52] <Switeck> that might be it [02:22:09] <TheSHAD0W> 2038. [02:22:15] <TheSHAD0W> http://en.wikipedia.org/wiki/Year_2038_problem [02:24:01] <kjetilho> TheSHAD0W: and how is Nature's cycle different? [02:26:23] <TheSHAD0W> Uh, because it's not as easy to analyze and is open not just to interpretation but apparently to manipulation? [02:26:28] *** Waldorf has quit IRC [03:13:39] <TheSHAD0W> http://adweek.blogs.com/adfreak/freakiest09popup22.html - look what global warming is going to do next! [03:14:22] <choykloun> it sucks that curve25519 doesnt have a normal signing function yet :( :( :( [03:16:04] <choykloun> i doubt you can just do like RSA and encrypt with the public key and maintain security .. ? [03:16:24] <choykloun> feels like it'd even reveal your private key or so.. [03:17:00] <TheSHAD0W> If it's a true public key, then no. [03:17:20] <TheSHAD0W> Otherwise anyone could encrypt something with the public key and break the security. [03:17:25] <The_8472> signing with DH can be achieved by generating an ephemeral keypair, generating a key exchange, encrypting the hash and publishing hash + ephemeral private key [03:17:47] <The_8472> but that's bulky compared to most signing schemes [03:19:19] <The_8472> hrrm... no [03:19:35] <choykloun> i meant signing the dh public value [03:19:35] <The_8472> people actually can't verify it then since they lack your private key xD [03:19:40] <choykloun> to prevent mitm [03:19:53] <The_8472> with what would you sign it? ... [03:20:08] <choykloun> you know like ssh and ssl and everyone else does it [03:20:10] <choykloun> well [03:20:29] <choykloun> maybe i wanna experiment with ssh-style hostkeys for nodes [03:20:38] <The_8472> ssh can only sign ephemeral keypairs because you have persistent, previously known ones [03:20:42] <choykloun> maybe i wanna do a dht-based darknet [03:20:43] <choykloun> etc. [03:20:45] <choykloun> exactly [03:21:03] <choykloun> but im discussing the problem of doing signed DH using curve25519 all the way [03:21:06] <The_8472> in a DHT where you don't personally know 99.9..% of the nodes that's useless [03:21:13] <choykloun> because there currently isnt any function that does exactly that [03:21:40] <The_8472> well, there's some generic way to sign with DH [03:21:55] <The_8472> should work with ECDH too, but i can't remember what it was [03:21:59] <choykloun> well we arent doing DH.... [03:22:03] <choykloun> or RSA.. [03:22:07] <choykloun> because they are both slow as fuck :) [03:22:26] <choykloun> and the problem for me is that i know jack shit about EC math [03:23:34] <choykloun> ok, one way to do signed DH if both have previously exchanged their respective keypairs [03:23:49] <choykloun> each node hashes his secret key together with the other parties secret key [03:24:11] <choykloun> encrypts a hash of the key exchange public value with that [03:24:50] <choykloun> (or hmac all the way.. depends on exact implementation and circumstances) [03:25:57] <choykloun> anyway both nodes store that hash [03:26:07] <choykloun> but it obviously doesnt scale [03:27:11] <choykloun> (and yes what i just wrote was a bit jumbled.. some stuff ended up in the wrong order) [03:27:24] <choykloun> fucking interesting projects keeping me awake too long [03:28:16] <TheSHAD0W> LOL [03:28:22] <TheSHAD0W> Tomorrow's Monday. Go to bed. [03:28:52] <choykloun> today is monday actually [03:29:01] <TheSHAD0W> Even more reason. [03:29:02] <choykloun> i need to be awake until 5pm when my staff leave [03:29:08] <TheSHAD0W> ... [03:29:11] <TheSHAD0W> Where are you? [03:29:12] <TheSHAD0W> HK? [03:29:22] <choykloun> KH [03:29:29] <TheSHAD0W> ? [03:29:37] <choykloun> KampucHea [03:29:40] <TheSHAD0W> Ohh. [03:29:54] <choykloun> my nick means 'go fuck yourself' in khmer [03:29:56] <The_8472> http://en.wikipedia.org/wiki/Integrated_Encryption_Scheme <- [03:29:58] <choykloun> but its a bit of a wordplay [03:29:59] <TheSHAD0W> Heh. [03:30:07] <choykloun> like a straight translation of an english curse into khmer [03:30:22] <choykloun> what 'jchgout' (name of my DHT impl) means is a later project to explain [03:31:14] <choykloun> and well its not possible to spell properly in the latin alphabet anyway.. http://knark.net/~user/chgout.png [03:32:24] <The_8472> go. read. article. [03:35:42] *** burris has quit IRC [03:35:46] *** burris has joined #bittorrent [03:36:23] <choykloun> ya was busy discussing another project on another irc network. [03:37:22] <choykloun> fuck [03:37:30] <choykloun> the second before i switched to this window [03:37:35] <choykloun> i thought of exactly the same thing ! [03:38:16] <choykloun> "why not do a key exchange with each parties known 'hostkey' and use the shared secret for HMAC" [03:39:10] <choykloun> how the hell could i avoid something that obvious [03:39:15] <choykloun> i blame it on being used to rsa [03:40:10] <choykloun> and it still works even if 1 party doesnt have a known pubkey [03:40:29] <choykloun> however someone needs to check the security implications of reusing pubkeys for c25519 exchange [03:41:02] <choykloun> if its secure, then we're all set [03:41:23] *** ProperNoun has quit IRC [03:41:29] <choykloun> or atleast i am, you still think node-keypairs are a bad idea :) [03:43:50] <choykloun> best of all, this can be done without exchanging any additional data except for if each node knows the others pubkey! [03:44:00] <The_8472> well, with IES you don't need to do a key exchange, you only need to know the destination's public key [03:44:10] <The_8472> of course you cannot authenticate the sender directly this way [03:44:26] <The_8472> but that's not necessary if you use transaction IDs and all that [03:44:33] <The_8472> since a MITM wouldn't know them [03:45:01] <choykloun> quick read and IES is basically a key exchange used in a creative way [03:45:05] <The_8472> so you can send your pubkey in the message [03:45:17] <The_8472> allowing the other one to respond [03:45:29] <The_8472> yes, it basically generates an ephemeral key on the sender side [03:46:53] <choykloun> you do a normal key exchange, and then hash the shared secret differently depending on whether keys are known [03:49:25] <choykloun> if both parties know each others pubkeys, you do another key exchange with that and hash with the result etc. [03:51:36] <choykloun> btw i currently do kex in ping query/response which includes node id [03:51:46] <choykloun> so finding proper keys etc isnt exactly an issue [03:52:42] *** The_8472 has quit IRC [03:53:03] *** The_8472 has joined #bittorrent [03:53:33] <choykloun> simple solution: i'll try! [03:54:29] <The_8472> you're thinking way too complicated [03:54:34] <choykloun> nah [03:54:39] <choykloun> just explaining it too complicated [03:55:07] <The_8472> in our situation we would only know the public key of other nodes and they wouldn't know ours. IES is quite appropriate in that situation [03:55:14] <The_8472> at the cost of additional overhead of course [03:55:31] <choykloun> in the scenario i think of the nodes would learn eachothers [03:55:32] <The_8472> still less overhead than pinging them with key exchanges before sending the actual data though [03:55:48] <The_8472> ... yes, but your scenario is mostly useless for the DHT [03:56:01] <The_8472> 90% fire and forget messages... [03:56:13] <The_8472> doing key exchanges for those would double to # of packets [03:56:28] <choykloun> besides [03:56:37] <choykloun> IES is already exactly what nacl does [03:56:47] <choykloun> with its authenticated encryption function [03:57:50] <choykloun> ya its not for fire annd forget [03:58:41] <choykloun> (well unsigned dh kex is very useful against many attacks in itself so that can stay.. its 32 bytes extra per exchange) [03:59:11] <choykloun> but i wanna explore some options for trust systems, darknets, etc [03:59:24] <choykloun> dht removes the biggest problems with darknets [03:59:27] <choykloun> that you stand out from the lot [04:02:43] <choykloun> and if a lot of non-darkneters routinely do key exchanges + encryption then its even better :) [04:04:07] *** init0_ has quit IRC [04:06:37] *** init0 has joined #bittorrent [04:08:42] <choykloun> btw somehow DHT hasn't collapsed from seeing ping requests like these [04:08:44] <choykloun> 64 31 3A 74 34 3A 36 02 8D BC 31 3A 76 34 3A 4C 67 30 30 31 3A 79 31 3A d1:t4:6...1:v4:Lg001:y1: [04:08:47] <choykloun> 71 31 3A 71 34 3A 70 69 6E 67 31 3A 61 64 32 3A 69 64 32 30 3A 3C 4A D7 q1:q4:ping1:ad2:id20:<J. [04:08:50] <choykloun> 26 95 31 CE 2C 45 53 54 4F 59 4B 48 2E 43 4F 4D 00 65 31 31 3A 70 75 62 &.1.,ESTOYKH.COM.e11:pub [04:08:54] <choykloun> 6B 65 79 32 35 35 31 39 33 32 3A 7B A7 18 0B D6 86 90 7E 9E 06 B3 A2 A9 key2551932:{......~..... [04:08:57] <choykloun> 75 85 1A BF EA DB 98 50 4D 1B F2 66 53 47 2B F9 7D 69 69 65 u......PM..fSG+.}iie [04:09:01] <choykloun> :) [04:24:25] *** ProperNoun has joined #bittorrent [04:37:39] <swolchok> so why are we doing our own brainstorming re: dht [04:37:46] <swolchok> there is TONS of academic work on these things [04:38:31] <swolchok> e.g. "Misusing Kademlia protocol to perform DDoS attacks" [04:40:23] <swolchok> "Poisoning the Kad Network" has a Kad DDoS as well [04:40:56] <swolchok> yes, Kad is a specific Kademlia implementation, but a lot of it should carry through to mainline because they're both relatively straightforward impls [04:41:44] <swolchok> "exploiting p2p systems for ddos attacks" [04:42:45] <swolchok> "On the feasibility of exploiting P2P systems to launch DDoS attacks" [04:43:02] <swolchok> and so forth. try "kademlia denial of service" and "kademlia ddos" on Google Scholar. [04:43:32] <choykloun> damn thats what i get for only reading academic computer science texts about data structures and nothing else [04:44:07] <choykloun> but you should see my old collection of neurobiology/pharmacology papers! [04:44:54] <choykloun> http://web.archive.org/web/20050314002549/http://www.anakata.hack.se/papers/ [05:03:26] *** ProperNoun has quit IRC [05:07:53] *** bittwist has joined #BitTorrent [05:11:02] *** ProperNoun has joined #bittorrent [05:28:28] *** bt42 has quit IRC [05:33:21] <TheSHAD0W> Hah. [05:33:24] <TheSHAD0W> I just realized. [05:33:37] <TheSHAD0W> There's a very good way to even out the keyspace. [05:33:49] <TheSHAD0W> Have everyone search for the sha of the infohash rather than the infohash itself. [05:34:40] <TheSHAD0W> Makes it very difficult to do a DoS. [05:36:27] <choykloun> hm hm [05:36:30] <choykloun> cant follow you [05:36:42] <choykloun> ah [05:36:53] <choykloun> you mean to avoid custering around popular torrents ? [05:37:53] <TheSHAD0W> No, to keep people from crowding a portion of the keyspace and doing a DoS. [05:37:54] <choykloun> well my data from my little dos experiments says that i got get_peers for almost uniformly distributed deltas [05:38:18] <choykloun> DoS as in fucking up part of the keyspace? [05:38:24] <TheSHAD0W> Right. [05:38:31] <choykloun> could be onto something [05:38:46] <TheSHAD0W> I think mole and I talked about something like that. [05:39:16] <TheSHAD0W> Organize based on the sha of the search term rather than the term itself. [05:42:13] <TheSHAD0W> Then use hashcash to discourage DDoSes. [05:42:58] <choykloun> ya [05:43:12] <choykloun> i discussed hashcash earlier in connection with anti-traffic-analysis stuff [05:44:43] <choykloun> however you should be aware that 1. incoming get_peers requests were almost randomly distributed and 2. not many per second were needed to achieve decent results [05:45:11] <TheSHAD0W> Yes, but that is when no one is trying to screw up the keyspace. [05:45:26] <choykloun> ah different dos [05:45:30] <choykloun> totally following you [05:46:09] <choykloun> what about this.. you dont assign your nodeid (or N bytes of it) yourself [05:46:30] <choykloun> it gets assigned by some random factor in the network [05:46:43] <choykloun> can think of about 150 different possible ones :) [05:46:54] <TheSHAD0W> Difficult to manage, and is potentially subvertible if you have multiple suborned IPs. [05:47:31] <TheSHAD0W> How well would your DDoS attacks have worked if it took you 3 seconds to insert a new node into the table? [05:47:43] <choykloun> just as well [05:47:52] <choykloun> let me check some actual timings here [05:47:54] <TheSHAD0W> Well, then that's out. ^__^ [05:48:40] <choykloun> n 1261863601 Storing node (get_peers, nodeid 66c322d0, from 210.128.70.206:9122) [05:48:43] <choykloun> n 1261863601 Storing node (get_peers, nodeid 912fbb3d, from 70.171.82.15:13104) [05:48:46] <choykloun> n 1261863601 Storing node (get_peers, nodeid 65ae45df, from 125.215.105.233:11307) [05:48:50] <choykloun> n 1261863602 Storing node (get_peers, nodeid 17714830, from 94.180.58.208:42276) [05:48:53] <choykloun> n 1261863603 Storing node (get_peers, nodeid 5b08fa6f, from 123.243.61.12:29456) [05:48:57] <choykloun> n 1261863603 Storing node (get_peers, nodeid 21c61a05, from 123.223.139.132:8891) [05:49:00] <choykloun> n 1261863604 Storing node (get_peers, nodeid 6c27a317, from 70.69.175.33:26713) [05:49:04] <choykloun> n 1261863604 Storing node (get_peers, nodeid 0e05bad0, from 88.8.105.55:4385) [05:49:07] <choykloun> n 1261863604 Storing node (get_peers, nodeid 1e90c409, from 125.1.179.197:39116) [05:49:10] <choykloun> n 1261863604 Storing node (get_peers, nodeid 35d643bb, from 221.31.206.16:18794) [05:49:14] <choykloun> thats a snippet [05:49:16] <choykloun> from one of the more intensive parts [05:49:19] <choykloun> so a high rate isnt really needed [05:49:21] <TheSHAD0W> So about 2 or more nodes per second? [05:49:23] <choykloun> and i could even counter-measure by only answering to the nodes that are most likely to care :> [05:49:47] <TheSHAD0W> Yeah, could slow that down a *little*, but not a ton. [05:50:05] <choykloun> and rate of insertion doesnt really matter [05:50:07] <TheSHAD0W> Hum. [05:50:20] <TheSHAD0W> Hashcash won't really help then. [05:50:28] <choykloun> if it took you 1 week to reach a rate to down a website you want to keep down for 1 month, whats the problem [05:50:47] <choykloun> it'd be a good thing to implement in some places though [05:51:04] <choykloun> maybe we can solve 2 problems in 1 and just use my curve25519 key exchange :> [05:51:08] <TheSHAD0W> You don't want to have to do it too often either. [05:51:28] <TheSHAD0W> If hashcash won't help, exchanging public keys won't either. [05:51:38] <choykloun> hashcash is useful under other circumstances [05:51:49] <choykloun> anyway [05:51:51] <TheSHAD0W> Um. [05:51:54] <choykloun> what i suggest for shortterm: [05:52:02] <TheSHAD0W> One thing that might help... [05:52:12] <TheSHAD0W> A UDP attack isn't as bad as a TCP connection attack. [05:52:17] <choykloun> 1. make clients detect common non-bt protocols, and rate limit connections to <1024 [05:52:32] <choykloun> yeah but DHT udp port isnt always same as BT TCP :( [05:52:48] <TheSHAD0W> Right, but if TCP isn't firewalled the UDP port probably isn't anyway. [05:53:00] <choykloun> 50 udp packets/sec == nothing [05:53:01] <TheSHAD0W> So make a UDP connect attempt to a peer on the DHT before connecting via TCP. [05:53:11] <choykloun> 50 tcp connections/sec == brings down any standard apache server right away [05:53:18] <choykloun> but you dont know the udp port :( [05:53:26] <TheSHAD0W> Well, why not?? [05:53:31] <TheSHAD0W> Add it to the protocol! [05:53:32] <choykloun> its not in the get_peers answer! [05:53:43] <choykloun> ya [05:53:58] <choykloun> thats ie one of the first things to add if were gonna break backwards compatibility [05:54:17] <TheSHAD0W> Add that, and add the rehash organization... [05:54:24] <TheSHAD0W> And add your damn ECC handshake too. :-P [05:54:40] * TheSHAD0W runs away! [05:55:19] <choykloun> i wanna do some odd stuff with dht [05:55:26] <choykloun> like for example build darknets on it [05:55:36] <TheSHAD0W> Well... [05:56:16] <choykloun> and also stuff that benefits dht globally like stopping DoS, traffic shaping/filtering etc *BEFORE* the enemies find out :p [05:56:19] <TheSHAD0W> I2P would be better. [05:56:46] <choykloun> DHT would make an excellent ground for a darknet because of its huge userbase [05:56:50] <choykloun> disappear among the crowd [05:56:58] <choykloun> freenet or whatever and you stand out like a sore thumb [05:59:01] <TheSHAD0W> Well... [05:59:08] <TheSHAD0W> Maybe base the new DHT via I2P? [05:59:27] <TheSHAD0W> That would expand the I2P base, and help drown out darknet comms. [05:59:46] <TheSHAD0W> Hell, make a file sharing app that runs via I2P. [05:59:59] <choykloun> and i wanna use dht for distributing tor entry nodes!@ [06:00:16] <TheSHAD0W> Well. [06:00:21] <TheSHAD0W> Tor has its uses... [06:00:24] <choykloun> and a lot of other weird overlay stuff [06:00:32] <TheSHAD0W> But doesn't work well for file sharing. [06:00:38] <choykloun> yeah [06:00:41] <choykloun> but on the other hand [06:00:42] <TheSHAD0W> IMO I2P is a very nice network. [06:00:48] <choykloun> DHT works very well for avoiding filtering etc [06:00:53] <choykloun> and 1 node and you have 1 million others [06:01:05] <choykloun> while for example tor entry nodes can be difficult to find [06:01:10] <choykloun> or other addrs that are frequently blocked [06:03:04] <TheSHAD0W> Any particular reason you don't like I2P? [06:03:17] <choykloun> because dht has millions of users! [06:03:27] <TheSHAD0W> ...Versus Tor. [06:03:45] <choykloun> imagine you're in a country with strict censorship [06:03:54] <TheSHAD0W> Okay... [06:03:56] <choykloun> a lot of known tor nodes are blocked [06:04:15] <choykloun> but if you just get 1 DHT node .. 48 bits of information [06:04:22] <choykloun> you can get a 'few' more [06:04:27] <TheSHAD0W> So you want to use DHT to organize Tor searches. [06:04:32] <choykloun> and use it to query for tor (or whatever) nodes to use [06:04:33] <choykloun> yeah [06:04:37] <TheSHAD0W> Right. [06:04:41] <choykloun> ive alreay implemented it [06:04:55] <choykloun> a general format for mapping a resource identifier to an infohash [06:12:06] <choykloun> user@thefin:~/jchgout-dev$ wc -l iplnodes.h [06:12:06] <choykloun> 178172 iplnodes.h [06:12:07] <choykloun> whops [06:12:24] <choykloun> maybe i should use a bit fewer init nodes when NOT experimenting with DoS attacks ... :P [06:15:12] <TheSHAD0W> LOL [06:15:32] <choykloun> i waaaas wondering why the compilation took ages.. [06:24:12] <choykloun> hah i have this urge to install BT/DHT clients on all embedded systems we ship [06:24:39] <TheSHAD0W> Well, DHT at least... [06:24:40] <TheSHAD0W> :-) [06:25:39] <choykloun> "yes, the central of your alarm is downloading TV series so our techs will have something to watch next time they visit" [06:26:16] <TheSHAD0W> BT-based automatic updates. [06:26:36] <TheSHAD0W> Hey, if you have that sort of latitude, put Tor entry nodes on them. [06:26:38] <TheSHAD0W> ^__^ [06:29:29] <choykloun> actually they are purposedly stripped down to run NOTHING else than whats absolutely needed :P [06:29:38] <choykloun> since people can die if a fire alarm fails... [06:32:59] <TheSHAD0W> Well that's no fun. :-P [06:33:51] *** _rafi_ has joined #bittorrent [06:36:11] <choykloun> ill borrow a couple of the dev boxes for testing BT shit though :> [06:36:25] <choykloun> just as long as there arent any traces of it left on the flash when it ships [06:36:25] <choykloun> haha [06:36:49] *** GTHK has quit IRC [06:37:54] *** Switeck has quit IRC [06:40:07] <swolchok> TheSHAD0W: I don't follow your suggestion. SHA-1(hotkey) is still hot. [06:40:39] <TheSHAD0W> Here's the problem. [06:41:23] <choykloun> i'm still thinking of not letting nodes assign their nodeid themselves [06:41:43] <choykloun> while still keeping it somewhat constant over time [06:41:55] <swolchok> choykloun: yeah, I believe that's a well-known measure that's fairly critical to prevent some attacks that were demonstrated on KAD. that's why Vuze uses SHA-1(IP:port). [06:42:33] <swolchok> if you want to make a good argument for it, go dredge the academic literature on exploiting KAD (e.g., "Exploiting KAD"). [06:43:05] <choykloun> i only do academic stuff when it comes to chemistry,biology,etc :) [06:43:30] <swolchok> Your choice. [06:43:32] <choykloun> hm [06:44:00] <choykloun> one crazy option would be to require the nodeid to be the result of some expensive trapdoor function [06:44:11] <choykloun> so you have to spend quite a bit of cpu time to generate a valid nodeid [06:44:15] <choykloun> but it can be verified easily [06:44:50] <choykloun> i mean nobody will be bothered that much by "please wait 5 minutes before your bittorrent is ready" uh [06:45:59] <swolchok> then you incentivize people to take a shortcut by reusing the same one over and over. "hax3d 4 fast install!!!1111" [06:46:02] <choykloun> that would make it quite expensive to target a specific part of the network (well depends on exactly what function you use) [06:46:12] <choykloun> haha [06:48:37] <swolchok> choykloun: so your attack is eclipsing some IDs in the search space and returning the victim as the result for that ID? if so, http://www.sigcomm.org/ccr/drupal/files/p65-steiner.pdf has a similar attack and it's only 5 pages [06:48:46] <choykloun> maybe allow the user to announce but not actually join the network [06:48:56] <choykloun> swolchok: no we're discussing gneeral stuff [06:49:02] <swolchok> section 5 is kind of fanciful though. [06:49:09] <choykloun> ive described the general outline of what i did yesterday before here [06:49:14] *** MassaRoddel has quit IRC [06:49:16] <swolchok> well, free ID choice definitely facilitates eclipsing. [06:49:45] <choykloun> i tried it but turns out the distribution of get_peers i get is almost random anyways [06:49:55] <choykloun> sure lower deltas are more common than higher [06:49:58] <swolchok> that is quite odd. [06:50:09] <choykloun> but its like 1/30 that were relevant [06:50:11] <choykloun> nah not really [06:50:17] <swolchok> I wonder if many popular clients have bogus routing implementations. [06:50:21] <choykloun> think statistics + lot of initialization nodes + etc [06:50:43] <choykloun> didnt you see my small list i was accidentally about to use now :) [06:50:52] <choykloun> user@thefin:~/jchgout-dev$ wc -l iplnodes.h.pw [06:50:52] <choykloun> 178172 iplnodes.h.pw [06:51:10] <swolchok> oh, get_peers is kademlia FIND_VALUE, that makes perfect sense. [06:51:27] <choykloun> anyway [06:51:35] <choykloun> my implementation is a bit weird to start with [06:51:47] <choykloun> because its not meant to ever be a normal dht client [06:52:05] <choykloun> so exact details arent as important as fixing more general issues [06:52:37] <swolchok> is there a laundry list of issues yet? 1) free id choice 2) ??? [06:52:47] <choykloun> we have a couple of ideas [06:52:55] <choykloun> but we need some bandaid first [06:53:04] <choykloun> to keep backwards compatibility [06:53:05] <swolchok> also you're going to break the universe, that'll be fun [06:59:34] <_rafi_> [20:27] <alus> include ping times to google or something running at the same time [06:59:56] <_rafi_> I just remembered to ask: why is that important, when comparing to a reference ? [07:01:34] <_rafi_> Yesterday, the result was ~140, and this morning it's ~90 . So, it's probably in Europe... [07:02:01] *** Gottaname has quit IRC [07:02:14] *** Gottaname has joined #bittorrent [07:04:07] <_rafi_> off to uni... feel free to email a new test-build... :) ( http://forum.utorrent.com/viewtopic.php?pid=443331#p443331 ) [07:08:40] *** burris has quit IRC [07:15:32] *** ProperNoun has quit IRC [07:16:02] *** ProperNoun has joined #bittorrent [07:16:26] *** burris has joined #bittorrent [07:34:45] <n215_> can i prioritize ftp traffic on asa ? [07:34:55] <n215_> sorry [07:34:58] <n215_> wrong channel [07:46:37] <choykloun> fuck, i should stop making excuses and just implement REAL routing in jchgout [07:46:50] <choykloun> but its more fun to try crypto and dos and stuff :p [07:48:34] <choykloun> The_8472: by the way, i have a hypothesis from the data i collected that some client(s) might be more likely to send query you for peers if you give them a response with 'values', regardless of routing delta etc [08:22:21] <alus> _rafi_: the reference is important to see if latency is changing as uTorrent changes the speed limit [08:42:01] *** MassaRoddel has joined #bittorrent [09:07:15] <_rafi_> alus: you aer welcome to suggest a tool of your choice to graph the pings. I don't have any [09:08:55] <_rafi_> )... that will be synchable with uTorrent graph... (or add a test-ping-plot inside uT speed-graph) [09:09:04] <_rafi_> *are [09:10:31] <_rafi_> (what I meant was - since we have a test-run w/o the tcp_rate_control as a reference to compare with) [09:12:58] *** MassaRoddel has quit IRC [09:15:10] <_rafi_> and I think that ping test results that are measured when your connection bandwidth is almost fully utilized - are questionable at best [09:18:13] *** bt42 has joined #BitTorrent [09:36:57] *** bittwist has quit IRC [09:40:29] *** razvand has joined #bittorrent [09:45:51] <alus> _rafi_: just submit a ping log that ran the whole time [09:48:47] <_rafi_> k [10:34:03] *** ania has joined #bittorrent [10:34:26] <ania> hi u tere [10:35:30] <ania> private trackers idea? [10:35:32] <hlindhe> ania: not ontopic here [10:43:30] *** ania has left #bittorrent [11:05:13] *** Andrius has joined #bittorrent [11:12:39] *** Andrius has quit IRC [11:30:36] *** Waldorf_ has joined #bittorrent [11:32:05] *** MassaRoddel has joined #bittorrent [11:58:33] *** Snoopotic has joined #bittorrent [12:39:32] *** migi has joined #bittorrent [12:45:51] <migi> hey everyone. If in a cloud of peers, all using DHT without a central tracker, several peers fail in such an unfortunate way that the network gets "split" (say there's a major power down in a large area), how will the two groups reconnect with each other? [12:46:25] <migi> Or is that so improbable it'll just never happen? [12:48:31] *** rrr has quit IRC [12:57:56] *** Andrius has joined #bittorrent [12:58:49] <alus> migi: all peers will continue to announce to and get peers from the DHT [12:59:12] <alus> migi: it is Very Improbable that the DHT will be split [12:59:44] <alus> unless the split represents a real-world network split, like an office building being disconnected from the internet [13:00:01] <migi> alus: also for torrents with few peers? or is a DHT network not torrent-specific? [13:00:05] <alus> in which case you will get a mini DHT (!) and the peers will still transfer with each other [13:00:10] <alus> migi: the DHT is global [13:00:13] <migi> oh ok [13:00:25] <migi> so, yeah, I agree it would be extremely unprobable [13:05:56] *** MassaRoddel has quit IRC [13:11:27] *** kwinz2 has joined #bittorrent [13:17:15] *** MassaRoddel has joined #bittorrent [13:17:48] *** migi has quit IRC [13:31:09] <_rafi_> alus: a new test-example3 added including ping results ( http://forum.utorrent.com/viewtopic.php?pid=443331#p443331 ) No ISP traffic shaping at this time of day... last one Will be added - tonight [13:35:57] <alus> _rafi_: oh! you have a cache peer! [13:36:05] <alus> _rafi_: that explains it, I bet [13:36:39] <alus> _rafi_: so, most other people won't experience your problem, and it's probably easy to fix [13:40:54] <_rafi_> I have noted this cache peer yesterday as Well , but now it not over 25% of the traffic [13:41:29] <alus> if it, by itself, is greater that all uTP traffic, that would make sense [13:41:51] <_rafi_> yesterday - it was [13:42:04] <_rafi_> (see the peers list) [13:48:39] <_rafi_> the situation was 90% downloading from the cache peer (tcp) and 5-10% all the rest of the remote peers (150-200msec away) [13:53:06] <alus> you should tell the cache operator to use uT >= 1.8.5 ;) [13:57:38] *** Waldorf_ has quit IRC [13:57:49] <Astro> could anyone hint me to a spec for the scrape extension? [13:59:59] <alus> Astro: http://wiki.theory.org/BitTorrentSpecification#Tracker_.27scrape.27_Convention [14:01:40] <Astro> thxalot! [14:12:27] <_rafi_> alus: luck for us the cache operator is using bitcomet )as u can see in the peers list [14:12:46] <_rafi_> *lucky [14:12:55] <_rafi_> :p [14:20:59] *** Waldorf has joined #bittorrent [14:21:56] <The_8472> <choykloun> The_8472: by the way, i have a hypothesis from the data i collected that some client(s) might be more likely to send query you for peers if you give them a response with 'values', regardless of routing delta etc <- uhm. if you send them values then they already got peers, i'm not sure what you mean by "query for peers" [14:34:29] *** MassaRoddel has quit IRC [14:44:05] *** kwinz2 has quit IRC [14:48:32] *** rrr has joined #bittorrent [14:59:37] *** kwinz2 has joined #bittorrent [15:45:57] *** kwinz2 has quit IRC [15:51:16] *** rrr has quit IRC [15:52:36] *** rrr has joined #bittorrent [16:13:46] *** kwinz2 has joined #bittorrent [16:23:02] *** [NiNjA]UK has joined #bittorrent [16:27:28] *** klapaucjusz has joined #bittorrent [16:35:24] *** [NiNjA]UK has left #bittorrent [16:35:24] *** [NiNjA]UK has joined #bittorrent [17:49:35] *** burris has quit IRC [17:49:56] *** burris has joined #bittorrent [18:02:47] <_rafi_> alus: additional test+ping-log was added http://forum.utorrent.com/viewtopic.php?pid=443331#p443331 is there any more info that you require to handle the issue ? [18:07:22] <_rafi_> btw, it will be nice to know what are those controls for (also bt.ratelimit_tcp_only) and how they were intended to work [18:09:42] *** lxsameer has joined #bittorrent [18:10:01] <lxsameer> how can i speed up torrent downloading ? [18:10:16] <The_8472> by uploading faster [18:11:51] <lxsameer> my client connect to 1 or 2 peer at a time but its max is 60 , is there any way to force that to connect more peer [18:13:27] <alus> _rafi_: that's all I need, thanks [18:13:45] <alus> _rafi_: and those controls are advanced settings, intended for debugging like we just did :) [18:16:44] <_rafi_> np. hmmm... if they are just for debugging, why not just leave their default as "false"/inactive ? (and avoid all the hussle ?) [18:18:19] <_rafi_> oh, and I thought it was, and I qoute: "not 'buggy', just not as I intended it to work" ... :) [18:20:22] <_rafi_> ( [20:22] <alus> _rafi_: buggy is the wrong word. doesn't work like you expect ) [18:20:29] *** MassaRoddel has joined #bittorrent [18:28:10] *** lxsameer has quit IRC [18:53:15] *** ProperNoun has quit IRC [18:54:56] *** ProperNoun has joined #bittorrent [19:00:44] *** NanoSHAD0W has joined #bittorrent [19:42:00] *** NanoSHAD0W has quit IRC [19:54:12] <klapaucjusz> The_8472: I'm being told that there's been some discussion on DHT scrapes? [19:54:58] <The_8472> yes [19:55:09] <The_8472> well, no. mostly me investigating and posting my results [19:55:31] <The_8472> http://forum.bittorrent.org/viewtopic.php?pid=888#p888 [19:55:33] <klapaucjusz> Nothing beyond what you posted on forum.bittorrent.org? [19:55:55] <The_8472> the rest is in my head. [19:56:07] <klapaucjusz> Heh. [19:56:14] <klapaucjusz> Is that open source? [19:56:20] <The_8472> what? [19:56:25] <klapaucjusz> Your head. [19:56:48] <The_8472> hahahaha, sortof. if i can find the sources :P [19:57:51] <The_8472> i'm currently trying to devise a formula for intersect(S1,...,Sn) ... or rather trying to simplify and understand the formula for intersect(S1,S2) [19:59:48] <The_8472> |intersect()| actually, i only want the cardinality of the sets, not their elements [20:00:41] <klapaucjusz> The_8472: let's first discuss the new requests. [20:00:47] <klapaucjusz> Your suggestion is over-engineered. [20:01:12] <klapaucjusz> Suggestion: [20:01:30] <klapaucjusz> 1. get_peers has a new key "numwant", which is an integer. Can be 0. [20:01:34] <The_8472> maybe, but imo just doing counts is way too inaccurate since we cannot know how many are actually shared among the nodes [20:02:05] <klapaucjusz> 2. reply to get_peers SHOULD contain a key numpeers, which is a list of two integers (seeds, leechers). [20:02:22] <klapaucjusz> The_8472: in a well-functioning DHT, most keys are shared. [20:02:31] <klapaucjusz> Hence simply take the max. [20:02:47] <klapaucjusz> Or the average of the two largest values, if you're nervous. [20:03:11] <klapaucjusz> I particularly dislike your extensions to the want key, which becomes unmanageable. [20:03:39] <The_8472> a) noise b) inefficient implementations not reaching the target c) perimeter widening (be it intentional or not, e.g. to nodes being overloaded or their databases "full") [20:03:59] <The_8472> define "unmanageable" i find them quite simple [20:05:07] <The_8472> e) churn [20:06:56] <The_8472> i see too many uncertainties to just take the max(), you'll most likely under-estimate the amount of seeds/peers. [20:08:39] <The_8472> just a simple scenario: 10k seeds, 10k peers. 8 nodes responsible for 20k values. Now assume that an implementation limits its peer-sets to 200 values per key. If it is clever it starts rejecting announces with increasing probability before it reaches 200, ala RED [20:09:08] <The_8472> which will lead to 2 effects: a) not all nodes share all values b) you force the announcing peers to do perimeter widening, simply by omitting the token [20:09:54] <The_8472> in some cases you might even end up with almost completely disjoint sets stored on each node [20:10:31] <The_8472> but we do not know how disjoint the sets are or not [20:10:36] <The_8472> hence bloom filters [20:10:37] *** ajaya has joined #bittorrent [20:12:35] *** tolpico has joined #bittorrent [20:14:05] * The_8472 pokes klapaucjusz [20:21:54] <The_8472> i mean... just look at a node that has been up for some hours. other nodes store values on keys you shouldn't actually be responsible for [20:22:02] <The_8472> which means they're not hitting the target [20:22:19] <The_8472> which in turn means that values get "scattered" around the target, not always hitting the same nodes [20:25:49] <klapaucjusz> Sorry, I'll be back in 40 minutes. [20:32:44] <alus> _rafi_: they are just for debugging. leave them at their defaults (true) [20:34:10] <_rafi_> if you promise we'll have no download-speed degradation - I promise I will... :) [20:37:05] <klapaucjusz> The_8472: back. [20:37:07] *** quimby_ has joined #bittorrent [20:37:12] <klapaucjusz> Sure, you're going to get an underestimate. [20:37:24] <klapaucjusz> The trick is estimating by how much you're underestimating, and correct. [20:37:32] <klapaucjusz> My point is, you don't need accurate scrapes. [20:38:03] <klapaucjusz> You need scrapes in order to (1) order torrents in your queue, and (2) give a vague idea to the user. [20:38:22] <klapaucjusz> Having an underestimate, albeit a consistent underestimate, is fine for (1). [20:38:34] <klapaucjusz> As to (2), you just need to fudge the results to give the user a reasonable display. [20:38:42] *** quimby_ has left #bittorrent [20:38:47] *** [NiNjA]UK has left #bittorrent [20:38:48] *** [NiNjA]UK has joined #bittorrent [20:39:20] <The_8472> consistent underestimate... with tons of noise [20:39:41] <The_8472> especially since taking the max() basically gets you the outliers. [20:40:29] <The_8472> hrrm, no, not outliers. but still only 1 sample in a very noisy environment [20:40:40] <klapaucjusz> Then take alpha*(a + b) + beta * (a D b), a and b the two largest, for the right values of alpha, beta. [20:41:03] <klapaucjusz> What I mean is that you don't need to bloat your scrapes with a full node list, you just need to compute the right estimate from pure cardinality data. [20:41:04] <The_8472> the thing is, you can't accurately predict how much they share or not [20:41:13] <klapaucjusz> I know. [20:41:24] <klapaucjusz> You're repeating yourself, and not listening to what I'm saying. [20:41:41] <klapaucjusz> You don't need accurate values. You need a reasonable estimate. [20:41:44] <The_8472> pure cardinality data is worthless if you do not know whether you have to add or average. [20:42:13] <klapaucjusz> The_8472: may I most humbly request that you at least try to take into account what I'm saying? [20:42:27] <The_8472> if you take 2 values and can pick between add and average and pick the wrong one you're already off by an order of 2 [20:42:29] <klapaucjusz> Why do you need accurate values? [20:43:06] <The_8472> i'm not saying that i need 100% accurate values. but i want something that's at least showing the correct order of magnitude [20:43:14] <klapaucjusz> Okay, let's go back to the basics. [20:43:19] <klapaucjusz> Why are you scraping in the first place? [20:43:31] <klapaucjusz> I'm afraid that's where the misunderstanding lies. [20:43:35] <The_8472> <klapaucjusz> You need scrapes in order to (1) order torrents in your queue, and (2) give a vague idea to the user. <- [20:43:53] <klapaucjusz> So there's no misunderstanding there? [20:44:20] <The_8472> yes, but i don't consider off by a factor of 2 or more as "vague" i consider it as "simply wrong" [20:44:33] <klapaucjusz> I don't agree. [20:44:43] <klapaucjusz> Factor of two uncertainty is fine for your application. [20:45:09] <klapaucjusz> Obviously, if accurate values are possible to compute cheaply, that's better. [20:45:24] <klapaucjusz> But what you're doing is multiplying by 50 the amount of data a scrape reply carries. [20:45:26] <The_8472> yes, with bloom filters you can [20:45:33] <The_8472> oO [20:45:43] <klapaucjusz> memory usage is not the point. [20:45:46] <The_8472> it's 1 packet [20:45:52] <klapaucjusz> Network traffic is the point, and here, bloom filters buy you nothing. [20:46:09] <The_8472> they do, compared to sending exhaustive node lists [20:46:20] <klapaucjusz> Ah, sorry, you're putting the bloom filters in the packets themselves? [20:46:26] <The_8472> YES! [20:46:29] <klapaucjusz> Ah. [20:46:31] * klapaucjusz ponders [20:46:50] <The_8472> http://forum.bittorrent.org/viewtopic.php?pid=888#p888 [20:46:59] <The_8472> "response contains 2 bloom filters (details not yet specified) "seeds" and "peers"" [20:47:05] <The_8472> it's right there... [20:47:31] <The_8472> 'Then the requesting node can send a new RPC type, called "scrape", containing only the target key. the response contains two 512 byte long fields, named "seeds" and "peers", which are the bloom filters containing all seeds and peers for that infohash known by that node.' [20:47:33] <klapaucjusz> First reaction: 512 bytes is too much. Bloom filters are extremely efficient. [20:48:00] <The_8472> they're necessary if you're dealing with 10k+ values and then union/intersect the filters [20:48:32] <klapaucjusz> ? [20:49:02] <klapaucjusz> What's wrong with an 8x8 filter? [20:49:06] <The_8472> scrape = |union(filter1,...,filterN)| [20:49:17] <The_8472> unions set more and more bits to 1, increasing the false positive rate [20:49:38] <The_8472> so your error explodes astronomically if the filter is too small [20:49:56] <klapaucjusz> What's the value of N? [20:50:12] <klapaucjusz> No, it doesn't depend on the value of N. [20:50:16] <The_8472> 8 i'd say. maybe 16 for an exhaustive lookup with perimeter widening [20:50:18] <klapaucjusz> It depends on the cardinality of the set. [20:50:22] <The_8472> yes [20:50:30] <klapaucjusz> The set is <1000. [20:50:50] <The_8472> 10k peers... [20:50:52] <klapaucjusz> If it's >1000, then it's a large swarm, and you're not interested in the details. [20:51:03] <The_8472> yes, you are, for ordering [20:51:19] <klapaucjusz> What's the collision rate in a bloom filter, again? [20:51:31] <The_8472> depends on k [20:51:53] <klapaucjusz> In an 8x8 filter, which is also the easiest to implement? [20:52:25] <The_8472> 8x8? you mean 8 longs? [20:53:15] <klapaucjusz> Also, you can deal with a small number of false positives, can't you? [20:53:34] <The_8472> anyway, for an optimal selected k the error rate is about 0.618 ^(bits/cardinality) [20:56:13] <klapaucjusz> So if you say that you're willing to have an uncertainty factor of 2, it's enough to have 1000 bits? [20:56:44] <The_8472> hrrm, i do think so [20:57:17] <The_8472> but the rate increases dramatically once you exceed your selected bit count [20:57:23] <The_8472> towards infinity [20:58:19] <The_8472> bloom filter with 512 bytes (4096 bits), k parametrized for 800 values => it breaks down at about 12k values. [20:59:00] <The_8472> breaks down means estimated size = infinity [21:00:01] <klapaucjusz> Which is fine for the applications outlined above. [21:00:34] <klapaucjusz> http://forum.bittorrent.org/viewtopic.php?pid=893#p893 [21:03:23] <The_8472> hrmhrm. well. get_peers packets basically are "full" anyway [21:03:30] <The_8472> consider nodes+nodes6+values [21:03:40] <The_8472> that's why i suggested a new one [21:03:51] <The_8472> and if we do a new packet... we can just use it completely [21:03:58] <The_8472> 512bytes for peers, 512 for seeds [21:04:11] <klapaucjusz> nodes: 80 bytes. [21:04:20] <klapaucjusz> nodes6: 200 bytes. [21:04:49] <klapaucjusz> values: 300 bytes in IPv4, 900 bytes in IPv6. [21:05:07] <The_8472> nodes = 208 bytes [21:05:10] <klapaucjusz> So that's 1180 worst case, yeah, you're right. [21:05:28] <The_8472> nodes6 = 304 bytes [21:05:33] <klapaucjusz> Eh? [21:05:51] <The_8472> (20 bytes node id + 6 bytes ip + 2 bytes port)*8 [21:05:57] <The_8472> (20 bytes node id + 16 bytes ip + 2 bytes port)*8 [21:06:04] <The_8472> err [21:06:06] <The_8472> 4 bytes ip [21:09:20] <The_8472> so nodes6+values alone is enough to leave little room for anything [21:09:27] <The_8472> as for my want flags... [21:10:11] <The_8472> if i could i'd actually get rid of get-peers with nodes and just do find_nodes with additional parameters, i.e. just plain find-node, find-value, find-scrape, find-token [21:10:28] <The_8472> and then an announce/scrape/values packet [21:10:33] <The_8472> depending on what you want [21:10:42] <The_8472> announce and values could even be rolled into one [21:10:47] <The_8472> since you usually want to do both anyway [21:11:06] <The_8472> hrrm, maybe not [21:11:21] <The_8472> but still. find_node and get_peers feels utterly redundant now [21:12:13] <The_8472> the only difference is when you terminate your lookup, get_peers may terminate sooner than find_node, but that's all requester-side logic [21:12:22] <The_8472> doesn't require separate messages in the protocol [21:13:09] * The_8472 pokes klapaucjusz once again [21:13:34] <klapaucjusz> Sorry. [21:13:41] <klapaucjusz> Shall we move over to the forum? [21:13:52] <klapaucjusz> I'm not very good at real-time technical discussions. [21:14:27] <The_8472> hrrm... [21:14:31] <The_8472> if you want to [21:14:32] <klapaucjusz> Well, it's not really important. [21:14:45] <klapaucjusz> What you're saying is that you want to do the multiplexing at a different level, [21:14:49] *** GTHK has joined #bittorrent [21:14:53] <klapaucjusz> ping is find-whatever(want=none) [21:15:03] <klapaucjusz> find_nodes is find-whatever(want=nodes) [21:15:17] <klapaucjusz> get-peers is find-whatever(want=nodes, peers, token) [21:15:43] <klapaucjusz> So you're just pushing the demultiplexing one level into the packet. [21:15:43] <klapaucjusz> Yeah, it's more elegant, but not worth an incompatible change. [21:16:09] <The_8472> well, it gets really interesting when you want to scrape [21:16:21] <The_8472> since you need to know if the node has values to be scraped [21:16:31] <The_8472> but you don't really want the values, just an idication that it has values [21:17:10] <The_8472> the other want flags are obvious extensions that follow from distinguishing between seeds and peers [21:17:14] <klapaucjusz> I still don't understand what's wrong with get_peers(want=scrape) [21:17:35] <klapaucjusz> Er, no. [21:17:39] <The_8472> get_peers now always returns nodes lists. [21:17:52] <The_8472> not enough size for filters imo [21:18:14] <klapaucjusz> get_peers(peers=scrape) returns nodes lists from legacy nodes, and whatever data you want for a scrape from new nodes. [21:18:59] <The_8472> node lists are always returned to allow you to continue the lookup [21:20:18] <klapaucjusz> Sorry s/nodes/peers/ [21:21:11] <The_8472> just tested it. 256byte bloom filter with k=1 can't hold more than 17k values. i guess that would be ok if we mandate upper limits on how many values can be stored per node and i manage to figure out how to get inclusion-exclusion unioning to work [21:21:44] <The_8472> and we need 1 for seeds and 1 for peers. thus 512 bytes [21:22:14] <The_8472> though i'd rather like a k > 1 to increase accuracy at lower values [21:23:58] <The_8472> <klapaucjusz> get_peers(peers=scrape) returns nodes lists from legacy nodes, and whatever data you want for a scrape from new nodes. <- 304 bytes for nodes6 and a max packet size if 1200 bytes... so we'd only have 900 bytes for 2 bloom filters. since they should be a power of 2 for the ease of calculation that would only leave 256byte filters. [21:24:18] <klapaucjusz> Still not convinced that Bloom filters are necessary. [21:25:32] <klapaucjusz> What about trying out the relationship between the average of the two most busy nodes for a given torrent and the number of nodes obtained from a tracker scrape? [21:25:59] <klapaucjusz> My gut instinct is that you'll get excellent correlation once you do the fit (not necessarily linear). [21:26:20] <The_8472> there might be a relationship, but you'd only see it with a large sample size due to the immense noise [21:26:38] <klapaucjusz> If there's anyone who can try it out, it's you. [21:27:34] <The_8472> well, no, i can't :/ [21:27:50] <The_8472> too few plugin nodes on the v4 dht and not enough announces on the v6 one [21:28:01] <klapaucjusz> Yes, you can. [21:28:18] <klapaucjusz> for t in $torrents; do [21:28:32] <The_8472> you're forgetting that i can't get the count [21:28:36] <The_8472> only peer lists [21:28:37] <klapaucjusz> choose a node-id so hat I'm responsible for $t [21:28:42] <The_8472> which are by no way exhaustive [21:28:43] <klapaucjusz> Wait one hour [21:28:52] <klapaucjusz> Compute the number of stored values [21:28:58] <klapaucjusz> compare with the tracker [21:28:59] <klapaucjusz> done [21:29:39] <The_8472> then i can't compare with my neighbor ^^ [21:29:43] <klapaucjusz> The point is -- you're tracking the given torrent. [21:30:00] <klapaucjusz> No, but it will give you a good idea of how correlated the values are. [21:30:40] <The_8472> not the issue here [21:30:50] <klapaucjusz> ? [21:31:18] <The_8472> of course there's a correlation between count(Node_x) and count(tracker). the real issue here is the correlation between count(Node_x) and count(Node_y) [21:31:50] <The_8472> the correlation between node and tracker is as simple as: more peers = mode entries on the node [21:32:33] <The_8472> but that correlation is mostly based on a) accuracy of lookups b) number of trackers for that torrent c) DHT-enabled clients [21:33:07] <The_8472> i.e. contains arbitrary, possibly torrent-specific constants [21:33:15] <The_8472> hence i'm not interested in it [21:34:35] <klapaucjusz> Why? [21:35:12] <klapaucjusz> We agreed earlier that you only need a rough estimate. [21:35:28] <klapaucjusz> So we need to determine if the estimate obtained from the most loaded node is good enough. [21:35:31] <klapaucjusz> You claim it isn't. [21:35:36] <klapaucjusz> I claim I don't know either way. [21:35:45] <klapaucjusz> So I suggest you measure. [21:36:14] <The_8472> let me rephrase: the correlation between trackers and most loaded node depends on several external factors, not just related to the DHT [21:36:41] <The_8472> and it's probably quite noisy, a single measurement wouldn't cut it [21:36:59] <klapaucjusz> You think so? [21:37:36] <klapaucjusz> I'd really like to have some hard figures before we go the Bloom filter route. [21:37:37] <The_8472> i do, based on the horrible inaccuracy i've seen in other clients' lookup preocedures [21:37:53] <The_8472> *procedures [21:38:02] * klapaucjusz dislikes seeing complex data structures on the wire. [21:38:50] <The_8472> hrrm, yes. i understand that concern. since people would have to implement them to produce bit-exact copies from the same values [21:39:04] <The_8472> differently implemented bloom filters are worthless [21:39:47] <The_8472> we'd have to provide test vectors [21:40:47] *** tolpico has quit IRC [21:41:46] <klapaucjusz> A lot of work. [21:42:05] <The_8472> yes [21:42:08] <klapaucjusz> So while I have learnt to trust your intuition, I'd really like to see some hard proof that the simpler approach is unworkable first. [21:42:49] <The_8472> well, just an example: tracker reports 37 seeds, 7 peers. DHT found 69 nodes overall [21:42:55] <The_8472> more than the tracker even knows about [21:43:22] <klapaucjusz> That's very reasonable. [21:44:29] <The_8472> i'm just adding the unique values i get from the 8 closest nodes, could be more if they don't return everything [21:44:56] <klapaucjusz> Yeah, I understand that. [21:45:07] <klapaucjusz> But with 69, you're probably exhaustive. [21:45:36] <The_8472> approximately... you have to consider the birthday paradox [21:45:53] <The_8472> i ran into that while simulating bloom filters ^^ [21:46:58] <klapaucjusz> < klapaucjusz> So while I have learnt to trust your intuition, I'd really like to see some hard proof that the simpler approach is unworkable first. [21:47:47] <The_8472> next one. 31 peers/seeds from the tracker. 17 from the DHT. [21:48:10] <The_8472> 2 samples, already a difference of a factor of 4 [21:48:54] <The_8472> and i'm taking all values here, not just from the max() [21:50:10] <klapaucjusz> What are we measuring here exactly? [21:50:52] <The_8472> count of unique peer IPs returned by a get_peers lookup, over 8 nodes returning values [21:51:34] <klapaucjusz> What we are interested in is whether the number of unique DHT peers is correlated with the number of DHT peers stored in the most busy DHT node. [21:52:04] <klapaucjusz> Not whether the number of nodes that announce to the DHT is correlated with the number of nodes that announce to the tracker. [21:52:19] <The_8472> well, that was what you were talking about :P [21:52:24] <klapaucjusz> (The latter will depend on whether the tracker performs aa NAT check.) [21:52:38] <klapaucjusz> Then I said something stupid. Let him who never has cast me the first stone ;-) [21:52:41] <The_8472> they're all big public trackers. which generally don't do nat checks [21:52:53] <The_8472> but ok [21:53:05] <klapaucjusz> ok about the stones? [21:53:50] <The_8472> *casts ball of mud* [21:54:13] <The_8472> the problem is i can only measure this for small torrents [21:54:24] <klapaucjusz> That's already an interesting result. [21:54:24] <The_8472> big torrents is where it gets really interesting [21:54:32] <klapaucjusz> I claim otherwise. [21:54:41] <klapaucjusz> big torrent -> you don't need precise data. [21:55:23] <The_8472> well, i can't even measure medium sized torrents since the sum of unique peers will already exceed what's returned by a single node list [21:55:36] <The_8472> anyway, i don't want my queue ordering to be determined by pure noise [21:55:51] <klapaucjusz> Hmm. [21:55:58] <The_8472> anyway, i'll take a few measurements on small torrents [21:56:08] *** GTHK has quit IRC [21:57:08] <klapaucjusz> Summary of the discussion. [21:57:28] <klapaucjusz> The good news is that we agree that sending bloom filters on the wire should be avoided if at all possible. [21:57:38] <klapaucjusz> The bad news is that we're not sure we see a different solution. [21:58:17] <The_8472> i don't really agree there, i'm just playing along :P [22:00:49] <The_8472> now let me do the testing ^^ [22:08:49] *** razvand has quit IRC [22:09:12] *** punto has joined #bittorrent [22:09:56] <punto> hi.. are there any tools to "visualize" the DHT network? [22:11:47] <klapaucjusz> Not to my knowledge. [22:13:29] <klapaucjusz> Fancy developing one? [22:13:41] <klapaucjusz> It could be fun. [22:15:11] <The_8472> well, the mldht plugin visualized the routing table [22:15:13] <The_8472> but that's it [22:15:37] <The_8472> klapaucjusz, extra fun.... peers return redundant IPs [22:19:45] <klapaucjusz> Heh. [22:19:51] <klapaucjusz> What's the v key? [22:20:14] <The_8472> version [22:20:26] <klapaucjusz> What's the v key in the packets with redundant IPs? [22:20:44] <The_8472> not printing that out [22:21:00] <The_8472> can check soon ^^ [22:26:07] <punto> what's needed to be sure that you're looking at the whole network? don't nodes only talk to other nodes that are "close" to them? [22:26:58] <klapaucjusz> The whole network is some 10 million nodes. [22:27:49] <punto> but is there a way to walk through all of them? does the protocol allow it? [22:28:19] <The_8472> well, you can write a crawler that enumerates all nodes [22:28:25] <The_8472> but that'll take a while [22:29:26] <The_8472> about an hour i guess if you have a decent upload speed and a router that won't fry with the UDP traffic [22:29:30] <The_8472> no router at all would be best [22:29:42] <punto> can you just ask for a node's routing table? [22:30:37] <The_8472> you can dump its routing table with maybe 30 queries [22:31:29] <The_8472> but the closest ~3 buckets should be the most interesting ones [22:35:27] * klapaucjusz has some trouble visualising an Internet with no routers. [22:35:33] <klapaucjusz> Are you confusing router with NAT box? [22:39:08] <punto> if I generated 30 ids and published 30 nodes with these ids, would I get requests from all over the infohash space? [22:43:17] <klapaucjusz> Aye. [22:43:21] <klapaucjusz> Sound like a plan. [22:43:37] <klapaucjusz> Ehm, no, you wouldn't. [22:43:59] <klapaucjusz> You'd need some one million ids. [22:44:12] <klapaucjusz> Sorry for the confusion. [22:45:13] <The_8472> klapaucjusz, domestic router = always NAT [22:45:27] <The_8472> i didn't confuse, i implied [22:45:49] <klapaucjusz> The_8472: fair enough. I'll remain my pendantic self, though. [22:45:51] <punto> what are the "30 queries" then? [22:46:10] <klapaucjusz> punto: every node has 22-25 buckets in its routing table. [22:46:19] <choykloun> the_8472: i have a consumer-grade router at my home and it has NAT as an option (which i have disabled) [22:46:46] <choykloun> punto: yes you would [22:46:53] <choykloun> you can very well get it with 1 id [22:47:12] <The_8472> klapaucjusz: http://codepad.org/9qiim4et [22:47:17] <The_8472> looks like my data was too long, sec ^^ [22:47:28] <klapaucjusz> However, these 30 buckets are distributed non-uniformly. [22:48:06] <klapaucjusz> You need one million ids (roughly) in order to ensure that you appear at least once in the routing table of a node in each part of the id space. [22:48:26] <klapaucjusz> The_8472: will you or will I file a bug? [22:49:01] <The_8472> http://pastebin.com/m41caf8fc <- ~~ [22:49:39] [22:49:46] <klapaucjusz> (I'm so relieved I'm not guilty.) [22:49:57] <klapaucjusz> Will do, then. How do you want to be credited? As The_8472? [22:50:47] <The_8472> whitespace instead of underscore [22:50:55] <klapaucjusz> k [22:52:08] <The_8472> anyway, my finding is: in some cases the max() yields good results, but you have to perform exhaustive queries because due to inaccurate routing you basically already get perimeter widening and thus can miss the "core" nodes for a key. [22:52:51] <The_8472> and those some cases basically are tiny torrents with only a handful peers, where lookups always are exhaustive [22:53:07] <The_8472> with just 200 peers/seeds the accuracy drops [22:53:20] <The_8472> a lot [22:53:40] <klapaucjusz> The_8472: http://forum.utorrent.com/viewtopic.php?pid=443493 [22:53:46] <klapaucjusz> alus: http://forum.utorrent.com/viewtopic.php?pid=443493 [22:54:21] <klapaucjusz> So Bloom filters it is? [22:54:29] <klapaucjusz> Oh, my. [22:54:36] <klapaucjusz> I'm not writing the spec. [22:54:46] <punto> so I also can't dump a node's routing table in 30 requests? [22:54:53] <The_8472> you can [22:54:56] <klapaucjusz> Yes, you can. [22:55:13] <klapaucjusz> You just cannot expect to passively sit there and be contacted by everyone without having millions of ids. [22:55:27] <klapaucjusz> On the other hand, you can do some rather efficient things actively. [22:55:52] <klapaucjusz> Take a node id. [22:56:16] <klapaucjusz> Let id_k be id with its k-th bit inverted. [22:56:35] <klapaucjusz> Send to the node with id id find_nodes requests for id_k for k being 0 to 30. [22:56:44] <klapaucjusz> You'll get a full dump of the routing table. [22:57:04] <klapaucjusz> (For the right value of 30. I'd say 24 should be enough right now.) [22:58:41] <punto> so each bucket is like 128 bit space? [22:58:43] <The_8472> well yeah, 24 should do, i was thinking of enumerting half-buckets to have some overlap [22:58:54] <The_8472> no [22:59:09] <The_8472> read this: http://infinite-source.de/az/whitepapers/kademlia_optimized.pdf [23:00:15] <choykloun> whats the origin of this conversation? sorry no scrollback.. some academic research into dht? [23:01:13] <klapaucjusz> punto is starting to think about software to visualise the DHT. [23:01:20] <punto> I was wondering if there were tools to visualize the dht network [23:01:22] <klapaucjusz> The_8472 and myself are cheering. [23:03:19] <choykloun> that will be like a map where everyone can come and draw their own paths by improperly / lazily implementing routing .. :) [23:05:06] <choykloun> i got around 1.7m unique nodes in a few hrs from a single nodeID and top100 infohashes (although that's an extreme example :P) [23:05:30] <The_8472> klapaucjusz, good news is: perimeter widening works awesomely well. there shouldn't be any problems with rejecting stores and thus actively limiting load on nodes [23:05:48] <choykloun> that would look more like an anthill than anything usable :) [23:06:32] *** _rafi_ has quit IRC [23:08:10] <choykloun> fuck, i should get around to implementing 100% correct by-the-books routing also [23:08:18] <choykloun> so its usabe for anything else than crazy experiments [23:09:04] <choykloun> or maybe they get even nuttier when it behaves fine on the surface (evil grin) :) [23:09:12] <klapaucjusz> choykloun: what are you up to? [23:09:30] <klapaucjusz> The_8472: cool. [23:09:35] <klapaucjusz> I still don't understand why. [23:09:36] <choykloun> i do weird stuff with dht [23:09:48] <klapaucjusz> That much I had gathered. [23:09:59] <The_8472> he's dDoSing people [23:10:40] <choykloun> more like discovering critical design errors in the protocol and assisting in getting them fixed? :) [23:10:43] <punto> where do you get the top 100 infohashes? [23:11:09] <The_8472> you just listen for get_peers requests [23:11:09] <choykloun> dumped a top100 from some torrent site and did a bit of sed magic [23:11:32] <The_8472> though having a fixed node ID would introduce bias [23:11:44] <The_8472> biasing it towards torrent hashes near you [23:12:04] <choykloun> which as it turns out isn't really the case [23:12:05] <choykloun> well [23:12:10] <The_8472> so you'd have to violate the DHT assumptions to get a well-balanced view [23:13:14] <choykloun> you WILL get well-distributed delta's from just speaking with as many nodes as possible [23:13:23] <klapaucjusz> Okay, here's another issue. [23:13:31] <klapaucjusz> We need to develop an implementation of a bootstrap node. [23:13:41] <klapaucjusz> Currently, we're running normal DHT nodes for the bootstrap nodes. [23:14:06] <klapaucjusz> That's ungood: if a node gets into the routing table of a bootstrap node, it gets ddos'ed. [23:14:38] <klapaucjusz> My plan would be to have a node that, instead of maintaining a Kademlia routing table, maintains a huge (thousands of nodes) table that is evenly spread in the DHT space. [23:14:43] <choykloun> there's a bias but its not very big [23:14:43] <The_8472> it shouldn't, really. a good DHT implementation keeps a local cache of nodes and shouldn't ever need to contact the bootstrap node [23:14:52] <choykloun> it also depends on how you choose the initial nodes [23:15:15] <klapaucjusz> The_8472: in practice, however, the bootstrap nodes are contacted hundreds of times per second. [23:15:24] <klapaucjusz> thousands. [23:15:27] <choykloun> klapaucjusz: you dont need a lot of bootstrap nodes [23:15:31] <choykloun> ive compared using 200k to 200 [23:15:35] <klapaucjusz> No, we currently have 2. [23:15:41] <The_8472> klapaucjusz, sure. a bootstrap node doesn't need routing. but it has to ascertain the liveness of the nodes it hands out to some extent. [23:15:46] <choykloun> for purposes like this they were equal [23:15:56] <choykloun> as long as the nodes are randomly distributed [23:15:57] <klapaucjusz> Sure, normal liveness procedures. [23:16:08] <choykloun> ah [23:16:18] <choykloun> i meant bootstrap node as in node _I_ use for bootstrap [23:16:21] <klapaucjusz> Just instead of having a Kademlia routing table, have a uniform routing table of a few thousand nodes. [23:16:56] <The_8472> choykloun, uhm. you use one. not 200 or 200k. [23:17:14] <The_8472> bootstrap node = the node you fill an empty routing table with to get started [23:17:14] *** spopp has joined #bittorrent [23:17:19] <choykloun> the_8472: depends on what you are doing.. :p [23:17:21] <klapaucjusz> Additionally, we could change node ids every hour or so, to drop the bootstrap nodes from other nodes' routing tables. [23:17:22] <choykloun> yes [23:17:32] <The_8472> we're only talking about reasonable implementations here [23:17:43] *** Switeck has joined #bittorrent [23:17:45] <choykloun> (i call it IPL nodes though) [23:18:08] <The_8472> klapaucjusz, that's actually the reason why you were seeing so many DNS queries. nodes lookup the IP of the bootstrap node to actively remove it from their routing table ;) [23:18:12] <The_8472> at least mine does [23:18:45] <choykloun> anyway my point was that it doesnt really matter much how many nodes you bootstrap with [23:18:54] <The_8472> we know [23:18:59] <klapaucjusz> So, does my plan sound reasonable? [23:19:24] <klapaucjusz> Shall I write such an implementation, and replace one of our bootstrap nodes with it? [23:19:28] <choykloun> changing nodeIDs are usually a bad thing [23:19:36] <The_8472> sure, doesn't really matter what the bootstrap node does as long as it hands out any fresh and juicy... i.e. reachable nodes [23:19:56] <klapaucjusz> More radical: [23:20:12] <klapaucjusz> instead of returning the nodes the requestor asked for, [23:20:15] *** spoop has quit IRC [23:20:16] <klapaucjusz> return a random subset. [23:20:18] <klapaucjusz> Good idea? [23:20:26] <The_8472> no problem with that [23:20:32] <The_8472> since they'll just go to the root bucket [23:20:55] <choykloun> hm i dont think you should expect all implementations to behave perfectly... [23:20:56] <The_8472> the node can bootstrap itself from there on [23:21:03] <klapaucjusz> Ok. [23:21:08] <klapaucjusz> Then I'll give it a try. [23:21:18] <klapaucjusz> How many nodes shall I keep? [23:21:24] <The_8472> choykloun, mine does. borderline perfectly ^^ [23:21:26] <klapaucjusz> 10000? [23:21:40] <The_8472> just keep a rolling list. it doesn't have to be all that big [23:21:52] <The_8472> you have to consider that you have to do liveness checks after all [23:22:09] <klapaucjusz> Two circular lists: one of recently learnt-about nodes, one of confirmed-reachable nodes. [23:22:26] <The_8472> and randomly select from both [23:22:28] <The_8472> sounds good [23:22:39] <klapaucjusz> Eh, no. Randomly select from the confirmed-reachable nodes. [23:22:56] <The_8472> hrrm. and confirm from the recently learned one? [23:23:08] <The_8472> at a constant rate [23:23:31] <klapaucjusz> Yep. Using find-peers requests when the recently learned list is emptyish, using ping requests otherwise. [23:24:12] <The_8472> hrrm, excellent plan [23:24:52] <klapaucjusz> I'm glad you like it. [23:25:05] <klapaucjusz> Will think about hacking that together. [23:25:18] <klapaucjusz> alus: see above, might interest you. [23:25:28] <The_8472> though you should be careful of changing node IDs [23:25:37] <klapaucjusz> Yeah, it worries me too. [23:25:43] <klapaucjusz> I'm listening. [23:25:45] <The_8472> if you change it too often then you'll end up in too many local buckets [23:25:48] <choykloun> do it like this instead [23:25:52] <choykloun> have several IP addrs [23:25:56] <choykloun> dns round robin [23:25:59] <choykloun> 1 nodeID per ip addr [23:26:13] <klapaucjusz> Eh? [23:26:27] <The_8472> then he'd be responsible for even more IDs... [23:26:33] <choykloun> will prevent the problems changing nodeIDs frequently might cause [23:26:41] <klapaucjusz> What problems? [23:26:47] <choykloun> although in this case those might be good :> [23:26:53] <The_8472> the bootstrap node basically just needs an ID in a nice and silent corner of the keyspace. [23:27:01] <choykloun> ya [23:27:19] <The_8472> its neighbors might point to it on occasion, but then it'll just be a "bad" neighbor, i.e. not returning helpful results [23:27:26] <The_8472> but it'll only do that for 1 ID, that's it [23:27:27] <klapaucjusz> So what change frequency do you suggest, then? [23:27:32] <choykloun> ya and ya [23:28:07] <The_8472> change the ID once every few hours, that should do [23:28:32] <choykloun> is there, in real-life dht, any point in doing that? [23:28:39] <The_8472> hrrm [23:29:03] <The_8472> changing IPs would actually be better [23:29:11] <The_8472> so you get evicted from routing tables for sure [23:29:36] <The_8472> a bootstrap node is long-lived, so you might end up in many people's root buckets [23:29:41] <The_8472> though you do little harm there [23:29:52] <The_8472> since you don't expect "accurate" results from a root bucket node [23:29:57] <choykloun> just dont answer anythign basically [23:29:58] <klapaucjusz> choykloun: you are aware that we're *not* speaking about a normal DHT node. We're speaking about a DHT node that's designed to spread the load more evenly, and deliberately violates the spec. [23:30:04] <choykloun> yyes [23:30:13] <The_8472> but still, it might cause more traffic than necessary for the bootstrap node by the virtue of being long-lived [23:30:35] <klapaucjusz> mkdir dht-bootstrap [23:30:39] <klapaucjusz> emacs dht-bootstrap.c [23:31:12] <The_8472> so, it doesn't really do any harm having a fixed node ID and fixed IP. you'll just see more traffic than necessary [23:32:21] <choykloun> basically the less you talk the less traffic you get [23:32:50] <The_8472> well, he has to answer find_node and ping queries to allow other nodes to bootstrap [23:32:53] <choykloun> i even believe some implementations pick nodes to send to based mostly on number of answers received regardless of the ID delta :p [23:32:59] <The_8472> that alone is enough to get into lots of routing tables [23:33:18] <klapaucjusz> choykloun: which implementations? [23:33:30] <choykloun> let me think here.. couldnt he use another port for "outgoing" DHT ? [23:33:37] <The_8472> no [23:33:50] <The_8472> nodes should match the source to destination [23:33:59] <The_8472> and reject non-matching responses [23:34:01] <choykloun> klapaucjusz: just what i suspect from real data [23:34:31] <choykloun> the_8472: i was meaning that the bootstrap node clients bootstrap from uses one DHT port [23:34:58] <The_8472> well, it does [23:35:04] <The_8472> otherwise we couldn't contact it [23:35:13] <choykloun> but when it goes on a routing-table-learning mission it uses a different port/addr so they can be kept apart [23:35:30] <choykloun> so for clients wanting to IPL/bootstrap its nice and talkative [23:35:39] <The_8472> ah yes, that could be done. but it wouldn't matter [23:35:51] <choykloun> for others its autistic :) [23:36:08] <The_8472> you end up in routing tables by the virtue of being the bootstrap node [23:36:26] <The_8472> and then you get propagated because you're long-lived [23:36:35] <choykloun> still the less you talk and answer the less you will end up getting [23:36:43] <The_8472> since routing tables prefer long-lived stuff [23:37:07] <The_8472> sure, but it doesn't have to do much actively anyway [23:37:09] <choykloun> even when they dont even respond to ping much less find_node? :) [23:37:26] <The_8472> bootstrap node HAS to respond to ping and find_node [23:37:29] <choykloun> ye [23:37:30] <choykloun> s [23:37:33] <choykloun> from clients bootstrapping! [23:37:37] <choykloun> not the rest of DHT [23:37:42] <The_8472> it doesn't know that [23:37:50] <choykloun> thats why i suggested split horizon [23:38:12] <The_8472> no, i mean it doesn't know who's bootstrapping and who is just querying it because it heard from other nodes [23:39:11] <choykloun> couldnt you put some effort into making the bootstrap side distributed as little as possible [23:39:51] <The_8472> wouldn't do much good [23:40:07] <The_8472> at least not on the bootstrap node's side [23:40:28] <The_8472> on the client side you can do more. i resolve all bootstrap node addresses and kick them from the table [23:40:45] <klapaucjusz> choykloun: you suspect wrong. [23:41:01] <klapaucjusz> What you're seeing is just the way bucket replacement works. [23:42:02] <choykloun> well then why did i get a shitload of get_peers from nodes with basically random deltas? [23:42:08] <The_8472> especially if you fake response IDs then you're most certainly get inserted into local buckets. since local buckets are never full [23:42:14] <choykloun> as in tens per second [23:43:05] <choykloun> there was a bias towards lower deltas but it was more like a summit on mount everest than the Petronas tower :) [23:43:28] <The_8472> i'm seeing 5 get_peers per second on a regular, well-behaved client [23:43:38] <The_8472> you only need to be a little more noisy to see more [23:43:51] <choykloun> but what bias does the deltas have for you? [23:44:29] <The_8472> idk, not measuring them. [23:44:40] <The_8472> should be exponentially distributed though [23:44:57] <choykloun> k [23:45:12] <choykloun> coz with my craziness it certainly wasnt [23:45:16] <choykloun> i mean [23:45:32] <The_8472> if you fake node IDs then all bets are off [23:45:42] <choykloun> it was more or less random except for the closest ones [23:45:47] <choykloun> nope single node ID [23:46:17] <choykloun> announcing popular infohashes [23:47:14] [23:47:21] <The_8472> i suspect an array[random()] ^^ [23:47:32] <The_8472> called in a loop [23:50:18] <choykloun> klapaucjusz: btw to introduce myself, i'm basically a guy that likes playing with complex stuff [23:50:30] <choykloun> such as decentralized networking :) [23:50:50] <klapaucjusz> Yeah, I know. [23:50:56] <choykloun> i'm quite new to DHT [23:51:01] <klapaucjusz> We've already met. [23:51:04] <choykloun> ah [23:51:10] <klapaucjusz> It wasn't very pleasant ;-) [23:51:14] <choykloun> didnt recognize your nic [23:51:20] <choykloun> ah [23:51:22] <choykloun> its you [23:51:23] <choykloun> heh [23:51:26] <The_8472> bwhahahha [23:51:35] <choykloun> uhm [23:51:42] <klapaucjusz> Heh. [23:51:44] <choykloun> my DHT implementation is named jchgout [23:52:00] <choykloun> which is a combination of 'jch' and the khmer word for 'idiot' [23:52:01] <choykloun> :-) [23:52:15] <klapaucjusz> I'm flattered. [23:52:23] <klapaucjusz> The important thing is not that people speak well of me. [23:52:26] * The_8472 tips a barrel of liquid helium [23:52:27] <choykloun> idiot in khmer is like chgout/chkout [23:52:30] <klapaucjusz> The important thing is that they speak of me. [23:52:37] <choykloun> haha [23:53:18] <choykloun> http://knark.net/~user/chgout.png thats how its spelled properly :) [23:54:36] <klapaucjusz> Any chance you could spell jchgout in Khmer? [23:54:40] <The_8472> klapaucjusz, btw. if given the chance to do incompatible changes i'd do a lot more with the mainline DHT [23:54:55] <choykloun> klapaucjusz: not exactly [23:55:07] <choykloun> as it doesnt work like the latin alphabet does [23:55:07] <The_8472> http://forum.bittorrent.org/viewtopic.php?id=146 [23:55:12] <choykloun> each character is a syllable [23:55:41] <choykloun> then you write configuration parameters around it to change the details of it [23:56:08] <klapaucjusz> How do you code consonant clusters? Do you have a limit on their complexity? [23:56:29] <choykloun> you write the second consonant under the first [23:57:05] <choykloun> or well [23:57:09] <choykloun> sometimes it takes on special forms [23:57:20] <choykloun> so it gets very messy if you try to do it with more than 1 [23:57:30] <choykloun> especially when you later add signs to modify the vowel [23:58:01] <klapaucjusz> The_8472: yes, I've read that. [23:58:14] <klapaucjusz> I think that your major flaw is that you tend to over-engineer things. [23:58:16] <choykloun> the bottommost character of the 3 turns the o of the 'cho' syllable into more like oou [23:58:25] <klapaucjusz> Your love of complexity sometimes clouds your (otherwise sound) judgement. [23:58:28] <choykloun> then the rightmost one is a T [23:59:12] <choykloun> well i mostly work with stuff that HAS to be over-engineered [23:59:29] <choykloun> so you're half right half wrong [23:59:56] <The_8472> klapaucjusz, true. i do love complex systems. they give you room to tinker with [23:59:58] <choykloun> more in the periodic sense than the schroedingers sense