Switch to DuckDuckGo Search
   November 15, 2013  
< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | >

Toggle Join/Part | bottom
[00:00:39] <iAmTheDave> never used before. will take a look
[00:01:53] *** wez has joined #omnios
[00:07:25] *** wez is now known as wez|away
[00:08:01] *** wez|away is now known as wez
[00:11:32] *** wez is now known as wez|away
[00:21:15] *** wez|away is now known as wez
[00:28:07] *** esproul has quit IRC
[00:31:01] *** wuff has quit IRC
[00:38:51] <iAmTheDave> hmm. in prelim testing, the ssd backed instances are 2x read and write of non-SSD backed drives
[00:40:12] <apeiron> imo omnios is what you build your cloud out of
[00:40:17] <apeiron> rather than using EC2
[00:40:24] <apeiron> (opinions are very much not those of my employer)
[00:41:25] <iAmTheDave> apeiron: i hear you. this is more of an availability thing on my part.
[00:42:07] <iAmTheDave> on-demand, managed, point-and-click, that kind of stuff
[00:42:28] <iAmTheDave> unfortunately our entire API/backend development team plus our server management team together equals 1 person ;)
[00:42:39] *** joltman has quit IRC
[00:44:11] *** rjwill has quit IRC
[00:44:39] *** szaydel has quit IRC
[00:44:43] <danlarkin> iAmTheDave: do does the omnios ami boot on the c3 instances?
[00:45:51] <iAmTheDave> danlarkin: yup. it's still an EBS-backed instance, with two attached 80G SSDs that i'm putting in to a ZFS pool
[00:46:15] <danlarkin> oh, gotcha, so the ssds are instance-store volumes?
[00:46:28] <iAmTheDave> they are
[00:46:50] <iAmTheDave> the one thing i'm a bit confused about is how they're formatted (and if that matters)
[00:47:12] *** wuff has joined #omnios
[00:47:24] <danlarkin> whatdya mean formatted?
[00:48:45] <iAmTheDave> well, most disks in my non-unix world are formatted with a file system. adding them to a zfs pool seems... too fast ;)
[00:49:43] <iAmTheDave> getting 100-120 MB/s sustained write and 1.2GB sustained read, vs 50-55 MB/s sustained write and 675 MB/s sustained read
[00:49:58] <iAmTheDave> each 2 drives in a mirrored setup
[00:51:37] *** szaydel has joined #omnios
[00:53:52] <danlarkin> iAmTheDave: can you share details about your ec2 usage with omnios?
[00:54:58] <iAmTheDave> danlarkin: in what way?
[00:55:01] <iAmTheDave> and sure
[00:55:33] *** theason has quit IRC
[00:55:48] <danlarkin> how many nodes, have you built custom amis, are you using autoscaling, what config management
[00:57:36] *** desai has joined #omnios
[00:57:56] <iAmTheDave> how should i share this info?
[00:58:35] <iAmTheDave> i mean, do you just want me to type it in hee?
[00:58:41] <iAmTheDave> *here?
[00:58:42] <danlarkin> right here is fine with me, but I guess you can pm me or email or whatever you want I guess
[00:58:55] <iAmTheDave> oh, yeah, no, ok. i have no secrets
[00:58:59] <iAmTheDave> running omnios for riak
[00:59:29] <iAmTheDave> mostly for ZFS snapshots as a backup mechanism, instead of rolling node take-down file-copy backups
[00:59:59] <apeiron> if you're not sending your snapshots somewhere, they're not backups
[01:00:20] <iAmTheDave> apeiron: i will be sending them to a small instance with a ton of EBS-backed storage in another datacenter
[01:00:27] <apeiron> ++
[01:00:33] <danlarkin> why not s3?
[01:01:39] <iAmTheDave> friends of mine do this with 30TB+ of data, with 1 second snapshots. their setup is pretty sweet at this point in time
[01:01:54] <iAmTheDave> and they've volunteered to help with the setup
[01:02:19] *** [1]wuff has joined #omnios
[01:02:28] <danlarkin> nice
[01:02:34] <iAmTheDave> s3...? hmm. i don't know enough yet about ZFS backups and restores to tell you, but as i understand it, a computer needs to receive the diffs, no?
[01:02:54] <iAmTheDave> they turned me on to omnios to begin with, and i believe have a support contract with you guys
[01:03:31] <danlarkin> afaiu you can stick a snapshot wherever you want; it's just a file, right?
[01:04:04] <iAmTheDave> but when you're doing diffed snapshots every second, don't they need to be applied by a receiver?
[01:04:21] <iAmTheDave> i'll bring up s3 with them, since it's an option
[01:04:21] <danlarkin> yeah I guess for the diffs
[01:05:20] *** wuff has quit IRC
[01:05:21] *** [1]wuff is now known as wuff
[01:06:25] <iAmTheDave> in any case, i'll likely be running 5x the new c3.2xlarge instances. 15G RAM, 8 CPU (28 "ECU"), 2x 80G SSD
[01:06:59] <thebug> harder to restore if you're not receiving the diffs into a zpool, since all you have are base + diffs then, and you'd need to roll the whole thing forward until you've received every diff back into the pool/dataset you're trying to restore
[01:07:10] <iAmTheDave> cost before reserved instances comes in at around $2100/m - not the highest available bang-for-buck, but it'll hold until we get to bare metal
[01:07:27] <thebug> if you're receiving them into a pool/dataset on a 2nd box, then you can instantly pull files back out, or ship a merged new base back to the original box
[01:08:24] <iAmTheDave> thebug: that was my thinking - i could even replace servers by restoring the pools and not give riak any clue that there was a server replacement
[01:08:36] <iAmTheDave> less restore handoffs, etc.
[01:08:57] <thebug> it's probably more expensive that way than the s3 route, but much more convenient
[01:09:09] <thebug> (since you're paying for a 2nd machine + ebs)
[01:09:18] <iAmTheDave> thebug: i'm assuming i don't need a very powerful machine, just a bunch of storage
[01:09:33] <iAmTheDave> slow EBS is cheap, and so is a t1.micro or m1.small instance ;)
[01:09:48] <iAmTheDave> plus i can just add on EBS storage as needed
[01:09:49] *** Gibheer_ has joined #omnios
[01:09:53] <thebug> as long as the machine + the ebs backing it can keep up receiving the diffs as your normal machine hands them off, yeah
[01:10:00] *** Gibheer has quit IRC
[01:10:05] <iAmTheDave> right. that'll be something i'll have to test
[01:10:36] <thebug> I don't use ec2/ebs, but I do the snap zfs send -R -I ... -> ship over ssh -> zfs recv
[01:11:09] <thebug> not with riak, but I assume riak has some 'quiesce disk writes' mode you can smack it into while you're snapping
[01:11:11] <iAmTheDave> i assume that's how they're going to set me up
[01:11:33] <iAmTheDave> thebug: not sure if it does or not.
[01:11:52] <iAmTheDave> fact is, i may be overestimating the power of zfs snapshots, but I'm still working on it
[01:11:54] <thebug> worth checking into. I'm not familiar with riak, but I do that with postgresql
[01:12:16] <thebug> I doubt you're overestimating, you just want to ask the db nicely to clean up and stop writing for a moment while you snap
[01:12:18] <iAmTheDave> if i'm doing - for the sake of argument - 5s diff snapshots, why would i need to pause the db each time?
[01:12:41] <iAmTheDave> i think they're going to try to set me up with 1s snapshots - so they're incredibly small
[01:12:43] <thebug> well, the idea is you'd *rather* have the on-disk bits of the db be in a consistent state
[01:12:52] <iAmTheDave> ok
[01:13:09] <thebug> zfs will most definitely take an instantaneous snapshot, but if you can arrange that the db cleans up first, you'll be happier with the results should you ahve to restore from a snapshot
[01:13:53] <thebug> I would think 1s snaps would be rather abusive
[01:14:12] <iAmTheDave> thebug: maybe. not sure why they do it that way. will have to ask
[01:14:21] *** desai has quit IRC
[01:14:35] <iAmTheDave> well, actually i know - they can't lose any data, basically
[01:15:01] <thebug> I would think synchronous replication at the db level would help there, rather than relying on the fs
[01:15:14] <iAmTheDave> for all intents and purposes, 5x nodes is enough that i shouldn't NEED pool backups...
[01:15:21] <thebug> I love me some zfs, but that seems like maybe the wrong tool in this instance [for that particular functionality]
[01:15:24] <iAmTheDave> but i'm too skidish for that
[01:15:47] <iAmTheDave> they have like 5 layers of raid/mirrored/snapshots/etc/
[01:16:15] <iAmTheDave> and for them it's not a db, it's regular file storage. for me, it's db
[01:16:36] <arekinath> http://docs.basho.com/riak/latest/ops/running/backups/ might be relevant
[01:16:41] <arekinath> for riak backups
[01:16:45] <iAmTheDave> once i can afford an enterprise license with basho and have a second cluster, i may just stop doing the snapshots
[01:16:53] <arekinath> if you're using bitcask you can hot backup, but leveldb you have to shut the node down to be safe
[01:17:03] <iAmTheDave> yeah, we're on leveldb
[01:17:14] <thebug> yep, was just the random thoughts leaking off the top of my brain when you said it was for 'must not lose data' db backups
[01:17:15] <iAmTheDave> hence the desire to not have to do rolling take-downs for backups
[01:17:17] <arekinath> since google refuse to implement quiesce in leveldb, they closed teh bug with WontFix
[01:17:26] <iAmTheDave> arekinath: is that right?
[01:17:26] <thebug> I don't know enough about riak to make any claims about the right way to deal :)
[01:17:37] <iAmTheDave> thebug: me neither ;)
[01:17:58] <iAmTheDave> oh. last but not least, they'll be in a VPC behind an ELB.
[01:18:05] <arekinath> you can use leveldb repair usually to restore things out of a hot riak backup, but it doesn't work very well, it's incredibly slow, and you might lose data
[01:18:11] <arekinath> so it's a terrible method of doing regular backups
[01:18:29] <arekinath> shutting the node down is the only way with riak+leveldb at the moment
[01:18:33] <iAmTheDave> yeah, for all the good that's in leveldb, it seems there's some nasty when you get to data corruption
[01:19:46] <iAmTheDave> arekinath: so what i'm reading is i'd have to pull it down, even with ZFS...
[01:20:05] <arekinath> yeah, but ZFS will let you pull it down for much less time than you would have to otherwise. :P
[01:20:13] <iAmTheDave> shoot. well, fair enough
[01:20:48] <iAmTheDave> that changes the strategy a bit. not too much, but i wasn't thrilled with the idea of having backups have to take down the node.
[01:21:08] <arekinath> yeah, it's not ideal. but that's why you should have plenty of nodes I guess
[01:21:40] <iAmTheDave> at that point, is it even worth backing up?
[01:22:00] <iAmTheDave> that's the whole point of the 5x minimum in production, as i understand it
[01:22:18] <arekinath> depends on your needs and how valuable the data is, of course
[01:22:19] <arekinath> haha
[01:22:30] <thebug> well, you could still use one of the 5 nodes as your frequent "shut it down, back up, restart" node
[01:22:40] <thebug> so it'll bounce in and out of the pool, if that's possible in riak
[01:22:49] <arekinath> you have to do it to all the nodes
[01:22:55] <thebug> ah
[01:22:55] <arekinath> to guarantee you back up every partition
[01:23:08] <arekinath> but yeah, riak handles nodes going down and up just fine
[01:23:10] <arekinath> it's designed for that
[01:23:14] <iAmTheDave> thebug: it's a 3x storage redundancy setup
[01:23:18] <thebug> was gonna say, at least then you have some backup, but I guess it's not actually running 5x copies, it's partitioned ...
[01:23:21] <thebug> right
[01:23:37] <iAmTheDave> yeah, the ease of handling downtime is half the reason i'm going with riak
[01:23:49] <arekinath> you could do it to a covering set if you know that all your buckets use n=3 and are set up the right way... but I wouldn't count on that
[01:23:56] <arekinath> back them all up if you're going to back it up at all
[01:24:07] <iAmTheDave> yeah. i'm not that kinda cowboy
[01:24:28] <iAmTheDave> our data is our entire business. if it goes away, so do the majority of our players
[01:24:37] <arekinath> then you probably want to back it up. :P
[01:24:37] <arekinath> haha
[01:24:54] <iAmTheDave> well, the snapshot can't take long, so i can deal with a cron job
[01:25:03] <iAmTheDave> frequency is the only thing i'll need to figure out
[01:25:14] <arekinath> there's also the multi-DC replication in riak enterprise to consider
[01:25:25] <iAmTheDave> maybe i'll monitor snapshot sizes in graphite, so i can see when they're getting bigger as traffic increases
[01:25:49] <arekinath> then you can do the stop-snapshot-start on the cluster that's got less load on it at the time and things like that
[01:25:50] <iAmTheDave> can't quite afford enterprise yet - once i have the $ for that and another cluster, i'll do that and probably stop with the backups
[01:26:09] <iAmTheDave> or, like you're saying, do them only on the slower cluster
[01:26:14] <iAmTheDave> s/slower/lower load/
[01:27:15] <arekinath> the annoying part is, leveldb normally operates like an append-only store until it does a merge cycle
[01:27:19] <arekinath> it would be ideal for hot backups
[01:27:24] <arekinath> if you could pause the merge thread
[01:28:14] <arekinath> but as it is, if you snapshot in the middle of a merge, which on a busy leveldb is reasonably likely to happen...
[01:28:20] <arekinath> you can miss some of the data that's in flight
[01:28:36] <arekinath> but anyway
[01:28:42] <iAmTheDave> ah.
[01:29:29] <iAmTheDave> i'm using leveldb mostly for the LRU key purge from RAM (dynamic, now, in riak 2.0) - secondary indexes are nice, but i have no need for historical data to have keyspace in ram
[01:30:14] <arekinath> they were working on hanoidb for a while to solve some of these issues and get a backend with more consistent latency, better visibility and finer ops controls
[01:30:17] <arekinath> but it's been abandoned now
[01:30:25] <arekinath> leveldb is apparently "good enough" in spite of the issues
[01:30:26] <arekinath> :P
[01:33:00] <iAmTheDave> really? that's boo
[01:34:02] *** kohju_ has joined #omnios
[01:35:02] *** kohju has quit IRC
[01:37:06] *** desai has joined #omnios
[01:38:44] *** szaydel has quit IRC
[01:42:41] <iAmTheDave> question for anyone on AWS - if my riak data store is SSD-backed, do i need my root drive to be EBS-optimized for the OS?
[01:47:18] *** wez is now known as wez|away
[01:48:18] *** wez|away has quit IRC
[02:09:02] *** jpeach_ has joined #omnios
[02:10:37] *** wez has joined #omnios
[02:12:14] *** jpeach has quit IRC
[02:13:58] *** jpeach_ has quit IRC
[02:21:25] *** wez is now known as wez|away
[02:22:09] <berend> iAmTheDave: EBS optimisation is always optional.
[02:22:18] <berend> Unless you write a lot to your root drive, I wouldn't bother.
[02:25:44] <iAmTheDave> berend: thanks. i was hoping i could forgo that extra cost
[02:47:46] *** esproul has joined #omnios
[02:51:32] <ashley_w> iAmTheDave: i convinced work to let me switch our mongodb servers from using ext4 to zfs (stuck with centOS), and i have minutely snapshots running, while improving non-indexed read performance significantly and not hurting write performance (it might actually be better).
[02:53:17] <ashley_w> i turned off mongodb's journaling (which is slow) and flush to disk and lock the db for snapshots.
[02:54:24] *** jeffrymolanus has quit IRC
[03:13:39] <esproul> I try to do something good, and all I get is pain
[03:13:51] <esproul> trying to update the ruby-19 we have in ms.omniti.com
[03:14:01] <esproul> 'cause it's, you know, old.
[03:14:18] <bdha> That's what you get.
[03:14:34] <esproul> what do I get for my trouble? undefined symbol 'signbit'
[03:14:52] <esproul> yo bdha how's it going?
[03:15:01] <bdha> Not too shabby.
[03:15:11] <bdha> Yourself?
[03:15:33] <esproul> good, mostly on Circonus now, haven't been able to keep up with OmniOS this cycle
[03:16:41] <esproul> so this is what happens: https://gist.github.com/esproul/7478041
[03:17:16] <esproul> same thing happens on the version we already have, which was built on r151002
[03:17:33] <esproul> so I'm guessing it's something in OmniOS that's changed in the last 18 months
[03:19:42] <bdha> http://mattconnolly.wordpress.com/2013/04/18/building-ruby-2-in-smartos/ # ruby2, but signbit.
[03:19:53] <bdha> https://bugs.ruby-lang.org/issues/8268
[03:20:07] <bdha> Check the pkgsrc patches?
[03:22:25] *** hlangeveld has quit IRC
[03:22:36] <richlowe> signbit should be in math.h
[03:22:48] <richlowe> probably only when c99 though.
[03:22:56] <bdha> Or Rich will
[03:22:59] <bdha> just fix it for you.
[03:23:01] <richlowe> because it's not like slavish adherence to header namespacing ever screwed anyone.
[03:23:02] <bdha> That's my strategy.
[03:23:23] <esproul> signbit(x) is different though, right?
[03:23:34] <esproul> that's in iso/math_c99.h
[03:23:41] <richlowe> which we include via math.d when you're c99
[03:24:11] <richlowe> and which, if we saw the macro, wouldn't leave a symbol reference...
[03:24:32] <richlowe> sneak -std=gnu99 onto the gcc command line, and I bet it finds it
[03:24:43] <richlowe> and I'd certainly _hope_ nothing would break
[03:24:55] <esproul> richlowe: I'll try that
[03:24:56] <richlowe> 'cos '99 is not exactly futuristic.
[03:25:01] <esproul> heh
[03:25:10] <esproul> gonna compile like it's 1999
[03:27:51] <apeiron> esproul, dude. upgrading the ruby is *PAIN*
[03:28:16] <apeiron> seriously it's what's in the box in that Dune scene
[03:28:22] <apeiron> I have some notes on it if you're interested
[03:28:45] <apeiron> the signbit thing is because of symbol visibility hilarity, iirc
[03:28:58] <esproul> apeiron: I knew that going in, but there are security bugs
[03:29:04] <apeiron> there are
[03:29:08] <apeiron> that's partly what motivated me
[03:29:10] <apeiron> see also: jtimberman
[03:29:52] <apeiron> oh, no, wait
[03:29:55] <apeiron> the signbit thing is not that
[03:30:25] <apeiron> it's this: https://www.illumos.org/issues/1989
[03:30:48] <apeiron> It originally appears to be visibility hilarity
[03:30:56] <apeiron> but the real issue is what the issue above outlines
[03:31:08] <apeiron> and then you run into issues with rubygems assuming the whole world is an x86 loonix box
[03:31:22] <apeiron> hardcoded assumptions that /usr/bin/make is gmake
[03:31:29] <apeiron> which requires a patch to rubygems to properly fix
[03:31:44] <apeiron> I'm not sure if that's been pushed out yet. it was supposed to be out RSN some months ago
[03:31:57] <apeiron> and then with those patches you'd be able to produce something viable
[03:32:07] <apeiron> let me see if I still have my omnios-build checkout that produces working bits
[03:33:00] <esproul> apeiron: to add to the hilarity, someone rolled p194 but didn't commit that to the build script
[03:33:07] <esproul> which still says p125
[03:33:17] <apeiron> er
[03:33:22] <esproul> so we don't really have a record of what was done to get p194 built
[03:33:32] <apeiron> there is.... a nonzero possibility that that was me
[03:33:38] <apeiron> as I was playing with it
[03:33:39] <esproul> no, it predates you
[03:33:43] <apeiron> oh
[03:33:46] <apeiron> \o/
[03:33:49] <esproul> may 2012
[03:34:05] <apeiron> do I get an achievement in taskman for trying to take blame for something I couldn't have done?
[03:34:25] <esproul> the Fall On Your Sword achievement
[03:34:55] <esproul> richlowe: jamming in some gnu99 did allow it to get past the miniruby linkage
[03:35:41] <esproul> and… omg. export CLFAGS
[03:35:49] <esproul> in our build script
[03:35:54] *** wez|away is now known as wez
[03:35:55] <esproul> no wonder it didn't take effect
[03:36:58] * apeiron git blames, sighs of relief
[03:38:46] <esproul> well hey look at that, it builds!
[03:39:48] <apeiron> \o/
[03:40:03] <apeiron> I filed a pull req about stuff for gems
[03:40:10] <apeiron> because otherwise nothing needing to do C linking works
[03:40:19] <apeiron> Theo had some objections, I think, or maybe Mark
[03:40:28] <esproul> is that true with the current version that we have published?
[03:40:42] <apeiron> hmm
[03:40:46] <apeiron> I don't think so
[03:40:56] <esproul> so I shouldn't publish this one just yet
[03:41:04] <apeiron> right
[03:41:09] <esproul> ko
[03:41:13] <apeiron> try publishing locally and building e.g. the ffi gem
[03:42:25] <arekinath> I seem to remember at one point resorting to adding a directory to PATH that just had "make", a symlink to gmake
[03:42:46] <arekinath> it worked and I got very depressed
[03:43:43] <wuff> aha! just got the kernel panic to occur again.. gonna wait out the dump this time
[03:45:15] *** jpeach has joined #omnios
[03:45:21] <bdha> arekinath: export MAKE=gmake?
[03:45:43] <apeiron> essentially that's what you need to do for the ruby build
[03:45:48] <arekinath> bdha: quite a few gems actually hard-code "make" commands and expect it to be gmake
[03:45:51] <bdha> ah.
[03:45:56] <arekinath> bdha: that works for ruby itself though
[03:46:00] <bdha> Yeah.
[03:46:38] <apeiron> arekinath, that's because mkmf expects gmake syntax
[03:46:49] <apeiron> mkmf is like ExtUtils::MakeMaker if you're familiar with perl
[03:46:53] <apeiron> or gyp from node
[03:47:15] <apeiron> or something that's probably egg or spam-related if you're a python coder
[03:47:19] <wuff> does the console log help any with the dump? http://i.imgur.com/nGakKyS.png?1
[03:48:16] <apeiron> not immediately, but someone who's been following recent developments more might have more info
[03:49:38] <wuff> i guess i could upgrade to latest just to see if it's already been addressed..
[03:50:54] <apeiron> esproul, I found my patched bits if you're interested in looking at a diff
[03:51:03] *** gmason has quit IRC
[03:51:17] <apeiron> funny, I was going to erase this VM today and something stopped me
[03:51:25] <apeiron> "nah, keep it around, you'll regret it if you ditch it"
[03:51:28] <apeiron> (I have backups, but still)
[03:52:03] <esproul> apeiron: I have it built and installed locally on my build zone
[03:52:12] <apeiron> ok
[03:52:52] <wuff> interesting.. found this on nexenta forums: http://www.nexentastor.org/boards/2/topics/7680
[03:53:52] <richlowe> dump would definitely be necessary to make any real progress on that
[03:53:56] <apeiron> ^
[03:54:03] <richlowe> can you repeat it on demand?
[03:54:09] <apeiron> the panic message being the same is not immediately indicative of it being the same problem
[03:54:50] <richlowe> the last time I saw that panic it was wideranging, but subtle, memory corruption
[03:55:08] <richlowe> which doesn't mean it is this time, either.
[03:57:04] <wuff> yes, pretty much on demand.. all i need to do is use esx and have a bit of disk activity (configured for SRP)
[03:57:18] <wuff> first time it happened was during business hours today.. now i'm reproducing it
[03:57:36] <wuff> (adding new vmware hosts on top of hyper-v hosts)
[04:01:33] <esproul> apeiron: both existing and new ruby fails to build the ffi gem, for the same reason:
[04:01:34] <esproul> checking for ffi.h... *** extconf.rb failed ***
[04:01:51] <esproul> so the new one appears no more broken. :)
[04:01:56] <apeiron> yeah
[04:02:03] <apeiron> I have fixes that I have verified actually work
[04:02:15] <apeiron> though I want to look at the new rubygems release to see if it has the hooks we need
[04:02:26] <apeiron> (specifically, respecting the MAKE env variable)
[04:03:26] <esproul> ok
[04:03:28] <apeiron> you probably need to install the libffi IPS package
[04:03:45] <apeiron> hm, or... something, damnit
[04:03:50] <esproul> I have it
[04:03:53] <esproul> it's a build dep
[04:03:59] <apeiron> ok, hm
[04:04:00] <apeiron> that's curious
[04:04:15] <apeiron> when you build gems it should keep the extracted build directory around
[04:04:25] <apeiron> iirc there's a ./configure so you should have a config.log
[04:05:06] <esproul> I don't think it gets that far
[04:05:58] <richlowe> huh, I thought ESX + disk activity is pretty much what all of the Delphix systems were
[04:06:01] <richlowe> so I'd hoped it wasn't fragile
[04:08:57] <esproul> apeiron: heh, helps if I have a PATH that gets me 'gcc'. ;)
[04:09:04] <esproul> ffi gem works
[04:09:07] * apeiron grins
[04:09:14] <apeiron> yeah, that's kinda useful... :)
[04:10:50] <esproul> so, I guess we're good then.
[04:11:04] <apeiron> I would try building something nontrivial
[04:11:06] <apeiron> like chef
[04:11:11] <esproul> heh
[04:11:17] <esproul> not tonight
[04:11:20] <apeiron> ok
[04:11:26] <apeiron> I remember I got the ffi gem going and thought I was good
[04:11:28] <apeiron> then tried chef and kaboom
[04:11:31] <apeiron> so I had more stuff to fix
[04:11:43] <esproul> happy to drop you a p5a of the new version
[04:11:52] <apeiron> sure
[04:11:58] <esproul> hold please
[04:15:26] <apeiron> I'll play with that soon
[04:15:30] <wuff> who/where do i get help with this dump, if possible?
[04:15:32] <apeiron> probably after the 008 push is done
[04:15:48] <apeiron> wuff, start with the mailing lists
[04:15:56] <bdha> wuff: Put it on thoth.
[04:16:29] <bdha> https://github.com/joyent/manta-thoth
[04:16:30] <wuff> gmane.os.omnios.general?
[04:19:18] <apeiron> I don't know what that maps to
[04:19:22] <apeiron> if it's omnios-discuss, yes
[04:19:27] <apeiron> gmane's... weird
[04:20:04] <thebug> use http://lists.omniti.com/mailman/listinfo/omnios-discuss , not the gmane interface
[04:20:10] <wuff> yeah, just found that
[04:21:38] *** esproul has quit IRC
[04:21:54] *** desai has quit IRC
[04:28:08] <wuff> looks like i'm on OmniOS v11 r151006.. here was the console after reboot: http://i.imgur.com/wIx0g6J.png?1
[04:28:34] <wuff> going to update to r151006y
[04:28:49] <apeiron> that is expected output after a crash
[04:29:09] <apeiron> it tells you what to do to preserve the crashdump so you can upload it for people to look at
[04:33:32] <wuff> so i see something called /var/crash/unknown/vmdump.0 that is ~7GB.. that's it, right?
[04:34:18] <wuff> the system won't delete it if i leave it there correct?
[04:34:35] <apeiron> you did the savecore right?
[04:34:54] <apeiron> assuming you did the savecore, yeah, you're good
[04:35:13] <richlowe> bdha: didn't think normal people could
[04:35:29] <richlowe> pretty sure I can't, anyway.
[04:39:44] <wuff> i haven't done anything.. i just went to the /var/crash/unknown folder and checked what was there
[04:40:40] <apeiron> run savecore like the system asked you to
[04:40:57] <bdha> richlowe: I assume you need a JPC account.
[04:41:07] <bdha> I don't know what the pricing looks like for Manta storage, though.
[04:41:14] <bdha> (I could be confused)
[04:41:21] <apeiron> yeah, I don't think it's free
[04:41:24] <apeiron> but!
[04:41:27] <apeiron> they do offer you a free evaluation
[04:41:32] <apeiron> for two months I think
[04:41:43] <apeiron> should be enough to get your dump debugged, I think, or realize that no one wants to look at it
[04:42:06] <bdha> wuff: http://www.c0t0d0s0.org/archives/4389-Less-known-Solaris-features-About-crashes-and-cores-Part-4-Crashdump-analysis-for-beginners.html
[04:42:11] <bdha> That whole series may be useful for you.
[04:42:30] <apeiron> and, as always, the perennial favorite of reading the manuals is good too
[04:42:32] <wuff> looks like it was enabled:
[04:42:33] <apeiron> savecore, dumpadm
[04:42:34] <wuff> > dumpadm
[04:42:34] <wuff> Dump content: kernel pages
[04:42:34] <wuff> Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
[04:42:34] <wuff> Savecore directory: /var/crash/unknown
[04:42:35] <wuff> Savecore enabled: yes
[04:42:35] <wuff> Save compressed: on
[04:43:20] <apeiron> if you don't have the vmcore file you haven't run savecore
[04:43:46] <bdha> wuff: Right now the core is on your dump device. You need to run savecore to pull it off into /var/crash.
[04:51:52] <richlowe> he did, hence the vmdump.0?
[04:51:54] <apeiron> ok, good
[04:51:54] <apeiron> sorry, must've misread the above
[04:52:01] *** lotheac_ has joined #omnios
[04:52:50] *** storkone has quit IRC
[04:52:54] *** lotheac has quit IRC
[05:07:16] *** infernix has quit IRC
[05:08:36] *** infernix has joined #omnios
[05:26:03] *** wez has quit IRC
[05:33:05] *** jpeach has quit IRC
[05:33:41] *** jpeach has joined #omnios
[05:38:42] *** jpeach has quit IRC
[06:04:00] *** wez has joined #omnios
[06:13:26] *** wez has quit IRC
[06:16:52] *** wez has joined #omnios
[06:24:59] *** _Tenchi_ has quit IRC
[06:25:27] *** _Tenchi_ has joined #omnios
[06:39:04] *** nefilim has quit IRC
[07:29:17] *** slx86 has joined #omnios
[07:29:38] *** slx86 has quit IRC
[07:31:14] *** slx86 has joined #omnios
[07:40:08] *** nikolam has joined #omnios
[08:54:45] *** Gibheer_ is now known as Gibheer
[08:58:07] *** KermitTheFragger has joined #omnios
[09:04:57] *** bens1 has joined #omnios
[09:13:02] *** bens1 has quit IRC
[09:16:01] *** ilovezfs has quit IRC
[09:27:54] *** ilovezfs has joined #omnios
[10:00:02] *** jeffrymolanus has joined #omnios
[10:06:14] *** lotheac_ is now known as lotheac
[10:19:54] *** wez has quit IRC
[10:21:02] *** wez has joined #omnios
[12:12:52] *** wez has quit IRC
[12:14:11] *** storkone has joined #omnios
[13:16:03] <lotheac> is it just me or does zetaback's failure mode kinda suck? it seems to run ssh $target $agent -l in a loop if there is some failure anywhere and that resulted in 18 simultaneously running agents on my agent machine because cron was running zetaback hourly on the storage node...
[13:17:21] <lotheac> it pains me greatly but I think we're going to have to NIH a zfs backup solution :)
[13:19:33] <lotheac> I don't get why none of the existing ones use zfs holds for example...
[13:23:58] *** khushildep has joined #omnios
[13:32:07] *** szaydel has joined #omnios
[13:50:17] *** slx86 has quit IRC
[14:00:12] <bdha> lotheac: Patches welcome?
[14:00:28] <bdha> lotheac: I've also used https://github.com/pobox/replicator with some success, but it has similar "only used by one org" problems.
[14:00:35] <bdha> Also, no docs. :)
[14:01:04] <bdha> I've also used zetaback successfully, so.
[14:01:04] <lotheac> bdha, I already submitted one PR to zetaback, but I kinda suck at perl :P
[14:01:45] <lotheac> it's also a bit painful to read
[14:01:49] <lotheac> due to being perl
[14:02:52] * bdha shrugs.
[14:03:35] <lotheac> I was thinking of a system that doesn't take or prune snapshots on its own, but uses existing ones and sends the latest ones to a backup host, using zfs hold on both ends afterwards to ensure nobody makes the next incremental fail
[14:03:55] <bdha> lotheac: zfs send -R is pretty easy to write shell around. If that solves your problem, it's easy.
[14:04:07] <lotheac> we actually use zfs-auto-snapshot for snapshotting still, dug it out from some old archive and packaged into our repo
[14:04:24] <lotheac> yeah, we've had a few scripts around that before
[14:04:51] <lotheac> but there have been problems such as what happens when you remove a snapshot from source or target, but zfs hold should work around that nicely
[14:05:22] <lotheac> also then you could just be dumb and try to prune every snapshot from the backup host if you wanted to save disk space (the held snap won't be destroyed)
[14:05:32] *** nick___ has joined #omnios
[14:05:43] *** nick___ is now known as nhubbard
[14:05:44] <lotheac> i find it a bit weird no backup solution seems to be using hold
[14:05:59] <lotheac> I'm experimenting it for offsites from home atm
[14:06:57] <lotheac> there's also the issue that the user receiving zetaback snapshots needs to have destroy permissions on the pool because it does pruning
[14:07:10] <lotheac> I'm a bit uncomfortable with that
[14:07:33] <lotheac> or, not pool, but target dataset (if receiving into datasets of course)
[14:14:01] *** desai has joined #omnios
[14:16:08] *** sebasp_ is now known as sebasp
[14:24:47] *** ira has joined #omnios
[14:28:57] *** schleicher has joined #omnios
[14:37:14] *** desai has quit IRC
[15:00:39] *** nikolam has quit IRC
[15:02:42] *** gmason has joined #omnios
[15:02:45] *** schleicher has quit IRC
[15:12:51] *** gmason has quit IRC
[15:12:56] *** nikolam has joined #omnios
[15:15:24] *** gmason has joined #omnios
[15:46:44] *** gmason has quit IRC
[15:47:15] *** xeyed4good has joined #omnios
[15:47:42] *** xeyed4good has left #omnios
[16:08:15] *** slx86 has joined #omnios
[16:08:38] *** mayuresh has joined #omnios
[16:08:57] *** slx86 has quit IRC
[16:09:31] *** mayuresh has left #omnios
[16:12:22] *** slx86 has joined #omnios
[16:15:26] *** neophenix has joined #omnios
[16:24:25] *** slx86 has quit IRC
[16:24:58] *** wuff has quit IRC
[16:36:02] *** joltman has joined #omnios
[16:54:31] *** desai has joined #omnios
[16:57:06] *** rjwill has joined #omnios
[16:57:14] *** rjwill has left #omnios
[17:09:52] *** jpeach has joined #omnios
[17:39:43] *** desai has quit IRC
[17:40:56] *** desai has joined #omnios
[17:46:48] *** xeyed4good1 has joined #omnios
[17:49:45] *** wuff has joined #omnios
[17:50:52] *** nikolam has quit IRC
[17:51:07] *** xeyed4good1 has left #omnios
[18:04:48] *** ghost75 has joined #omnios
[18:05:11] <ghost75> is omnios supporting WOL ?
[18:07:43] <nahamu> "WOL"?
[18:07:52] <nahamu> Wake on LAN?
[18:08:58] <ghost75> yes
[18:09:02] <ghost75> didnt work on openindiana
[18:11:24] *** KermitTheFragger has quit IRC
[18:20:47] *** gmason has joined #omnios
[18:27:25] *** xeyed4good has joined #omnios
[18:31:25] *** xeyed4good has left #omnios
[18:37:37] *** wez has joined #omnios
[18:39:52] <Gibheer> isn't that something indipendent of the OS, managed by the mainboard directly?
[18:50:47] <nahamu> Right, I was under the impression that you need hardware support for Wake on Lan.
[18:50:52] <nahamu> If the OS isn't running, it's not there to listen for the magic packet.
[18:51:12] <nahamu> If it supposedly "works under Linux" then put linux on the same machine and see if it works.
[18:51:22] <nahamu> But I suspect the problem is your hardware, not the OS.
[18:51:34] <nahamu> Unless you're trying to find a way to *send* a WOL packet to some other system
[18:51:41] <nahamu> in which case you might be missing a piece of software.
[19:12:15] *** khushildep has quit IRC
[19:15:04] <ghost75> nope its not os independant
[19:25:12] *** desai has quit IRC
[19:33:05] <nahamu> ghost75: meaning on that same hardware it "works under Linux"?
[19:36:39] *** klobucar has quit IRC
[19:38:17] *** szaydel has quit IRC
[19:38:41] *** szaydel has joined #omnios
[19:38:41] *** klobucar has joined #omnios
[19:46:22] *** xeyed4good has joined #omnios
[19:55:59] *** xeyed4good has left #omnios
[19:56:20] <richlowe> I thought WOL was either basically OS agnostic, or not going to work
[19:58:24] <lotheac> or both?
[20:08:21] <storkone> When the OS shuts down it must instruct the NIC driver to put the NIC in a WOL mode. (There're several variants) So both OS and driver need to support it, apart from the hardware.
[20:10:11] *** xeyed4good has joined #omnios
[20:10:18] *** xeyed4good has left #omnios
[20:11:54] *** wez has quit IRC
[20:12:13] *** slx86 has joined #omnios
[20:15:56] *** slx86 has quit IRC
[20:16:49] <nahamu> I thought you tell the BIOS to do that.
[20:17:34] <nahamu> Wake-on-LAN support is implemented on the motherboard of a computer and the network interface (firmware), and is consequently not dependent on the operating system running on the hardware. Some operating systems can control Wake-on-LAN behaviour via NIC drivers. If the network interface is a plug-in card rather than being integrated into the motherboard, the card may need to be connected to the motherboard by an additional cable. Motherboards with an embedde
[20:17:41] <nahamu> http://en.wikipedia.org/wiki/Wake-on-LAN#Hardware_requirements
[20:18:07] <nahamu> ghost75: is this an onboard NIC or a PCIe card?
[20:18:38] <storkone> You're entirely correct, are missing what happens after you've booted an OS.
[20:19:18] <nahamu> storkone: I was just pasting from wikipedia (should have put in quotation marks...)
[20:19:51] <nahamu> at any rate, I've certainly learned something new, which is great for me, but doesn't help ghost75 ...
[20:20:09] <storkone> The OS and driver take over control of the NIC after the boot. When shutting down the bios can no longer do anything to the NIC
[20:21:09] <nahamu> http://mannlinstones.wordpress.com/2011/07/13/opensolaris-nas-with-wake-on-lan-wol-overview-opensolaris-is-master-howto/
[20:22:02] *** pygi has joined #omnios
[20:23:58] <storkone> So YES the WOL feature is OS agnostic like :richlowe said, and YES the BIOS should support it as well. But the OS must instruct the driver upon shutdown to place the NIC in WOL mode. The driver can set for example which magic causes it to wake up.
[20:24:40] *** desai has joined #omnios
[20:26:00] <nahamu> http://src.illumos.org/source/search?q=%22wake+on+lan%22&project=illumos-gate
[20:28:19] *** nhubbard has quit IRC
[20:29:20] <storkone> nahamu: That's an ugly solution in that link. Having both a Linux and Osol on the same boot disk. And letting them manipulate the order in grub. It more or less confirms that you need an OS and driver the put the NIC in WOL mode.
[20:29:36] <nahamu> I'm not suggesting it's a good idea
[20:29:52] <nahamu> just a relevant link.
[20:30:58] <nahamu> Like I said, I have now learned that you need OS support to put the NIC in WOL mode at shutdown. If someone wants to implement than in illumos, they shoud do it.
[20:31:08] <nahamu> My servers are generally always on.
[20:35:33] *** ira has quit IRC
[20:35:40] *** VerboEse has quit IRC
[20:35:45] *** lpsmith has quit IRC
[20:35:47] *** _jared_ has quit IRC
[20:35:48] *** richlowe has quit IRC
[20:35:49] <ashley_w> i never understood the purpose of WOL
[20:35:50] <storkone> It should be part off the old Brussels project but never got implemented.
[20:36:21] <storkone> I like WOL and would like to have it in illumos.
[20:37:37] *** richlowe has joined #omnios
[20:37:38] *** VerboEse has joined #omnios
[20:37:51] *** _jared_ has joined #omnios
[20:38:18] *** lpsmith has joined #omnios
[20:38:24] <storkone> ashley_w: My main use case is to turn on test machines. But I must admit that with the omni presence of IPMI the need for WOL is very little.
[20:39:53] <thebug> ipmi being able to tweak things like "boot selection on next boot" and serial over lan are pretty massive wins over WOL
[20:40:54] <storkone> VMWare has a feature to turn off servers when the entire load could be consolidated in fewer server. And turning them back on when the load increases. I don't wether they use WOL or IPMI for that.
[20:45:51] <storkone> thebug: You're completely right. But thinking about WOL I know an installation where they use both WOL and IPMI. WOL is inband, it's on the same switches as user traffic. IPMI is out of band, and access is usually very restricted. Although IPMI has roles and access control. The researchers could turn on clusters with WOL, and the admin had IPMI.
[20:52:44] *** wez has joined #omnios
[20:52:57] *** nhubbard has joined #omnios
[21:03:12] *** neophenix has quit IRC
[21:07:05] *** wez is now known as wez|away
[21:16:05] *** wez|away is now known as wez
[21:26:16] <ashley_w> richlowe: i saw on SO you say "I don't think Oracle Solaris allows you to tune what SMF considers "too quickly"." what about omnios?
[21:27:23] <danlarkin> oh good question, I saw & wondered that too
[21:28:37] *** nhubbard has quit IRC
[21:30:29] *** ira has joined #omnios
[21:34:00] *** wez is now known as wez|away
[21:35:04] *** wez|away is now known as wez
[21:46:29] <wuff> ok silly question guys.. i sent my first email out to the omnios list, and subscribed to the digest. I just noticed I have a reply under my thread but didn't get an email notification for it. Now I'm wondering how I reply to that? http://lists.omniti.com/pipermail/omnios-discuss/2013-November/001774.html
[21:52:47] *** wez is now known as wez|away
[21:53:57] <lotheac> wuff, maybe turn digest mode off? :)
[21:56:26] *** sebasp is now known as sebasp_
[22:02:42] <ashley_w> (filter to folder, no digest)++
[22:07:52] *** wez|away is now known as wez
[22:08:38] <ghost75> WOL is very good for backup servers
[22:08:46] *** xeyed4good has joined #omnios
[22:09:33] <ghost75> or at home when you dont need it 24x7
[22:13:44] *** xeyed4good has quit IRC
[22:20:33] <wuff> well, digest mode is off now, but since I didn't get an email with the reply retroactively, all I needed to do to keep it in the same thread was to send an email to the list with "Re: <same subject>" and edit accordingly
[22:21:09] <lotheac> actually to be correct you would also need In-Reply-To and the message id
[22:21:23] <lotheac> lest you break threading :)
[22:21:29] <lotheac> (on other users' MUAs)
[22:21:51] <lotheac> though apparently mutt still threads it fine
[22:23:07] <wuff> yeah, i broke a level in threading heh
[22:29:44] <ghost75> somebody has a e1000 nic?
[22:29:58] <lotheac> yes
[22:30:10] <ghost75> which flags do you see in ifconfig
[22:30:44] <lotheac> e1000g0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> e1000g0:1: flags=20002080841<UP,RUNNING,MULTICAST,ADDRCONF,IPv6>
[22:31:06] <lotheac> oh, missed one. e1000g0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4>
[22:31:27] <ghost75> mmhh ok no WOL
[22:38:05] <storkone> ghost75: There's no support for WOL in illumos. Please this email http://marc.info/?l=opensolaris-networking-discuss&m=125336654402336&w=3
[22:38:23] <ghost75> ok
[22:38:47] <storkone> If you don't know the author, please look him up...
[22:40:01] <ghost75> that link is opensolaris from 2009
[22:40:47] <ghost75> but i guess Omni is the same
[22:48:17] *** RoyK has left #omnios
[22:51:26] *** patdk-wk has quit IRC
[22:52:41] *** Thrae has quit IRC
[22:54:12] *** patdk-wk has joined #omnios
[22:54:25] *** Thrae has joined #omnios
[22:59:15] *** RoyK^ has joined #omnios
[23:11:18] *** szaydel has quit IRC
[23:20:30] *** pygi has quit IRC
[23:24:42] *** szaydel has joined #omnios
[23:36:48] <ghost75> is freebsd loading opensolaris kernel to include zfs oO
[23:38:03] <thebug> no
[23:38:58] <thebug> the opensolaris module is all the glue to provide the facilities of the freebsd kernel VMM and VFS [and a few other bits] in a way that the rest of the zfs code expects
[23:44:33] *** szaydel has quit IRC
[23:53:33] *** wez is now known as wez|away
[23:55:03] *** wez|away has quit IRC
[23:55:44] *** Clory has quit IRC
[23:57:03] *** Clory has joined #omnios
top

   November 15, 2013  
< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | >