Switch to DuckDuckGo Search
   April 24, 2020  
< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | >

NOTICE: This channel is no longer actively logged.

Toggle Join/Part | bottom
[00:01:25] <KungFuJesus> I mean the bottom of the stack is 0x000, so it's obviously a null pointer deref
[00:01:35] <KungFuJesus> but something must have corrupted the stack frame somehow
[00:01:47] *** andy_js <andy_js!~andy@> has quit IRC (Quit: andy_js)
[00:03:00] <KungFuJesus> at least I can break at main
[00:12:31] *** Kurlon_ <Kurlon_!~Kurlon@cpe-67-253-136-97.rochester.res.rr.com> has joined #illumos
[00:16:25] *** Kurlon <Kurlon!~Kurlon@bidd-pub-03.gwi.net> has quit IRC (Ping timeout: 264 seconds)
[00:16:37] <liv3010m> @rmustacc: HI Rober, hope you are doing well! I wanted to report that I downloaded the SmartOS image 20200422T203030Z that was released yesterday, which includes many of the bge commits but strangely enough it booted without network, bge was not loaded, I loaded it manually and nothing happened, checked driver aliases file and it is there, I did anyway a update_drv with the device id and I got the infamous couldn't attach driver
[00:16:38] <liv3010m> message. mdb -ke msgbuf doesn't show anything.
[00:16:58] <liv3010m> ups, it was long
[00:17:45] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has joined #illumos
[00:17:48] <liv3010m> it should have worked out of box right?
[00:17:58] <rmustacc> Um, maybe? I don't work on SmartOS actively any more.
[00:18:52] <liv3010m> I ask because I saw all the commits on the change log
[00:18:53] <liv3010m> https://us-east.manta.joyent.com/Joyent_Dev/public/SmartOS/smartos.html#20200422T203030Z
[00:20:10] <liv3010m> so if driver isn't attaching maybe the driver binary wasn't updated?
[00:20:42] <LeftWing> It will have been rebuilt with everything else. If it's in those release notes, I expect it is in the image.
[00:21:32] <jbk> git wasn't showing any conflicts with the changes
[00:21:39] <liv3010m> That was my thinking too
[00:22:05] <LeftWing> Did the prior release work for you, or are you counting on support for the new hardware?
[00:22:32] <liv3010m> the patched bge driver Rober gave me was working
[00:22:37] <liv3010m> Robert*
[00:23:19] <liv3010m> the original driver was not working before the patches
[00:24:35] <jbk> unless i'm using the wrong magic git incantation, the sources are identical between illumos-joyent and illumos-gate
[00:25:18] <KungFuJesus> hmm, it seems like something may have broken with its interaction with libkstat
[00:26:32] <jbk> hrm
[00:27:02] <jbk> i think the /etc/driver_aliases entries are missing
[00:28:13] <jbk> liv3010m: want me to build you a new image to test?
[00:28:15] <LeftWing> There were certainly new ones added as part of one of those changes
[00:28:33] <liv3010m> i searched for my device 14e4,16b1 it was there
[00:29:15] <liv3010m> only diference was 0x before deviceid, but it was there for pci and pcie
[00:29:41] <liv3010m> jbk: sure, if you are willing to, of course
[00:30:00] <jbk> let me double check.. i think this proto area was after that was merged..
[00:30:57] <liv3010m> sure, take your time
[00:31:23] <liv3010m> thanks guys for your help
[00:31:40] <jbk> hmm.. no they are there.. maybe the 0x is throwing things off..
[00:34:23] <jbk> i can build one (it'll be very _slightly_ ahead of the PI that just came out)
[00:34:28] <jbk> without the 0x bits
[00:34:35] <jbk> what format would you need?
[00:34:54] <liv3010m> maybe but when i forced it to load with update_drv -a -i pci14e4,16b1 bge it complained it couldn't attach
[00:35:14] <liv3010m> USB image
[00:36:12] <liv3010m> but shouldn't it work now that I did manually add the driver to driver_aliases file if it where the 0x stuff?
[00:38:39] <tsoome_> note the lciid needs to be properly quoted
[00:38:48] <tsoome_> pciid*
[00:39:25] <liv3010m> it was: update_drv -a -i 'pci14e4,16b1' bge
[00:40:01] <liv3010m> and in the file it "landed there" with double quotes like the other ones there
[00:40:21] <tsoome_> ok
[00:42:09] <liv3010m> https://i.ibb.co/FhXxGkD/IMG-2005.jpg
[00:42:11] <liv3010m> :)
[00:49:09] <jbk> it's goofy
[00:49:33] <jbk> i think you need: update_drv -a -i '"pci14e4,16b1"' bge
[00:49:35] <jbk> and maybe
[00:49:40] <jbk> i think you need: update_drv -a -i '"pciex14e4,16b1"' bge
[00:50:25] <jbk> you might also want to try: update_drv -d -i '"pci14e4,0x16b1"' bge
[00:50:30] <jbk> and see if it removes the 0x line
[00:50:35] <jbk> just to make sure
[00:51:15] <jbk> as well as the -d the pciex...,0x16b1 too
[01:02:31] <liv3010m> OK, let me try it
[01:07:48] <rmustacc> 14e4,0x16b1 will not work.
[01:07:54] <rmustacc> As an alias.
[01:08:06] <rmustacc> It looks like illumos-joyent added that to their /etc/driver_aliases.
[01:08:10] <rmustacc> So that's why it didn't autodetect.
[01:08:58] <rmustacc> https://github.com/joyent/illumos-joyent/blob/master/usr/src/uts/intel/os/driver_aliases#L189-L198 all have an incorrect 0x there. I'll file a bug.
[01:09:14] <jbk> i'm building a PI with those fixed as we speak :)
[01:09:37] <rmustacc> OK. Sounds good. I guess you don't need me to file a bug then?
[01:09:44] <jbk> doesn't matter
[01:09:59] <jbk> i'm happy to run with it
[01:10:25] *** jamtorus <jamtorus!~quassel@s91904423.blix.com> has joined #illumos
[01:10:49] *** jellydonut <jellydonut!~quassel@s91904423.blix.com> has quit IRC (Read error: Connection reset by peer)
[01:10:54] *** psarria <psarria!~psarria@37.red-79-146-98.dynamicip.rima-tde.net> has quit IRC (Ping timeout: 240 seconds)
[01:12:33] <Smithx10> rmustacc: I noticed that diskinfo -cH doesn't show the slot numbers. Is there another way to blink a drive from the OS for nvme?
[01:13:08] <rmustacc> Yes, ish.
[01:13:20] <rmustacc> svcadm enable hotplug and then there are some bits with the hotplug command.
[01:13:35] <rmustacc> I don't recall the specifics, but it kind of worked on some of the sytems I was playing around with back in the day.
[01:13:47] <rmustacc> On the other hand, for slot numbers, there was a bunch of topo work to enable creating maps for NVMe based systems.
[01:13:54] <rmustacc> So depending on your system, it should be able to create slot numbers.
[01:14:24] <Smithx10> alright, I was able to get the Serial from iostat, and will do it in oob
[01:14:42] <Smithx10> thanks for the info
[01:15:47] <rmustacc> No problem.
[01:16:08] <rmustacc> It might not be too hard to rig up topo to do that for your system.
[01:16:53] <rmustacc> If you're intrested, I might be able to point you to some of the stuff on that.
[01:17:11] <rmustacc> And some of what rigging up that all looks like in topo.
[01:20:55] <rmustacc> liv3010m: Let me know if jbk's update doesn't work and then I can figure out how to investigate more.
[01:21:04] <jbk> shouldn't be too much longer...
[01:21:15] <jbk> its on illumos-extra now
[01:21:18] <jbk> (perl)
[01:21:26] <rmustacc> No rush on my account.
[01:21:26] <liv3010m> guys sorry for taking this much
[01:21:51] <liv3010m> it's working either way, 0x is the culprit of not working out of box
[01:22:29] <liv3010m> it wasn't working initially when I did the update_drv stuff because of a typo of mine
[01:22:42] <rmustacc> Ah, gotcha. OK. So it's now working. Well, that's good.
[01:22:56] <liv3010m> it works either single quotes, double quotes, pciex, pci
[01:22:58] <liv3010m> :)
[01:23:06] <liv3010m> the problem is the 0x on the aliases file
[01:23:11] <liv3010m> yup
[01:23:27] <jbk> ok.. the fixed image should be finished building here soon
[01:23:38] <liv3010m> nice, thanks!
[01:24:17] <liv3010m> so this is strange right? I'm the only one with an bge adapter that doesn't like the 0x before the deviceid
[01:25:04] <rmustacc> It's not strange.
[01:25:18] <rmustacc> You have an on-board device so usually they don't always actually hook up PCI express.
[01:25:27] <rmustacc> So it would have attached to the pci variant of the ID.
[01:25:33] <rmustacc> But none matched, so nothing attatched.
[01:26:52] <liv3010m> ah, I see
[01:27:54] <liv3010m> it is doable to have it native on the driver_aliases on illumos gate?
[01:28:32] <LeftWing> SmartOS has a somewhat out-of-the-ordinary approach to driver management
[01:28:41] <rmustacc> But it looks like this came from me screwing up the manifest file and IPS not noticing.
[01:28:44] <rmustacc> So I should fix that!
[01:28:59] <rmustacc> This is the challenge of not having the hardware myself.
[01:29:16] <rmustacc> So I didn't notice it.
[01:29:17] <rmustacc> Sorry!
[01:29:28] <liv3010m> No worries!
[01:29:48] <rmustacc> Jerry probably copied what I screwed up verbatim.
[01:29:52] <liv3010m> so it was normally without the 0x then?
[01:30:06] <rmustacc> Correct.
[01:30:09] <rmustacc> The 0x isn't present.
[01:30:15] <liv3010m> excellent :)
[01:30:19] <rmustacc> If you look at prtconf -v, you'll see that most drivers have a 'compatible' array.
[01:30:24] <rmustacc> It needs to match that.
[01:30:46] <rmustacc> Or rather, a driver will attach to a device if one of the compatible entries match.
[01:31:40] <liv3010m> I see
[01:33:27] <Smithx10> Anyone know why a resilver might go back to 0% ? https://gist.github.com/Smithx10/87a82aecd8baee9802b6d446e4c131ac
[01:35:26] <jbk> what version are you running?
[01:36:11] <jbk> i see a reference to 'Fix estimated scrub completion time' that's in 20190523
[01:36:40] <Smithx10> SunOS ac-1f-6b-a5-b5-24 5.11 joyent_20200228T001732Z i86pc i386 i86pc
[01:36:49] <jbk> so not that :)
[01:36:55] <Smithx10> ahhhhh
[01:36:57] <Smithx10> zfs version
[01:36:58] <Smithx10> lolololl
[01:36:59] <Smithx10> sorry
[01:37:03] <jbk> no, PI version
[01:37:12] <LeftWing> It looks like you're having a lot of errors beyond the disk you're replacing there?
[01:37:21] <jbk> the release tags in git just include year + month + day, no timestamp
[01:37:25] <LeftWing> Might be a cable, expander, or HBA problem?
[01:37:33] <Smithx10> they are all NVME
[01:37:50] <Smithx10> I guess where they are plugged into
[01:37:51] <LeftWing> That doesn't mean there aren't expanders, though; they're just going to be PCIe instead
[01:37:53] <Smithx10> whatever that thingie is
[01:38:09] <LeftWing> They might still have firmware and components and such
[01:38:40] <LeftWing> You can take a look at "mdb -ke ::zfs_dbgmsg"
[01:38:53] <LeftWing> It can tell you things about what ZFS recently decided to do with respect to resilvering
[01:39:28] <jbk> liv3010m: https://us-east.manta.joyent.com/jbk/public/tmp/platform-20200423T223407Z.usb.gz
[01:39:31] <LeftWing> I also have https://gist.github.com/jclulow/19b8036617de0c74c732bb5e597901d4
[01:39:44] <LeftWing> To watch in real time
[01:40:12] <LeftWing> You might also look at the output of "fmdump -e" and "fmadm faulty" and such
[01:40:19] <LeftWing> and just "fmdump"
[01:40:35] <liv3010m> jbk: thanks! I'll start to download it
[01:40:38] <jbk> ok
[01:40:43] <jbk> let me know the results
[01:41:02] <liv3010m> I'll, thank you
[01:41:09] <liv3010m> and thanks to you all
[01:41:12] <jbk> np
[01:41:15] <liv3010m> :)
[01:41:16] <jbk> sorry for the mixup
[01:41:26] <liv3010m> no problem!
[01:43:05] *** psarria <psarria!~psarria@63.red-79-152-159.dynamicip.rima-tde.net> has joined #illumos
[01:43:06] <liv3010m> It will take a while to download because it's only downloading at 100-125Bk/s, sometimes it happens when downloading from joyent
[01:43:15] <Smithx10> LeftWing: looks like its scanning canned dataset 250541 (zones/d2ac2759-e179-c48e-c4b0-984cf19985ba/data@zrepl_20200320_145631_000) with min=3666556 max=3666687; suspending=0
[01:43:23] <Smithx10> these machines alot
[01:43:30] <jbk> and if you ever need to use the rescue option on that image, the root password is 'root' (whatever you set during install should still work -- that's _only_ when using the rescue boot option where it doesn't mount anything)
[01:44:26] <liv3010m> perfect, thanks
[01:50:00] <Smithx10> ouch looks like april 11th fmadm was scremaing at us
[01:50:00] <Smithx10> https://gist.github.com/Smithx10/fecb08f83f0b9ca4123a56362c475057
[01:50:48] <liv3010m> one thing funny I noticed, and don't know if it is only with 20200422 image or not (because I was not using smartos in this system before) was that loader wasn't picking up the 4_ configure other options entry, when I chosed it it went diretly to boot. But go to boot loader prompt worked
[01:51:28] <liv3010m> I don't know if it's related to the system or not, but I was so excited to try the image that I didn't care
[01:52:07] *** arnoldoree <arnoldoree!~arnoldore@2001:d08:1a81:3a1f:2396:9257:1ced:1e22> has joined #illumos
[01:54:36] <jbk> rmustacc: ixgbe tx tries to get all the headers into their own mblk_t, but if it fails, it tries to undo all of that and return the original mblk_t unchanged (I'm wondering if somewhere along there, it's leaking mblk_ts) -- is that actually necessary, or is it jsut enough that what's returned (when reading via b_rptr) you see the same bytes?
[01:58:27] <danmcd> mc_tx(9E) says the driver's in charge of the memory, so it seems we could return a *modified* or *different* mblk chain if it has the same contents.
[02:04:38] *** cantstanya <cantstanya!~chatting@gateway/tor-sasl/cantstanya> has quit IRC (Remote host closed the connection)
[02:05:31] <jbk> liv3010m: i need to step away for a bit, but i'll be back and will check in to make sure it works
[02:06:21] *** cantstanya <cantstanya!~chatting@gateway/tor-sasl/cantstanya> has joined #illumos
[02:14:14] *** kovert <kovert!~kovert@> has quit IRC (Ping timeout: 256 seconds)
[02:57:19] *** kovert <kovert!~kovert@> has joined #illumos
[03:01:15] *** mscheker <mscheker!uid437521@gateway/web/irccloud.com/x-gqglzvoqxbzsdrbg> has quit IRC (Quit: Connection closed for inactivity)
[03:10:58] <liv3010m> jbk: confirmed, networking works OOTB :)
[03:14:54] <jbk> cool
[03:15:03] <jbk> glad it was an easy fix :)
[03:16:21] <liv3010m> yes, many thanks! :)
[03:19:24] <rmustacc> And here's the fix for illumos: https://code.illumos.org/c/illumos-gate/+/577
[03:20:29] <liv3010m> nice, thanks Robert
[03:22:23] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has quit IRC (Quit: jcea)
[03:23:34] *** varna_ <varna_!~varna@> has joined #illumos
[03:26:34] *** varna <varna!~varna@> has quit IRC (Ping timeout: 240 seconds)
[03:52:54] *** wacki <wacki!~wacki@i577B8ABE.versanet.de> has quit IRC (Ping timeout: 256 seconds)
[03:53:27] <jbk> i put up a PR for smartos -- i'll see in the morning if jerry wants to grab that or just wait and fix it when that gets merged in
[04:01:15] *** ypankov <ypankov!~ypankov@> has joined #illumos
[04:40:50] <jbk> tsoome_: what's the first forth file that gets run w/ loader?
[04:42:23] <LeftWing> Is it not /boot/loader.rc ?
[04:42:51] <jbk> it ususally scrolls by too quick for me to see..
[04:43:00] <jbk> i was trying loader.4th
[04:43:04] <LeftWing> loader(5) mentions the sequence I believe
[04:43:27] <LeftWing> "During initialization, ..."
[04:43:42] <Smithx10> LeftWing: I think I'm in a infinite resilvering :(
[04:43:43] <jbk> (illumos#12448 seems to have broken the smartos customizations, so probably a bit crazy, but i managed to get a chroot setup w/ the files + ficl-sys working to test/debug
[04:44:37] *** dopplerg- <dopplerg-!~dop@> has quit IRC (Ping timeout: 256 seconds)
[04:46:43] <richlowe> speaking of, are we in a place we can delete grub yet?
[04:46:59] <LeftWing> Good question!
[04:47:44] *** dopplergange <dopplergange!~dop@> has joined #illumos
[05:27:13] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Remote host closed the connection)
[05:41:25] <papertigers> anyone here using in kernel smb and have it configured with guestok=true?
[05:41:33] <papertigers> I can't seem to get it to work
[05:56:17] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[06:54:07] <jbk> liv3010m: still around?
[06:54:07] *** KungFuJesus <KungFuJesus!~adam@> has quit IRC (Remote host closed the connection)
[06:54:15] *** KungFuJesus <KungFuJesus!~adam@> has joined #illumos
[06:54:31] <KungFuJesus> which library is userspace's assert located in?
[06:54:40] <jbk> libc
[06:54:42] <KungFuJesus> linker seems to be having trouble finding that symbol
[06:55:02] <jbk> it's actually a macro iirc
[06:55:03] <KungFuJesus> https://pastebin.com/qKvH2fMw
[06:55:08] <KungFuJesus> ah, so ASSERT?
[06:55:15] <KungFuJesus> probably defined to _assert()?
[06:55:29] <jbk> i thought assfail(), but i'll need to check
[06:55:48] <jbk> __assert
[06:55:48] <KungFuJesus> I think I _may_ have figured out why gkrellm broken, still looking into it, though
[06:55:52] <jbk> or __assert_c99
[06:55:58] <KungFuJesus> I don't know why I care about this program so much, lol
[06:56:25] <jbk> so some file is probably missing an #include
[06:56:41] <KungFuJesus> I have #include <assert.h>
[06:56:54] <KungFuJesus> man page says it's all I need, but it's clearly not finding it in libc
[06:57:16] <jbk> in sysdeps-unix.c ?
[06:57:39] <KungFuJesus> eh, that's a gkrellm source file
[06:57:46] <KungFuJesus> but yes
[06:58:01] <KungFuJesus> __assert did it, not sure why assert doesn't want to work, though
[06:58:48] <jbk> maybe something's causing the #include <assert.h> to be skipped, or something's redefining assert?
[06:59:08] <KungFuJesus> using gcc, maybe that's it?
[06:59:14] <jbk> or #undef assert
[06:59:28] <jbk> gcc from where?
[07:00:06] <KungFuJesus> in any case, I think that the size of kstat_io_t has changed at some point or another, and this thing is using stale headers or something. I'm pretty sure kstat_read is causing the stack corruption here, since it takes a void *buf, and that buf happens to be a stack pointer here
[07:00:14] <KungFuJesus> gcc from IPS
[07:00:20] <KungFuJesus> in OI
[07:01:34] <jbk> should be fine
[07:16:04] <KungFuJesus> alright so they are pulling in the headers directly from sys/kstat.h, but this is definitely the issue
[07:18:17] <KungFuJesus> for these "kc_chains", is it possible for something in this linked list to not be a kstat_io_t?
[07:19:39] <KungFuJesus> there's definitely a size mismatch going on
[07:20:01] <jbk> what stat?
[07:20:36] <KungFuJesus> they're referring to some extern for a kstat_ctl_t
[07:20:55] <KungFuJesus> I think before they were relying on the drive name only existing in kstat_io_t for the "name" stat
[07:21:04] <KungFuJesus> err, name field
[07:21:43] <jbk> there is a ks_type field that has the type
[07:21:56] <KungFuJesus> I suspect there's a missing comparison of ks_type with KSTAT_TYPE_IO
[07:24:04] <KungFuJesus> I do wonder why this is just breaking now
[07:25:07] <KungFuJesus> also I feel like assert() should have worked but didn't. And __assert() compiled but blew the assertion every time, even when true
[07:25:12] <KungFuJesus> so that's...odd
[07:26:39] <KungFuJesus> hmm, worked in a simple test
[07:31:00] <KungFuJesus> so this seems to have fixed the issue
[07:31:12] <KungFuJesus> where do I send a patch for oi-userland?
[07:33:11] <KungFuJesus> ah gotta get to sleep, will ask again in here tomorrow
[07:33:13] *** KungFuJesus <KungFuJesus!~adam@> has quit IRC (Quit: leaving)
[07:40:36] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Quit: Leaving)
[07:41:13] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[07:41:59] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Remote host closed the connection)
[07:42:33] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[07:43:16] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Remote host closed the connection)
[07:51:11] <tsoome_> jbk, yes loader.rc is the one.
[07:52:07] <tsoome_> richlowe I guess we should be able to remove grub, I haven't been pushy about it as it only adds few extra minute or so of the build time..
[07:53:36] <jbk> i think i got it fixed
[07:53:51] <tsoome_> :)
[07:54:17] <jbk> smartos has it's own menu.rc file (joyent.menu.rc)... except some makefile bits a few directories over install joyent.menu.rc as menu.rc in the proto area
[07:54:25] <gitomat> [illumos-gate] 12590 bge: build error on sparc -- Toomas Soome <tsoome at me dot com>
[07:54:30] <jbk> so it took me a bit to figure out where things were coming from
[07:54:33] <jbk> and where to look
[07:54:36] <jbk> but then i could compare the two
[07:54:43] <tsoome_> makes sense.
[07:54:44] <jbk> and see the tweaks we needed in ours
[07:54:53] <jbk> to makes the changes andyf did
[07:54:56] <jbk> err make
[07:59:36] <ypankov> is lua support coming soon? it really looks saner than forth
[07:59:38] *** BOKALDO <BOKALDO!~BOKALDO@> has joined #illumos
[08:19:51] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[08:22:36] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Client Quit)
[08:22:40] <tsoome_> looks can deceive:)
[08:23:11] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[08:23:19] <tsoome_> it can be done ofc. its just not number 1 on my priority list.
[08:24:19] *** pwinder <pwinder!~pwinder@> has joined #illumos
[08:24:43] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Client Quit)
[08:25:18] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[08:25:35] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Remote host closed the connection)
[08:26:03] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[08:55:44] <LeftWing> If somebody wants to remove grub I am on board at this point
[08:55:59] <LeftWing> Feels like loader has had plenty of soak time
[08:56:11] <LeftWing> And it isn't like grub works with EFI etc
[08:57:35] *** phyre <phyre!~phyre___@> has joined #illumos
[09:04:05] *** varna_ <varna_!~varna@> has quit IRC (Ping timeout: 265 seconds)
[09:04:24] *** varna <varna!~varna@> has joined #illumos
[09:05:49] *** awordnot <awordnot!~awordnot@c-73-210-60-203.hsd1.il.comcast.net> has quit IRC (Quit: Ping timeout (120 seconds))
[09:06:13] *** awordnot <awordnot!~awordnot@c-73-210-60-203.hsd1.il.comcast.net> has joined #illumos
[09:11:34] <gitomat> [illumos-gate] 12582 CUBIC module should react immediately to CC_RTO congestion signal -- Cody Peter Mello <cody.mello at joyent dot com>
[09:11:35] <gitomat> [illumos-gate] 12583 Import FreeBSD congestion control updates -- Paul Winder <pwinder at racktopsystems dot com>
[09:11:36] <gitomat> [illumos-gate] 12581 sockets using cubic congestion control can block -- Paul Winder <paul at winders dot demon.co.uk>
[09:13:34] *** Obscurax <Obscurax!~Obscurax@unaffiliated/obscurax> has quit IRC (Quit: So long, and thanks for all the fish.)
[09:13:43] <gitomat> [illumos-gate] 12584 spitfire: '0' flag used with '%b' cmn_err format -- Toomas Soome <tsoome at me dot com>
[09:15:56] *** Obscurax <Obscurax!~Obscurax@unaffiliated/obscurax> has joined #illumos
[09:30:54] *** neirac <neirac!~neirac@pc-4-149-45-190.cm.vtr.net> has quit IRC (Quit: ZNC 1.7.5 - https://znc.in)
[09:31:10] *** neirac <neirac!~neirac@pc-4-149-45-190.cm.vtr.net> has joined #illumos
[09:35:33] <sjorge> tsoome_ oh... I thought we got rid of grub2 already? It's still compile/in tree?!
[09:37:23] <LeftWing> We never had grub2, just grub 0.97
[09:37:27] <LeftWing> From a million years ago
[09:37:44] <LeftWing> And yes, we still have it in the tree, but it's probably time for it to go
[09:38:18] <LeftWing> But it is also definitely time for me to go to bed
[09:53:55] *** ptribble <ptribble!~ptribble@cpc92716-cmbg20-2-0-cust138.5-4.cable.virginm.net> has joined #illumos
[10:01:01] *** andy_js <andy_js!~andy@> has joined #illumos
[10:14:23] *** jamtorus is now known as jellydonut
[10:37:40] <sjorge> Gnite @LeftWing
[10:37:48] <sjorge> Oh wow ancient grub :O
[10:38:42] *** man_u <man_u!~manu@89-92-19-81.hfc.dyn.abo.bbox.fr> has joined #illumos
[10:39:31] *** khng300 <khng300!~khng300@unaffiliated/khng300> has quit IRC (Quit: ZNC 1.7.5 - https://znc.in)
[10:43:03] *** DanDan <DanDan!~DanDan@89-160-68-254.cust.bredband2.com> has quit IRC (Ping timeout: 272 seconds)
[10:43:36] *** DanDan <DanDan!~DanDan@89-160-68-254.cust.bredband2.com> has joined #illumos
[10:43:49] *** khng300 <khng300!~khng300@unaffiliated/khng300> has joined #illumos
[10:48:08] *** jimklimov <jimklimov!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has quit IRC (Quit: Leaving.)
[11:15:39] *** jimklimov <jimklimov!~jimklimov@> has joined #illumos
[11:58:50] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has joined #illumos
[12:02:11] <andyf> We still have some OmniOS users who are using grub..
[12:02:24] <andyf> and I still use it to PXE boot my servers as it is massively faster than loader, even now
[12:17:01] <tsoome> probably need to check why
[12:17:56] <andyf> yes, it's just really hard to debug things on customer hardware without access, especially loader.
[12:18:08] <tsoome> btw, have you checked if uefi netboot is faster?
[12:18:21] <andyf> not recently
[12:18:37] <andyf> it does not mean we can't pull grub from illumos-gate still of course
[12:22:23] <liv3010m> jbk: sorry fell asleep but now I'm awake
[12:22:53] <tsoome> I have few ideas about possible performance boost but I havent had any time for it…
[12:47:06] *** ldepandis <ldepandis!~ldepandis@unaffiliated/ldepandis> has joined #illumos
[12:59:44] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Remote host closed the connection)
[13:01:28] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[13:05:10] *** aszeszo <aszeszo!~aszeszo@unaffiliated/aszeszo> has joined #illumos
[13:06:25] * Alasdair waves at aszeszo
[13:07:03] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has quit IRC (Remote host closed the connection)
[13:07:06] * aszeszo waves back
[13:08:17] <Alasdair> brb
[13:09:18] *** mnrmnaugh <mnrmnaugh!~mnrmnaugh@unaffiliated/mnrmnaugh> has joined #illumos
[13:09:30] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has quit IRC (Remote host closed the connection)
[13:09:55] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has joined #illumos
[13:10:28] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has quit IRC (Client Quit)
[13:10:33] <tsoome> bloody ipfilter. IMPACT: svc:/network/ipfilter:default is unavailable.
[13:10:51] <tsoome> and I did disable damn thing before reboot…
[13:12:54] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has joined #illumos
[13:14:02] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has quit IRC (Client Quit)
[13:14:32] *** Alasdair <Alasdair!~alasdair@al.cloud.ec> has joined #illumos
[13:17:45] *** tsoome <tsoome!~tsoome@> has quit IRC (Quit: tsoome)
[13:31:33] *** am11 <am11!54f985dd@dsl-hkibng22-54f985-221.dhcp.inet.fi> has quit IRC (Remote host closed the connection)
[13:34:03] *** tsoome <tsoome!~tsoome@> has joined #illumos
[13:34:47] *** am11 <am11!54f985dd@dsl-hkibng22-54f985-221.dhcp.inet.fi> has joined #illumos
[13:38:47] *** tsoome <tsoome!~tsoome@> has quit IRC (Client Quit)
[13:40:03] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has joined #illumos
[13:40:03] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has quit IRC (Client Quit)
[13:40:45] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has joined #illumos
[13:42:18] *** wacki <wacki!~wacki@i577B09F8.versanet.de> has joined #illumos
[13:45:13] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has quit IRC (Ping timeout: 264 seconds)
[13:46:01] *** tsoome <tsoome!~tsoome@> has joined #illumos
[14:10:30] *** EisNerd <EisNerd!~manschwet@mail.cs-software-gmbh.de> has quit IRC (Ping timeout: 256 seconds)
[14:18:09] *** EisNerd <EisNerd!~manschwet@mail.cs-software-gmbh.de> has joined #illumos
[14:29:48] *** tsoome <tsoome!~tsoome@> has quit IRC (Read error: Connection reset by peer)
[14:30:24] *** tsoome <tsoome!~tsoome@> has joined #illumos
[14:45:12] *** phyre <phyre!~phyre___@> has quit IRC (Remote host closed the connection)
[14:45:58] *** Pocky <Pocky!~Pocky@> has joined #illumos
[14:52:34] *** gh34 <gh34!~textual@cpe-184-58-181-106.wi.res.rr.com> has joined #illumos
[14:56:38] *** Pocky <Pocky!~Pocky@> has quit IRC ()
[14:57:53] *** BOKALDO <BOKALDO!~BOKALDO@> has quit IRC (Quit: Leaving)
[14:59:34] <am11> jbk: i noticed that one of the libunwind passing test, Lrs-race, was passing for you but consistently failing on my box with SIGPIPE: https://paste2.org/5eMULv8w with `SunOS df333bd6-c7f8-63b6-b1c2-e947d740e084 5.11 joyent_20200408T231825Z i86pc i386 i86pc illumos`. was it fixed in a later version?
[15:05:12] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has joined #illumos
[15:30:02] *** phyre <phyre!~phyre___@> has joined #illumos
[15:37:50] *** varna <varna!~varna@> has quit IRC (Ping timeout: 256 seconds)
[15:37:54] <neirac> su
[15:38:04] <toastersonerson1> ENOPERM
[15:39:42] *** varna <varna!~varna@> has joined #illumos
[15:40:49] <Alasdair> Password:
[15:58:15] *** tsoome <tsoome!~tsoome@> has quit IRC (Read error: Connection reset by peer)
[15:59:42] *** tsoome <tsoome!~tsoome@> has joined #illumos
[16:08:04] *** reflekt <reflekt!~ghost1@c-73-245-52-228.hsd1.fl.comcast.net> has joined #illumos
[16:11:50] *** varna <varna!~varna@> has quit IRC (Ping timeout: 265 seconds)
[16:11:55] *** BOKALDO <BOKALDO!~BOKALDO@> has joined #illumos
[16:12:08] *** varna <varna!~varna@> has joined #illumos
[16:12:17] *** ypankov <ypankov!~ypankov@> has quit IRC (Remote host closed the connection)
[16:15:46] <neirac> toastersonerson1 have you tried hugo? I think there is an issue in there, it compiles ok, but then fails to serve the content, have not debug it, I think we need to work in delve
[16:16:37] <toastersonerson1> hugo was missing filesystem even work IIRC. it will build bot now watch continously
[16:16:49] <toastersonerson1> s/now/not
[16:23:59] *** yomisei <yomisei!~void@ip4d16be28.dynamic.kabel-deutschland.de> has quit IRC (Ping timeout: 250 seconds)
[16:40:58] <neirac> toastersonerson1 this hugo right ? https://gohugo.io/getting-started/quick-start/
[16:41:15] *** idodeclare <idodeclare!~textual@cpe-76-185-177-63.satx.res.rr.com> has quit IRC (Ping timeout: 258 seconds)
[16:41:19] *** ChrisBF <ChrisBF!~ubuntu-st@host86-128-39-53.range86-128.btcentralplus.com> has joined #illumos
[16:41:21] <toastersonerson1> yes
[16:41:59] <neirac> toastersonerson1 oh you are right that part is missing
[16:42:06] *** ChrisBF <ChrisBF!~ubuntu-st@host86-128-39-53.range86-128.btcentralplus.com> has left #illumos
[16:42:09] *** reflekt <reflekt!~ghost1@c-73-245-52-228.hsd1.fl.comcast.net> has quit IRC (Quit: reflekt)
[16:42:24] <toastersonerson1> the upstream library work was never completed IIRC but there is a PR
[16:48:17] *** amrfrsh <amrfrsh!~Thunderbi@> has quit IRC (Quit: amrfrsh)
[16:48:41] *** idodeclare <idodeclare!~textual@cpe-76-185-177-63.satx.res.rr.com> has joined #illumos
[16:51:30] *** varna <varna!~varna@> has quit IRC (Ping timeout: 256 seconds)
[16:51:48] *** varna <varna!~varna@> has joined #illumos
[17:00:15] *** KungFuJesus <KungFuJesus!~adam@> has joined #illumos
[17:00:26] <KungFuJesus> if I had a patch for a package oi-userland, where would I put it?
[17:00:39] <KungFuJesus> and should I try to upstream it as well?
[17:00:51] <KungFuJesus> package in*
[17:01:27] <jlevon> depends on the patch?
[17:01:48] <jlevon> you can certainly open a PR against oi-userland though
[17:02:03] <KungFuJesus> ah, I know gate doesn't really use GH PRs
[17:02:09] <KungFuJesus> unless you mean problem report
[17:02:18] <jlevon> oi-userland does
[17:02:30] <KungFuJesus> it's a patch that makes gkrellmd work instead of segfaulting
[17:03:03] <jlevon> you should certainly try upstreaming if it's a bug fix then
[17:03:22] *** pwinder <pwinder!~pwinder@> has quit IRC (Read error: No route to host)
[17:03:35] <KungFuJesus> yeah, they were corrupting the stack by just assuming a value that was kstat_read would be a kstat_io_t
[17:03:49] <KungFuJesus> I'm not sure how this ever had worked before, to be honest
[17:05:32] <KungFuJesus> unless maybe kstat_ctl_t's largest kstat happened to be kstat_io_t until recently
[17:10:23] *** liv3010m <liv3010m!~liv3010m@77-72-245-190.fibertel.com.ar> has quit IRC (Ping timeout: 258 seconds)
[17:18:24] *** yomisei <yomisei!~void@ip4d16be28.dynamic.kabel-deutschland.de> has joined #illumos
[17:19:09] *** liv3010m <liv3010m!~liv3010m@77-72-245-190.fibertel.com.ar> has joined #illumos
[17:21:22] *** tsoome <tsoome!~tsoome@> has quit IRC (Read error: Connection reset by peer)
[17:21:53] *** tsoome <tsoome!~tsoome@> has joined #illumos
[17:23:46] *** man_u <man_u!~manu@89-92-19-81.hfc.dyn.abo.bbox.fr> has quit IRC (Quit: man_u)
[17:40:56] <KungFuJesus> hmm, is it normal for an application using select to idle at around 10%-12% system cpu utilization?
[17:42:10] <jlevon> no
[17:42:31] <KungFuJesus> https://pastebin.com/89JWCGj6
[17:43:26] <KungFuJesus> is that probe sufficient to see the stacks for time spent in syscalls?
[17:44:48] <jlevon> no, there's no correlation between off/on cpu and syscalls. you want syscall entry/return
[17:45:12] <jlevon> also remember scheduling is per thread, not sure if that matters there
[17:45:35] <KungFuJesus> single threaded
[17:45:53] <KungFuJesus> dtruss tells me the vast majority of syscalls are ioctl's, with the first argument being 0x3
[17:49:06] *** am11 <am11!54f985dd@dsl-hkibng22-54f985-221.dhcp.inet.fi> has quit IRC (Remote host closed the connection)
[17:50:12] <KungFuJesus> https://pastebin.com/qNGSxWLm
[17:50:22] <KungFuJesus> reducing the update frequency doesn't seem to reduce the time
[18:01:53] *** amrfrsh <amrfrsh!~Thunderbi@> has joined #illumos
[18:05:50] <KungFuJesus> so it seems like _pollsys via select->pselect is taking most of the CPU cycles
[18:18:59] <jbk> 3 is just the fd
[18:22:48] <KungFuJesus> figured
[18:23:19] <KungFuJesus> a curious amount of time is happening in sys_net_read_data. How do I forcibly remove a network interface's driver?
[18:26:08] *** tsoome <tsoome!~tsoome@> has quit IRC (Read error: Connection reset by peer)
[18:26:40] <KungFuJesus> modunload tells me the interface is busy
[18:26:49] *** tsoome <tsoome!~tsoome@> has joined #illumos
[18:32:04] <KungFuJesus> sorry had to unplumb it
[18:32:08] <KungFuJesus> ok so this is interesting...
[18:33:14] <KungFuJesus> rmustacc: I found an oddity in the bge0 driver. For some reason when gkrellmd loops over kstat_chain_update it causes a lot of wasted time
[18:33:25] <KungFuJesus> when I unload that driver, the sys time drops to 0
[18:34:23] <KungFuJesus> something about this ioctl call is taking forever when bge0 is plumbed but not configured: ioctl(0x3, 0x4B02, 0x856B960)
[18:34:49] <KungFuJesus> so what's ioctl 0x4B02?
[18:36:48] <KungFuJesus> descriptor 3 is the kstat pseudodevice it looks like
[18:37:49] <tsoome> KSTAT_IOC_READ
[18:39:46] <KungFuJesus> yeah, so it must be something specific in a kstat that's being provided by bge0
[18:42:16] <KungFuJesus> rmustacc: I'm still using the module you sent me rather than what might be built by CI from the gate. Did you happen to send me a debug or unoptimized version, maybe?
[18:45:39] <ptribble> One thing I remember about the bge driver is that it exposes quite a lot of kstats that are transferred from the hardware each time you read them
[18:45:59] <KungFuJesus> ahhhh, that might do it
[18:46:32] <KungFuJesus> so I'm relatively screwed in regard to 10% of a core's cycles if I want to use this device
[18:47:05] <KungFuJesus> if I want to continue using gkrellm, anyway :-p
[18:47:54] <KungFuJesus> or anything else that polls kstats at any moderate frequency
[18:48:48] <KungFuJesus> loop rate here is 3 times a second by default. Reducing it to 1 doesn't really substantially improve it
[18:49:39] <toastersonerson1> neirac (IRC): there is also a inconsistency with the chtimes call
[18:50:00] <toastersonerson1> neirac (IRC): Error: Error copying static files: chtimes /root/src/quickstart/public/: value too large for defined data type
[18:52:59] <KungFuJesus> kind of wish kstat had a pub/sub sort of system...
[18:53:27] <KungFuJesus> kstat_open() should take an argument that lets you narrow scope or something
[19:03:06] <KungFuJesus> or at the very least there should be a selective kstat chain update, where you can modify the chain to only contain the kstats that you want, and the syscall only polls those
[19:03:52] <ptribble> Looking at the code, gkrellm reads all kstats of class "net" every time round the loop
[19:04:01] <ptribble> Regardless of whether they have usefull data or not
[19:04:37] <ptribble> Looking at the bge driver, it marks all the stats as class "net"
[19:05:37] <ptribble> It may be that you can fix gkrellm to exclude the expensive kstats by name
[19:05:43] <KungFuJesus> right, but even if it didn't there really wouldn't be a solution to it, as the kstat_update_chain call is what causes the expensive syscall
[19:06:48] <KungFuJesus> the other crap is just looping over the stat names and doing string comparisons. The userspace time is quite small, it's all in that global update to kstat, which tells the kernel to copy every damn statistic into a new kchain id
[19:08:17] <KungFuJesus> sadly, unless I'm dealing with just an unoptimized kernel module, the solution is probably to improve the interface to kstat
[19:08:46] <KungFuJesus> The irony is that in Linux it does a shitload of sscanf's from procfs, but it's faster because the procfs tree still limits scope
[19:08:49] <KungFuJesus> quite a bit
[19:12:39] <KungFuJesus> In that same loop, I added a condition, as an experiment, to ignore things with the name bge0. The problem didn't go away, so it's definitely the chain update
[19:12:45] <LeftWing> KungFuJesus: You don't always have to do a chain update, right?
[19:12:59] <LeftWing> You can just read new values for the one's you're interested in
[19:13:51] <LeftWing> (Not that a selective interface isn't an interesting idea, just talking about options with the code as it is today)
[19:14:56] <ptribble> But kstat_chain_update should nornmally be a no-op
[19:15:33] <LeftWing> That's true, because the chain ID shouldn't be changing
[19:18:16] <LeftWing> So perhaps that's worth investigating: if every kstat_chain_update() does the full work, is it because kstat_chain_id keeps changing -- and if so, why is that?
[19:18:26] <LeftWing> kstats are not supposed to come and go a lot
[19:18:55] <KungFuJesus> well for sure conditioning the inside of the loop to not do the kstat_data_lookup call if the kstat was named bge0 certain didn't improve things...
[19:21:56] <ptribble> you've skipped the kstat_read as well as the kstat_data_lookup?
[19:22:30] <KungFuJesus> yes, I added a strcmp in the conditional there that loops over the chain
[19:22:38] <KungFuJesus> && after the "net" class comparison
[19:22:53] <KungFuJesus> I did a print to in there to confirm that they weren't being kstat_read, they aren't
[19:25:51] <KungFuJesus> doing a poor man's time between chain update start and finish with clock()
[19:25:59] <KungFuJesus> I'm guessing that'll be enough resolution
[19:28:06] <LeftWing> I would just use DTrace to time the calls
[19:31:19] *** arnoldoree <arnoldoree!~arnoldore@2001:d08:1a81:3a1f:2396:9257:1ced:1e22> has quit IRC (Quit: Leaving)
[19:32:08] <KungFuJesus> probably should just use fbt to do it
[19:32:33] <KungFuJesus> dtruss is telling me the kstat read ioctl is most of that
[19:32:40] <KungFuJesus> maybe it's mac_miscstats?
[19:33:04] <KungFuJesus> is bge providing a ton of those, maybe?
[19:33:52] *** wacki <wacki!~wacki@i577B09F8.versanet.de> has quit IRC (Ping timeout: 265 seconds)
[19:34:31] *** wacki <wacki!~wacki@i577B09F8.versanet.de> has joined #illumos
[19:35:10] <LeftWing> I'm not sure. If you look at the code for kstat_chain_update(), you'll see that it should be almost free in the event that the chain ID is not changing over and over
[19:36:09] <LeftWing> e.g., on my system, running: pfexec mdb -ke kstat_chain_id/X
[19:36:24] <LeftWing> It's just 0x864 over and over
[19:36:36] <LeftWing> Is your chain ID value changing a lot?
[19:40:25] *** Kurlon_ <Kurlon_!~Kurlon@cpe-67-253-136-97.rochester.res.rr.com> has quit IRC (Remote host closed the connection)
[19:41:02] *** Kurlon <Kurlon!~Kurlon@bidd-pub-03.gwi.net> has joined #illumos
[19:45:39] *** jimklimov <jimklimov!~jimklimov@> has quit IRC (Quit: Leaving.)
[19:46:20] *** bahamas10 <bahamas10!~dave@cpe-72-231-182-75.nycap.res.rr.com> has joined #illumos
[19:52:10] *** am11 <am11!54f985dd@dsl-hkibng22-54f985-221.dhcp.inet.fi> has joined #illumos
[19:53:17] <KungFuJesus> right, it updates what kstats are provided. And if no devices leave or join, it should be the same
[19:53:34] <gitomat> [illumos-gate] 12534 fm: NULL pointer errors -- Toomas Soome <tsoome at me dot com>
[19:55:29] <KungFuJesus> yeah, so it seems like if I only do a kstat_read on things named e1000g1, it's still spending a ton of cpu cycles. So somehow kstat_read is impacted by bge0 regardless of which kstat is being read from
[19:57:22] <KungFuJesus> this seems like a flaw in kstat's design. Are there locks or something serializing in that kstat_read that make other unrelated kstats impact that time?
[19:58:02] <gitomat> [illumos-gate] 12533 cfgadm_plugins: NULL pointer errors -- Toomas Soome <tsoome at me dot com>
[19:58:49] <KungFuJesus> any suggestions for which kernel functions to fbt?
[20:00:33] *** alanc <alanc!~alanc@inet-hqmc01-o.oracle.com> has quit IRC (Remote host closed the connection)
[20:00:59] *** alanc <alanc!~alanc@inet-hqmc01-o.oracle.com> has joined #illumos
[20:01:39] <KungFuJesus> where's the code that services the kstat_read ioctl?
[20:03:47] <gitomat> [illumos-gate] 12532 unix: NULL pointer errors -- Toomas Soome <tsoome at me dot com>
[20:05:02] <rmustacc> KungFuJesus: Most of the bge modules that I built and shipped were debug builds of the driver. Though I don't believe that would have an impact on stats.
[20:10:23] <gitomat> [illumos-gate] 12542 dtrace: NULL pointer errors -- Toomas Soome <tsoome at me dot com>
[20:10:32] <LeftWing> KungFuJesus: Did you confirm the chain ID is not changing?
[20:11:52] <KungFuJesus> I didn't but I timed that function and it's instaneous
[20:12:30] <KungFuJesus> rmustacc: so it looks like there's some locking going on in read_kstat_data but it's per kstat id, so unless those are colliding, I don't think there's some contention there
[20:12:54] <rmustacc> I haven't really read back on the current issue you're debugging. Just was trying to answer the question about a debug build.
[20:13:14] <gitomat> [illumos-gate] 12547 pci_pci: NULL pointer errors -- Toomas Soome <tsoome at me dot com>
[20:13:47] <rmustacc> But yes, that sounds right regarding the locking. In general, different stats should have different locking.
[20:13:56] <rmustacc> Though groups of stats can be all covered by one lock and updated in one pass.
[20:16:30] <KungFuJesus> the gist is gkrellmd, when bge is loaded, is eating up 10-12% of the system cpu time. Unloading the module it goes down to zero. Syscalls seem to be spending most of the time in the kstat_read ioctl. Simply ignoring ones named bge0 does not make this time go down, so something in the driver must be impeding the reads of other unrelated kstats
[20:17:01] <rmustacc> OK.
[20:17:08] <KungFuJesus> (in addition to this, there's a bug in gkrellmd's disk kstat parsing code that causes stack corruption, but I've since fixed that locally)
[20:17:20] <rmustacc> Well, I'm not expert in that driver. I just made the mistake of trying to add support for a new chipset for some folks.
[20:17:25] <rmustacc> *no expert
[20:17:50] <KungFuJesus> hah no I appreciate it, I'm just trying to answer my own questions about it
[20:18:09] <rmustacc> so, he kstats that bge creates are all updated with the bge_statistics_update function.
[20:18:23] <KungFuJesus> trying to be helpful, not sanctimonious
[20:18:25] <rmustacc> I would probably use fbt to see how long that's being read for and from where.
[20:19:19] <KungFuJesus> sure, I'll try doing an image-update first, just to be sure it's not unoptimized binaries that are doing something really dumb (like dumping registers to stack and back)
[20:19:33] <rmustacc> I don't think so.
[20:19:49] <rmustacc> OK. I think I see what it's doing in the driver.
[20:20:14] <rmustacc> Let me try to summarize.
[20:21:25] <rmustacc> So the kernel kstat_t structure can be assosciated with a varying number of stats.
[20:21:39] <rmustacc> Each kstat_t can have an assosciated update function.
[20:22:10] <rmustacc> So when the kernel wants to update data, it'll call every ks_update function as appropriate.
[20:22:33] <rmustacc> Generally, the way I'd expect the driver to work is that it'd have a kstat_t for each group of stats that it updates at once.
[20:22:51] <rmustacc> So for i40e, for example, we have one kstat_t, with all of the named kstats.
[20:23:05] <rmustacc> And they're all updated at once when a read happens as a result.
[20:23:07] *** jimklimov <jimklimov!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has joined #illumos
[20:24:23] <rmustacc> The bge code is a little confusing to me, so I'm not sure if it's doing that in the right way or not or how many kstat_t's it's created with how many entries.
[20:24:48] <rmustacc> But I think if you look at the main update function with DTrace and do something like 'fbt::bge_statistics_update:entry{ @[stack()] = count(); }' that'll be pretty interesting.
[20:26:36] <rmustacc> KungFuJesus: Does that help at all?
[20:27:45] <am11> regarding https://github.com/dotnet/runtime/issues/35362#issuecomment-619020439, the `siginfo->si_addr` does not get persisted in user-defined sigsegv handler, is it a known issue?
[20:27:56] <KungFuJesus> hmm, not as interesting as I"d hoped: https://pastebin.com/u7VZd263
[20:28:08] <KungFuJesus> that's the only stacktrace it produced
[20:29:41] <rmustacc> Is that across the application doing something multiple times?
[20:30:01] <KungFuJesus> yes, it's monitoring on kstat interfaces like 3 times a second I think
[20:30:31] <KungFuJesus> now what might be interesting is if we enter that function while skipping over bge0
[20:30:48] <LeftWing> If you're hitting 10-15% SYS time I feel like making a flamegraph from kernel profiling will tell you where it is spending the time
[20:31:09] <KungFuJesus> good point
[20:31:27] <LeftWing> https://github.com/brendangregg/FlameGraph#dtrace -- I think the kernel stack example here still works
[20:31:45] <LeftWing> You could also constrain it to SYS time in the context of the gkrellm pid
[20:32:13] <KungFuJesus> ok so what's interesting is that function is entered even when I skip over that kstat...
[20:32:24] *** jemershaw <jemershaw!~jemershaw@c-68-83-252-28.hsd1.pa.comcast.net> has quit IRC (Quit: ZNC - http://znc.in)
[20:32:40] *** jemershaw <jemershaw!~jemershaw@c-68-83-252-28.hsd1.pa.comcast.net> has joined #illumos
[20:33:03] <KungFuJesus> LeftWing: I've constructed one already from userspace, but the userspace time is not all that interesting. The system time profile will probably be considerably more interesting
[20:33:21] <LeftWing> Neat! And yes, I hope so
[20:33:53] <KungFuJesus> so, when not running gkrellmd at all, I never hit that function. When running it and skipping over bge0 ks_name's, I still enter that function
[20:33:58] <KungFuJesus> this shouldn't happen, correct?
[20:34:54] <rmustacc> Probably? But I think looking at stack()/ustack() will be pretty interesting.
[20:38:13] <KungFuJesus> ok, aggregating stack()'s matching gkrellmd's pid
[20:39:49] *** tsoome__ <tsoome__!~tsoome@> has joined #illumos
[20:40:24] *** tsoome <tsoome!~tsoome@> has quit IRC (Read error: Connection reset by peer)
[20:40:25] *** tsoome__ is now known as tsoome
[20:41:32] <KungFuJesus> basically looks like the same stack trace, though I'm not getting ustacks + kstacks
[20:41:35] <KungFuJesus> is there a way to do that?
[20:41:50] <KungFuJesus> the example given is a duplicate of the userstacks one, I think brendan made a typo
[20:46:04] <KungFuJesus> where can I upload the flamegraph?
[20:48:30] <LeftWing> Maybe you can put it in a gist?
[20:48:43] <LeftWing> Not sure if that allows files other than markdown
[20:56:40] <KungFuJesus> https://gist.githubusercontent.com/KungFuJesus/f76c1922cd6fe42f80201e5027580c2a/raw/d944eafe0c221676c0c55a549a73371692fb470e/kernel.svg
[20:57:01] <KungFuJesus> ahh, the interaction is kinda messed up with that
[20:57:44] <KungFuJesus> but yeah, it's all in bge's kstat reads, despite then never being explicitly read from right now
[20:58:02] <KungFuJesus> no time spent in e1000g1's, at least not at the 997 hz profile resolution
[20:59:27] <KungFuJesus> in any case, that time shouldn't be as large as it is, even if it is being errantly called somewhere
[21:00:57] *** SashaRose <SashaRose!4d410d55@gateway/web/cgi-irc/kiwiirc.com/ip.> has joined #illumos
[21:03:11] *** SashaRose <SashaRose!4d410d55@gateway/web/cgi-irc/kiwiirc.com/ip.> has quit IRC (Client Quit)
[21:03:13] <KungFuJesus> the code seems to be using high resolution timestamps and reading the tsc, is this typical for kstats?
[21:06:41] *** Kruppt <Kruppt!~Kruppt@50-111-59-200.drhm.nc.frontiernet.net> has joined #illumos
[21:13:28] <KungFuJesus> hah, even weirder, when that module is loaded, but the interface is not plumbed, it cuts sys time in half. When the module is fully unloaded, it brings it down to basically 0
[21:19:14] *** neirac <neirac!~neirac@pc-4-149-45-190.cm.vtr.net> has quit IRC (Ping timeout: 265 seconds)
[21:25:40] *** igork <igork!~igork@> has quit IRC (Read error: Connection reset by peer)
[21:25:59] *** igork <igork!~igork@> has joined #illumos
[21:26:29] *** BrownBear <BrownBear!~BrownBear@> has quit IRC (Quit: Ping timeout (120 seconds))
[21:27:05] <gitomat> [illumos-gate] 12592 stmf_sbd: panic in _init on sparc -- Toomas Soome <tsoome at me dot com>
[21:27:30] *** BrownBear <BrownBear!~BrownBear@> has joined #illumos
[21:36:08] <gitomat> [illumos-gate] 12472 pam_list does not have 'group' option -- Jorge Schrauwen <sjorge at blackdot dot be>
[21:36:27] <andyf> sjorge, nice :)
[21:36:36] <danmcd> Congrats sjorge on your first push!
[21:38:16] <jlevon> sweet
[21:39:07] <tsoome> new contributor:)
[21:39:20] *** neirac <neirac!~neirac@pc-4-149-45-190.cm.vtr.net> has joined #illumos
[21:41:36] <rzezeski> congrats sjorge
[21:41:51] <sjorge> thanks!
[21:42:29] <sjorge> First upstream commit... and C
[21:42:41] <sjorge> well I have 2 one liners in zhyve, so I guess 2nd
[21:44:32] <neirac> sjorge nice!
[21:46:57] <LeftWing> Hooray!
[21:50:13] <Smithx10> What a stud
[21:56:18] *** neirac <neirac!~neirac@pc-4-149-45-190.cm.vtr.net> has quit IRC (Ping timeout: 256 seconds)
[21:57:24] *** ypankov <ypankov!~ypankov@> has joined #illumos
[21:58:20] *** BOKALDO <BOKALDO!~BOKALDO@> has quit IRC (Quit: Leaving)
[22:10:32] *** jimklimov <jimklimov!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has quit IRC (Read error: Connection reset by peer)
[22:10:37] *** jimklimov1 <jimklimov1!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has joined #illumos
[22:11:13] *** neirac <neirac!~neirac@pc-4-149-45-190.cm.vtr.net> has joined #illumos
[22:12:18] <sjorge> So the trick is get me something I reallly need and that is not to hard... I think
[22:14:21] <Smithx10> So what you're saying is, I need to trick you into thinking you really need something that I want that is not too hard :P
[22:16:45] <sjorge> I guess :p
[22:17:01] <sjorge> my list of full of way to complex thing I want though haha, I was surprised how readable pam was
[22:19:20] <am11> is kernel code also in https://github.com/illumos/illumos-gate repo? trying to find the place where it populates `siginfo->si_addr` in user-defined signal handler (SIGSEGV in particular).
[22:22:44] <sjorge> am11 yes, both 'kernel', 'modules', and basic userland live there
[22:23:10] <sjorge> I'd suggest https://src.illumos.org though
[22:23:17] <sjorge> it's great for looking up defines and code searching
[22:24:33] <jlevon> am11: nearly all of kernel is under usr/src/uts
[22:26:02] <sjorge> uts -> unix time share
[22:26:04] <rmustacc> https://illumos.org/books/dev/layout.html may be useful.
[22:26:07] <sjorge> took me forever to figure out
[22:26:13] <rmustacc> Particularly https://illumos.org/books/dev/layout.html#source-tree-tour
[22:26:25] <sjorge> *sharing
[22:27:25] <am11> thanks! i was also looking up the expansion of this acronym. ;)
[22:28:11] <sjorge> from a quick glance with grok, I think trap.c might be the file you are looking for
[22:29:57] *** clapont <clapont!~clapont@unaffiliated/clapont> has quit IRC (Ping timeout: 256 seconds)
[22:31:45] *** jimklimov <jimklimov!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has joined #illumos
[22:31:51] *** clapont <clapont!~clapont@unaffiliated/clapont> has joined #illumos
[22:32:17] *** jimklimov1 <jimklimov1!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has quit IRC (Read error: Connection reset by peer)
[22:38:59] <andyf> LeftWing - now you've got me researching the differences between v2 and v2 pkgfmt output
[22:44:28] <LeftWing> andyf: Haha, I'm sorry
[22:44:37] <LeftWing> It was just punching pmooney in the face a bit
[22:44:44] <andyf> Heh - I've almost finished the man page update
[22:44:49] <LeftWing> And it took me a while to remember the v1 thing
[22:45:07] <andyf> It bit me when I started too, before I got used to always being in a bldenv
[22:45:21] <andyf> but I do wonder why we use v1 in gate.
[22:45:23] <LeftWing> https://www.illumos.org/issues/1751 is relevant
[22:45:27] <andyf> We could allow both
[22:45:33] <andyf> or just switch to v2
[22:45:47] <LeftWing> As I recall, the differences in v2 seemed capricious
[22:45:55] <LeftWing> And would thoroughly restyle everything
[22:46:31] <andyf> v2 is probably closer to how I write manifests, before formatting them to v1 :)
[22:46:48] <andyf> it's just cosmetic - does not matter
[22:47:05] <LeftWing> Right
[22:47:09] <andyf> when I've finished this man page, I'll drop a link..
[22:47:13] <LeftWing> Tah!
[22:47:19] <LeftWing> I mean we can always revisit it I'm sure
[22:47:26] <LeftWing> It has been a long time haha
[22:47:37] <LeftWing> Are OmniOS repos all in v2?
[22:47:46] <andyf> most likely
[22:47:54] <andyf> well, not the gate bits I imagine.
[22:47:59] <LeftWing> Ah right
[22:48:13] *** gh34 <gh34!~textual@cpe-184-58-181-106.wi.res.rr.com> has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
[22:50:02] <jbk> i'm guessing no one should still be running a pkg version that doesn't support v2
[22:50:05] <jbk> at this point
[22:50:07] <jbk> ?
[22:52:17] *** ypankov <ypankov!~ypankov@> has quit IRC (Quit: leaving)
[22:52:45] *** ypankov <ypankov!~ypankov@> has joined #illumos
[22:55:26] * ypankov eyes #1751
[23:09:24] <andyf> jbk - I would be surprised if anyone didn't support it now
[23:10:28] <wilbury> sjorge: good work!
[23:15:51] *** wacki <wacki!~wacki@i577B09F8.versanet.de> has quit IRC (Quit: Lingo: www.lingoirc.com)
[23:16:10] *** ptribble <ptribble!~ptribble@cpc92716-cmbg20-2-0-cust138.5-4.cable.virginm.net> has quit IRC (Quit: Leaving)
[23:31:00] <gitomat> [illumos-gate] 12594 bge device IDs do not have a leading 0x -- Robert Mustacchi <rm at fingolfin dot org>
[23:34:55] *** tsoome <tsoome!~tsoome@> has quit IRC (Read error: Connection reset by peer)
[23:36:12] *** tsoome <tsoome!~tsoome@> has joined #illumos
[23:36:31] *** Kurlon_ <Kurlon_!~Kurlon@cpe-67-253-136-97.rochester.res.rr.com> has joined #illumos
[23:39:29] *** Kurlon <Kurlon!~Kurlon@bidd-pub-03.gwi.net> has quit IRC (Ping timeout: 250 seconds)
[23:43:59] <sjorge> LeftWing from a safe distance I hope :p

   April 24, 2020  
< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | >