Switch to DuckDuckGo Search
   March 13, 2020
< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31

Toggle Join/Part | bottom
[00:12:47] *** andy_js <andy_js!~andy@51.146.99.40> has quit IRC (Quit: andy_js)
[00:33:29] *** Knez <Knez!~Knez@h-73-78.A444.priv.bahnhof.se> has quit IRC (Ping timeout: 258 seconds)
[00:40:13] *** Knez <Knez!~Knez@h-73-78.A444.priv.bahnhof.se> has joined #illumos
[00:50:43] *** Knez <Knez!~Knez@h-73-78.A444.priv.bahnhof.se> has quit IRC (Ping timeout: 260 seconds)
[00:57:37] *** Knez <Knez!~Knez@h-73-78.A444.priv.bahnhof.se> has joined #illumos
[01:07:03] *** clapont <clapont!~clapont@unaffiliated/clapont> has quit IRC (Ping timeout: 260 seconds)
[01:33:36] *** rzezeski <rzezeski!uid151901@gateway/web/irccloud.com/x-kcmejssfzxgyaoub> has joined #illumos
[01:36:34] *** freakazoid0223 <freakazoid0223!~matt@pool-96-227-98-169.phlapa.fios.verizon.net> has joined #illumos
[03:56:52] *** hawk <hawk!~hawk@d.qw.se> has quit IRC (Ping timeout: 256 seconds)
[04:15:56] *** freakazoid0223 <freakazoid0223!~matt@pool-96-227-98-169.phlapa.fios.verizon.net> has left #illumos ("Leaving")
[04:29:03] *** wl_ <wl_!~wl_@2605:6000:1b0c:46ad::87c> has quit IRC (Quit: Leaving)
[04:34:25] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has quit IRC (Ping timeout: 240 seconds)
[04:50:58] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has joined #illumos
[05:22:27] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has quit IRC (Quit: jcea)
[05:37:21] *** Fenix_ <Fenix_!~Fenix@75.170.89.113> has quit IRC (Quit: Leaving)
[07:13:15] *** BOKALDO <BOKALDO!~BOKALDO@91.105.118.203> has joined #illumos
[07:22:02] *** kerberizer <kerberizer!~luchesar@wikipedia/Iliev> has quit IRC (Ping timeout: 240 seconds)
[07:26:11] *** kerberizer <kerberizer!~luchesar@wikipedia/Iliev> has joined #illumos
[07:41:51] <tsoome> jimklimov, We just found the latest centos iscsi target is broken. switched to oracle linux and target was ok.
[07:42:38] *** neuroserve <neuroserve!~toens@ip-178-202-216-248.hsi09.unitymediagroup.de> has quit IRC (Ping timeout: 256 seconds)
[07:43:20] <tsoome> jimklimov in that setup, we had HDS F370 as initiator (iscsi lun connected as external storage to provide quorum device).
[07:55:02] *** neuroserve <neuroserve!~toens@ip-88-152-242-221.hsi03.unitymediagroup.de> has joined #illumos
[08:02:40] *** gitomat <gitomat!~nodebot@165.225.148.18> has quit IRC (Remote host closed the connection)
[08:02:51] *** gitomat <gitomat!~nodebot@165.225.148.18> has joined #illumos
[08:15:05] *** tsoome <tsoome!~tsoome@89f7-dd9d-1c71-9c29-2f80-4a40-07d0-2001.sta.estpak.ee> has quit IRC (Quit: This computer has gone to sleep)
[08:15:47] *** jimklimov1 <jimklimov1!~jimklimov@ip-86-49-254-26.net.upcbroadband.cz> has joined #illumos
[08:36:59] *** alanc <alanc!~alanc@inet-hqmc02-o.oracle.com> has quit IRC (Remote host closed the connection)
[08:37:26] *** alanc <alanc!~alanc@inet-hqmc02-o.oracle.com> has joined #illumos
[08:52:16] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has joined #illumos
[09:49:30] *** ptribble <ptribble!~ptribble@cpc92716-cmbg20-2-0-cust138.5-4.cable.virginm.net> has joined #illumos
[10:27:03] <jimklimov> tsoome: thanks... fwiw, this one is Debian 10.3.0
[10:27:35] <jimklimov> chugs for now, zpool was nice enough to just import :)
[10:27:46] <jimklimov> and znapzend is now sending to it
[10:28:42] <jimklimov> for some reason, could not figure out a way to pass the USB3 disk into my OI VM other than as a hard-connected storage device (SAS/SATA/...), and rebooting the VM to attach/detach it sucks
[10:29:05] <jimklimov> so had to spin another just to serve this hard-attached disk over iSCSI to the main workstation VM :)
[10:40:41] *** man_u <man_u!~manu@manu2.gandi.net> has joined #illumos
[10:46:36] *** andy_js <andy_js!~andy@51.146.99.40> has joined #illumos
[10:48:38] *** clapont <clapont!~clapont@unaffiliated/clapont> has joined #illumos
[12:14:50] *** mpana <mpana!~mpana@193.226.149.49> has joined #illumos
[12:39:26] *** ldepandis <ldepandis!~ldepandis@unaffiliated/ldepandis> has joined #illumos
[13:01:33] <toasterson1> alp andyf we have a problem with IPS the rapidjson update is breaking it.
[13:01:59] <toasterson1> My guess is the linker is pulling in gcc-runtime on the build server.
[13:03:41] <toasterson1> Missing symbol __gxx_personality_v0
[13:06:31] <toasterson1> Hmm it's not gcc-runtime which package did include the gxx symbols again?
[13:07:14] <toasterson1> ld.so.1: python3.5: fatal: relocation error: file
[13:07:14] <toasterson1> > /usr/lib/python3.5/vendor-packages/rapidjson.cpython-35m.so: symbol
[13:07:15] <toasterson1> > __gxx_personality_v0: referenced symbol not found
[13:34:50] *** gh34 <gh34!~textual@cpe-184-58-181-106.wi.res.rr.com> has joined #illumos
[13:44:04] *** jenelizabeth <jenelizabeth!~jenelizab@cpc155793-brmb11-2-0-cust474.1-3.cable.virginm.net> has quit IRC (Read error: Connection reset by peer)
[13:48:06] <toasterson1> this is weird. __gxx_personality_v0 comes from system/library/g++-6-runtime but that is part of the dependencies
[13:51:04] *** idodeclare <idodeclare!~textual@cpe-76-185-177-63.satx.res.rr.com> has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
[13:59:12] *** ptribble <ptribble!~ptribble@cpc92716-cmbg20-2-0-cust138.5-4.cable.virginm.net> has quit IRC (Quit: Leaving)
[14:43:19] *** neirac_ <neirac_!~cneir@pc-184-104-160-190.cm.vtr.net> has joined #illumos
[14:46:24] *** neirac <neirac!~cneir@pc-184-104-160-190.cm.vtr.net> has quit IRC (Ping timeout: 258 seconds)
[15:01:22] *** hawk <hawk!~hawk@d.qw.se> has joined #illumos
[15:02:07] *** neirac_ is now known as neirac
[15:18:43] *** ldepandis <ldepandis!~ldepandis@unaffiliated/ldepandis> has quit IRC (Ping timeout: 260 seconds)
[15:33:13] <leoric> Can somehow another gcc runtime from /usr/gnu/lib/amd64 be loaded?
[15:33:41] <leoric> yeah
[15:34:01] <jlevon> if your run paths are all messed up
[15:34:17] <leoric> they are not, it's more difficult
[15:34:33] <leoric> if something unexpected going on, blame SFE, this always works
[15:34:36] <leoric> :)
[15:34:45] <leoric> tomww: ^
[15:34:53] <leoric> Hi, we have an issue here
[15:35:19] <leoric> we deliver some ncurses libraries in /usr/gnu/lib/amd64
[15:35:52] <leoric> So, python libraries likely have /usr/gnu/lib/amd64 runpath (or embed it into binaries while compiling)
[15:36:08] <leoric> Unluckily enough, SFE GCC runtime lives here
[15:36:15] <leoric> Now guess what?
[15:36:26] <jlevon> yuck
[15:38:07] <leoric> Now how should we fix it?....
[15:39:20] <jlevon> the runtime libs should all be under /usr/lib/<gccver>/whatever...
[15:40:20] <leoric> This means 'avoid using SFE compiler for SFE'
[15:40:53] <leoric> or put it to /somewhere/outside/of/system/paths
[15:41:04] <leoric> tomww will not be happy
[15:41:06] <jlevon> sounds fine to me hah
[15:42:05] <toasterson1> yikes. yeah figured other publishers were playing not nice
[15:42:27] <toasterson1> Mine also working btw
[15:42:38] <toasterson1> *my installtion of pkg is working
[15:46:09] <leoric> hm... The strong reason to have SFE publisher is LibreOffice...
[15:47:06] <toasterson1> should we finaly package that?
[15:47:25] <leoric> Yes, but I don't know anyone who wants to bother with packaging it properly ;)
[15:48:01] * toasterson1 wishes for a build instruction conversion tool
[15:48:53] <leoric> no, when I've looked at it last time, it was full of hacks which needed proper rethinking
[15:50:45] <toasterson1> for example?
[15:52:56] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has quit IRC (Ping timeout: 265 seconds)
[16:04:37] <leoric> Like patching with sed...
[16:06:00] <toasterson1> oh
[16:06:01] <tomww> leoric: I'm happy if I get notice about a clush. How would I reproduce this?
[16:07:00] *** awordnot <awordnot!~awordnot@c-73-210-60-203.hsd1.il.comcast.net> has quit IRC (Ping timeout: 268 seconds)
[16:07:22] *** awordnot <awordnot!~awordnot@c-73-210-60-203.hsd1.il.comcast.net> has joined #illumos
[16:07:25] <toasterson1> tomww installing rapidjson-35 and looking at ldd
[16:07:27] <toasterson1> ldd /usr/lib/python3.5/vendor-packages/rapidjson.cpython-35m.so
[16:07:45] <tomww> leoric: one can uninstall the package gcc-runtime with only the symlinks in it. *for testing*. SFE binaries should sill first look into /usr/gcc/<version>/lib for their runtime.
[16:07:54] <toasterson1> it should find libstdc++.so.6 => /usr/gnu/lib/amd64/libstdc++.so.6
[16:08:09] <tomww> and so often, LD_DEBUG is your friend to see what is really going on.
[16:08:30] <toasterson1> /usr/gcc/6/lib/amd64/libstdc++.so.6 should be loaded
[16:09:01] <leoric> update to latest OI, get something which requires SFE libstdc++.so.6
[16:09:08] <leoric> Get IPS broken ;)
[16:13:08] <tomww> well, I believe in show me LD_DEBUG=libs,files testprogram
[16:14:21] *** leoric <leoric!~alp@pyhalov.cc.rsu.ru> has quit IRC (Remote host closed the connection)
[16:14:43] <tomww> why does python look into /usr/gnu/lib at all? is this one to many directory in python's RPATH? eldfump should reveal this.
[16:15:07] <tomww> don't blame SFE if the research is not finished pla
[16:15:08] <tomww> eas
[16:15:10] <tomww> please
[16:16:34] *** leoric <leoric!~alp@pyhalov.cc.rsu.ru> has joined #illumos
[16:17:06] <toasterson1> we dliver some ncurses libraries into /usr/gnu/lib
[16:17:11] <toasterson1> *deliver
[16:18:20] *** jcea <jcea!~Thunderbi@2001:bc8:2ecd:caed:7670:6e00:7670:6e00> has joined #illumos
[16:20:07] <leoric> well, historically it looked for libncurses.so there
[16:20:29] <leoric> so, it was compiled with -R /usr/gnu/lib
[16:21:59] <leoric> Changing this now will make us recompile all python35 modules
[16:24:21] <leoric> can easily reproduce
[16:27:01] *** leoric <leoric!~alp@pyhalov.cc.rsu.ru> has quit IRC (Remote host closed the connection)
[16:33:51] *** leoric <leoric!~alp@pyhalov.cc.rsu.ru> has joined #illumos
[16:34:08] *** hemi770 <hemi770!~hemi666@unaffiliated/hemi770> has quit IRC (Ping timeout: 268 seconds)
[16:34:41] <leoric> quick fix, as expected is to rm /usr/gnu/lib/amd64/libstdc++.so.6
[16:35:23] <leoric> libreoffice works by chance after this, as it's 32-bit ;)
[16:35:36] *** hemi770 <hemi770!~hemi666@unaffiliated/hemi770> has joined #illumos
[16:35:55] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has joined #illumos
[16:51:08] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has quit IRC (Quit: Leaving)
[16:51:17] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has joined #illumos
[16:54:48] *** spicywolf <spicywolf!~spicywolf@c-24-8-18-96.hsd1.co.comcast.net> has quit IRC (Read error: Connection reset by peer)
[16:56:05] *** ptribble <ptribble!~ptribble@cpc92716-cmbg20-2-0-cust138.5-4.cable.virginm.net> has joined #illumos
[17:20:47] *** man_u <man_u!~manu@manu2.gandi.net> has quit IRC (Quit: man_u)
[18:29:24] <tomww> leoric: haha, yes LO)
[18:31:28] <tomww> the point is different. the pythin libs schon *not* think loading GCC runtime libs by RPATH. The SFE way uses the ols spec settings that come way before the runtime linker does its standard work. That way binaries compiled by SFE-gcc load the runtime libs from the private directory /usr/gcc-sfe/<version>/lib and this matches right.
[18:32:21] <tomww> I would like to see a binary load the runtime it was compiled against first, if not found go to a standard directory.
[18:33:02] *** ldepandis <ldepandis!~ldepandis@unaffiliated/ldepandis> has joined #illumos
[18:45:44] *** jimklimov <jimklimov!~jimklimov@78.80.224.132> has quit IRC (Quit: Leaving.)
[18:49:29] *** vneurosere <vneurosere!~toens@ip-178-202-216-248.hsi09.unitymediagroup.de> has joined #illumos
[18:49:36] *** neuroserve <neuroserve!~toens@ip-88-152-242-221.hsi03.unitymediagroup.de> has quit IRC (Ping timeout: 256 seconds)
[18:49:43] *** vneurosere is now known as neuroserve
[18:56:30] *** Kurlon_ <Kurlon_!~Kurlon@cpe-67-253-136-97.rochester.res.rr.com> has quit IRC (Remote host closed the connection)
[18:57:05] *** Kurlon <Kurlon!~Kurlon@bidd-pub-03.gwi.net> has joined #illumos
[19:14:00] *** ngchk1 <ngchk1!~ngchk1@b2b-92-50-91-166.unitymedia.biz> has quit IRC (Ping timeout: 265 seconds)
[19:19:39] <v_a_b> Are there any known problems with older Opteron processors in Illumos? I am fighting to get recent versions OmniOS to work on a Sun Fire X4440 with four quad core Opteron 8380s. Older versions (026) ran fine. I never used the systems much, though.
[19:20:18] <v_a_b> I have BEs for OOCE r151030 and r151032, but both of them hang after boot. Can't log in on the serial console most of the time.
[19:21:18] <v_a_b> I have gone through all the ACPI options the loader menu gives me (except ACPI off, will try that next).
[19:22:31] <rmustacc> There were some cpuid changes that we did at some point for Zen, though I believe folks helped make sure that still worked on older Opterons.
[19:22:55] <rmustacc> Do you have a service processor there?
[19:23:52] <LeftWing> If somebody would like to test out an experimental "rustup"...
[19:23:54] <LeftWing> curl https://illumos.org/downloads/rustup/init.sh | bash
[19:24:01] <LeftWing> Be sure to tell it you want the "nightly" toolchain
[19:24:29] <toasterson1> LeftWing (IRC): oh interesting. that might make rust more usefull
[19:24:51] <LeftWing> I'm inching towards illumos binaries being available through the official rustup, which will be cool
[19:25:05] <LeftWing> So that one might then be able to use all the different versions as they come out
[19:25:55] <v_a_b> Yes I have an ILOM/SP
[19:26:42] <rmustacc> So if you boot kmdb you may be able to inject an nmi that will be able to get us a better sense of where it is hung.
[19:26:48] *** ngchk1 <ngchk1!~ngchk1@b2b-92-50-91-166.unitymedia.biz> has joined #illumos
[19:27:44] <toasterson1> LeftWing (IRC): so no more gnarling bootstrap packaging stuff? We are still stuck on 1.32.0
[19:28:03] <LeftWing> OmniOS and pkgsrc are up to current I believe
[19:28:05] <LeftWing> for stable
[19:28:21] <toasterson1> LeftWing (IRC): yes it seems related to our build system
[19:28:29] <toasterson1> and it is deep linker stuff
[19:28:31] <jperkin> yeh I just published the 1.42 bootstrap, will push packages later
[19:28:38] <LeftWing> It's definitely easier to get from version to version more recently -- the bumps have been sanded down a lot by lots of folks like jbk etc
[19:29:13] *** igork <igork!~igork@91.204.56.74> has joined #illumos
[19:29:41] <v_a_b> rmustacc I did manage to break into kmdb when I had the "alternate break sequence" enabled. NMI should be easy. However I don't much know my way around kmdb.
[19:29:58] <toasterson1> LeftWing (IRC): the problem ist rust makefile really don't like oi-userland. and always find new linker or compiler problems we really have trouble fixing
[19:31:25] <toasterson1> LeftWing (IRC): I might look into it again on the Weekend then maybe stuff is better now with bootstrapping from rust 1.42 to build packages.
[19:31:41] <rmustacc> v_a_b: In kmdb the most useful thing would be to do a fun things like '::ptree', '::cpuinfo', and '::stacks'.
[19:31:50] <rmustacc> With those I can probably at least figure out a good starting point.
[19:32:00] <v_a_b> rmustacc Heh, these are the three I know :-)
[19:32:50] <rmustacc> OK. Well if we can get some data, I can try and help debug. Though it'll be a bit of a slow process for me.
[19:33:28] <v_a_b> NP I have sunk dozens of hours into those machines. I have learned patience. :-/
[19:34:13] <v_a_b> Hmmm... NMI panics the box. Do I need to enable both kmdb and "debug" (whatever that does) in the loader menu?
[19:34:55] <rmustacc> Honestly, if you can break in with the alternate brak sequence, then that's fine as a starting point, tbh.
[19:35:06] <tsoome> it does drop you to kmdb prompt before kernel
[19:35:16] <v_a_b> I need to enable that on the system, which I can't log into :-(
[19:35:48] <rmustacc> Well, the nmi bit defaults to panicking, but there is a tunable to enter kmdb.
[19:36:13] <v_a_b> Yep, the loader menu gives me that.
[19:38:39] <v_a_b> Now rebooting with verbose on, single user on, reconfigure on, acpi default (= on I guess) and kmdb on...
[19:41:32] *** fanta1 <fanta1!~fanta1@p200300F76BC21600B56AEE0F8E68D3B8.dip0.t-ipconnect.de> has joined #illumos
[19:41:44] <v_a_b> Just sitting there after the "hostname" line, the "Configuring devices" never prints. NMI panics the box. I do get this:
[19:41:47] <v_a_b> module /kernel/misc/amd64/kmdbmod: text at [0xfffffffffb9842e0, 0xfffffffffba4339f] data at 0xfffffffffbffe400
[19:41:56] <v_a_b> So kmdb *should* be loaded, right?
[19:42:23] <Agnar> try to boot with flags: -kd
[19:42:29] <Agnar> on prompt hit :c
[19:42:38] <Agnar> when it freezes, try to break
[19:42:52] <Agnar> oh right, sorry
[19:42:56] <v_a_b> Yep, that's what loader "kmdb" and "debug" are supposed to give me.
[19:42:57] <Agnar> you did that already
[19:43:43] <v_a_b> I am currently on the r151030 BE. I guess using r151032 wouldn't make any difference.
[19:45:12] <v_a_b> OK, maybe I missed "debug" the last time. Anyway. I am in kmdb now.
[19:45:39] <v_a_b> Did a :c and we're off...
[19:46:17] <Agnar> :c is continue, do not type it when you are at the freezing point
[19:46:20] <rmustacc> If you're using IPMI, you should be able to break in by sending a virtual break sequence. Which with the default escape char of '~' is '~B'
[19:47:21] <v_a_b> panic[cpu2]/thread=fffffe008baffc20: NMI received :-(
[19:47:45] <v_a_b> As I understand it I need to do :c at the beginning, so that the kernel is loaded and booting begins.
[19:47:56] <Agnar> correct
[19:48:07] <v_a_b> Then when it freezes I send an NMI
[19:48:27] <v_a_b> ...which panics the box.
[19:49:02] <rmustacc> The reason is the value of apic_kmdb_on_nmi.
[19:49:21] <rmustacc> Simpler way than worrying about setting that is to run 'apic_nmi_intr::bp'
[19:50:10] <v_a_b> rmustacc Run that at the kmdb entry point?
[19:50:42] <rmustacc> If there, it probably needs to be actually '::bp pcplusmp`apic_nmi_intr'
[19:50:53] <v_a_b> [0]> apic_nmi_intr::bp
[19:50:53] <v_a_b> kmdb: failed to dereference symbol: unknown symbol name
[19:51:08] <rmustacc> Because this is before that module has loaded.
[19:51:21] <v_a_b> Better. Now :c and try again?
[19:51:39] <rmustacc> Yeah. And we'll see how good my memory is.
[19:52:48] *** ngchk1_ <ngchk1_!~ngchk1@b2b-92-50-91-166.unitymedia.biz> has joined #illumos
[19:53:55] <v_a_b> Looks good. In kmdb, ::cpus gives a line for every CPU core, all "idle"
[19:54:33] *** ngchk1 <ngchk1!~ngchk1@b2b-92-50-91-166.unitymedia.biz> has quit IRC (Ping timeout: 268 seconds)
[19:54:51] *** neirac_ <neirac_!~cneir@pc-184-104-160-190.cm.vtr.net> has joined #illumos
[19:56:11] <v_a_b> Output of ::ptree and the beginning of ::stacks is in http://www.bb-c.de/x4440-1.log
[19:58:28] *** neirac <neirac!~cneir@pc-184-104-160-190.cm.vtr.net> has quit IRC (Ping timeout: 255 seconds)
[20:00:52] <jbk> that cpio certainly looks suspicious (could be holding things up)
[20:01:14] <Agnar> it's probably updating the boot archive
[20:02:42] <v_a_b> Why would it do that, booting in single user mode? I can wait a bit longer, no problem... retrying now.
[20:03:11] <v_a_b> Probably the cpio is unpacking the boot archive :-)
[20:04:24] <Agnar> could be
[20:04:40] <Agnar> get the process flags
[20:04:47] <v_a_b> dedum dedum... the boot archive isn't THAT big....
[20:05:08] <v_a_b> That is where my kmdb knowledge abruptly ends :-(
[20:05:43] <v_a_b> OK I am back in kmdb, and ::ptree still shows that cpio
[20:07:49] <Agnar> v_a_b: get the pid of cpio
[20:08:24] <Agnar> then run: 0t<PIDofCPIO>::pid2proc |::print -a proc_t p_user.u_psargs
[20:08:36] <Agnar> replace <PIDofCPIO> with the pid of cpio
[20:09:31] <Agnar> also 0t<PIDofCPIO>::pid2proc |::pfiles could also be interesting
[20:10:36] <v_a_b> I guess I need to read the mdb manual... how Do I get the process ID? mdb(1) says $? but what is the syntax?
[20:11:06] <Agnar> oh right
[20:11:08] <jbk> the ::ptree already shows the address of the proc_t
[20:11:10] <Agnar> easier
[20:11:14] <Agnar> just ::ps
[20:11:15] <jbk> fffffe645f1fd008
[20:11:19] <Agnar> and take the address
[20:11:25] <Agnar> instead of ::pid2proc
[20:12:26] <jbk> fffffe645f1fd008::walk thread | ::findstack might be interesting as well
[20:13:50] <v_a_b> Ah good, ::ps shows
[20:13:52] <v_a_b> R 125 104 9 9 0 0x4a004000 fffffe647b3c0040 cpio
[20:14:09] <v_a_b> (address is different because I rebooted in the meantime)
[20:14:39] <Agnar> fffffe647b3c0040::print -a proc_t p_user.u_psargs
[20:14:53] <Agnar> fffffe647b3c0040::walk thread |::findstack -v
[20:15:01] <v_a_b> fffffe647b3c08d1 p_user.u_psargs = [ "cpio -qo -H odc" ]
[20:15:48] <Agnar> so it is updating the boot archive
[20:16:13] <v_a_b> findstack is in http://www.bb-c.de/x4440-2.log
[20:16:29] <Agnar> fffffe647b3c0040::pfiles
[20:17:19] <v_a_b> http://www.bb-c.de/x4440-3.log
[20:18:29] <jbk> so it's sitting there trying to read /platform/i86pc/ucode/GenuineIntel/00000F48-01
[20:19:16] <Agnar> but the X4440 is an AMD, right?
[20:19:36] <v_a_b> Yes indeed, Opteron 8380
[20:20:03] <Agnar> so, this explains it
[20:20:08] <v_a_b> If it updates the boot archive, it probably wants to read a bunch of files and stuff it in there?
[20:21:33] <jbk> i believe so
[20:21:40] <jbk> i don't think it's trying to actually use that file
[20:21:41] <v_a_b> Just checked another AMD OmniOS box I have up, and there's lots of microcode in that tree.
[20:21:42] <jbk> just copy it
[20:21:45] <v_a_b> yep
[20:21:47] <Agnar> oh it's a regular file
[20:22:00] <Agnar> I thought it would be a pipe or dev
[20:22:23] <v_a_b> no, just one of a gazillion intel ucode files.
[20:23:07] <Agnar> yeah
[20:23:54] <v_a_b> So I can reboot without breaking into kmdb and let it run for a bit longer, and then it should at least be reading a different file...
[20:24:04] <v_a_b> Unless there's more that I can investigate now.
[20:26:14] <Agnar> probably not
[20:29:24] <igork> you can try : moddebug 0x80000001
[20:29:54] <igork> in kmdb before continue :c
[20:30:04] <igork> and take a look how modules are loading
[20:30:50] <igork> it helps in looks like what latest module was loaded (attached)
[20:32:10] *** gh34 <gh34!~textual@cpe-184-58-181-106.wi.res.rr.com> has quit IRC (Ping timeout: 268 seconds)
[20:32:21] <v_a_b> igork Thanks, I'll do that.
[20:32:54] <v_a_b> I'll have dinner and theck the box when I return in an hour or so. That should be enough for any boot archive :-)
[20:33:05] <v_a_b> Only problem is that the beast eats 400W
[20:34:03] <v_a_b> This one has 72 GB memory and quad 4core CPUs. I have another box with the same problem. That one has 128 GB and 4x six-core CPUs. It eats even more...
[20:36:27] <igork> well - one additional step : try to boot from usb/iso -> mount working BE to dir: beadm mount <BEname> /mnt -> try to re-generate boot archive: bootadm update-archive -Q -fv -T<type:cpio|hsfs> - and try to check output of this process = checks for no errors with boot archive process
[20:36:44] <igork> i'm using hsfs type
[20:37:52] <igork> but it is dilos preference to be the same on Intel and SPARC - because CPIO not working on SPARC
[20:38:18] <v_a_b> [0]> moddebug 0x80000001
[20:38:18] <v_a_b> kmdb: syntax error near "0x80000001"
[20:39:21] <v_a_b> Hmmm... maybe the problems started in the version that switched to the cpio format. Have to investigate that.
[20:39:27] <jbk> if the cpio is hanging.. figuring out why that's happening would probably be a good start
[20:39:42] <v_a_b> yep
[20:39:57] <jbk> I can't recall if this is actually early enough, but since it's getting to userland
[20:40:11] <igork> v_a_b: moddebug/W 0x80000001
[20:40:14] <jbk> maybe trying to boot '-m milestone=none'
[20:40:44] <v_a_b> jbk Noted for later.
[20:40:53] <igork> v_a_b: https://illumos.org/docs/user-guide/debug-systems/
[20:40:53] <jbk> will allow you to poke around (though it might also be a very sparse userland at that point, which may limit what's available)
[20:41:18] <jbk> i.e. you may find a lot of the usual bits missing at that point since it's so early in the boot process
[20:41:58] *** clapont <clapont!~clapont@unaffiliated/clapont> has quit IRC (Ping timeout: 265 seconds)
[20:42:13] <v_a_b> igork tx for the link
[20:42:33] <jbk> are you booting off cd/usb, or internal drives?
[20:42:34] <igork> google "illumos moddebug" = first link :)
[20:42:51] <v_a_b> jbk I guess I need to drop to the loader prompt and use my own command line
[20:43:06] <v_a_b> jbk internal drive, 1068E HBA
[20:43:19] *** tsoome <tsoome!~tsoome@148-52-235-80.sta.estpak.ee> has quit IRC (Quit: This computer has gone to sleep)
[20:43:35] *** clapont <clapont!~clapont@unaffiliated/clapont> has joined #illumos
[20:44:18] <igork> v_a_b: huh - you have 1068E HBA ? it's closed mpt driver
[20:44:26] <igork> it can hang
[20:44:47] <v_a_b> Oh I need to :c after every module :-)
[20:44:51] <igork> try to use another one for debug problems with it
[20:45:03] <jbk> if it's mirrored, a simple test might be (could also be done from booting off a cd/usb) is to dd from the block devices and see if any of them hang
[20:45:03] <igork> v_a_b: yes
[20:45:40] <v_a_b> igork yes it can hang. I have swapped the HBA out already. However, the machine worked fine. It's only after I updated to r151030 that the problems started.
[20:45:55] <v_a_b> Unfortunately most of the HBAs I have are mpt
[20:45:57] <igork> ok
[20:46:36] <v_a_b> 59 modules :-)
[20:46:39] <igork> you can try to use LSI 2008 = 9211-8i, or 3008 = 9311-8i
[20:46:49] <igork> they are not expensive now
[20:48:22] <v_a_b> Yes, I have thought about getting newer HBAs. But I don't really want to spend more money on these lab systems that I have up a few hours on the weekend.
[20:48:42] <igork> :)
[20:49:23] <v_a_b> 155 modules...
[20:49:39] <igork> and ? still fine ?
[20:50:27] <v_a_b> installing tavor, module id 162. :-)
[20:51:06] <igork> it's mean - kernel with modules still loading fine
[20:51:17] <igork> from boot archive
[20:51:44] <igork> but - it was not try to have access to hard drive :)
[20:52:06] <igork> and mpt has no traffic yet
[20:52:46] <v_a_b> No, not yet. Now it displays the hostname, and I will let it sit there while I have dinner. I'll be back in 45 minutes- Thanks guys for your help so far!
[20:53:13] <igork> v_a_b: i'm off for now, have a good luck :)
[21:12:32] *** fanta1 <fanta1!~fanta1@p200300F76BC21600B56AEE0F8E68D3B8.dip0.t-ipconnect.de> has quit IRC (Quit: fanta1)
[21:12:51] <Smithx10> Anyone using virtio / vmxnet 3 with illumos? I found this rather strange bug when doinga triton install https://github.com/joyent/sdc-imgapi/issues/32
[21:23:28] <LeftWing> Smithx10: I suspect it would be good to get a snoop trace of the stream that's crawling along
[21:24:03] *** tsoome <tsoome!~tsoome@8250-27cc-b372-68f6-2f80-4a40-07d0-2001.sta.estpak.ee> has joined #illumos
[21:24:25] <Smithx10> yeah, I'll blow away the working e1000g and get the dump
[21:24:38] <Smithx10> I imagine from both sides? imgapi and the gz?
[21:31:20] *** tsoome <tsoome!~tsoome@8250-27cc-b372-68f6-2f80-4a40-07d0-2001.sta.estpak.ee> has quit IRC (Quit: Leaving)
[21:31:29] *** tsoome <tsoome!~tsoome@8250-27cc-b372-68f6-2f80-4a40-07d0-2001.sta.estpak.ee> has joined #illumos
[21:41:06] <v_a_b> I'm back. Instead of hanging, the system now says: panic[cpu0]/thread=fffffe008b0ecc20: I/O to pool 'rpool' appears to be hung.
[21:41:06] <v_a_b> Full text is in http://www.bb-c.de/x4440-4.log
[21:41:59] <v_a_b> So igork may have guessed correctly. But why would the mpt driver work on Omnios 028 but not on 030 or 032? It's always the same closed binary.
[21:43:26] <v_a_b> The swap device is dedicated, and OmniOS happily writes the dump... via mpt.
[21:50:44] *** Kurlon_ <Kurlon_!~Kurlon@cpe-67-253-136-97.rochester.res.rr.com> has joined #illumos
[21:54:28] *** Kurlon <Kurlon!~Kurlon@bidd-pub-03.gwi.net> has quit IRC (Ping timeout: 265 seconds)
[22:08:38] *** neirac_ <neirac_!~cneir@pc-184-104-160-190.cm.vtr.net> has quit IRC (Read error: Connection reset by peer)
[22:09:08] *** idodeclare <idodeclare!~textual@2600:1700:1101:17c0:6466:8fce:4503:3036> has joined #illumos
[22:13:11] *** rzezeski <rzezeski!uid151901@gateway/web/irccloud.com/x-kcmejssfzxgyaoub> has quit IRC (Quit: Connection closed for inactivity)
[22:26:52] *** BOKALDO <BOKALDO!~BOKALDO@91.105.118.203> has quit IRC (Quit: Leaving)
[22:29:23] <v_a_b> OK I give up for tonight. If you have any ideas let me know and I'll try them first thing tomorrow morning. I just zapped one half of the mirrored rpool by netbooting via kayak...
[22:29:32] <toasterson1> v_a_b (IRC): hangs means one IO request timed out
[22:29:39] <toasterson1> not async but probably sync
[22:29:51] <toasterson1> disk bad probably had that a few times
[22:30:13] <v_a_b> I can swap in a new disk and reinstall... I broke half of the pool just now anyway.
[22:30:41] <toasterson1> no sleep for the wicked .)
[22:30:43] <v_a_b> Strange though that two X4440 will have that very same problem, with two pairs of mirrored disks...
[22:30:44] <toasterson1> :)
[22:30:45] <v_a_b> yep
[22:31:05] <v_a_b> Anyway, more tomorrow. Have a good evening everyone.
[22:31:19] <toasterson1> good night
[22:38:42] <Smithx10> LeftWing: here are links to the snoops https://gist.github.com/49e0f6c29cd8285d899d5d94cb638758 (imgapi) and https://gist.github.com/91c88161e4a64875bd797dd16914ddae (hn)
[22:40:46] <Smithx10> LeftWing: I guesss it would be beneficial to see Snoops from the working e1000g huh?
[22:50:20] *** ptribble <ptribble!~ptribble@cpc92716-cmbg20-2-0-cust138.5-4.cable.virginm.net> has quit IRC (Quit: Leaving)
[23:02:24] *** janci <janci!~janci@binaryparadise.com> has quit IRC (Read error: Connection reset by peer)
[23:02:43] *** janci <janci!~janci@binaryparadise.com> has joined #illumos
[23:05:06] *** pcd <pcd!~pcd@openzfs/developer/pcd> has quit IRC (Ping timeout: 268 seconds)
[23:06:08] *** pcd <pcd!~pcd@openzfs/developer/pcd> has joined #illumos
[23:50:24] *** idodeclare <idodeclare!~textual@2600:1700:1101:17c0:6466:8fce:4503:3036> has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
top

   March 13, 2020
< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31