Thread: hung backends stuck in spinlock heavy endless loop
On my workstation today (running vanilla 9.4.0) I was testing some new code that does aggressive parallel loading to a couple of tables. It ran ok several dozen times and froze up with no external trigger. There were at most 8 active backends that were stuck (the loader is threaded to a cap) -- each query typically resolves in a few seconds but they were hung for 30 minutes+. Had to do restart immediate as backends were not responding to cancel...but I snapped a 'perf top' before I did so. The results were interesting so I'm posting them here. So far I have not been able to reproduce...FYI 61.03% postgres [.] s_lock13.56% postgres [.] LWLockRelease10.11% postgres [.] LWLockAcquire 4.02% perf [.] 0x00000000000526d3 1.65% postgres [.] _bt_compare 1.60% libc-2.17.so [.] 0x0000000000081069 0.66% [kernel] [k]kallsyms_expand_symbol.constprop.1 0.60% [kernel] [k] format_decode 0.57% [kernel] [k] number.isra.1 0.47% [kernel] [k] memcpy 0.44% postgres [.] ReleaseAndReadBuffer0.44% postgres [.] FunctionCall2Coll 0.41% [kernel] [k] vsnprintf0.41% [kernel] [k] module_get_kallsym 0.32% postgres [.] _bt_relandgetbuf0.31% [kernel] [k] string.isra.5 0.31% [kernel] [k] strnlen 0.31% postgres [.] _bt_moveright 0.28% libc-2.17.so [.] getdelim 0.22% postgres [.] LockBuffer 0.16% [kernel] [k] seq_read 0.16% libc-2.17.so [.]__libc_calloc 0.13% postgres [.] _bt_checkpage 0.09% [kernel] [k] pointer.isra.150.09% [kernel] [k] update_iter 0.08% plugin_host [.] PyObject_GetAttr0.06% [kernel] [k] strlcpy 0.06% [kernel] [k] seq_vprintf 0.06% [kernel] [k] copy_user_enhanced_fast_string 0.06% libc-2.17.so [.] _IO_feof 0.06% postgres [.] btoidcmp 0.06% [kernel] [k] page_fault 0.06% libc-2.17.so [.] free 0.06% libc-2.17.so [.] memchr 0.06% libpthread-2.17.so [.] __pthread_mutex_unlock_usercnt merlin
Hi, On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: > On my workstation today (running vanilla 9.4.0) I was testing some new > code that does aggressive parallel loading to a couple of tables. It > ran ok several dozen times and froze up with no external trigger. > There were at most 8 active backends that were stuck (the loader is > threaded to a cap) -- each query typically resolves in a few seconds > but they were hung for 30 minutes+. Interesting. > Had to do restart immediate as > backends were not responding to cancel...but I snapped a 'perf top' > before I did so. The results were interesting so I'm posting them > here. So far I have not been able to reproduce...FYI Can you compile postgres with -fno-omit-frame-pointer? Then, when this happens the next time, you can take a perf record -g, which will tell us which lock the contention is at. > 61.03% postgres [.] s_lock > 13.56% postgres [.] LWLockRelease > 10.11% postgres [.] LWLockAcquire That profile looks like it might end up being helped by the lwlock and/or freelist changes in 9.5. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 13, 2015 at 4:33 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote: >> On my workstation today (running vanilla 9.4.0) I was testing some new >> code that does aggressive parallel loading to a couple of tables. It >> ran ok several dozen times and froze up with no external trigger. >> There were at most 8 active backends that were stuck (the loader is >> threaded to a cap) -- each query typically resolves in a few seconds >> but they were hung for 30 minutes+. > > Interesting. > >> Had to do restart immediate as >> backends were not responding to cancel...but I snapped a 'perf top' >> before I did so. The results were interesting so I'm posting them >> here. So far I have not been able to reproduce...FYI > > Can you compile postgres with -fno-omit-frame-pointer? Then, when this > happens the next time, you can take a perf record -g, which will tell us > which lock the contention is at. will do, and I'll loop it for a while and see if I can get it to re-occur. merlin
On Tue, Jan 13, 2015 at 2:29 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On my workstation today (running vanilla 9.4.0) I was testing some new > code that does aggressive parallel loading to a couple of tables. Could you give more details, please? For example, I'd like to see representative data, or at least the table definitions involved. Do you have any idea what index is involved with the _bt_compare() calls above? I'm curious as to what its definition is. You might also consider using my B-Tree verification tool on a preserved data directory: http://www.postgresql.org/message-id/CAM3SWZRK2YxKegj94_2wKs_vbEMJeHc1yE23ECsS01wRAKzkjg@mail.gmail.com I'm inclined to think that this is a livelock, and so the problem isn't evident from the structure of the B-Tree, but it can't hurt to check. The tool is bleeding edge, so don't use it in production. Thanks -- Peter Geoghegan
On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: > I'm inclined to think that this is a livelock, and so the problem > isn't evident from the structure of the B-Tree, but it can't hurt to > check. My guess is rather that it's contention on the freelist lock via StrategyGetBuffer's. I've seen profiles like this due to exactly that before - and it fits to parallel loading quite well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: >> I'm inclined to think that this is a livelock, and so the problem >> isn't evident from the structure of the B-Tree, but it can't hurt to >> check. > > My guess is rather that it's contention on the freelist lock via > StrategyGetBuffer's. I've seen profiles like this due to exactly that > before - and it fits to parallel loading quite well. I think I've got it to pop again. s_lock is only showing 35% (increasing very slowly if at all) but performance is mostly halted. Frame pointer is compiled out. perf report attached. merlin 35.82% postgres [.] s_lock 23.71% postgres [.] tas 14.01% postgres [.] tas 6.82% postgres [.] spin_delay 5.93% postgres [.] LWLockRelease 4.36% postgres [.] LWLockAcquireCommon 2.34% postgres [.] FunctionCall2Coll 1.79% postgres [.] _bt_compare 0.80% postgres [.] LockBuffer 0.52% postgres [.] btoidcmp 0.49% postgres [.] ReleaseAndReadBuffer 0.47% postgres [.] _bt_moveright 0.36% postgres [.] _bt_checkpage 0.36% postgres [.] _bt_relandgetbuf 0.20% perf [.] 0x000000000004312a 0.19% postgres [.] LWLockAcquire 0.13% libv8.so [.] 0x00000000001bbbe0 0.11% libc-2.17.so [.] 0x0000000000151134 0.09% libwebkit.so [.] 0x000000000106ccb7 0.05% libgdk_pixbuf-2.0.so.0.2800.1 [.] 0x00000000000139c7 0.04% Xorg [.] 0x00000000000efb00 0.03% libglib-2.0.so.0.3800.1 [.] 0x00000000000876a2 0.03% [kernel] [k] __ticket_spin_lock 0.02% perf-12966.map [.] 0x00000625db944621 0.02% libskia.so [.] 0x000000000021d15f 0.02% libcairo.so.2.11200.16 [.] 0x000000000002440b 0.02% libpthread-2.17.so [.] __pthread_mutex_unlock_usercnt 0.02% [kernel] [k] copy_user_enhanced_fast_string 0.02% radeon_drv.so [.] 0x0000000000045002 0.02% libpthread-2.17.so [.] pthread_mutex_lock 0.02% [kernel] [k] fget_light 0.01% [kernel] [k] __schedule 0.01% libexa.so [.] 0x0000000000006e1d 0.01% libgdk-x11-2.0.so.0.2400.20 [.] 0x000000000002f0b0 0.01% libvte.so.9.2800.2 [.] 0x0000000000039028 0.01% [radeon] [k] r100_mm_rreg 0.01% [kernel] [k] xhci_irq 0.01% [kernel] [k] native_write_msr_safe 0.01% [kernel] [k] update_cfs_rq_blocked_load 0.01% libglib-2.0.so.0.3800.1 [.] g_hash_table_lookup 0.01% libgobject-2.0.so.0.3800.1 [.] g_type_check_instance_is_a Press '?' for help on key bindings
Attachment
On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: > On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: > >> I'm inclined to think that this is a livelock, and so the problem > >> isn't evident from the structure of the B-Tree, but it can't hurt to > >> check. > > > > My guess is rather that it's contention on the freelist lock via > > StrategyGetBuffer's. I've seen profiles like this due to exactly that > > before - and it fits to parallel loading quite well. > > I think I've got it to pop again. s_lock is only showing 35% > (increasing very slowly if at all) but performance is mostly halted. > Frame pointer is compiled out. perf report attached. > 35.82% postgres [.] s_lock > 23.71% postgres [.] tas > 14.01% postgres [.] tas > 6.82% postgres [.] spin_delay > 5.93% postgres [.] LWLockRelease > 4.36% postgres [.] LWLockAcquireCommon Interesting. This profile looks quite different? What kind of hardware is this on? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: > My guess is rather that it's contention on the freelist lock via > StrategyGetBuffer's. I've seen profiles like this due to exactly that > before - and it fits to parallel loading quite well. I'm not saying you're wrong, but the breakdown of _bt_moveright() relative to _bt_relandgetbuf() calls seems a bit fishy to me. I don't remember seeing _bt_moveright() or _bt_compare() figuring so prominently, where _bt_binsrch() is nowhere to be seen. I can't see a reference to _bt_binsrch() in either profile. -- Peter Geoghegan
On Tue, Jan 13, 2015 at 5:49 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> My guess is rather that it's contention on the freelist lock via >> StrategyGetBuffer's. I've seen profiles like this due to exactly that >> before - and it fits to parallel loading quite well. > > I'm not saying you're wrong, but the breakdown of _bt_moveright() > relative to _bt_relandgetbuf() calls seems a bit fishy to me. > > I don't remember seeing _bt_moveright() or _bt_compare() figuring so > prominently, where _bt_binsrch() is nowhere to be seen. I can't see a > reference to _bt_binsrch() in either profile. hm, this is hand compiled now, I bet the symbols are wrong. merlin
On 2015-01-13 15:49:33 -0800, Peter Geoghegan wrote: > On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > My guess is rather that it's contention on the freelist lock via > > StrategyGetBuffer's. I've seen profiles like this due to exactly that > > before - and it fits to parallel loading quite well. > > I'm not saying you're wrong, but the breakdown of _bt_moveright() > relative to _bt_relandgetbuf() calls seems a bit fishy to me. The hierarchical profile definitely doesn't look like my guess was right. > I don't remember seeing _bt_moveright() or _bt_compare() figuring so > prominently, where _bt_binsrch() is nowhere to be seen. I can't see a > reference to _bt_binsrch() in either profile. Well, we do a _bt_moveright pretty early on, so that actually might be cache misses we're primarily seing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 13, 2015 at 5:42 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote: >> On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote: >> >> I'm inclined to think that this is a livelock, and so the problem >> >> isn't evident from the structure of the B-Tree, but it can't hurt to >> >> check. >> > >> > My guess is rather that it's contention on the freelist lock via >> > StrategyGetBuffer's. I've seen profiles like this due to exactly that >> > before - and it fits to parallel loading quite well. >> >> I think I've got it to pop again. s_lock is only showing 35% >> (increasing very slowly if at all) but performance is mostly halted. >> Frame pointer is compiled out. perf report attached. > >> 35.82% postgres [.] s_lock >> 23.71% postgres [.] tas >> 14.01% postgres [.] tas >> 6.82% postgres [.] spin_delay >> 5.93% postgres [.] LWLockRelease >> 4.36% postgres [.] LWLockAcquireCommon > > Interesting. This profile looks quite different? yep, it's very stable, and the database is approximately frozen. > What kind of hardware is this on? linux on crap workstation box, except I have good ssd (intel 3500). model name : Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz top - 17:44:00 up 11 days, 6:25, 15 users, load average: 6.94, 6.97, 5.73 Tasks: 259 total, 8 running, 250 sleeping, 0 stopped, 1 zombie %Cpu0 : 95.7 us, 0.0 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 93.5 us, 2.2 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 95.7 us, 0.0 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 93.6 us, 2.1 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8131028 total, 7977608 used, 153420 free, 21424 buffers KiB Swap: 8340476 total, 1884900 used, 6455576 free, 5201188 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32492 mmoncure 20 0 175m 12m 9040 R 64.0 0.2 9:28.42 postgres 32367 mmoncure 20 0 175m 12m 9128 R 57.3 0.2 9:35.87 postgres 31792 mmoncure 20 0 175m 92m 88m R 52.9 1.2 9:57.50 postgres 32725 mmoncure 20 0 175m 13m 9852 R 52.9 0.2 9:08.50 postgres 31899 mmoncure 20 0 179m 104m 97m R 50.7 1.3 9:49.51 postgres 32368 mmoncure 20 0 175m 12m 9200 R 50.7 0.2 9:33.77 postgres 31282 mmoncure 20 0 184m 152m 140m R 46.3 1.9 10:26.29 postgres 666 mmoncure 20 0 20640 1684 1116 R 2.2 0.0 0:00.01 top Some more information what's happening: This is a ghetto logical replication engine that migrates data from sql sever to postgres, consolidating a sharded database into a single set of tables (of which there are only two). There is only one index on the destination table, and it's composite int,int in both cases. A single source table is replicated in a single transaction, and it's staged to a set of two temp tables, one for deletes, one for inserts. I log each step in the database. Here are the times: cds2=# select cdsruntableid, finished - started, inserted from cdsruntable where cdsrunid = 62 order by started;cdsruntableid │ ?column? │ inserted ───────────────┼─────────────────┼────────── 833 │ 00:00:33.24044 │ 11860 834 │ 00:00:35.44981 │ 19099 835 │ 00:02:01.725186 │ 530923 836 │ 00:01:29.578811 │ 211737 838 │ 00:01:17.393461│ 64258 837 │ 00:00:56.849106 │ 227844 839 │ 00:02:12.016504 │ 630309 840│ 00:00:00.00111 │ 841 │ 00:01:09.058806 │ 155309 842 │ 00:01:54.723747 │ 704422 843 │00:01:14.965304 │ 844 │ 00:00:45.410953 │ 59934 845 │ 00:00:34.302632 │ 14848 846 │ 00:00:46.913832│ 89893 848 │ 00:05:22.155189 │ 2410622 847 │ 00:01:25.199141 │ 268157 849│ 00:01:16.168352 │ 117511 850 │ 00:00:34.809862 │ 1175 851 │ 00:01:12.565397 │ 67214 852 │ 00:01:03.734776 │ 20129 853 │ 00:00:41.038456 │ 62765 854 │ 00:01:03.478917 │ 14967 855 │ 00:00:33.88803 │ 6901 856 │ 00:00:36.381886 │ 6670 857 │ 00:00:36.626623 │ 8268 858 │ 00:01:14.391584 │ 363448 859 │ 00:01:44.619005 │ 533395 860 │ 00:01:02.042255│ 212202 861 │ 00:00:00.001065 │ 863 │ 00:00:58.265506 │ 215234 862 │ 00:02:45.572635│ 827941 865 │ 00:01:28.049165 │ 241020 864 │ 00:01:51.465643 │ 531012 866│ 00:01:20.273391 │ 419357 868 │ 00:01:25.479443 │ 294262 869 │ 00:00:46.400825 │ 46136 870 │ 00:00:42.337286 │ 25934 871 │ 00:00:48.350152 │ 91375 872 │ 00:00:00.001743 │ 873 │ 00:01:04.596749 │ 199744 876 │ 00:01:15.210301 │ 276276 877 │ 00:02:36.447723 │ 1693887 878 │ 00:00:55.925104 │ 36938 879 │ 00:00:00.016874 │ 882 │ 00:00:50.479248 │ 171901 881 │ │ 884 │ │ 883 │ │ 867 │ │ 874 │ │ 875 │ │ 880 │ │ (52 rows) There is a an 8 thread worker that walks that queue in id order and migrates one source table of data to the destination postgres server.It's been stuck for quite some time now, and I'm certainthat if I restart the database, and rerun the incremental processing routine, it would work those last seven tables with no issues (at least that's what happened last time). It's been bogged with no progress now for a half hour again, so there's some kind of weird bug here. merlin
On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >> reference to _bt_binsrch() in either profile. > > hm, this is hand compiled now, I bet the symbols are wrong. In case it isn't clear, I think that the proximate cause here may well be either one (or both) of commits efada2b8e920adfdf7418862e939925d2acd1b89 and/or 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think that the profile is roughly consistent with that, although I may well be wrong. -- Peter Geoghegan
On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >>> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >>> reference to _bt_binsrch() in either profile. >> >> hm, this is hand compiled now, I bet the symbols are wrong. > > In case it isn't clear, I think that the proximate cause here may well > be either one (or both) of commits > efada2b8e920adfdf7418862e939925d2acd1b89 and/or > 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think > that the profile is roughly consistent with that, although I may well > be wrong. I'm out of time for the day, but I'm fairly confident I can reproduce. I'll see if I can reverse those commits tomorrow and retest (I'm on development box). merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan <pg@heroku.com> wrote: >> In case it isn't clear, I think that the proximate cause here may well >> be either one (or both) of commits >> efada2b8e920adfdf7418862e939925d2acd1b89 and/or >> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think >> that the profile is roughly consistent with that, although I may well >> be wrong. > I'm out of time for the day, but I'm fairly confident I can reproduce. > I'll see if I can reverse those commits tomorrow and retest (I'm on > development box). I'm not convinced that Peter is barking up the right tree. I'm noticing that the profiles seem rather skewed towards parser/planner work; so I suspect the contention is probably on access to system catalogs. No idea exactly why though. It would be good to collect a few stack traces from the hung backends, rather than relying on perf statistics. regards, tom lane
On 2015-01-13 19:05:10 -0500, Tom Lane wrote: > Merlin Moncure <mmoncure@gmail.com> writes: > > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan <pg@heroku.com> wrote: > >> In case it isn't clear, I think that the proximate cause here may well > >> be either one (or both) of commits > >> efada2b8e920adfdf7418862e939925d2acd1b89 and/or > >> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think > >> that the profile is roughly consistent with that, although I may well > >> be wrong. > > > I'm out of time for the day, but I'm fairly confident I can reproduce. > > I'll see if I can reverse those commits tomorrow and retest (I'm on > > development box). > > I'm not convinced that Peter is barking up the right tree. I'm noticing > that the profiles seem rather skewed towards parser/planner work; so I > suspect the contention is probably on access to system catalogs. No > idea exactly why though. The plan contains plpgsql and exec_stmt_dynexecute(). So it might just be executing crazy amounts of dynamic statements. I'm still wondering if this isn'ta different issue to the first one, the plans do look different. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 13, 2015 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not convinced that Peter is barking up the right tree. I'm noticing > that the profiles seem rather skewed towards parser/planner work; so I > suspect the contention is probably on access to system catalogs. No > idea exactly why though. I see no int4cmp() calls at all, but plenty of _bt_compare(), and some FunctionCall2Coll(). And yet, no _bt_binsrch(). I can see btoidcmp(), so if I'm right then I guess it's a system catalog index. I too would like to see a stack trace. Trivia: pg_attribute_relid_attnam_index has deleted B-Tree pages after a fresh initdb. -- Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I don't remember seeing _bt_moveright() or _bt_compare() figuring so >> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a >> reference to _bt_binsrch() in either profile. > > Well, we do a _bt_moveright pretty early on, so that actually might be > cache misses we're primarily seing. Still, I see zero references to _bt_binsrch(). Not one. Even if there was only one non-meta page (i.e. pre-first-root-split), and regardless of whether this was a read or a write to the B-Tree, there'd still be some call there as the B-Tree was initially scanned (before _bt_next() calls, of which I also see none). -- Peter Geoghegan
On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > Some more information what's happening: > This is a ghetto logical replication engine that migrates data from > sql sever to postgres, consolidating a sharded database into a single > set of tables (of which there are only two). There is only one index > on the destination table, and it's composite int,int in both cases. Does the logical replication engine perform dynamic DDL at all? Does it dynamically add columns to the table that everything is being consolidated to? -- Peter Geoghegan
On Tue, Jan 13, 2015 at 7:24 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Some more information what's happening: >> This is a ghetto logical replication engine that migrates data from >> sql sever to postgres, consolidating a sharded database into a single >> set of tables (of which there are only two). There is only one index >> on the destination table, and it's composite int,int in both cases. > > Does the logical replication engine perform dynamic DDL at all? Does > it dynamically add columns to the table that everything is being > consolidated to? I let the test burn in all night. It's still exactly where it was...this is not slow performance, it's stuck. It works like this: Dynamically create staging table for inserts/change Dynamically create staging table for deletes Dynamically create master table if not present (in this last run, it was was present) Dynamically delete from master via delete table Dynamically delete from master via insert/change table (based on int/int key) Dynamically insert/select from inserts/change to permanent There is also some log/tracking information going on in other tables. Here's a backtrace: #0 0x0000000000750a97 in spin_delay () #1 0x0000000000750b19 in s_lock () #2 0x0000000000750844 in LWLockRelease () #3 0x0000000000730000 in LockBuffer () #4 0x00000000004b2db4 in _bt_relandgetbuf () #5 0x00000000004b7116 in _bt_moveright () #6 0x00000000004b6e02 in _bt_search () #7 0x00000000004b8006 in _bt_first () #8 0x00000000004b5aad in btgettuple () #9 0x0000000000896656 in FunctionCall2Coll () #10 0x00000000004acd55 in index_getnext_tid () #11 0x00000000004acfad in index_getnext () #12 0x00000000004ac120 in systable_getnext () #13 0x0000000000878ce4 in ScanPgRelation () #14 0x0000000000879eb1 in RelationBuildDesc () #15 0x000000000087bb52 in RelationIdGetRelation () #16 0x0000000000495115 in relation_open () #17 0x00000000004952f9 in relation_openrv_extended () #18 0x000000000049555e in heap_openrv_extended () #19 0x000000000057ef0b in parserOpenTable () #20 0x000000000056776f in setTargetTable () #21 0x000000000053628f in transformDeleteStmt () #22 0x000000000053603c in transformStmt () #23 0x0000000000535fdb in transformTopLevelStmt () #24 0x0000000000535db8 in parse_analyze () #25 0x000000000075dd4f in pg_analyze_and_rewrite () #26 0x00000000006472a5 in _SPI_execute_plan () #27 0x0000000000644551 in SPI_execute () #28 0x00007f89b83a8ce7 in exec_stmt_dynexecute () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #29 0x00007f89b83a4c43 in exec_stmt () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #30 0x00007f89b83a495f in exec_stmts () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #31 0x00007f89b83a480a in exec_stmt_block () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #32 0x00007f89b83a2802 in plpgsql_exec_function () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #33 0x00007f89b839cd6a in plpgsql_call_handler () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #34 0x0000000000618805 in ExecMakeTableFunctionResult () #35 0x00000000006376ba in FunctionNext () #36 0x000000000061f868 in ExecScanFetch () #37 0x000000000061f8d7 in ExecScan () #38 0x0000000000637a1d in ExecFunctionScan () #39 0x0000000000614cfd in ExecProcNode () #40 0x0000000000612c32 in ExecutePlan () #41 0x00000000006110c3 in standard_ExecutorRun () ---Type <return> to continue, or q <return> to quit--- #42 0x0000000000610fc5 in ExecutorRun () #43 0x0000000000647b02 in _SPI_pquery () #44 0x00000000006475e8 in _SPI_execute_plan () #45 0x000000000064471d in SPI_execute_plan_with_paramlist () #46 0x00007f89b83a85a5 in exec_stmt_execsql () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #47 0x00007f89b83a4c28 in exec_stmt () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #48 0x00007f89b83a495f in exec_stmts () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #49 0x00007f89b83a4429 in exec_stmt_block () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #50 0x00007f89b83a4a5d in exec_stmt () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #51 0x00007f89b83a495f in exec_stmts () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #52 0x00007f89b83a54a5 in exec_stmt_loop () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #53 0x00007f89b83a4aff in exec_stmt () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #54 0x00007f89b83a495f in exec_stmts () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #55 0x00007f89b83a5241 in exec_stmt_if () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #56 0x00007f89b83a4ac9 in exec_stmt () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #57 0x00007f89b83a495f in exec_stmts () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #58 0x00007f89b83a480a in exec_stmt_block () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #59 0x00007f89b83a2802 in plpgsql_exec_function () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #60 0x00007f89b839cd6a in plpgsql_call_handler () from /home/mmoncure/pg94/lib/postgresql/plpgsql.so #61 0x0000000000618403 in ExecMakeFunctionResultNoSets () #62 0x0000000000618e01 in ExecEvalFunc () #63 0x000000000061f0f7 in ExecTargetList () #64 0x000000000061f6dd in ExecProject () #65 0x0000000000638a03 in ExecResult () #66 0x0000000000614c16 in ExecProcNode () #67 0x0000000000612c32 in ExecutePlan () #68 0x00000000006110c3 in standard_ExecutorRun () #69 0x0000000000610fc5 in ExecutorRun () #70 0x0000000000764725 in PortalRunSelect () #71 0x00000000007643b4 in PortalRun () #72 0x000000000075e484 in exec_simple_query () #73 0x00000000007626e2 in PostgresMain () #74 0x00000000006f463f in BackendRun () #75 0x00000000006f3d44 in BackendStartup () #76 0x00000000006f075e in ServerLoop () #77 0x00000000006efdec in PostmasterMain () #78 0x00000000006594ec in main () merlin
On Wed, Jan 14, 2015 at 8:03 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > Here's a backtrace: > > #0 0x0000000000750a97 in spin_delay () > #1 0x0000000000750b19 in s_lock () > #2 0x0000000000750844 in LWLockRelease () > #3 0x0000000000730000 in LockBuffer () > #4 0x00000000004b2db4 in _bt_relandgetbuf () > #5 0x00000000004b7116 in _bt_moveright () > #6 0x00000000004b6e02 in _bt_search () > #7 0x00000000004b8006 in _bt_first () > #8 0x00000000004b5aad in btgettuple () > #9 0x0000000000896656 in FunctionCall2Coll () > #10 0x00000000004acd55 in index_getnext_tid () > #11 0x00000000004acfad in index_getnext () There were seven process with that backtrace exact backtrace (except that randomly they are sleeping in the spinloop). Something else interesting: autovacuum has been running all night as well. Unlike the other process however, cpu utilization does not register on top. I backtraced them as well. They are not identical. One of them looks like this: #0 0x00007f89c1b21a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00000000006de073 in PGSemaphoreLock () #2 0x000000000074e398 in ProcWaitForSignal () #3 0x00000000007302cb in LockBufferForCleanup () #4 0x00000000004b699e in btvacuumpage () #5 0x00000000004b6627 in btvacuumscan () #6 0x00000000004b6389 in btbulkdelete () #7 0x00000000008968d7 in FunctionCall4Coll () #8 0x00000000004ad196 in index_bulk_delete () #9 0x000000000060b9ed in lazy_vacuum_index () #10 0x000000000060b019 in lazy_scan_heap () #11 0x0000000000609836 in lazy_vacuum_rel () #12 0x0000000000609383 in vacuum_rel () #13 0x0000000000607e4a in vacuum () #14 0x00000000006e2d40 in autovacuum_do_vac_analyze () #15 0x00000000006e2127 in do_autovacuum () #16 0x00000000006e0f19 in AutoVacWorkerMain () #17 0x00000000006e0b06 in StartAutoVacWorker () #18 0x00000000006f4fac in StartAutovacuumWorker () #19 0x00000000006f4816 in sigusr1_handler () #20 <signal handler called> #21 0x00007f89c1b17dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 #22 0x00000000006f0619 in ServerLoop () #23 0x00000000006efdec in PostmasterMain () #24 0x00000000006594ec in main () The other autovacuum process (0 cpu usage) looks like this: #0 0x00007f89c1b17dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00000000008d3ac4 in pg_usleep () #2 0x0000000000750b69 in s_lock () #3 0x0000000000750844 in LWLockRelease () #4 0x0000000000730000 in LockBuffer () #5 0x00000000004b2db4 in _bt_relandgetbuf () #6 0x00000000004b7116 in _bt_moveright () #7 0x00000000004b6e02 in _bt_search () #8 0x00000000004b8006 in _bt_first () #9 0x00000000004b5aad in btgettuple () #10 0x0000000000896656 in FunctionCall2Coll () #11 0x00000000004acd55 in index_getnext_tid () #12 0x00000000004acfad in index_getnext () #13 0x00000000004ac120 in systable_getnext () #14 0x0000000000878ce4 in ScanPgRelation () #15 0x000000000087bcc4 in RelationReloadIndexInfo () #16 0x000000000087c1f1 in RelationClearRelation () #17 0x000000000087ca37 in RelationCacheInvalidate () #18 0x0000000000875fce in InvalidateSystemCaches () #19 0x0000000000741d81 in ReceiveSharedInvalidMessages () #20 0x0000000000876080 in AcceptInvalidationMessages () #21 0x0000000000745ce3 in LockRelationOid () #22 0x000000000049510b in relation_open () #23 0x0000000000495359 in heap_open () #24 0x0000000000608694 in vac_update_relstats () #25 0x000000000060bb98 in lazy_cleanup_index () #26 0x000000000060b0b6 in lazy_scan_heap () #27 0x0000000000609836 in lazy_vacuum_rel () #28 0x0000000000609383 in vacuum_rel () #29 0x0000000000607e4a in vacuum () #30 0x00000000006e2d40 in autovacuum_do_vac_analyze () #31 0x00000000006e2127 in do_autovacuum () #32 0x00000000006e0f19 in AutoVacWorkerMain () #33 0x00000000006e0b06 in StartAutoVacWorker () #34 0x00000000006f4fac in StartAutovacuumWorker () #35 0x00000000006f4816 in sigusr1_handler () #36 <signal handler called> #37 0x00007f89c1b17dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 #38 0x00000000006f0619 in ServerLoop () #39 0x00000000006efdec in PostmasterMain () #40 0x00000000006594ec in main () merlin
Merlin Moncure <mmoncure@gmail.com> writes: > There were seven process with that backtrace exact backtrace (except > that randomly they are sleeping in the spinloop). Something else > interesting: autovacuum has been running all night as well. Unlike > the other process however, cpu utilization does not register on top. > I backtraced them as well. They are not identical. One of them looks > like this: What are the autovac processes doing (according to pg_stat_activity)? regards, tom lane
On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> There were seven process with that backtrace exact backtrace (except >> that randomly they are sleeping in the spinloop). Something else >> interesting: autovacuum has been running all night as well. Unlike >> the other process however, cpu utilization does not register on top. >> I backtraced them as well. They are not identical. One of them looks >> like this: > > What are the autovac processes doing (according to pg_stat_activity)? pid,running,waiting,query 27885,15:40:46.904039,f,autovacuum: VACUUM ANALYZE onesitepmc.propertyguestcard 7209,00:24:44.162684,f,"SELECT n.nspname as ""Schema"", c.relname as ""Name"", CASE c.relkind WHEN 'r' THEN 'table' WHEN'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as ""Type"", pg_catalog.pg_get_userbyid(c.relowner) as ""Owner"" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','v','m','S','f','') AND n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2;" 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class 31282,15:28:29.070942,f,SELECT CDSReconcileRunTable(867) 31792,15:27:09.483365,f,SELECT CDSReconcileRunTable(874) 7135,00:26:08.977953,t,vacuum VERBOSE pg_class; 32725,15:24:46.008341,f,SELECT CDSReconcileRunTable(884) 32367,15:25:44.4206,f,SELECT CDSReconcileRunTable(881) 32492,15:25:31.27934,f,SELECT CDSReconcileRunTable(883) 31899,15:26:57.246415,f,SELECT CDSReconcileRunTable(875) 32368,15:25:44.418445,f,SELECT CDSReconcileRunTable(880) 7270,00:21:42.274104,f,"SELECT n.nspname as ""Schema"",c.relname as ""Name"", CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v'THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as ""Type"", pg_catalog.pg_get_userbyid(c.relowner) as ""Owner"" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind IN ('r','v','m','S','f','') AND n.nspname <> 'pg_catalog' AND n.nspname <> 'information_schema' AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 1,2;" 7403,00:00:00,f,"COPY ( select pid, now ( ) - query_start as running, waiting, query from pg_stat_activity ) TO STDOUT csv header ;" notes: *) SELECT CDSReconcileRunTable(xxx) are the replication processes. they are burning cpu. *) SELECT n.nspname ... is via psql \d. they stick when invoked and do not respond to cancel. they are also burning cpu *) the autovacum processes do not respond to cancel. however, one of them did earlier (tracing in from lazy_vacuum_index ()) did respond. However, now, it doesn't (upon being respawned via AV) with the same backtrace. The blocked manual vacuum verbose did cancel merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What are the autovac processes doing (according to pg_stat_activity)? > pid,running,waiting,query > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class Hah, I suspected as much. Is that the one that's stuck in LockBufferForCleanup, or the other one that's got a similar backtrace to all the user processes? regards, tom lane
On Wed, Jan 14, 2015 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> What are the autovac processes doing (according to pg_stat_activity)? > >> pid,running,waiting,query >> 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class > > Hah, I suspected as much. Is that the one that's stuck in > LockBufferForCleanup, or the other one that's got a similar > backtrace to all the user processes? Yes, it is pg_class is coming from LockBufferForCleanup (). As you can see above, it has a shorter runtime. So it was killed off once about a half hour ago which did not free up the logjam. However, AV spawned it again and now it does not respond to cancel. merlin
On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > Merlin Moncure <mmoncure@gmail.com> writes: > > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> What are the autovac processes doing (according to pg_stat_activity)? > > > pid,running,waiting,query > > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class It'd be interesting to know whether that vacuum gets very frequent semaphore wakeups. Could you strace it for a second or three? How did this perform < 9.4? Can you guess how many times these dynamic statements are planned? How many different relations are accessed in the dynamically planned queries? > Hah, I suspected as much. Is that the one that's stuck in > LockBufferForCleanup, or the other one that's got a similar backtrace > to all the user processes? Do you have a theory? Right now it primarily looks like contention on a single buffer due to the high number of dynamic statements, possibly made worse by the signalling between normal pinners and vacuum waiting for cleanup. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Merlin Moncure <mmoncure@gmail.com> writes: > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > can see above, it has a shorter runtime. So it was killed off once > about a half hour ago which did not free up the logjam. However, AV > spawned it again and now it does not respond to cancel. Interesting. That seems like there might be two separate issues at play. It's plausible that LockBufferForCleanup might be interfering with other attempts to scan the index, but then why wouldn't killing the AV have unstuck things? Anyway it's now seeming that Peter may have the right idea about where to look. regards, tom lane
On 2015-01-14 10:13:32 -0500, Tom Lane wrote: > Merlin Moncure <mmoncure@gmail.com> writes: > > Yes, it is pg_class is coming from LockBufferForCleanup (). As you > > can see above, it has a shorter runtime. So it was killed off once > > about a half hour ago which did not free up the logjam. However, AV > > spawned it again and now it does not respond to cancel. > > Interesting. That seems like there might be two separate issues at > play. It's plausible that LockBufferForCleanup might be interfering > with other attempts to scan the index, but then why wouldn't killing > the AV have unstuck things? LockBufferForCleanup() unfortunately isn't interruptible. I've every now and then seen vacuums being stuck for a long while, trying to acquire cleanup locks - IIRC I complained about that on list even. So autovac will only be cancelled when going to the next page. And even if the page ever gets a zero (well, one) refcount, by the time autovac is woken up via the semaphore, it'll often end up being used again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Hah, I suspected as much. Is that the one that's stuck in >> LockBufferForCleanup, or the other one that's got a similar backtrace >> to all the user processes? > Do you have a theory? Right now it primarily looks like contention on a > single buffer due to the high number of dynamic statements, possibly > made worse by the signalling between normal pinners and vacuum waiting > for cleanup. The readers shouldn't be contending with each other. It's certainly plausible enough that there's a steady stream of readers that would prevent LockBufferForCleanup from getting in, but that should not result in any readers getting blocked. So something's gotten broken in that area. regards, tom lane
On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: >> Merlin Moncure <mmoncure@gmail.com> writes: >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> What are the autovac processes doing (according to pg_stat_activity)? >> >> > pid,running,waiting,query >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class > > It'd be interesting to know whether that vacuum gets very frequent > semaphore wakeups. Could you strace it for a second or three? for 30 seconds+ it just looks like this: mmoncure@mernix2 ~ $ sudo strace -p 7105 Process 7105 attached semop(5701638, {{4, -1, 0}}, 1 all of other processes are yielding out of the spinlock, for example: select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout) > How did this perform < 9.4? this is a new project. However, I can run it vs earlier version. Can you guess how many times these dynamic > statements are planned? How many different relations are accessed in the > dynamically planned queries? only once or twice, and only a couple of tables. This is an operation that should only take few seconds (inserting a few 10s of thousands of rows), that has blocked for many hours now. Usually it runs through taking a few seconds. This is either a deadlock or a deadlock emulating sequence of operations. I'll try to pull commits that Peter suggested and see if that helps (I'm getting ready to bring the database down). I can send the code off-list if you guys think it'd help. merlin
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2015-01-14 10:05:01 -0500, Tom Lane wrote: > >> Merlin Moncure <mmoncure@gmail.com> writes: > >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> What are the autovac processes doing (according to pg_stat_activity)? > >> > >> > pid,running,waiting,query > >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class > > > > It'd be interesting to know whether that vacuum gets very frequent > > semaphore wakeups. Could you strace it for a second or three? > > for 30 seconds+ it just looks like this: > mmoncure@mernix2 ~ $ sudo strace -p 7105 > Process 7105 attached > semop(5701638, {{4, -1, 0}}, 1 Ok. So that explains why it's not interruptible. > all of other processes are yielding out of the spinlock, for example: > select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout) Note the above isn't the spinlock, it's the process's semaphore. It'll only get set if the refcount ever indicates that nobody but autovac is holding the lock. > > How did this perform < 9.4? > this is a new project. However, I can run it vs earlier version. > > Can you guess how many times these dynamic > > statements are planned? How many different relations are accessed in the > > dynamically planned queries? > > only once or twice, and only a couple of tables. Hm. Odd. The first -g profile seemed to indicate a hell of a lot time was spent in LWLockRelease() - indicating that there's actually progress. Later profiles/backtraces were less clear. If you gdb in, and type 'fin' a couple times, to wait till the function finishes, is there actually any progress? I'm wondering whether it's just many catalog accesses + contention, or some other problem. Alternatively set a breakpoint on ScanPgRelation() or so and see how often it's hit. > I can send the code off-list if you guys think it'd help. Might be interesting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > If you gdb in, and type 'fin' a couple times, to wait till the function > finishes, is there actually any progress? I'm wondering whether it's > just many catalog accesses + contention, or some other > problem. Alternatively set a breakpoint on ScanPgRelation() or so and > see how often it's hit. well, i restarted the database, forgetting my looper was running which immediately spun up and it got stuck again with a similar profile (lots of cpu in spinlock): Samples: 3K of event 'cycles', Event count (approx.): 269572322831.16% postgres [.] s_lock22.32% postgres [.] tas12.13% postgres [.] tas 5.93% postgres [.] spin_delay 5.69% postgres [.] LWLockRelease 3.75% postgres [.] LWLockAcquireCommon 3.61% perf [.] 0x00000000000526c4 2.51% postgres [.] FunctionCall2Coll 1.48% libc-2.17.so [.] 0x000000000016a132 > If you gdb in, and type 'fin' a couple times, (gdb) fin Run till exit from #0 0x00007ff4c63f7a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6 0x00000000006de073 in PGSemaphoreLock () (gdb) fin Run till exit from #0 0x00000000006de073 in PGSemaphoreLock () It returned once. Second time, it didn't at least so far (minute or so). >> I can send the code off-list if you guys think it'd help. > > Might be interesting. sent (off-list). merlin
On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: > On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > If you gdb in, and type 'fin' a couple times, to wait till the function > > finishes, is there actually any progress? I'm wondering whether it's > > just many catalog accesses + contention, or some other > > problem. Alternatively set a breakpoint on ScanPgRelation() or so and > > see how often it's hit. > > well, i restarted the database, forgetting my looper was running which > immediately spun up and it got stuck again with a similar profile > (lots of cpu in spinlock): > > Samples: 3K of event 'cycles', Event count (approx.): 2695723228 > 31.16% postgres [.] s_lock > 22.32% postgres [.] tas > 12.13% postgres [.] tas > 5.93% postgres [.] spin_delay > 5.69% postgres [.] LWLockRelease > 3.75% postgres [.] LWLockAcquireCommon > 3.61% perf [.] 0x00000000000526c4 > 2.51% postgres [.] FunctionCall2Coll > 1.48% libc-2.17.so [.] 0x000000000016a132 > > > If you gdb in, and type 'fin' a couple times, > (gdb) fin > Run till exit from #0 0x00007ff4c63f7a97 in semop () from > /lib/x86_64-linux-gnu/libc.so.6 > 0x00000000006de073 in PGSemaphoreLock () > (gdb) fin > Run till exit from #0 0x00000000006de073 in PGSemaphoreLock () > > It returned once. Second time, it didn't at least so far (minute or so). Hm, that's autovac though, not the normal user backends that actually do stuff, right? If you could additionally check those, it'd be great. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > I'll try to pull commits that Peter suggested and see if that helps > (I'm getting ready to bring the database down). I can send the code > off-list if you guys think it'd help. Thanks for the code! I think it would be interesting to see if the bug reproduces without the page split commit (40dae7ec537c5619fc93ad602c62f37be786d161) first (which was applied afterwards). Then, without the page deletion commit (efada2b8e920adfdf7418862e939925d2acd1b89). But without the page split commit seems most interesting at the moment. It would be easier to diagnose the issue if the bug still reproduces without that commit - that way, we can be reasonably confident that there are no confounding factors from the new page split code. There were some bugfixes to those two during the 9.4 beta cycle, too. It might be a bit tricky to generate a 9.4 that lacks one or both of those commits. Bugfix commits in reverse chronological order: c73669c0e0168923e3f9e787beec980f55af2bd8 (deletion) c91a9b5a285e20e54cf90f3660ce51ce3a5c2ef4 (incomplete split) 4fafc4ecd9e4d224d92c4a8549c5646860787a5d (deletion) 4a5d55ec2b711e13438a32d119a809a22ced410b (incomplete split) 77fe2b6d795f3f4ed282c9c980920e128a57624e (deletion) 7d98054f0dd115f57ad0ec1f424a66c13459013b (deletion) 954523cdfe229f1cb99a43a19e291a557ae2822d (incomplete split) I think that's all of them (apart from the original commits, of course). -- Peter Geoghegan
On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote: >> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> > If you gdb in, and type 'fin' a couple times, to wait till the function >> > finishes, is there actually any progress? I'm wondering whether it's >> > just many catalog accesses + contention, or some other >> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and >> > see how often it's hit. >> >> well, i restarted the database, forgetting my looper was running which >> immediately spun up and it got stuck again with a similar profile >> (lots of cpu in spinlock): >> >> Samples: 3K of event 'cycles', Event count (approx.): 2695723228 >> 31.16% postgres [.] s_lock >> 22.32% postgres [.] tas >> 12.13% postgres [.] tas >> 5.93% postgres [.] spin_delay >> 5.69% postgres [.] LWLockRelease >> 3.75% postgres [.] LWLockAcquireCommon >> 3.61% perf [.] 0x00000000000526c4 >> 2.51% postgres [.] FunctionCall2Coll >> 1.48% libc-2.17.so [.] 0x000000000016a132 >> >> > If you gdb in, and type 'fin' a couple times, >> (gdb) fin >> Run till exit from #0 0x00007ff4c63f7a97 in semop () from >> /lib/x86_64-linux-gnu/libc.so.6 >> 0x00000000006de073 in PGSemaphoreLock () >> (gdb) fin >> Run till exit from #0 0x00000000006de073 in PGSemaphoreLock () >> >> It returned once. Second time, it didn't at least so far (minute or so). > > Hm, that's autovac though, not the normal user backends that actually do > stuff, right? If you could additionally check those, it'd be great. got it to reproduce. had to take a break to get some work done. (gdb) fin Run till exit from #0 0x00007f7b07099dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6 0x00000000008d3ac4 in pg_usleep () (gdb) fin Run till exit from #0 0x00000000008d3ac4 in pg_usleep () 0x0000000000750b69 in s_lock () (gdb) fin Run till exit from #0 0x0000000000750b69 in s_lock () 0x0000000000750844 in LWLockRelease () (gdb) fin Run till exit from #0 0x0000000000750844 in LWLockRelease () 0x0000000000730000 in LockBuffer () (gdb) fin Run till exit from #0 0x0000000000730000 in LockBuffer () 0x00000000004b2db4 in _bt_relandgetbuf () (gdb) fin Run till exit from #0 0x00000000004b2db4 in _bt_relandgetbuf () 0x00000000004b7116 in _bt_moveright () (gdb) fin Run till exit from #0 0x00000000004b7116 in _bt_moveright () .... so it looks like nobody ever exits from _bt_moveright. any last requests before I start bisecting down? merlin
On Wed, Jan 14, 2015 at 11:49 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > so it looks like nobody ever exits from _bt_moveright. any last > requests before I start bisecting down? Could you write some code to print out the block number (i.e. "BlockNumber blkno") if there are more than, say, 5 retries within _bt_moveright()? You'll also want to print out the name of the index (so rel->rd_rel->relname.data). That'll probably only trip when this happens (5 retries are very unusual). Then, use contrib/pageinspect to dump that block's contents (I think that'll work, since no one sits on a buffer lock here). I'd like to know what index we're dealing with here. I wonder if it's "pg_attribute_relid_attnam_index". So: call "bt_page_stats('pg_whatever', n)", n being the relevant block number. Try and do the same with the right link (which bt_page_stats() will show). Then try and call bt_page_items() against one or both of those blocks/pages. I'd like to see what that shows. I'm wondering if some invariant has been violated. Which _bt_moveright() caller are we talking about here, BTW? I guess it's just the _bt_first()/ _bt_search() one. That might vary, of course. -- Peter Geoghegan
On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan <pg@heroku.com> wrote: > Could you write some code to print out the block number (i.e. > "BlockNumber blkno") if there are more than, say, 5 retries within > _bt_moveright()? Obviously I mean that the block number should be printed, no matter whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you should just move this to the top of the for(;;) loop, so it's always available to print: BlockNumber blkno = BufferGetBlockNumber(buf); Thanks -- Peter Geoghegan
On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Could you write some code to print out the block number (i.e. >> "BlockNumber blkno") if there are more than, say, 5 retries within >> _bt_moveright()? > > Obviously I mean that the block number should be printed, no matter > whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you > should just move this to the top of the for(;;) loop, so it's always > available to print: > > BlockNumber blkno = BufferGetBlockNumber(buf); (gdb) print BufferGetBlockNumber(buf) $15 = 9 ..and it stays 9, continuing several times having set breakpoint. merlin
On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > (gdb) print BufferGetBlockNumber(buf) > $15 = 9 > > ..and it stays 9, continuing several times having set breakpoint. And the index involved? I'm pretty sure that this in an internal page, no? -- Peter Geoghegan
On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> (gdb) print BufferGetBlockNumber(buf) >> $15 = 9 >> >> ..and it stays 9, continuing several times having set breakpoint. > > > And the index involved? I'm pretty sure that this in an internal page, no? The index is the oid index on pg_class. Some more info: *) temp table churn is fairly high. Several dozen get spawned and destroted at the start of a replication run, all at once, due to some dodgy coding via dblink. During the replication run, the temp table churn rate drops. *) running btreecheck, I see: cds2=# select bt_index_verify('pg_class_oid_index'); NOTICE: page 7 of index "pg_class_oid_index" is deleted NOTICE: page 10 of index "pg_class_oid_index" is deleted NOTICE: page 12 of index "pg_class_oid_index" is deletedbt_index_verify ───────────────── cds2=# select bt_leftright_verify('pg_class_oid_index'); WARNING: left link/right link pair don't comport at level 0, block 9, last: 2, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 WARNING: left link/right link pair don't comport at level 0, block 9, last: 9, current left: 4 [repeat infinity until cancel] which looks like the index is corrupted? ISTM _bt_moveright is hanging because it's trying to move from block 9 to block 9 and so loops forever. merlin
On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > The index is the oid index on pg_class. Some more info: > > *) temp table churn is fairly high. Several dozen get spawned and > destroted at the start of a replication run, all at once, due to some > dodgy coding via dblink. During the replication run, the temp table > churn rate drops. > > *) running btreecheck, I see: I knew I wrote that tool for a reason. :-) It would be great if I could get a raw dump of the index using contrib/pageinspect at this juncture too, as already described. -- Peter Geoghegan
On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> (gdb) print BufferGetBlockNumber(buf) >>> $15 = 9 >>> >>> ..and it stays 9, continuing several times having set breakpoint. >> >> >> And the index involved? I'm pretty sure that this in an internal page, no? > > The index is the oid index on pg_class. Some more info: > > *) temp table churn is fairly high. Several dozen get spawned and > destroted at the start of a replication run, all at once, due to some > dodgy coding via dblink. During the replication run, the temp table > churn rate drops. > > *) running btreecheck, I see: > cds2=# select bt_index_verify('pg_class_oid_index'); > NOTICE: page 7 of index "pg_class_oid_index" is deleted > NOTICE: page 10 of index "pg_class_oid_index" is deleted > NOTICE: page 12 of index "pg_class_oid_index" is deleted > bt_index_verify > ───────────────── > > > cds2=# select bt_leftright_verify('pg_class_oid_index'); > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 2, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > WARNING: left link/right link pair don't comport at level 0, block 9, > last: 9, current left: 4 > [repeat infinity until cancel] > > which looks like the index is corrupted? ISTM _bt_moveright is > hanging because it's trying to move from block 9 to block 9 and so > loops forever. per Peter the following might be useful: cds2=# select * from bt_metap('pg_class_oid_index'); magic │ version │ root │ level │ fastroot │ fastlevel ────────┼─────────┼──────┼───────┼──────────┼─────────── 340322 │ 2 │ 3 │ 1 │ 3 │ 1 cds2=# select (bt_page_stats('pg_class_oid_index', s)).* from generate_series(1,12) s; blkno │ type │ live_items │ dead_items │ avg_item_size │ page_size │ free_size │ btpo_prev │ btpo_next │ btpo │ btpo_flags ───────┼──────┼────────────┼────────────┼───────────────┼───────────┼───────────┼───────────┼───────────┼───────┼──────────── 1 │ l │ 119 │ 0 │ 16 │ 8192 │ 5768 │ 0 │ 4 │ 0 │ 1 2 │ l │ 25 │ 0 │ 16 │ 8192 │ 7648 │ 4 │ 9 │ 0 │ 1 3 │ r │ 8 │ 0 │ 15 │ 8192 │ 7996 │ 0 │ 0 │ 1 │ 2 4 │ l │ 178 │ 0 │ 16 │ 8192 │ 4588 │ 1 │ 2 │ 0 │ 1 5 │ l │ 7 │ 0 │ 16 │ 8192 │ 8008 │ 9 │ 11 │ 0 │ 1 6 │ l │ 5 │ 0 │ 16 │ 8192 │ 8048 │ 11 │ 8 │ 0 │ 1 7 │ d │ 0 │ 0 │ 0 │ 8192 │ 0 │ -1 │ -1 │ 12366 │ 0 8 │ l │ 187 │ 0 │ 16 │ 8192 │ 4408 │ 6 │ 0 │ 0 │ 1 9 │ l │ 25 │ 0 │ 16 │ 8192 │ 7648 │ 4 │ 9 │ 0 │ 1 10 │ d │ 0 │ 0 │ 0 │ 8192 │ 0 │ -1 │ -1 │ 12366 │ 0 11 │ l │ 6 │ 0 │ 16 │ 8192 │ 8028 │ 5 │ 6 │ 0 │ 1 12 │ d │ 0 │ 0 │ 0 │ 8192 │ 0 │ -1 │ -1 │ 10731 │ 0 merlin
Attachment
This is great, but it's not exactly clear which bt_page_items() page is which - some are skipped, but I can't be sure which. Would you mind rewriting that query to indicate which block is under consideration by bt_page_items()? Thanks -- Peter Geoghegan
On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan <pg@heroku.com> wrote: > This is great, but it's not exactly clear which bt_page_items() page > is which - some are skipped, but I can't be sure which. Would you mind > rewriting that query to indicate which block is under consideration by > bt_page_items()? yeah. via: cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv header; merlin
Attachment
On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > yeah. via: > cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index', > s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv > header; My immediate observation here is that blocks 2 and 9 have identical metadata (from their page opaque area), but partially non-matching data items (however, the number of items on each block is consistent and correct according to that metadata, as far as it goes). I think block 9 is supposed to have a right-link to block 5 (block 5 seems to think so, at least -- its left link is 9). So now the question is: how did that inconsistency arise? It didn't necessarily arise at the time of the (presumed) split of block 2 to create 9. It could be that the opaque area was changed by something else, some time later. I'll investigate more. -- Peter Geoghegan
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan <pg@heroku.com> wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that metadata, as far as it goes). I think > block 9 is supposed to have a right-link to block 5 (block 5 seems to > think so, at least -- its left link is 9). For the convenience of others, here is a version with a normalized data column added (interpreting the datums as little-endian Oids). -- Peter Geoghegan
Attachment
On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan <pg@heroku.com> wrote: > My immediate observation here is that blocks 2 and 9 have identical > metadata (from their page opaque area), but partially non-matching > data items (however, the number of items on each block is consistent > and correct according to that metadata, as far as it goes). I think > block 9 is supposed to have a right-link to block 5 (block 5 seems to > think so, at least -- its left link is 9). I am mistaken on one detail here - blocks 2 and 9 are actually fully identical. I still have no idea why, though. -- Peter Geoghegan
On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan <pg@heroku.com> wrote: > I am mistaken on one detail here - blocks 2 and 9 are actually fully > identical. I still have no idea why, though. So, I've looked at it in more detail and it appears that the page of block 2 split at some point, thereby creating a new page (the block 9 page). There is a sane downlink in the root page for the new rightlink page. The root page looks totally sane, as does every other page - as I said, the problem is only that block 9 is spuriously identical to block 2. So the (correct) downlink in the root page, to block 9, is the same as the (incorrect) high key value in block 9 - Oid value 69924. To be clear: AFAICT everything is perfect except block 9, which is bizarrely identical to block 2. Now, since the sane page downlink located in the root (like every downlink, a lower bound on items in its child) is actually a copy of the high key on the page that is the child's left link (that is to say, it comes from the original target of a page split - it shares the target's high key value, Oid value 69924), there may have never been sane data in block 9, even though its downlink is sane (so maybe the page split patch is implicated). But it's hard to see how that could be true. The relevant code wasn't really what was changed about page splits in 9.4 anyway (plus this wasn't a non-leaf split, since there aren't enough pages for those to be a factor). There just isn't that many items on page 2 (or its bizarre identical twin, page 9), so a recent split seems unlikely. And, the target and new right page are locked together throughout both the split and down link insertion (even though there are two atomic operations/WAL inserts). So to reiterate, a close by page split that explains the problem seems unlikely. I'm going to focus on the page deletion patch for the time being. Merlin - it would be great if you could revert all the page split commits (which came after the page deletion fix). All the follow-up page split commits [1] were fairly straightforward bugs with recovery, so it should be easy enough to totally remove the page split stuff from 9.4 for the purposes of isolating the bug. [1] http://www.postgresql.org/message-id/CAM3SWZSpJ6M9HfHksjUiUof30aUWXyyb56FJbW1_doGQKbEO+g@mail.gmail.com -- Peter Geoghegan
On 01/15/2015 03:23 AM, Peter Geoghegan wrote: > So now the question is: how did that inconsistency arise? It didn't > necessarily arise at the time of the (presumed) split of block 2 to > create 9. It could be that the opaque area was changed by something > else, some time later. I'll investigate more. Merlin, could you re-run the test with a WAL archive (if you don't have one already), and then run pg_xlogdump, filtering it to show only the changes to the index? That should show us how the index got to be the way it is. Also, if you could post a copy of the raw relation file for pg_class_oid_index; I assume it's not too large. Something like: pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd cluster. In case it's not the same on your system, you can use oid2name to find it out. - Heikki
On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 01/15/2015 03:23 AM, Peter Geoghegan wrote: >> >> So now the question is: how did that inconsistency arise? It didn't >> necessarily arise at the time of the (presumed) split of block 2 to >> create 9. It could be that the opaque area was changed by something >> else, some time later. I'll investigate more. > > > Merlin, could you re-run the test with a WAL archive (if you don't have one > already), and then run pg_xlogdump, filtering it to show only the changes to > the index? That should show us how the index got to be the way it is. Also, > if you could post a copy of the raw relation file for pg_class_oid_index; I > assume it's not too large. > > Something like: > > pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917 > > 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd > cluster. In case it's not the same on your system, you can use oid2name to > find it out. I'm on it. Will try this first, then patch removal. Question: Coming in this morning I did an immediate restart and logged into the database and queried pg_class via index. Everything was fine, and the leftright verify returns nothing. How did it repair itself without a reindex? merlin
On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > Question: Coming in this morning I did an immediate restart and logged > into the database and queried pg_class via index. Everything was > fine, and the leftright verify returns nothing. How did it repair > itself without a reindex? Maybe the relevant WAL records are more or less correct, and recovery restores the database to its correct state, as opposed to the state it was actually in at the time of the crash due to a race condition or whatever. Could you possibly send bt_page_items() against block 9 of the index after recovery? It probably looks totally sane, but it would be nice to be sure, if you happen to still have the data. -- Peter Geoghegan
On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 01/15/2015 03:23 AM, Peter Geoghegan wrote: >>> >>> So now the question is: how did that inconsistency arise? It didn't >>> necessarily arise at the time of the (presumed) split of block 2 to >>> create 9. It could be that the opaque area was changed by something >>> else, some time later. I'll investigate more. >> >> >> Merlin, could you re-run the test with a WAL archive (if you don't have one >> already), and then run pg_xlogdump, filtering it to show only the changes to >> the index? That should show us how the index got to be the way it is. Also, >> if you could post a copy of the raw relation file for pg_class_oid_index; I >> assume it's not too large. >> >> Something like: >> >> pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917 >> >> 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd >> cluster. In case it's not the same on your system, you can use oid2name to >> find it out. > > I'm on it. Will try this first, then patch removal. > > Question: Coming in this morning I did an immediate restart and logged > into the database and queried pg_class via index. Everything was > fine, and the leftright verify returns nothing. How did it repair > itself without a reindex? The plot thickens! I looped the test, still stock 9.4 as of this time and went to lunch. When I came back, the database was in recovery mode. Here is the rough sequence of events. 1) REINDEXED pg_class (wanted clean slate for full reproduction) 2) before too long (unfortunately did not configure timestamps in the log) starting seeing: ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1 at character 8 QUERY: UPDATE CDSRunTable SET Finished = clock_timestamp(), DidSucceed = _DidSucceed, ErrorMessage= _ErrorMessage WHERE CDSRunTableId = _CdsRunTableId CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement STATEMENT: SELECT CDSReconcileRunTable(2020) ..and again with a FATAL FATAL: root page 3 of index "pg_class_oid_index" has level 0, expected 1 ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1 CONTEXT: SQL statement "UPDATE CDSRunTable SET Finished = clock_timestamp(), DidSucceed = _DidSucceed, ErrorMessage = _ErrorMessage WHERE CDSRunTableId = _CdsRunTableId" PL/pgSQL function cdsreconcileruntable(bigint)line 82 at SQL statement 3) shortly (?) after that, I got: WARNING: did not find subXID 14955 in MyProc CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup WARNING: you don't own a lock of type RowExclusiveLock CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup LOG: could not send data to client: Broken pipe CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup STATEMENT: SELECT CDSReconcileRunTable(2151) WARNING: ReleaseLockIfHeld: failed?? CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup ERROR: failed to re-find shared proclock object CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup STATEMENT: SELECT CDSReconcileRunTable(2151) WARNING: AbortSubTransaction while in ABORT state WARNING: did not find subXID 14955 in MyProc WARNING: you don't own a lock of type AccessShareLock WARNING: ReleaseLockIfHeld: failed?? ERROR: failed to re-find shared proclock object WARNING: AbortSubTransaction while in ABORT state WARNING: did not find subXID 14955 in MyProc WARNING: you don't own a lock of type AccessShareLock WARNING: ReleaseLockIfHeld: failed?? WARNING: you don't own a lock of type ShareLock TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 0)", File: "lock.c", Line: 1240) LOG: server process (PID 10117) was terminated by signal 6: Aborted LOG: terminating any other active server processes 4) while I was still at lunch, the database started up again, and quickly fell over: LOG: database system is ready to accept connections LOG: autovacuum launcher started ERROR: could not open relation with OID 3597 STATEMENT: SELECT CDSReconcileRunTable(2268) ERROR: could not open relation with OID 3597 STATEMENT: SELECT CDSReconcileRunTable(2270) ERROR: could not open relation with OID 3597 STATEMENT: SELECT CDSReconcileRunTable(2272) ERROR: could not open relation with OID 3597 STATEMENT: SELECT CDSReconcileRunTable(2273) TRAP: FailedAssertion("!(( ((void) ((bool) ((! assert_enabled) || ! (!(((bool) (((const void*)(&htup->t_ctid) != ((void *)0)) && ((&htup->t_ctid)->ip_posid != 0) LOG: server process (PID 30520) was terminated by signal 6: Aborted LOG: terminating any other active server processes this repeated a very large number of times. Occasionally, but infrequently, I'd see during the database startup something like: LOG: unexpected pageaddr 50/1F2D8000 in log segment 0000000100000051000000E6, offset 2981888 during startup. the address would change sometimes: LOG: unexpected pageaddr 50/1F2DA000 in log segment 0000000100000051000000E6, offset 2990080 Basically, what is happening here is that any attempt to connect to cds database fails with assertion (I can connect to postgres ok). On Thu, Jan 15, 2015 at 12:47 PM, Peter Geoghegan <pg@heroku.com> wrote: > Maybe the relevant WAL records are more or less correct, and recovery > restores the database to its correct state, as opposed to the state it > was actually in at the time of the crash due to a race condition or > whatever. Could you possibly send bt_page_items() against block 9 of > the index after recovery? It probably looks totally sane, but it would > be nice to be sure, if you happen to still have the data. Too late: I reindexed it. going back and reviewing the log however, I did notice: ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1 at character 8 in the log before the reindex was thrown. merlin
Hi, > The plot thickens! I looped the test, still stock 9.4 as of this time > and went to lunch. When I came back, the database was in recovery > mode. Here is the rough sequence of events. > Whoa. That looks scary. Did you see (some of) those errors before? Most of them should have been emitted independently of being built with assertions. > 1) REINDEXED pg_class (wanted clean slate for full reproduction) > 2) before too long (unfortunately did not configure timestamps in the > log) starting seeing: > > ERROR: root page 3 of index "pg_class_oid_index" has level 0, > expected 1 at character 8 > QUERY: UPDATE CDSRunTable SET > Finished = clock_timestamp(), > DidSucceed = _DidSucceed, > ErrorMessage = _ErrorMessage > WHERE CDSRunTableId = _CdsRunTableId > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 82 at > SQL statement > STATEMENT: SELECT CDSReconcileRunTable(2020) > > ..and again with a FATAL > FATAL: root page 3 of index "pg_class_oid_index" has level 0, expected 1 > ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1 > CONTEXT: SQL statement "UPDATE CDSRunTable SET > Finished = clock_timestamp(), > DidSucceed = _DidSucceed, > ErrorMessage = _ErrorMessage > WHERE CDSRunTableId = _CdsRunTableId" > PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement Just to be absolutely clear, those happened *before* the following errors? And there were no 'during exception cleanup' like errors before? > 3) shortly (?) after that, I got: > > WARNING: did not find subXID 14955 in MyProc > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > WARNING: you don't own a lock of type RowExclusiveLock > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > LOG: could not send data to client: Broken pipe > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > STATEMENT: SELECT CDSReconcileRunTable(2151) > WARNING: ReleaseLockIfHeld: failed?? > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > ERROR: failed to re-find shared proclock object > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > STATEMENT: SELECT CDSReconcileRunTable(2151) > WARNING: AbortSubTransaction while in ABORT state > WARNING: did not find subXID 14955 in MyProc > WARNING: you don't own a lock of type AccessShareLock > WARNING: ReleaseLockIfHeld: failed?? > ERROR: failed to re-find shared proclock object > WARNING: AbortSubTransaction while in ABORT state > WARNING: did not find subXID 14955 in MyProc > WARNING: you don't own a lock of type AccessShareLock > WARNING: ReleaseLockIfHeld: failed?? > WARNING: you don't own a lock of type ShareLock > TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > > 0)", File: "lock.c", Line: 1240) > LOG: server process (PID 10117) was terminated by signal 6: Aborted > LOG: terminating any other active server processes Ick. Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log entries before? It's hard to know from here, but the 'during exception cleanup' indicates a problem in abort handling. Were there any deadlock detected errors closeby? You're catching deadlock errors in a subtransaction. Hm. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2015-01-15 20:15:42 +0100, Andres Freund wrote: > > WARNING: did not find subXID 14955 in MyProc > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > WARNING: you don't own a lock of type RowExclusiveLock > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > LOG: could not send data to client: Broken pipe > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > STATEMENT: SELECT CDSReconcileRunTable(2151) > > WARNING: ReleaseLockIfHeld: failed?? > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > ERROR: failed to re-find shared proclock object > > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > > during exception cleanup > > STATEMENT: SELECT CDSReconcileRunTable(2151) > > WARNING: AbortSubTransaction while in ABORT state > > WARNING: did not find subXID 14955 in MyProc > > WARNING: you don't own a lock of type AccessShareLock > > WARNING: ReleaseLockIfHeld: failed?? > > ERROR: failed to re-find shared proclock object > > WARNING: AbortSubTransaction while in ABORT state > > WARNING: did not find subXID 14955 in MyProc > > WARNING: you don't own a lock of type AccessShareLock > > WARNING: ReleaseLockIfHeld: failed?? > > WARNING: you don't own a lock of type ShareLock > > TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > > > 0)", File: "lock.c", Line: 1240) > > LOG: server process (PID 10117) was terminated by signal 6: Aborted > > LOG: terminating any other active server processes > > Ick. > > Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log > entries before? It's hard to know from here, but the 'during exception > cleanup' indicates a problem in abort handling. Were there any deadlock > detected errors closeby? Alternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If so, what was the cause? > You're catching deadlock errors in a subtransaction. Hm. A couple questions: * Do you also use lock_timeout/statement_timeout? If so, what are their settings + deadlock_timeout? * were any processes killed at that time? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > >> The plot thickens! I looped the test, still stock 9.4 as of this time >> and went to lunch. When I came back, the database was in recovery >> mode. Here is the rough sequence of events. >> > > Whoa. That looks scary. Did you see (some of) those errors before? Most > of them should have been emitted independently of being built with > assertions. Nope. >> 1) REINDEXED pg_class (wanted clean slate for full reproduction) >> 2) before too long (unfortunately did not configure timestamps in the >> log) starting seeing: >> >> ERROR: root page 3 of index "pg_class_oid_index" has level 0, >> expected 1 at character 8 >> QUERY: UPDATE CDSRunTable SET >> Finished = clock_timestamp(), >> DidSucceed = _DidSucceed, >> ErrorMessage = _ErrorMessage >> WHERE CDSRunTableId = _CdsRunTableId >> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 82 at >> SQL statement >> STATEMENT: SELECT CDSReconcileRunTable(2020) >> >> ..and again with a FATAL >> FATAL: root page 3 of index "pg_class_oid_index" has level 0, expected 1 >> ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1 >> CONTEXT: SQL statement "UPDATE CDSRunTable SET >> Finished = clock_timestamp(), >> DidSucceed = _DidSucceed, >> ErrorMessage = _ErrorMessage >> WHERE CDSRunTableId = _CdsRunTableId" >> PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement > > Just to be absolutely clear, those happened *before* the following > errors? And there were no 'during exception cleanup' like errors before? correct. unfortunately, I wasn't logging times so I'm not 100% sure if the 'root page' error happened during the event or sometime before it. I'm addressing that with the log prefix. > Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log > entries before? It's hard to know from here, but the 'during exception > cleanup' indicates a problem in abort handling. Were there any deadlock > detected errors closeby? > > You're catching deadlock errors in a subtransaction. Hm. yup, exactly 1: WARNING: LOG: Handling deadlock detected on CdsRunTableId 1626 Based on the id that is proven to be before the crashing started, although at least one manual server restart happened in the interim. Since it's possible the database is a loss, do you see any value in bootstrappinng it again with checksums turned on? One point of note is that this is a brand spanking new SSD, maybe we nee to rule out hardware based corruption? > lternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If so, what was the cause? There was one, but unfortunately since the error was trapped the relevant detail to catch the other side of the deadlock is missed. Barring this analysis, I'm inclined to remove that guard: it was mostly there to deal with what turned out to be some bugs in the code (specifically nonconstrained update on CDSTable). > * Do you also use lock_timeout/statement_timeout? not on this database -- this is a coding environment and not set up with standard configuration > were any processes killed at that time? later on, yes, by manual restart, and the query was cancelled first: to wit: WARNING: LOG: Handling deadlock detected on CdsRunTableId 1626 ERROR: canceling statement due to user request STATEMENT: SELECT CDSReconcileRunTable(1626) followed by restart -m fast merlin
On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > Since it's possible the database is a loss, do you see any value in > bootstrappinng it again with checksums turned on? One point of note > is that this is a brand spanking new SSD, maybe we nee to rule out > hardware based corruption? hm! I bootstrapped a new database with checksums on, and lo, [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING: page verification failed, calculated checksum 28520 but expected 28541 [cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT: SQL statement "COPY lease_delete FROM '/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER" merlin
On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Since it's possible the database is a loss, do you see any value in >> bootstrappinng it again with checksums turned on? One point of note >> is that this is a brand spanking new SSD, maybe we nee to rule out >> hardware based corruption? > > hm! I bootstrapped a new database with checksums on, and lo, > > [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING: page > verification failed, calculated checksum 28520 but expected 28541 > [cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT: SQL statement > "COPY lease_delete FROM > '/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER" Running this test on another set of hardware to verify -- if this turns out to be a false alarm which it may very well be, I can only offer my apologies! I've never had a new drive fail like that, in that manner. I'll burn the other hardware in overnight and report back. merlin
On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > Running this test on another set of hardware to verify -- if this > turns out to be a false alarm which it may very well be, I can only > offer my apologies! I've never had a new drive fail like that, in > that manner. I'll burn the other hardware in overnight and report > back. Thanks. -- Peter Geoghegan
On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Running this test on another set of hardware to verify -- if this >> turns out to be a false alarm which it may very well be, I can only >> offer my apologies! I've never had a new drive fail like that, in >> that manner. I'll burn the other hardware in overnight and report >> back. huh -- well possibly. not. This is on a virtual machine attached to a SAN. It ran clean for several (this is 9.4 vanilla, asserts off, checksums on) hours then the starting having issues: [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page verification failed, calculated checksum 59143 but expected 59137 at character 20 [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY: DELETE FROM "onesitepmc"."propertyguestcard" t WHEREEXISTS ( SELECT 1 FROM "propertyguestcard_insert" d WHERE (t."prptyid", t."gcardid") = (d."prptyid",d."gcardid") ) [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT: PL/pgSQL function cdsreconciletable(text,text,text,text,boolean) line 197 at EXECUTE statement SQL statement "SELECT * FROM CDSReconcileTable( t.CDSServer, t.CDSDatabase, t.SchemaName, t.TableName)" PL/pgSQL function cdsreconcileruntable(bigint) line 35at SQL statement After that, several hours of clean running, followed by: [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING: did not find subXID 7553 in MyProc [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING: you don't own a lock of type AccessShareLock [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG: could not send data to client: Broken pipe [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT: SELECT CDSReconcileRunTable(1160) [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a lock of type AccessShareLock [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a lock of type AccessShareLock [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a lock of type AccessShareLock [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a lock of type ShareLock [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find shared lock object [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 during exception cleanup [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT: SELECT CDSReconcileRunTable(1160) [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: AbortSubTransaction while in ABORT state [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: did not find subXID 7553 in MyProc [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a lock of type RowExclusiveLock [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a lock of type AccessExclusiveLock [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find shared lock object [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: AbortSubTransaction while in ABORT state [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: did not find subXID 7553 in MyProc [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find shared lock object [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: AbortSubTransaction while in ABORT state [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: did not find subXID 7553 in MyProc [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a lock of type RowExclusiveLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type RowShareLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type AccessShareLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type ExclusiveLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]ERROR: failed to re-find shared lock object [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: AbortSubTransaction while in ABORT state [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: did not find subXID 7553 in MyProc [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type RowExclusiveLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type ShareLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type AccessExclusiveLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type ShareLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type AccessShareLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a lock of type RowShareLock [cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.029 CST 7549]WARNING: you don't own a lock of type AccessExclusiveLock [cds2 32353 2015-01-16 04:40:58.029 CST 7549]WARNING: ReleaseLockIfHeld: failed?? [cds2 32353 2015-01-16 04:40:58.029 CST 7549]ERROR: failed to re-find shared lock object [cds2 32353 2015-01-16 04:40:58.029 CST 7549]PANIC: ERRORDATA_STACK_SIZE exceeded [ 3093 2015-01-16 04:41:00.299 CST 0]LOG: server process (PID 32353) was terminated by signal 6: Aborted [ 3093 2015-01-16 04:41:00.300 CST 0]LOG: terminating any other active server processes After that, server resumed processing without further incident. merlin
On Fri, Jan 16, 2015 at 8:05 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> Running this test on another set of hardware to verify -- if this >>> turns out to be a false alarm which it may very well be, I can only >>> offer my apologies! I've never had a new drive fail like that, in >>> that manner. I'll burn the other hardware in overnight and report >>> back. > > huh -- well possibly. not. This is on a virtual machine attached to a > SAN. It ran clean for several (this is 9.4 vanilla, asserts off, > checksums on) hours then the starting having issues: I'm going to bisect this down...FYI. I'll start with the major releases first. This is not going to be a fast process...I'm out of pocket for the weekend and have a lot of other stuff going on. merlin
On 01/16/2015 04:05 PM, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> Running this test on another set of hardware to verify -- if this >>> turns out to be a false alarm which it may very well be, I can only >>> offer my apologies! I've never had a new drive fail like that, in >>> that manner. I'll burn the other hardware in overnight and report >>> back. > > huh -- well possibly. not. This is on a virtual machine attached to a > SAN. It ran clean for several (this is 9.4 vanilla, asserts off, > checksums on) hours then the starting having issues: > > [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page > verification failed, calculated checksum 59143 but expected 59137 at > character 20 The calculated checksum is suspiciously close to to the expected one. It could be coincidence, but the previous checksum warning you posted was also quite close: > [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING: page > verification failed, calculated checksum 28520 but expected 28541 I believe the checksum algorithm is supposed to mix the bits quite thoroughly, so that a difference in a single byte in the input will lead to a completely different checksum. However, we add the block number to the checksum last: > /* Mix in the block number to detect transposed pages */ > checksum ^= blkno; > > /* > * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of > * one. That avoids checksums of zero, which seems like a good idea. > */ > return (checksum % 65535) + 1; It looks very much like that a page has for some reason been moved to a different block number. And that's exactly what Peter found out in his investigation too; an index page was mysteriously copied to a different block with identical content. - Heikki
Hi, On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: > > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> Running this test on another set of hardware to verify -- if this > >> turns out to be a false alarm which it may very well be, I can only > >> offer my apologies! I've never had a new drive fail like that, in > >> that manner. I'll burn the other hardware in overnight and report > >> back. > > huh -- well possibly. not. This is on a virtual machine attached to a > SAN. It ran clean for several (this is 9.4 vanilla, asserts off, > checksums on) hours then the starting having issues: Damn. Is there any chance you can package this somehow so that others can run it locally? It looks hard to find the actual bug here without adding instrumentation to to postgres. > [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page > verification failed, calculated checksum 59143 but expected 59137 at > character 20 > [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY: > DELETE FROM "onesitepmc"."propertyguestcard" t > WHERE EXISTS > ( > SELECT 1 FROM "propertyguestcard_insert" d > WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid") > ) > [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT: PL/pgSQL > function cdsreconciletable(text,text,text,text,boolean) line 197 at > EXECUTE statement > SQL statement "SELECT * FROM CDSReconcileTable( > t.CDSServer, > t.CDSDatabase, > t.SchemaName, > t.TableName)" > PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement This was the first error? None of the 'could not find subXID' errors beforehand? > [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING: did not find > subXID 7553 in MyProc > [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG: could not send data > to client: Broken pipe > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT: SELECT > CDSReconcileRunTable(1160) > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a > lock of type ShareLock > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find > shared lock object > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT: SELECT > CDSReconcileRunTable(1160) > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > AbortSubTransaction while in ABORT state This indicates a bug in our subtransaction abort handling. It looks to me like there actually might be several. But it's probably a consequence of an earlier bug. It's hard to diagnose the actual issue, because we're not seing the original error(s) :(. Could you add a EmitErrorReport(); before the FlushErrorState() in pl_exec.c's exec_stmt_block()? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Is there any chance you can package this somehow so that others can run > it locally? It looks hard to find the actual bug here without adding > instrumentation to to postgres. That's possible but involves a lot of complexity in the setup because of the source database (SQL Server) dependency.. Thinking outside the box here I'm going to migrate the source to postgres. This will rule out pl/sh which is the only non-core dependency but will take some setup work on my end first. If I can still reproduce the error at that point, maybe we can go in this direction, and it it would make local reproduction easier anyways. >> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page > > This was the first error? None of the 'could not find subXID' errors > beforehand? yep. no caught exceptions or anything interesting in the log before that. > Could you add a EmitErrorReport(); before the FlushErrorState() in > pl_exec.c's exec_stmt_block()? will do. merlin
On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: >> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: >> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >> Running this test on another set of hardware to verify -- if this >> >> turns out to be a false alarm which it may very well be, I can only >> >> offer my apologies! I've never had a new drive fail like that, in >> >> that manner. I'll burn the other hardware in overnight and report >> >> back. >> >> huh -- well possibly. not. This is on a virtual machine attached to a >> SAN. It ran clean for several (this is 9.4 vanilla, asserts off, >> checksums on) hours then the starting having issues: > > Damn. > > Is there any chance you can package this somehow so that others can run > it locally? It looks hard to find the actual bug here without adding > instrumentation to to postgres. FYI, a two hour burn in on my workstation on 9.3 ran with no issues. An overnight run would probably be required to prove it, ruling out both hardware and pl/sh. If proven, it's possible we may be facing a regression, perhaps a serious one. ISTM the next step is to bisect the problem down over the weekend in order to to narrow the search. If that doesn't turn up anything productive I'll look into taking other steps. merlin
On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > ISTM the next step is to bisect the problem down over the weekend in > order to to narrow the search. If that doesn't turn up anything > productive I'll look into taking other steps. That might be the quickest way to do it, provided you can isolate the bug fairly reliably. It might be a bit tricky to write a shell script that assumes a certain amount of time having passed without the bug tripping indicates that it doesn't exist, and have that work consistently. I'm slightly concerned that you'll hit other bugs that have since been fixed, given the large number of possible symptoms here. -- Peter Geoghegan
On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > It looks very much like that a page has for some reason been moved to a > different block number. And that's exactly what Peter found out in his > investigation too; an index page was mysteriously copied to a different > block with identical content. What I found suspicious about that was that the spuriously identical pages were not physically adjacent, but logically adjacent (i.e. the bad page was considered the B-Tree right link of the good page by the good, spuriously-copied-by-bad page). It also seems likely that that small catalog index on pg_class(oid) was well cached in shared_buffers. So I agree that it's unlikely that this is actually a hardware or filesystem problem. Beyond that, if I had to guess, I'd say that the problem is more likely to be in the B-Tree code than it is in the buffer manager or whatever (so the "logically adjacent" thing is probably not an artifact of the order that the pages were accessed, since it appears there was a downlink to the bad page. This downlink was not added recently. Also, this logical adjacency is unlikely to be mere coincidence - Postgres seemed to fairly consistently break this way). Does anyone have a better developed sense of where the ultimate problem here is than I do? I guess I've never thought too much about how the system fails when a catalog index is this thoroughly broken. -- Peter Geoghegan
On Fri, Jan 16, 2015 at 5:20 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> ISTM the next step is to bisect the problem down over the weekend in >> order to to narrow the search. If that doesn't turn up anything >> productive I'll look into taking other steps. > > That might be the quickest way to do it, provided you can isolate the > bug fairly reliably. It might be a bit tricky to write a shell script > that assumes a certain amount of time having passed without the bug > tripping indicates that it doesn't exist, and have that work > consistently. I'm slightly concerned that you'll hit other bugs that > have since been fixed, given the large number of possible symptoms > here. Quick update: not done yet, but I'm making consistent progress, with several false starts. (for example, I had a .conf problem with the new dynamic shared memory setting and git merrily bisected down to the introduction of the feature.). I have to triple check everything :(. The problem is generally reproducible but I get false negatives that throws off the bisection. I estimate that early next week I'll have it narrowed down significantly if not to the exact offending revision. So far, the 'nasty' damage seems to generally if not always follow a checksum failure and the checksum failures are always numerically adjacent. For example: [cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING: page verification failed, calculated checksum 9465 but expected 9477 at character 20 [cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING: page verification failed, calculated checksum 61889 but expected 61903 at character 20 [cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING: page verification failed, calculated checksum 27311 but expected 27316 I'm not up on the intricacies of our checksum algorithm but this is making me suspicious that we are looking at a improperly flipped visibility bit via some obscure problem -- almost certainly with vacuum playing a role. This fits the profile of catastrophic damage that masquerades as numerous other problems. Or, perhaps, something is flipping what it thinks is a visibility bit but on the wrong page. I still haven't categorically ruled out pl/sh yet; that's something to keep in mind. In the 'plus' category, aside from flushing out this issue, I've had zero runtime problems so far aside from the mains problem; bisection (at least on the 'bad' side) has been reliably engaged by simply counting the number of warnings/errors/etc in the log. That's really impressive. merlin
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
So far, the 'nasty' damage seems to generally if not always follow a
checksum failure and the checksum failures are always numerically
adjacent. For example:
[cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING: page
verification failed, calculated checksum 9465 but expected 9477 at
character 20
[cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING: page
verification failed, calculated checksum 61889 but expected 61903 at
character 20
[cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING: page
verification failed, calculated checksum 27311 but expected 27316
I'm not up on the intricacies of our checksum algorithm but this is
making me suspicious that we are looking at a improperly flipped
visibility bit via some obscure problem -- almost certainly with
vacuum playing a role.
That very much sounds like the block is getting duplicated from one place to another.
Even flipping one hint bit (aren't these index pages? Do they have hint bits) should thoroughly scramble the checksum.
Because the checksum adds in the block number after the scrambling has been done, copying a page to another nearby location will just move the (expected) checksum a little bit.
Cheers,
Jeff
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of the feature.). Thanks. Looking forward to your findings. Ideally, we'd be able to get a fix in for 9.4.1. -- Peter Geoghegan
On Thu, Jan 22, 2015 at 03:50:03PM -0600, Merlin Moncure wrote: > Quick update: not done yet, but I'm making consistent progress, with > several false starts. (for example, I had a .conf problem with the > new dynamic shared memory setting and git merrily bisected down to the > introduction of the feature.). > I have to triple check everything :(. The problem is generally > reproducible but I get false negatives that throws off the bisection. > I estimate that early next week I'll have it narrowed down > significantly if not to the exact offending revision. I've never used it but the BBChop project claims to be able to bisect even in the case of intermittent failure. https://github.com/Ealdwulf/bbchop/ It claims to be slow, but I don't think that's the limiting factor here... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > I still haven't categorically ruled out pl/sh yet; that's something to > keep in mind. Well, after bisection proved not to be fruitful, I replaced the pl/sh calls with dummy calls that approximated the same behavior and the problem went away. So again, it looks like this might be a lot of false alarm. A pl/sh driven failure might still be interesting if it's coming from the internal calls it's making, so I'm still chasing it down. merlin
On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> I still haven't categorically ruled out pl/sh yet; that's something to >> keep in mind. > > Well, after bisection proved not to be fruitful, I replaced the pl/sh > calls with dummy calls that approximated the same behavior and the > problem went away. So again, it looks like this might be a lot of > false alarm. A pl/sh driven failure might still be interesting if > it's coming from the internal calls it's making, so I'm still chasing > it down. ...hm, I spoke to soon. So I deleted everything, and booted up a new instance 9.4 vanilla with asserts on and took no other action. Applying the script with no data activity fails an assertion every single time: mmoncure@mernix2 12:25 PM (REL9_4_STABLE) ~/src/p94$ cat /mnt/ssd/data/pg_log/postgresql-28.log [ 12287 2015-01-28 12:24:24.080 CST 0]LOG: received smart shutdown request [ 13516 2015-01-28 12:24:24.080 CST 0]LOG: autovacuum launcher shutting down [ 13513 2015-01-28 12:24:24.081 CST 0]LOG: shutting down [ 13513 2015-01-28 12:24:24.083 CST 0]LOG: database system is shut down [ 14481 2015-01-28 12:24:25.127 CST 0]LOG: database system was shut down at 2015-01-28 12:24:24 CST [ 14457 2015-01-28 12:24:25.129 CST 0]LOG: database system is ready to accept connections [ 14485 2015-01-28 12:24:25.129 CST 0]LOG: autovacuum launcher started TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330) [ 14457 2015-01-28 12:24:47.983 CST 0]LOG: server process (PID 14545) was terminated by signal 6: Aborted [ 14457 2015-01-28 12:24:47.983 CST 0]DETAIL: Failed process was running: SELECT CDSStartRun() [ 14457 2015-01-28 12:24:47.983 CST 0]LOG: terminating any other active server processes [cds2 14546 2015-01-28 12:24:47.983 CST 0]WARNING: terminating connection because of crash of another server process [cds2 14546 2015-01-28 12:24:47.983 CST 0]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. [cds2 14546 2015-01-28 12:24:47.983 CST 0]HINT: In a moment you should be able to reconnect to the database and repeat your command. [ 14485 2015-01-28 12:24:47.983 CST 0]WARNING: terminating connection because of crash of another server process [ 14485 2015-01-28 12:24:47.983 CST 0]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. [ 14485 2015-01-28 12:24:47.983 CST 0]HINT: In a moment you should be able to reconnect to the database and repeat your command. [ 14457 2015-01-28 12:24:47.984 CST 0]LOG: all server processes terminated; reinitializing [ 14554 2015-01-28 12:24:47.995 CST 0]LOG: database system was interrupted; last known up at 2015-01-28 12:24:25 CST [ 14554 2015-01-28 12:24:47.995 CST 0]LOG: database system was not properly shut down; automatic recovery in progress [ 14554 2015-01-28 12:24:47.997 CST 0]LOG: invalid magic number 0000 in log segment 000000010000000000000001, offset 13000704 [ 14554 2015-01-28 12:24:47.997 CST 0]LOG: redo is not required [ 14457 2015-01-28 12:24:48.000 CST 0]LOG: database system is ready to accept connections [ 14558 2015-01-28 12:24:48.000 CST 0]LOG: autovacuum launcher started merlin
Merlin Moncure <mmoncure@gmail.com> writes: > ...hm, I spoke to soon. So I deleted everything, and booted up a new > instance 9.4 vanilla with asserts on and took no other action. > Applying the script with no data activity fails an assertion every > single time: > TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330) There's no Assert at line 330 in 9.4, though there is in HEAD. I suspect what you've got here is a version mismatch; in particular commit 4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c such that external modules would need to be recompiled to use it without error. I'm not real sure though how you are getting past the loadable-module version check. Anyway, I'd try "make distclean" and full rebuild before anything else. regards, tom lane
On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> ...hm, I spoke to soon. So I deleted everything, and booted up a new >> instance 9.4 vanilla with asserts on and took no other action. >> Applying the script with no data activity fails an assertion every >> single time: > >> TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330) > > There's no Assert at line 330 in 9.4, though there is in HEAD. I suspect > what you've got here is a version mismatch; in particular commit > 4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c > such that external modules would need to be recompiled to use it without > error. I'm not real sure though how you are getting past the > loadable-module version check. Anyway, I'd try "make distclean" and > full rebuild before anything else. you're right -- git got confused and built a 9.5 devel at some random commit. I reran it again on 9.4 and it worked ok :(. merlin