Thread: hung backends stuck in spinlock heavy endless loop

hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

13 January 2015, 22:29:58

On my workstation today (running vanilla 9.4.0) I was testing some new
code that does aggressive parallel loading to a couple of tables. It
ran ok several dozen times and froze up with no external trigger.
There were at most 8 active backends that were stuck (the loader is
threaded to a cap) -- each query typically resolves in a few seconds
but they were hung for 30 minutes+.  Had to do restart immediate as
backends were not responding to cancel...but I snapped a 'perf top'
before I did so.  The results were interesting so I'm posting them
here.  So far I have not been able to reproduce...FYI
61.03%  postgres                     [.] s_lock13.56%  postgres                     [.] LWLockRelease10.11%  postgres
                 [.] LWLockAcquire 4.02%  perf                         [.] 0x00000000000526d3 1.65%  postgres
         [.] _bt_compare 1.60%  libc-2.17.so                 [.] 0x0000000000081069 0.66%  [kernel]
[k]kallsyms_expand_symbol.constprop.1 0.60%  [kernel]                     [k] format_decode 0.57%  [kernel]
       [k] number.isra.1 0.47%  [kernel]                     [k] memcpy 0.44%  postgres                     [.]
ReleaseAndReadBuffer0.44%  postgres                     [.] FunctionCall2Coll 0.41%  [kernel]                     [k]
vsnprintf0.41%  [kernel]                     [k] module_get_kallsym 0.32%  postgres                     [.]
_bt_relandgetbuf0.31%  [kernel]                     [k] string.isra.5 0.31%  [kernel]                     [k] strnlen
0.31% postgres                     [.] _bt_moveright 0.28%  libc-2.17.so                 [.] getdelim 0.22%  postgres
                 [.] LockBuffer 0.16%  [kernel]                     [k] seq_read 0.16%  libc-2.17.so
[.]__libc_calloc 0.13%  postgres                     [.] _bt_checkpage 0.09%  [kernel]                     [k]
pointer.isra.150.09%  [kernel]                     [k] update_iter 0.08%  plugin_host                  [.]
PyObject_GetAttr0.06%  [kernel]                     [k] strlcpy 0.06%  [kernel]                     [k] seq_vprintf
0.06% [kernel]                     [k] copy_user_enhanced_fast_string 0.06%  libc-2.17.so                 [.] _IO_feof
0.06% postgres                     [.] btoidcmp 0.06%  [kernel]                     [k] page_fault 0.06%  libc-2.17.so
              [.] free 0.06%  libc-2.17.so                 [.] memchr 0.06%  libpthread-2.17.so           [.]
__pthread_mutex_unlock_usercnt

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

13 January 2015, 22:33:39

Hi,

On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote:
> On my workstation today (running vanilla 9.4.0) I was testing some new
> code that does aggressive parallel loading to a couple of tables. It
> ran ok several dozen times and froze up with no external trigger.
> There were at most 8 active backends that were stuck (the loader is
> threaded to a cap) -- each query typically resolves in a few seconds
> but they were hung for 30 minutes+.

Interesting.

> Had to do restart immediate as
> backends were not responding to cancel...but I snapped a 'perf top'
> before I did so.  The results were interesting so I'm posting them
> here.  So far I have not been able to reproduce...FYI

Can you compile postgres with -fno-omit-frame-pointer? Then, when this
happens the next time, you can take a perf record -g, which will tell us
which lock the contention is at.

>  61.03%  postgres                     [.] s_lock
>  13.56%  postgres                     [.] LWLockRelease
>  10.11%  postgres                     [.] LWLockAcquire

That profile looks like it might end up being helped by the lwlock
and/or freelist changes in 9.5.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

13 January 2015, 22:40:26

On Tue, Jan 13, 2015 at 4:33 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2015-01-13 16:29:51 -0600, Merlin Moncure wrote:
>> On my workstation today (running vanilla 9.4.0) I was testing some new
>> code that does aggressive parallel loading to a couple of tables. It
>> ran ok several dozen times and froze up with no external trigger.
>> There were at most 8 active backends that were stuck (the loader is
>> threaded to a cap) -- each query typically resolves in a few seconds
>> but they were hung for 30 minutes+.
>
> Interesting.
>
>> Had to do restart immediate as
>> backends were not responding to cancel...but I snapped a 'perf top'
>> before I did so.  The results were interesting so I'm posting them
>> here.  So far I have not been able to reproduce...FYI
>
> Can you compile postgres with -fno-omit-frame-pointer? Then, when this
> happens the next time, you can take a perf record -g, which will tell us
> which lock the contention is at.

will do, and I'll loop it for a while and see if I can get it to re-occur.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

13 January 2015, 23:17:23

On Tue, Jan 13, 2015 at 2:29 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On my workstation today (running vanilla 9.4.0) I was testing some new
> code that does aggressive parallel loading to a couple of tables.

Could you give more details, please? For example, I'd like to see
representative data, or at least the table definitions involved. Do
you have any idea what index is involved with the _bt_compare() calls
above? I'm curious as to what its definition is.

You might also consider using my B-Tree verification tool on a
preserved data directory:
http://www.postgresql.org/message-id/CAM3SWZRK2YxKegj94_2wKs_vbEMJeHc1yE23ECsS01wRAKzkjg@mail.gmail.com

I'm inclined to think that this is a livelock, and so the problem
isn't evident from the structure of the B-Tree, but it can't hurt to
check. The tool is bleeding edge, so don't use it in production.

Thanks
-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

13 January 2015, 23:21:45

On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote:
> I'm inclined to think that this is a livelock, and so the problem
> isn't evident from the structure of the B-Tree, but it can't hurt to
> check.

My guess is rather that it's contention on the freelist lock via
StrategyGetBuffer's. I've seen profiles like this due to exactly that
before - and it fits to parallel loading quite well.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

13 January 2015, 23:39:16

On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote:
>> I'm inclined to think that this is a livelock, and so the problem
>> isn't evident from the structure of the B-Tree, but it can't hurt to
>> check.
>
> My guess is rather that it's contention on the freelist lock via
> StrategyGetBuffer's. I've seen profiles like this due to exactly that
> before - and it fits to parallel loading quite well.

I think I've got it to pop again. s_lock is only showing 35%
(increasing very slowly if at all) but performance is mostly halted.
Frame pointer is compiled out.  perf report attached.

merlin

 35.82%  postgres                            [.] s_lock
 23.71%  postgres                            [.] tas
 14.01%  postgres                            [.] tas
  6.82%  postgres                            [.] spin_delay
  5.93%  postgres                            [.] LWLockRelease
  4.36%  postgres                            [.] LWLockAcquireCommon
  2.34%  postgres                            [.] FunctionCall2Coll
  1.79%  postgres                            [.] _bt_compare
  0.80%  postgres                            [.] LockBuffer
  0.52%  postgres                            [.] btoidcmp
  0.49%  postgres                            [.] ReleaseAndReadBuffer
  0.47%  postgres                            [.] _bt_moveright
  0.36%  postgres                            [.] _bt_checkpage
  0.36%  postgres                            [.] _bt_relandgetbuf
  0.20%  perf                                [.] 0x000000000004312a
  0.19%  postgres                            [.] LWLockAcquire
  0.13%  libv8.so                            [.] 0x00000000001bbbe0
  0.11%  libc-2.17.so                        [.] 0x0000000000151134
  0.09%  libwebkit.so                        [.] 0x000000000106ccb7
  0.05%  libgdk_pixbuf-2.0.so.0.2800.1       [.] 0x00000000000139c7
  0.04%  Xorg                                [.] 0x00000000000efb00
  0.03%  libglib-2.0.so.0.3800.1             [.] 0x00000000000876a2
  0.03%  [kernel]                            [k] __ticket_spin_lock
  0.02%  perf-12966.map                      [.] 0x00000625db944621
  0.02%  libskia.so                          [.] 0x000000000021d15f
  0.02%  libcairo.so.2.11200.16              [.] 0x000000000002440b
  0.02%  libpthread-2.17.so                  [.]
__pthread_mutex_unlock_usercnt
  0.02%  [kernel]                            [k]
copy_user_enhanced_fast_string
  0.02%  radeon_drv.so                       [.] 0x0000000000045002
  0.02%  libpthread-2.17.so                  [.] pthread_mutex_lock
  0.02%  [kernel]                            [k] fget_light
  0.01%  [kernel]                            [k] __schedule
  0.01%  libexa.so                           [.] 0x0000000000006e1d
  0.01%  libgdk-x11-2.0.so.0.2400.20         [.] 0x000000000002f0b0
  0.01%  libvte.so.9.2800.2                  [.] 0x0000000000039028
  0.01%  [radeon]                            [k] r100_mm_rreg
  0.01%  [kernel]                            [k] xhci_irq
  0.01%  [kernel]                            [k] native_write_msr_safe
  0.01%  [kernel]                            [k]
update_cfs_rq_blocked_load
  0.01%  libglib-2.0.so.0.3800.1             [.] g_hash_table_lookup
  0.01%  libgobject-2.0.so.0.3800.1          [.]
g_type_check_instance_is_a
Press '?' for help on key bindings

Attachment

perf.report.gz

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

13 January 2015, 23:42:16

On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote:
> On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote:
> >> I'm inclined to think that this is a livelock, and so the problem
> >> isn't evident from the structure of the B-Tree, but it can't hurt to
> >> check.
> >
> > My guess is rather that it's contention on the freelist lock via
> > StrategyGetBuffer's. I've seen profiles like this due to exactly that
> > before - and it fits to parallel loading quite well.
> 
> I think I've got it to pop again. s_lock is only showing 35%
> (increasing very slowly if at all) but performance is mostly halted.
> Frame pointer is compiled out.  perf report attached.

>  35.82%  postgres                            [.] s_lock
>  23.71%  postgres                            [.] tas
>  14.01%  postgres                            [.] tas
>   6.82%  postgres                            [.] spin_delay
>   5.93%  postgres                            [.] LWLockRelease
>   4.36%  postgres                            [.] LWLockAcquireCommon

Interesting. This profile looks quite different?

What kind of hardware is this on?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

13 January 2015, 23:49:47

On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> My guess is rather that it's contention on the freelist lock via
> StrategyGetBuffer's. I've seen profiles like this due to exactly that
> before - and it fits to parallel loading quite well.

I'm not saying you're wrong, but the breakdown of _bt_moveright()
relative to _bt_relandgetbuf() calls seems a bit fishy to me.

I don't remember seeing _bt_moveright()  or _bt_compare() figuring so
prominently, where _bt_binsrch() is nowhere to be seen. I can't see a
reference to _bt_binsrch() in either profile.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

13 January 2015, 23:50:39

On Tue, Jan 13, 2015 at 5:49 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> My guess is rather that it's contention on the freelist lock via
>> StrategyGetBuffer's. I've seen profiles like this due to exactly that
>> before - and it fits to parallel loading quite well.
>
> I'm not saying you're wrong, but the breakdown of _bt_moveright()
> relative to _bt_relandgetbuf() calls seems a bit fishy to me.
>
> I don't remember seeing _bt_moveright()  or _bt_compare() figuring so
> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a
> reference to _bt_binsrch() in either profile.

hm, this is hand compiled now, I bet the symbols are wrong.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

13 January 2015, 23:54:11

On 2015-01-13 15:49:33 -0800, Peter Geoghegan wrote:
> On Tue, Jan 13, 2015 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > My guess is rather that it's contention on the freelist lock via
> > StrategyGetBuffer's. I've seen profiles like this due to exactly that
> > before - and it fits to parallel loading quite well.
> 
> I'm not saying you're wrong, but the breakdown of _bt_moveright()
> relative to _bt_relandgetbuf() calls seems a bit fishy to me.

The hierarchical profile definitely doesn't look like my guess was
right.

> I don't remember seeing _bt_moveright()  or _bt_compare() figuring so
> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a
> reference to _bt_binsrch() in either profile.

Well, we do a _bt_moveright pretty early on, so that actually might be
cache misses we're primarily seing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

13 January 2015, 23:55:06

On Tue, Jan 13, 2015 at 5:42 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-13 17:39:09 -0600, Merlin Moncure wrote:
>> On Tue, Jan 13, 2015 at 5:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2015-01-13 15:17:15 -0800, Peter Geoghegan wrote:
>> >> I'm inclined to think that this is a livelock, and so the problem
>> >> isn't evident from the structure of the B-Tree, but it can't hurt to
>> >> check.
>> >
>> > My guess is rather that it's contention on the freelist lock via
>> > StrategyGetBuffer's. I've seen profiles like this due to exactly that
>> > before - and it fits to parallel loading quite well.
>>
>> I think I've got it to pop again. s_lock is only showing 35%
>> (increasing very slowly if at all) but performance is mostly halted.
>> Frame pointer is compiled out.  perf report attached.
>
>>  35.82%  postgres                            [.] s_lock
>>  23.71%  postgres                            [.] tas
>>  14.01%  postgres                            [.] tas
>>   6.82%  postgres                            [.] spin_delay
>>   5.93%  postgres                            [.] LWLockRelease
>>   4.36%  postgres                            [.] LWLockAcquireCommon
>
> Interesting. This profile looks quite different?

yep, it's very stable, and the database is approximately frozen.

> What kind of hardware is this on?

linux on crap workstation box, except I have good ssd (intel 3500).
model name : Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz


top - 17:44:00 up 11 days,  6:25, 15 users,  load average: 6.94, 6.97, 5.73
Tasks: 259 total,   8 running, 250 sleeping,   0 stopped,   1 zombie
%Cpu0  : 95.7 us,  0.0 sy,  0.0 ni,  4.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 93.5 us,  2.2 sy,  0.0 ni,  4.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 95.7 us,  0.0 sy,  0.0 ni,  4.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 93.6 us,  2.1 sy,  0.0 ni,  4.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8131028 total,  7977608 used,   153420 free,    21424 buffers
KiB Swap:  8340476 total,  1884900 used,  6455576 free,  5201188 cached
 PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
32492 mmoncure  20   0  175m  12m 9040 R  64.0  0.2   9:28.42 postgres
32367 mmoncure  20   0  175m  12m 9128 R  57.3  0.2   9:35.87 postgres
31792 mmoncure  20   0  175m  92m  88m R  52.9  1.2   9:57.50 postgres
32725 mmoncure  20   0  175m  13m 9852 R  52.9  0.2   9:08.50 postgres
31899 mmoncure  20   0  179m 104m  97m R  50.7  1.3   9:49.51 postgres
32368 mmoncure  20   0  175m  12m 9200 R  50.7  0.2   9:33.77 postgres
31282 mmoncure  20   0  184m 152m 140m R  46.3  1.9  10:26.29 postgres 666 mmoncure  20   0 20640 1684 1116 R   2.2
0.0  0:00.01 top 

Some more information what's happening:
This is a ghetto logical replication engine that migrates data from
sql sever to postgres, consolidating a sharded database into a single
set of tables (of which there are only two). There is only one index
on the destination table, and it's composite int,int in both cases.

A single source table is replicated in a single transaction, and it's
staged to a set of two temp tables, one for deletes, one for inserts.
I log each step in the database.  Here are the times:

cds2=# select cdsruntableid, finished - started, inserted from
cdsruntable where cdsrunid = 62 order by started;cdsruntableid │    ?column?     │ inserted
───────────────┼─────────────────┼──────────          833 │ 00:00:33.24044  │    11860          834 │ 00:00:35.44981  │
  19099          835 │ 00:02:01.725186 │   530923          836 │ 00:01:29.578811 │   211737          838 │
00:01:17.393461│    64258          837 │ 00:00:56.849106 │   227844          839 │ 00:02:12.016504 │   630309
840│ 00:00:00.00111  │          841 │ 00:01:09.058806 │   155309          842 │ 00:01:54.723747 │   704422          843
│00:01:14.965304 │          844 │ 00:00:45.410953 │    59934          845 │ 00:00:34.302632 │    14848          846 │
00:00:46.913832│    89893          848 │ 00:05:22.155189 │  2410622          847 │ 00:01:25.199141 │   268157
849│ 00:01:16.168352 │   117511          850 │ 00:00:34.809862 │     1175          851 │ 00:01:12.565397 │    67214
    852 │ 00:01:03.734776 │    20129          853 │ 00:00:41.038456 │    62765          854 │ 00:01:03.478917 │
14967         855 │ 00:00:33.88803  │     6901          856 │ 00:00:36.381886 │     6670          857 │ 00:00:36.626623
│    8268          858 │ 00:01:14.391584 │   363448          859 │ 00:01:44.619005 │   533395          860 │
00:01:02.042255│   212202          861 │ 00:00:00.001065 │          863 │ 00:00:58.265506 │   215234          862 │
00:02:45.572635│   827941          865 │ 00:01:28.049165 │   241020          864 │ 00:01:51.465643 │   531012
866│ 00:01:20.273391 │   419357          868 │ 00:01:25.479443 │   294262          869 │ 00:00:46.400825 │    46136
    870 │ 00:00:42.337286 │    25934          871 │ 00:00:48.350152 │    91375          872 │ 00:00:00.001743 │
873 │ 00:01:04.596749 │   199744          876 │ 00:01:15.210301 │   276276          877 │ 00:02:36.447723 │  1693887
     878 │ 00:00:55.925104 │    36938          879 │ 00:00:00.016874 │          882 │ 00:00:50.479248 │   171901
 881 │                 │          884 │                 │          883 │                 │          867 │
 │          874 │                 │          875 │                 │          880 │                 │ 
(52 rows)


There is a an 8 thread worker that walks that queue in id order and
migrates one source table of data to the destination postgres server.It's been stuck for quite some time now, and I'm
certainthat if I 
restart the database, and rerun the incremental processing routine, it
would work those last seven tables with no issues (at least that's
what happened last time).  It's been bogged with no progress now for a
half hour again, so there's some kind of weird bug here.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

13 January 2015, 23:55:09

On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I don't remember seeing _bt_moveright()  or _bt_compare() figuring so
>> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a
>> reference to _bt_binsrch() in either profile.
>
> hm, this is hand compiled now, I bet the symbols are wrong.

In case it isn't clear, I think that the proximate cause here may well
be either one (or both) of commits
efada2b8e920adfdf7418862e939925d2acd1b89 and/or
40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think
that the profile is roughly consistent with that, although I may well
be wrong.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

13 January 2015, 23:57:14

On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Jan 13, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> I don't remember seeing _bt_moveright()  or _bt_compare() figuring so
>>> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a
>>> reference to _bt_binsrch() in either profile.
>>
>> hm, this is hand compiled now, I bet the symbols are wrong.
>
> In case it isn't clear, I think that the proximate cause here may well
> be either one (or both) of commits
> efada2b8e920adfdf7418862e939925d2acd1b89 and/or
> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think
> that the profile is roughly consistent with that, although I may well
> be wrong.

I'm out of time for the day, but I'm fairly confident I can reproduce.
I'll see if I can reverse those commits tomorrow and retest (I'm on
development box).

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Tom Lane

Date:

14 January 2015, 00:05:30

Merlin Moncure <mmoncure@gmail.com> writes:
> On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> In case it isn't clear, I think that the proximate cause here may well
>> be either one (or both) of commits
>> efada2b8e920adfdf7418862e939925d2acd1b89 and/or
>> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think
>> that the profile is roughly consistent with that, although I may well
>> be wrong.

> I'm out of time for the day, but I'm fairly confident I can reproduce.
> I'll see if I can reverse those commits tomorrow and retest (I'm on
> development box).

I'm not convinced that Peter is barking up the right tree.  I'm noticing
that the profiles seem rather skewed towards parser/planner work; so I
suspect the contention is probably on access to system catalogs.  No
idea exactly why though.

It would be good to collect a few stack traces from the hung backends,
rather than relying on perf statistics.
        regards, tom lane

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

14 January 2015, 00:12:47

On 2015-01-13 19:05:10 -0500, Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > On Tue, Jan 13, 2015 at 5:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> >> In case it isn't clear, I think that the proximate cause here may well
> >> be either one (or both) of commits
> >> efada2b8e920adfdf7418862e939925d2acd1b89 and/or
> >> 40dae7ec537c5619fc93ad602c62f37be786d161. Probably the latter. I think
> >> that the profile is roughly consistent with that, although I may well
> >> be wrong.
> 
> > I'm out of time for the day, but I'm fairly confident I can reproduce.
> > I'll see if I can reverse those commits tomorrow and retest (I'm on
> > development box).
> 
> I'm not convinced that Peter is barking up the right tree.  I'm noticing
> that the profiles seem rather skewed towards parser/planner work; so I
> suspect the contention is probably on access to system catalogs.  No
> idea exactly why though.

The plan contains plpgsql and exec_stmt_dynexecute(). So it might just
be executing crazy amounts of dynamic statements.

I'm still wondering if this isn'ta different issue to the first one, the
plans do look different.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 00:23:14

On Tue, Jan 13, 2015 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not convinced that Peter is barking up the right tree.  I'm noticing
> that the profiles seem rather skewed towards parser/planner work; so I
> suspect the contention is probably on access to system catalogs.  No
> idea exactly why though.

I see no int4cmp() calls at all, but plenty of _bt_compare(), and some
FunctionCall2Coll(). And yet, no _bt_binsrch(). I can see btoidcmp(),
so if I'm right then I guess it's a system catalog index.

I too would like to see a stack trace.

Trivia: pg_attribute_relid_attnam_index has deleted B-Tree pages after
a fresh initdb.
-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 00:48:00

On Tue, Jan 13, 2015 at 3:54 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I don't remember seeing _bt_moveright()  or _bt_compare() figuring so
>> prominently, where _bt_binsrch() is nowhere to be seen. I can't see a
>> reference to _bt_binsrch() in either profile.
>
> Well, we do a _bt_moveright pretty early on, so that actually might be
> cache misses we're primarily seing.

Still, I see zero references to _bt_binsrch(). Not one. Even if there
was only one non-meta page (i.e. pre-first-root-split), and regardless
of whether this was a read or a write to the B-Tree, there'd still be
some call there as the B-Tree was initially scanned (before _bt_next()
calls, of which I also see none).

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 01:24:08

On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Some more information what's happening:
> This is a ghetto logical replication engine that migrates data from
> sql sever to postgres, consolidating a sharded database into a single
> set of tables (of which there are only two). There is only one index
> on the destination table, and it's composite int,int in both cases.

Does the logical replication engine perform dynamic DDL at all? Does
it dynamically add columns to the table that everything is being
consolidated to?

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 14:04:07

On Tue, Jan 13, 2015 at 7:24 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Jan 13, 2015 at 3:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Some more information what's happening:
>> This is a ghetto logical replication engine that migrates data from
>> sql sever to postgres, consolidating a sharded database into a single
>> set of tables (of which there are only two). There is only one index
>> on the destination table, and it's composite int,int in both cases.
>
> Does the logical replication engine perform dynamic DDL at all? Does
> it dynamically add columns to the table that everything is being
> consolidated to?

I let the test burn in all night.  It's still exactly where it
was...this is not slow performance, it's stuck.

It works like this:
Dynamically create staging table for inserts/change
Dynamically create staging table for deletes
Dynamically create master table if not present (in this last run, it
was was present)
Dynamically delete from master via delete table
Dynamically delete from master via insert/change table (based on int/int key)
Dynamically insert/select from inserts/change to permanent

There is also some log/tracking information going on in other tables.


Here's a backtrace:

#0  0x0000000000750a97 in spin_delay ()
#1  0x0000000000750b19 in s_lock ()
#2  0x0000000000750844 in LWLockRelease ()
#3  0x0000000000730000 in LockBuffer ()
#4  0x00000000004b2db4 in _bt_relandgetbuf ()
#5  0x00000000004b7116 in _bt_moveright ()
#6  0x00000000004b6e02 in _bt_search ()
#7  0x00000000004b8006 in _bt_first ()
#8  0x00000000004b5aad in btgettuple ()
#9  0x0000000000896656 in FunctionCall2Coll ()
#10 0x00000000004acd55 in index_getnext_tid ()
#11 0x00000000004acfad in index_getnext ()
#12 0x00000000004ac120 in systable_getnext ()
#13 0x0000000000878ce4 in ScanPgRelation ()
#14 0x0000000000879eb1 in RelationBuildDesc ()
#15 0x000000000087bb52 in RelationIdGetRelation ()
#16 0x0000000000495115 in relation_open ()
#17 0x00000000004952f9 in relation_openrv_extended ()
#18 0x000000000049555e in heap_openrv_extended ()
#19 0x000000000057ef0b in parserOpenTable ()
#20 0x000000000056776f in setTargetTable ()
#21 0x000000000053628f in transformDeleteStmt ()
#22 0x000000000053603c in transformStmt ()
#23 0x0000000000535fdb in transformTopLevelStmt ()
#24 0x0000000000535db8 in parse_analyze ()
#25 0x000000000075dd4f in pg_analyze_and_rewrite ()
#26 0x00000000006472a5 in _SPI_execute_plan ()
#27 0x0000000000644551 in SPI_execute ()
#28 0x00007f89b83a8ce7 in exec_stmt_dynexecute () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#29 0x00007f89b83a4c43 in exec_stmt () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#30 0x00007f89b83a495f in exec_stmts () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#31 0x00007f89b83a480a in exec_stmt_block () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#32 0x00007f89b83a2802 in plpgsql_exec_function () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#33 0x00007f89b839cd6a in plpgsql_call_handler () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#34 0x0000000000618805 in ExecMakeTableFunctionResult ()
#35 0x00000000006376ba in FunctionNext ()
#36 0x000000000061f868 in ExecScanFetch ()
#37 0x000000000061f8d7 in ExecScan ()
#38 0x0000000000637a1d in ExecFunctionScan ()
#39 0x0000000000614cfd in ExecProcNode ()
#40 0x0000000000612c32 in ExecutePlan ()
#41 0x00000000006110c3 in standard_ExecutorRun ()
---Type <return> to continue, or q <return> to quit---
#42 0x0000000000610fc5 in ExecutorRun ()
#43 0x0000000000647b02 in _SPI_pquery ()
#44 0x00000000006475e8 in _SPI_execute_plan ()
#45 0x000000000064471d in SPI_execute_plan_with_paramlist ()
#46 0x00007f89b83a85a5 in exec_stmt_execsql () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#47 0x00007f89b83a4c28 in exec_stmt () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#48 0x00007f89b83a495f in exec_stmts () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#49 0x00007f89b83a4429 in exec_stmt_block () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#50 0x00007f89b83a4a5d in exec_stmt () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#51 0x00007f89b83a495f in exec_stmts () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#52 0x00007f89b83a54a5 in exec_stmt_loop () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#53 0x00007f89b83a4aff in exec_stmt () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#54 0x00007f89b83a495f in exec_stmts () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#55 0x00007f89b83a5241 in exec_stmt_if () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#56 0x00007f89b83a4ac9 in exec_stmt () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#57 0x00007f89b83a495f in exec_stmts () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#58 0x00007f89b83a480a in exec_stmt_block () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#59 0x00007f89b83a2802 in plpgsql_exec_function () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#60 0x00007f89b839cd6a in plpgsql_call_handler () from
/home/mmoncure/pg94/lib/postgresql/plpgsql.so
#61 0x0000000000618403 in ExecMakeFunctionResultNoSets ()
#62 0x0000000000618e01 in ExecEvalFunc ()
#63 0x000000000061f0f7 in ExecTargetList ()
#64 0x000000000061f6dd in ExecProject ()
#65 0x0000000000638a03 in ExecResult ()
#66 0x0000000000614c16 in ExecProcNode ()
#67 0x0000000000612c32 in ExecutePlan ()
#68 0x00000000006110c3 in standard_ExecutorRun ()
#69 0x0000000000610fc5 in ExecutorRun ()
#70 0x0000000000764725 in PortalRunSelect ()
#71 0x00000000007643b4 in PortalRun ()
#72 0x000000000075e484 in exec_simple_query ()
#73 0x00000000007626e2 in PostgresMain ()
#74 0x00000000006f463f in BackendRun ()
#75 0x00000000006f3d44 in BackendStartup ()
#76 0x00000000006f075e in ServerLoop ()
#77 0x00000000006efdec in PostmasterMain ()
#78 0x00000000006594ec in main ()

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 14:17:35

On Wed, Jan 14, 2015 at 8:03 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Here's a backtrace:
>
> #0  0x0000000000750a97 in spin_delay ()
> #1  0x0000000000750b19 in s_lock ()
> #2  0x0000000000750844 in LWLockRelease ()
> #3  0x0000000000730000 in LockBuffer ()
> #4  0x00000000004b2db4 in _bt_relandgetbuf ()
> #5  0x00000000004b7116 in _bt_moveright ()
> #6  0x00000000004b6e02 in _bt_search ()
> #7  0x00000000004b8006 in _bt_first ()
> #8  0x00000000004b5aad in btgettuple ()
> #9  0x0000000000896656 in FunctionCall2Coll ()
> #10 0x00000000004acd55 in index_getnext_tid ()
> #11 0x00000000004acfad in index_getnext ()

There were seven process with that backtrace exact backtrace (except
that randomly they are sleeping in the spinloop).   Something else
interesting: autovacuum has been running all night as well.  Unlike
the other process however, cpu utilization does not register on top.
I backtraced them as well.  They are not identical.  One of them looks
like this:

#0  0x00007f89c1b21a97 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000006de073 in PGSemaphoreLock ()
#2  0x000000000074e398 in ProcWaitForSignal ()
#3  0x00000000007302cb in LockBufferForCleanup ()
#4  0x00000000004b699e in btvacuumpage ()
#5  0x00000000004b6627 in btvacuumscan ()
#6  0x00000000004b6389 in btbulkdelete ()
#7  0x00000000008968d7 in FunctionCall4Coll ()
#8  0x00000000004ad196 in index_bulk_delete ()
#9  0x000000000060b9ed in lazy_vacuum_index ()
#10 0x000000000060b019 in lazy_scan_heap ()
#11 0x0000000000609836 in lazy_vacuum_rel ()
#12 0x0000000000609383 in vacuum_rel ()
#13 0x0000000000607e4a in vacuum ()
#14 0x00000000006e2d40 in autovacuum_do_vac_analyze ()
#15 0x00000000006e2127 in do_autovacuum ()
#16 0x00000000006e0f19 in AutoVacWorkerMain ()
#17 0x00000000006e0b06 in StartAutoVacWorker ()
#18 0x00000000006f4fac in StartAutovacuumWorker ()
#19 0x00000000006f4816 in sigusr1_handler ()
#20 <signal handler called>
#21 0x00007f89c1b17dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#22 0x00000000006f0619 in ServerLoop ()
#23 0x00000000006efdec in PostmasterMain ()
#24 0x00000000006594ec in main ()


The other autovacuum process (0 cpu usage) looks like this:
#0  0x00007f89c1b17dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000008d3ac4 in pg_usleep ()
#2  0x0000000000750b69 in s_lock ()
#3  0x0000000000750844 in LWLockRelease ()
#4  0x0000000000730000 in LockBuffer ()
#5  0x00000000004b2db4 in _bt_relandgetbuf ()
#6  0x00000000004b7116 in _bt_moveright ()
#7  0x00000000004b6e02 in _bt_search ()
#8  0x00000000004b8006 in _bt_first ()
#9  0x00000000004b5aad in btgettuple ()
#10 0x0000000000896656 in FunctionCall2Coll ()
#11 0x00000000004acd55 in index_getnext_tid ()
#12 0x00000000004acfad in index_getnext ()
#13 0x00000000004ac120 in systable_getnext ()
#14 0x0000000000878ce4 in ScanPgRelation ()
#15 0x000000000087bcc4 in RelationReloadIndexInfo ()
#16 0x000000000087c1f1 in RelationClearRelation ()
#17 0x000000000087ca37 in RelationCacheInvalidate ()
#18 0x0000000000875fce in InvalidateSystemCaches ()
#19 0x0000000000741d81 in ReceiveSharedInvalidMessages ()
#20 0x0000000000876080 in AcceptInvalidationMessages ()
#21 0x0000000000745ce3 in LockRelationOid ()
#22 0x000000000049510b in relation_open ()
#23 0x0000000000495359 in heap_open ()
#24 0x0000000000608694 in vac_update_relstats ()
#25 0x000000000060bb98 in lazy_cleanup_index ()
#26 0x000000000060b0b6 in lazy_scan_heap ()
#27 0x0000000000609836 in lazy_vacuum_rel ()
#28 0x0000000000609383 in vacuum_rel ()
#29 0x0000000000607e4a in vacuum ()
#30 0x00000000006e2d40 in autovacuum_do_vac_analyze ()
#31 0x00000000006e2127 in do_autovacuum ()
#32 0x00000000006e0f19 in AutoVacWorkerMain ()
#33 0x00000000006e0b06 in StartAutoVacWorker ()
#34 0x00000000006f4fac in StartAutovacuumWorker ()
#35 0x00000000006f4816 in sigusr1_handler ()
#36 <signal handler called>
#37 0x00007f89c1b17dc3 in select () from /lib/x86_64-linux-gnu/libc.so.6
#38 0x00000000006f0619 in ServerLoop ()
#39 0x00000000006efdec in PostmasterMain ()
#40 0x00000000006594ec in main ()

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Tom Lane

Date:

14 January 2015, 14:41:59

Merlin Moncure <mmoncure@gmail.com> writes:
> There were seven process with that backtrace exact backtrace (except
> that randomly they are sleeping in the spinloop).   Something else
> interesting: autovacuum has been running all night as well.  Unlike
> the other process however, cpu utilization does not register on top.
> I backtraced them as well.  They are not identical.  One of them looks
> like this:

What are the autovac processes doing (according to pg_stat_activity)?
        regards, tom lane

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 14:58:36

On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> There were seven process with that backtrace exact backtrace (except
>> that randomly they are sleeping in the spinloop).   Something else
>> interesting: autovacuum has been running all night as well.  Unlike
>> the other process however, cpu utilization does not register on top.
>> I backtraced them as well.  They are not identical.  One of them looks
>> like this:
>
> What are the autovac processes doing (according to pg_stat_activity)?

pid,running,waiting,query
27885,15:40:46.904039,f,autovacuum: VACUUM ANALYZE onesitepmc.propertyguestcard
7209,00:24:44.162684,f,"SELECT n.nspname as ""Schema"", c.relname as ""Name"", CASE c.relkind WHEN 'r' THEN 'table'
WHEN'v' THEN 'view' WHEN 'm'
 
THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN
'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END
as ""Type"", pg_catalog.pg_get_userbyid(c.relowner) as ""Owner""
FROM pg_catalog.pg_class c    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','m','S','f','')     AND n.nspname <> 'pg_catalog'     AND n.nspname <> 'information_schema'
   AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid)
 
ORDER BY 1,2;"
7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
31282,15:28:29.070942,f,SELECT CDSReconcileRunTable(867)
31792,15:27:09.483365,f,SELECT CDSReconcileRunTable(874)
7135,00:26:08.977953,t,vacuum VERBOSE pg_class;
32725,15:24:46.008341,f,SELECT CDSReconcileRunTable(884)
32367,15:25:44.4206,f,SELECT CDSReconcileRunTable(881)
32492,15:25:31.27934,f,SELECT CDSReconcileRunTable(883)
31899,15:26:57.246415,f,SELECT CDSReconcileRunTable(875)
32368,15:25:44.418445,f,SELECT CDSReconcileRunTable(880)
7270,00:21:42.274104,f,"SELECT n.nspname as ""Schema"",c.relname as ""Name"", CASE c.relkind WHEN 'r' THEN 'table' WHEN
'v'THEN 'view' WHEN 'm'
 
THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN
'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END
as ""Type"", pg_catalog.pg_get_userbyid(c.relowner) as ""Owner""
FROM pg_catalog.pg_class c    LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','v','m','S','f','')     AND n.nspname <> 'pg_catalog'     AND n.nspname <> 'information_schema'
   AND n.nspname !~ '^pg_toast' AND pg_catalog.pg_table_is_visible(c.oid)
 
ORDER BY 1,2;"
7403,00:00:00,f,"COPY  ( select pid, now ( ) - query_start as running,
waiting, query from pg_stat_activity ) TO STDOUT csv header ;"

notes:
*) SELECT CDSReconcileRunTable(xxx) are the replication processes.
they are burning cpu.
*) SELECT n.nspname ... is via psql \d.  they stick when invoked and
do not respond to cancel. they are also burning cpu
*) the autovacum processes do not respond to cancel. however, one of
them did earlier (tracing in from lazy_vacuum_index ()) did respond.
However, now, it doesn't (upon being respawned via AV) with the same
backtrace.  The blocked manual vacuum verbose did cancel

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Tom Lane

Date:

14 January 2015, 15:05:13

Merlin Moncure <mmoncure@gmail.com> writes:
> On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What are the autovac processes doing (according to pg_stat_activity)?

> pid,running,waiting,query
> 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class

Hah, I suspected as much.  Is that the one that's stuck in
LockBufferForCleanup, or the other one that's got a similar
backtrace to all the user processes?
        regards, tom lane

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 15:08:10

On Wed, Jan 14, 2015 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> What are the autovac processes doing (according to pg_stat_activity)?
>
>> pid,running,waiting,query
>> 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
>
> Hah, I suspected as much.  Is that the one that's stuck in
> LockBufferForCleanup, or the other one that's got a similar
> backtrace to all the user processes?

Yes, it is pg_class is coming from  LockBufferForCleanup ().  As you
can see above, it has a shorter runtime.   So it was killed off once
about a half hour ago which did not free up the logjam.  However, AV
spawned it again and now it does not respond to cancel.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

14 January 2015, 15:11:56

On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> What are the autovac processes doing (according to pg_stat_activity)?
> 
> > pid,running,waiting,query
> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class

It'd be interesting to know whether that vacuum gets very frequent
semaphore wakeups. Could you strace it for a second or three?

How did this perform < 9.4?  Can you guess how many times these dynamic
statements are planned? How many different relations are accessed in the
dynamically planned queries?

> Hah, I suspected as much.  Is that the one that's stuck in
> LockBufferForCleanup, or the other one that's got a similar backtrace
> to all the user processes?

Do you have a theory? Right now it primarily looks like contention on a
single buffer due to the high number of dynamic statements, possibly
made worse by the signalling between normal pinners and vacuum waiting
for cleanup.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Tom Lane

Date:

14 January 2015, 15:13:42

Merlin Moncure <mmoncure@gmail.com> writes:
> Yes, it is pg_class is coming from  LockBufferForCleanup ().  As you
> can see above, it has a shorter runtime.   So it was killed off once
> about a half hour ago which did not free up the logjam.  However, AV
> spawned it again and now it does not respond to cancel.

Interesting.  That seems like there might be two separate issues at
play.  It's plausible that LockBufferForCleanup might be interfering
with other attempts to scan the index, but then why wouldn't killing
the AV have unstuck things?

Anyway it's now seeming that Peter may have the right idea about
where to look.
        regards, tom lane

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

14 January 2015, 15:17:41

On 2015-01-14 10:13:32 -0500, Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > Yes, it is pg_class is coming from  LockBufferForCleanup ().  As you
> > can see above, it has a shorter runtime.   So it was killed off once
> > about a half hour ago which did not free up the logjam.  However, AV
> > spawned it again and now it does not respond to cancel.
> 
> Interesting.  That seems like there might be two separate issues at
> play.  It's plausible that LockBufferForCleanup might be interfering
> with other attempts to scan the index, but then why wouldn't killing
> the AV have unstuck things?

LockBufferForCleanup() unfortunately isn't interruptible. I've every now
and then seen vacuums being stuck for a long while, trying to acquire
cleanup locks - IIRC I complained about that on list even.  So autovac
will only be cancelled when going to the next page.  And even if the
page ever gets a zero (well, one) refcount, by the time autovac is woken
up via the semaphore, it'll often end up being used again.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Tom Lane

Date:

14 January 2015, 15:17:54

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
>> Hah, I suspected as much.  Is that the one that's stuck in
>> LockBufferForCleanup, or the other one that's got a similar backtrace
>> to all the user processes?

> Do you have a theory? Right now it primarily looks like contention on a
> single buffer due to the high number of dynamic statements, possibly
> made worse by the signalling between normal pinners and vacuum waiting
> for cleanup.

The readers shouldn't be contending with each other.  It's certainly
plausible enough that there's a steady stream of readers that would
prevent LockBufferForCleanup from getting in, but that should not
result in any readers getting blocked.  So something's gotten broken
in that area.
        regards, tom lane

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 15:22:52

On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
>> Merlin Moncure <mmoncure@gmail.com> writes:
>> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> What are the autovac processes doing (according to pg_stat_activity)?
>>
>> > pid,running,waiting,query
>> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
>
> It'd be interesting to know whether that vacuum gets very frequent
> semaphore wakeups. Could you strace it for a second or three?

for 30 seconds+ it just looks like this:
mmoncure@mernix2 ~ $ sudo strace -p 7105
Process 7105 attached
semop(5701638, {{4, -1, 0}}, 1

all of other processes are yielding out of the spinlock, for example:
select(0, NULL, NULL, NULL, {0, 1408})  = 0 (Timeout)

> How did this perform < 9.4?
this is a new project. However, I can run it vs earlier version.

Can you guess how many times these dynamic
> statements are planned? How many different relations are accessed in the
> dynamically planned queries?

only once or twice, and only a couple of tables.     This is an
operation that should only take few seconds (inserting a few 10s of
thousands of rows), that has blocked for many hours now.  Usually it
runs through taking a few seconds.  This is either a deadlock or a
deadlock emulating sequence of operations.

I'll try to pull commits that Peter suggested and see if that helps
(I'm getting ready to bring the database down).  I can send the code
off-list if you guys think it'd help.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

14 January 2015, 15:30:34

On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote:
> On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
> >> Merlin Moncure <mmoncure@gmail.com> writes:
> >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> >> What are the autovac processes doing (according to pg_stat_activity)?
> >>
> >> > pid,running,waiting,query
> >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
> >
> > It'd be interesting to know whether that vacuum gets very frequent
> > semaphore wakeups. Could you strace it for a second or three?
> 
> for 30 seconds+ it just looks like this:
> mmoncure@mernix2 ~ $ sudo strace -p 7105
> Process 7105 attached
> semop(5701638, {{4, -1, 0}}, 1

Ok. So that explains why it's not interruptible.

> all of other processes are yielding out of the spinlock, for example:
> select(0, NULL, NULL, NULL, {0, 1408})  = 0 (Timeout)

Note the above isn't the spinlock, it's the process's semaphore. It'll
only get set if the refcount ever indicates that nobody but autovac is
holding the lock.

> > How did this perform < 9.4?
> this is a new project. However, I can run it vs earlier version.
> 
> Can you guess how many times these dynamic
> > statements are planned? How many different relations are accessed in the
> > dynamically planned queries?
> 
> only once or twice, and only a couple of tables.

Hm. Odd. The first -g profile seemed to indicate a hell of a lot time
was spent in LWLockRelease() - indicating that there's actually
progress. Later profiles/backtraces were less clear.

If you gdb in, and type 'fin' a couple times, to wait till the function
finishes, is there actually any progress? I'm wondering whether it's
just many catalog accesses + contention, or some other
problem. Alternatively set a breakpoint on ScanPgRelation() or so and
see how often it's hit.

> I can send the code off-list if you guys think it'd help.

Might be interesting.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 15:47:25

On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> If you gdb in, and type 'fin' a couple times, to wait till the function
> finishes, is there actually any progress? I'm wondering whether it's
> just many catalog accesses + contention, or some other
> problem. Alternatively set a breakpoint on ScanPgRelation() or so and
> see how often it's hit.

well, i restarted the database, forgetting my looper was running which
immediately spun up and it got stuck again with a similar profile
(lots of cpu in spinlock):

Samples: 3K of event 'cycles', Event count (approx.): 269572322831.16%  postgres            [.] s_lock22.32%  postgres
         [.] tas12.13%  postgres            [.] tas 5.93%  postgres            [.] spin_delay 5.69%  postgres
[.] LWLockRelease 3.75%  postgres            [.] LWLockAcquireCommon 3.61%  perf                [.] 0x00000000000526c4
2.51% postgres            [.] FunctionCall2Coll 1.48%  libc-2.17.so        [.] 0x000000000016a132
 

> If you gdb in, and type 'fin' a couple times,
(gdb) fin
Run till exit from #0  0x00007ff4c63f7a97 in semop () from
/lib/x86_64-linux-gnu/libc.so.6
0x00000000006de073 in PGSemaphoreLock ()
(gdb) fin
Run till exit from #0  0x00000000006de073 in PGSemaphoreLock ()

It returned once.  Second time, it didn't at least so far (minute or so).

>> I can send the code off-list if you guys think it'd help.
>
> Might be interesting.

sent (off-list).

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

14 January 2015, 15:50:17

On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote:
> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > If you gdb in, and type 'fin' a couple times, to wait till the function
> > finishes, is there actually any progress? I'm wondering whether it's
> > just many catalog accesses + contention, or some other
> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and
> > see how often it's hit.
> 
> well, i restarted the database, forgetting my looper was running which
> immediately spun up and it got stuck again with a similar profile
> (lots of cpu in spinlock):
> 
> Samples: 3K of event 'cycles', Event count (approx.): 2695723228
>  31.16%  postgres            [.] s_lock
>  22.32%  postgres            [.] tas
>  12.13%  postgres            [.] tas
>   5.93%  postgres            [.] spin_delay
>   5.69%  postgres            [.] LWLockRelease
>   3.75%  postgres            [.] LWLockAcquireCommon
>   3.61%  perf                [.] 0x00000000000526c4
>   2.51%  postgres            [.] FunctionCall2Coll
>   1.48%  libc-2.17.so        [.] 0x000000000016a132
> 
> > If you gdb in, and type 'fin' a couple times,
> (gdb) fin
> Run till exit from #0  0x00007ff4c63f7a97 in semop () from
> /lib/x86_64-linux-gnu/libc.so.6
> 0x00000000006de073 in PGSemaphoreLock ()
> (gdb) fin
> Run till exit from #0  0x00000000006de073 in PGSemaphoreLock ()
> 
> It returned once.  Second time, it didn't at least so far (minute or so).

Hm, that's autovac though, not the normal user backends that actually do
stuff, right? If you could additionally check those, it'd be great.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 17:42:37

On Wed, Jan 14, 2015 at 7:22 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I'll try to pull commits that Peter suggested and see if that helps
> (I'm getting ready to bring the database down).  I can send the code
> off-list if you guys think it'd help.

Thanks for the code!

I think it would be interesting to see if the bug reproduces without
the page split commit (40dae7ec537c5619fc93ad602c62f37be786d161) first
(which was applied afterwards). Then, without the page deletion commit
(efada2b8e920adfdf7418862e939925d2acd1b89). But without the page split
commit seems most interesting at the moment. It would be easier to
diagnose the issue if the bug still reproduces without that commit -
that way, we can be reasonably confident that there are no confounding
factors from the new page split code.

There were some bugfixes to those two during the 9.4 beta cycle, too.
It might be a bit tricky to generate a 9.4 that lacks one or both of
those commits.

Bugfix commits in reverse chronological order:

c73669c0e0168923e3f9e787beec980f55af2bd8 (deletion)

c91a9b5a285e20e54cf90f3660ce51ce3a5c2ef4 (incomplete split)

4fafc4ecd9e4d224d92c4a8549c5646860787a5d (deletion)

4a5d55ec2b711e13438a32d119a809a22ced410b (incomplete split)

77fe2b6d795f3f4ed282c9c980920e128a57624e (deletion)

7d98054f0dd115f57ad0ec1f424a66c13459013b (deletion)

954523cdfe229f1cb99a43a19e291a557ae2822d (incomplete split)

I think that's all of them (apart from the original commits, of course).

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 19:49:34

On Wed, Jan 14, 2015 at 9:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-14 09:47:19 -0600, Merlin Moncure wrote:
>> On Wed, Jan 14, 2015 at 9:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > If you gdb in, and type 'fin' a couple times, to wait till the function
>> > finishes, is there actually any progress? I'm wondering whether it's
>> > just many catalog accesses + contention, or some other
>> > problem. Alternatively set a breakpoint on ScanPgRelation() or so and
>> > see how often it's hit.
>>
>> well, i restarted the database, forgetting my looper was running which
>> immediately spun up and it got stuck again with a similar profile
>> (lots of cpu in spinlock):
>>
>> Samples: 3K of event 'cycles', Event count (approx.): 2695723228
>>  31.16%  postgres            [.] s_lock
>>  22.32%  postgres            [.] tas
>>  12.13%  postgres            [.] tas
>>   5.93%  postgres            [.] spin_delay
>>   5.69%  postgres            [.] LWLockRelease
>>   3.75%  postgres            [.] LWLockAcquireCommon
>>   3.61%  perf                [.] 0x00000000000526c4
>>   2.51%  postgres            [.] FunctionCall2Coll
>>   1.48%  libc-2.17.so        [.] 0x000000000016a132
>>
>> > If you gdb in, and type 'fin' a couple times,
>> (gdb) fin
>> Run till exit from #0  0x00007ff4c63f7a97 in semop () from
>> /lib/x86_64-linux-gnu/libc.so.6
>> 0x00000000006de073 in PGSemaphoreLock ()
>> (gdb) fin
>> Run till exit from #0  0x00000000006de073 in PGSemaphoreLock ()
>>
>> It returned once.  Second time, it didn't at least so far (minute or so).
>
> Hm, that's autovac though, not the normal user backends that actually do
> stuff, right? If you could additionally check those, it'd be great.


got it to reproduce.  had to take a break to get some work done.


(gdb) fin
Run till exit from #0  0x00007f7b07099dc3 in select ()  from /lib/x86_64-linux-gnu/libc.so.6
0x00000000008d3ac4 in pg_usleep ()
(gdb) fin
Run till exit from #0  0x00000000008d3ac4 in pg_usleep ()
0x0000000000750b69 in s_lock ()
(gdb) fin
Run till exit from #0  0x0000000000750b69 in s_lock ()
0x0000000000750844 in LWLockRelease ()
(gdb) fin
Run till exit from #0  0x0000000000750844 in LWLockRelease ()
0x0000000000730000 in LockBuffer ()
(gdb) fin
Run till exit from #0  0x0000000000730000 in LockBuffer ()
0x00000000004b2db4 in _bt_relandgetbuf ()
(gdb) fin
Run till exit from #0  0x00000000004b2db4 in _bt_relandgetbuf ()
0x00000000004b7116 in _bt_moveright ()
(gdb) fin
Run till exit from #0  0x00000000004b7116 in _bt_moveright ()
....


so it looks like nobody ever exits from _bt_moveright.   any last
requests before I start bisecting down?

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 20:24:15

On Wed, Jan 14, 2015 at 11:49 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> so it looks like nobody ever exits from _bt_moveright.   any last
> requests before I start bisecting down?

Could you write some code to print out the block number (i.e.
"BlockNumber blkno") if there are more than, say, 5 retries within
_bt_moveright()? You'll also want to print out the name of the index
(so rel->rd_rel->relname.data). That'll probably only trip when this
happens (5 retries are very unusual). Then, use contrib/pageinspect to
dump that block's contents (I think that'll work, since no one sits on
a buffer lock here).

I'd like to know what index we're dealing with here. I wonder if it's
"pg_attribute_relid_attnam_index".

So: call "bt_page_stats('pg_whatever', n)", n being the relevant block
number. Try and do the same with the right link (which bt_page_stats()
will show). Then try and call bt_page_items() against one or both of
those blocks/pages. I'd like to see what that shows. I'm wondering if
some invariant has been violated.

Which _bt_moveright() caller are we talking about here, BTW? I guess
it's just the _bt_first()/ _bt_search() one. That might vary, of
course.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 20:33:08

On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Could you write some code to print out the block number (i.e.
> "BlockNumber blkno") if there are more than, say, 5 retries within
> _bt_moveright()?

Obviously I mean that the block number should be printed, no matter
whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you
should just move this to the top of the for(;;) loop, so it's always
available to print:

BlockNumber blkno = BufferGetBlockNumber(buf);

Thanks
-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

14 January 2015, 23:38:53

On Wed, Jan 14, 2015 at 2:32 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Jan 14, 2015 at 12:24 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Could you write some code to print out the block number (i.e.
>> "BlockNumber blkno") if there are more than, say, 5 retries within
>> _bt_moveright()?
>
> Obviously I mean that the block number should be printed, no matter
> whether or not the P_INCOMPLETE_SPLIT() path is taken or not. So you
> should just move this to the top of the for(;;) loop, so it's always
> available to print:
>
> BlockNumber blkno = BufferGetBlockNumber(buf);

(gdb) print  BufferGetBlockNumber(buf)
$15 = 9

..and it stays 9, continuing several times having set breakpoint.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

14 January 2015, 23:39:50

On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> (gdb) print  BufferGetBlockNumber(buf)
> $15 = 9
>
> ..and it stays 9, continuing several times having set breakpoint.


And the index involved? I'm pretty sure that this in an internal page, no?

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 00:26:14

On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> (gdb) print  BufferGetBlockNumber(buf)
>> $15 = 9
>>
>> ..and it stays 9, continuing several times having set breakpoint.
>
>
> And the index involved? I'm pretty sure that this in an internal page, no?

The index is the oid index on pg_class.  Some more info:

*) temp table churn is fairly high.  Several dozen get spawned and
destroted at the start of a replication run, all at once, due to some
dodgy coding via dblink.  During the replication run, the temp table
churn rate drops.

*)  running btreecheck, I see:
cds2=# select bt_index_verify('pg_class_oid_index');
NOTICE:  page 7 of index "pg_class_oid_index" is deleted
NOTICE:  page 10 of index "pg_class_oid_index" is deleted
NOTICE:  page 12 of index "pg_class_oid_index" is deletedbt_index_verify
─────────────────


cds2=# select bt_leftright_verify('pg_class_oid_index');
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 2, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
WARNING:  left link/right link pair don't comport at level 0, block 9,
last: 9, current left: 4
[repeat infinity until cancel]

which looks like the index is corrupted?  ISTM _bt_moveright is
hanging because it's trying to move from block 9 to block 9 and so
loops forever.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 00:35:03

On Wed, Jan 14, 2015 at 4:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> The index is the oid index on pg_class.  Some more info:
>
> *) temp table churn is fairly high.  Several dozen get spawned and
> destroted at the start of a replication run, all at once, due to some
> dodgy coding via dblink.  During the replication run, the temp table
> churn rate drops.
>
> *)  running btreecheck, I see:

I knew I wrote that tool for a reason.   :-)

It would be great if I could get a raw dump of the index using
contrib/pageinspect at this juncture too, as already described.
-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 00:36:48

On Wed, Jan 14, 2015 at 6:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Jan 14, 2015 at 5:39 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Wed, Jan 14, 2015 at 3:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> (gdb) print  BufferGetBlockNumber(buf)
>>> $15 = 9
>>>
>>> ..and it stays 9, continuing several times having set breakpoint.
>>
>>
>> And the index involved? I'm pretty sure that this in an internal page, no?
>
> The index is the oid index on pg_class.  Some more info:
>
> *) temp table churn is fairly high.  Several dozen get spawned and
> destroted at the start of a replication run, all at once, due to some
> dodgy coding via dblink.  During the replication run, the temp table
> churn rate drops.
>
> *)  running btreecheck, I see:
> cds2=# select bt_index_verify('pg_class_oid_index');
> NOTICE:  page 7 of index "pg_class_oid_index" is deleted
> NOTICE:  page 10 of index "pg_class_oid_index" is deleted
> NOTICE:  page 12 of index "pg_class_oid_index" is deleted
>  bt_index_verify
> ─────────────────
>
>
> cds2=# select bt_leftright_verify('pg_class_oid_index');
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 2, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> WARNING:  left link/right link pair don't comport at level 0, block 9,
> last: 9, current left: 4
> [repeat infinity until cancel]
>
> which looks like the index is corrupted?  ISTM _bt_moveright is
> hanging because it's trying to move from block 9 to block 9 and so
> loops forever.

per Peter the following might be useful:

cds2=# select * from bt_metap('pg_class_oid_index');
 magic  │ version │ root │ level │ fastroot │ fastlevel
────────┼─────────┼──────┼───────┼──────────┼───────────
 340322 │       2 │    3 │     1 │        3 │         1

cds2=# select (bt_page_stats('pg_class_oid_index', s)).* from
generate_series(1,12) s;
 blkno │ type │ live_items │ dead_items │ avg_item_size │ page_size │
free_size │ btpo_prev │ btpo_next │ btpo  │ btpo_flags

───────┼──────┼────────────┼────────────┼───────────────┼───────────┼───────────┼───────────┼───────────┼───────┼────────────
     1 │ l    │        119 │          0 │            16 │      8192 │
    5768 │         0 │         4 │     0 │          1
     2 │ l    │         25 │          0 │            16 │      8192 │
    7648 │         4 │         9 │     0 │          1
     3 │ r    │          8 │          0 │            15 │      8192 │
    7996 │         0 │         0 │     1 │          2
     4 │ l    │        178 │          0 │            16 │      8192 │
    4588 │         1 │         2 │     0 │          1
     5 │ l    │          7 │          0 │            16 │      8192 │
    8008 │         9 │        11 │     0 │          1
     6 │ l    │          5 │          0 │            16 │      8192 │
    8048 │        11 │         8 │     0 │          1
     7 │ d    │          0 │          0 │             0 │      8192 │
       0 │        -1 │        -1 │ 12366 │          0
     8 │ l    │        187 │          0 │            16 │      8192 │
    4408 │         6 │         0 │     0 │          1
     9 │ l    │         25 │          0 │            16 │      8192 │
    7648 │         4 │         9 │     0 │          1
    10 │ d    │          0 │          0 │             0 │      8192 │
       0 │        -1 │        -1 │ 12366 │          0
    11 │ l    │          6 │          0 │            16 │      8192 │
    8028 │         5 │         6 │     0 │          1
    12 │ d    │          0 │          0 │             0 │      8192 │
       0 │        -1 │        -1 │ 10731 │          0


merlin

Attachment

page_items.log

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 00:50:19

This is great, but it's not exactly clear which bt_page_items() page
is which - some are skipped, but I can't be sure which. Would you mind
rewriting that query to indicate which block is under consideration by
bt_page_items()?

Thanks
-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 00:53:49

On Wed, Jan 14, 2015 at 6:50 PM, Peter Geoghegan <pg@heroku.com> wrote:
> This is great, but it's not exactly clear which bt_page_items() page
> is which - some are skipped, but I can't be sure which. Would you mind
> rewriting that query to indicate which block is under consideration by
> bt_page_items()?

yeah.  via:
cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index',
s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv
header;

merlin

Attachment

page_items.csv

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 01:23:48

On Wed, Jan 14, 2015 at 4:53 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> yeah.  via:
> cds2=# \copy (select s as page, (bt_page_items('pg_class_oid_index',
> s)).* from generate_series(1,12) s) to '/tmp/page_items.csv' csv
> header;

My immediate observation here is that blocks 2 and 9 have identical
metadata (from their page opaque area), but partially non-matching
data items (however, the number of items on each block is consistent
and correct according to that metadata, as far as it goes). I think
block 9 is supposed to have a right-link to block 5 (block 5 seems to
think so, at least -- its left link is 9).

So now the question is: how did that inconsistency arise? It didn't
necessarily arise at the time of the (presumed) split of block 2 to
create 9. It could be that the opaque area was changed by something
else, some time later. I'll investigate more.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 02:44:22

On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My immediate observation here is that blocks 2 and 9 have identical
> metadata (from their page opaque area), but partially non-matching
> data items (however, the number of items on each block is consistent
> and correct according to that metadata, as far as it goes). I think
> block 9 is supposed to have a right-link to block 5 (block 5 seems to
> think so, at least -- its left link is 9).

For the convenience of others, here is a version with a normalized
data column added (interpreting the datums as little-endian Oids).

--
Peter Geoghegan

Attachment

output.csv

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 04:50:17

On Wed, Jan 14, 2015 at 5:23 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My immediate observation here is that blocks 2 and 9 have identical
> metadata (from their page opaque area), but partially non-matching
> data items (however, the number of items on each block is consistent
> and correct according to that metadata, as far as it goes). I think
> block 9 is supposed to have a right-link to block 5 (block 5 seems to
> think so, at least -- its left link is 9).

I am mistaken on one detail here - blocks 2 and 9 are actually fully
identical. I still have no idea why, though.
-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 08:40:38

On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I am mistaken on one detail here - blocks 2 and 9 are actually fully
> identical. I still have no idea why, though.

So, I've looked at it in more detail and it appears that the page of
block 2 split at some point, thereby creating a new page (the block 9
page). There is a sane downlink in the root page for the new rightlink
page. The root page looks totally sane, as does every other page - as
I said, the problem is only that block 9 is spuriously identical to
block 2. So the (correct) downlink in the root page, to block 9, is
the same as the (incorrect) high key value in block 9 - Oid value
69924. To be clear: AFAICT everything is perfect except block 9, which
is bizarrely identical to block 2.

Now, since the sane page downlink located in the root (like every
downlink, a lower bound on items in its child) is actually a copy of
the high key on the page that is the child's left link (that is to
say, it comes from the original target of a page split - it shares the
target's high key value, Oid value 69924), there may have never been
sane data in block 9, even though its downlink is sane (so maybe the
page split patch is implicated). But it's hard to see how that could
be true. The relevant code wasn't really what was changed about page
splits in 9.4 anyway (plus this wasn't a non-leaf split, since there
aren't enough pages for those to be a factor). There just isn't that
many items on page 2 (or its bizarre identical twin, page 9), so a
recent split seems unlikely. And, the target and new right page are
locked together throughout both the split and down link insertion
(even though there are two atomic operations/WAL inserts). So to
reiterate, a close by page split that explains the problem seems
unlikely.

I'm going to focus on the page deletion patch for the time being.
Merlin - it would be great if you could revert all the page split
commits (which came after the page deletion fix). All the follow-up
page split commits [1] were fairly straightforward bugs with recovery,
so it should be easy enough to totally remove the page split stuff
from 9.4 for the purposes of isolating the bug.

[1] http://www.postgresql.org/message-id/CAM3SWZSpJ6M9HfHksjUiUof30aUWXyyb56FJbW1_doGQKbEO+g@mail.gmail.com
--
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Heikki Linnakangas

Date:

15 January 2015, 12:05:13

On 01/15/2015 03:23 AM, Peter Geoghegan wrote:
> So now the question is: how did that inconsistency arise? It didn't
> necessarily arise at the time of the (presumed) split of block 2 to
> create 9. It could be that the opaque area was changed by something
> else, some time later. I'll investigate more.

Merlin, could you re-run the test with a WAL archive (if you don't have 
one already), and then run pg_xlogdump, filtering it to show only the 
changes to the index? That should show us how the index got to be the 
way it is. Also, if you could post a copy of the raw relation file for 
pg_class_oid_index; I assume it's not too large.

Something like:

pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917

11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd 
cluster. In case it's not the same on your system, you can use oid2name 
to find it out.

- Heikki

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 14:02:39

On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 01/15/2015 03:23 AM, Peter Geoghegan wrote:
>>
>> So now the question is: how did that inconsistency arise? It didn't
>> necessarily arise at the time of the (presumed) split of block 2 to
>> create 9. It could be that the opaque area was changed by something
>> else, some time later. I'll investigate more.
>
>
> Merlin, could you re-run the test with a WAL archive (if you don't have one
> already), and then run pg_xlogdump, filtering it to show only the changes to
> the index? That should show us how the index got to be the way it is. Also,
> if you could post a copy of the raw relation file for pg_class_oid_index; I
> assume it's not too large.
>
> Something like:
>
> pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917
>
> 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd
> cluster. In case it's not the same on your system, you can use oid2name to
> find it out.

I'm on it.  Will try this first, then patch removal.

Question: Coming in this morning I did an immediate restart and logged
into the database and queried pg_class via index.   Everything was
fine, and the leftright verify returns nothing.  How did it repair
itself without a reindex?

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 18:47:12

On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Question: Coming in this morning I did an immediate restart and logged
> into the database and queried pg_class via index.   Everything was
> fine, and the leftright verify returns nothing.  How did it repair
> itself without a reindex?

Maybe the relevant WAL records are more or less correct, and recovery
restores the database to its correct state, as opposed to the state it
was actually in at the time of the crash due to a race condition or
whatever. Could you possibly send bt_page_items() against block 9 of
the index after recovery? It probably looks totally sane, but it would
be nice to be sure, if you happen to still have the data.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 18:53:36

On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 01/15/2015 03:23 AM, Peter Geoghegan wrote:
>>>
>>> So now the question is: how did that inconsistency arise? It didn't
>>> necessarily arise at the time of the (presumed) split of block 2 to
>>> create 9. It could be that the opaque area was changed by something
>>> else, some time later. I'll investigate more.
>>
>>
>> Merlin, could you re-run the test with a WAL archive (if you don't have one
>> already), and then run pg_xlogdump, filtering it to show only the changes to
>> the index? That should show us how the index got to be the way it is. Also,
>> if you could post a copy of the raw relation file for pg_class_oid_index; I
>> assume it's not too large.
>>
>> Something like:
>>
>> pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917
>>
>> 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd
>> cluster. In case it's not the same on your system, you can use oid2name to
>> find it out.
>
> I'm on it.  Will try this first, then patch removal.
>
> Question: Coming in this morning I did an immediate restart and logged
> into the database and queried pg_class via index.   Everything was
> fine, and the leftright verify returns nothing.  How did it repair
> itself without a reindex?

The plot thickens!  I looped the test, still stock 9.4 as of this time
and went to lunch. When I came back, the database was in recovery
mode.  Here is the rough sequence of events.

1) REINDEXED pg_class (wanted clean slate for full reproduction)
2) before too long (unfortunately did not configure timestamps in the
log) starting seeing:

ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
expected 1 at character 8
QUERY:  UPDATE CDSRunTable SET         Finished = clock_timestamp(),         DidSucceed = _DidSucceed,
ErrorMessage= _ErrorMessage       WHERE CDSRunTableId = _CdsRunTableId
 
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
SQL statement
STATEMENT:  SELECT CDSReconcileRunTable(2020)

..and again with a FATAL
FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
CONTEXT:  SQL statement "UPDATE CDSRunTable SET         Finished = clock_timestamp(),         DidSucceed = _DidSucceed,
       ErrorMessage = _ErrorMessage       WHERE CDSRunTableId = _CdsRunTableId"   PL/pgSQL function
cdsreconcileruntable(bigint)line 82 at SQL statement
 


3) shortly (?) after that, I got:

WARNING:  did not find subXID 14955 in MyProc
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
WARNING:  you don't own a lock of type RowExclusiveLock
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
LOG:  could not send data to client: Broken pipe
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
STATEMENT:  SELECT CDSReconcileRunTable(2151)
WARNING:  ReleaseLockIfHeld: failed??
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
ERROR:  failed to re-find shared proclock object
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
STATEMENT:  SELECT CDSReconcileRunTable(2151)
WARNING:  AbortSubTransaction while in ABORT state
WARNING:  did not find subXID 14955 in MyProc
WARNING:  you don't own a lock of type AccessShareLock
WARNING:  ReleaseLockIfHeld: failed??
ERROR:  failed to re-find shared proclock object
WARNING:  AbortSubTransaction while in ABORT state
WARNING:  did not find subXID 14955 in MyProc
WARNING:  you don't own a lock of type AccessShareLock
WARNING:  ReleaseLockIfHeld: failed??
WARNING:  you don't own a lock of type ShareLock
TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> 0)", File: "lock.c", Line: 1240)
LOG:  server process (PID 10117) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

4) while I was still at lunch, the database started up again, and
quickly fell over:
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2268)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2270)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2272)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2273)
TRAP: FailedAssertion("!(( ((void) ((bool) ((! assert_enabled) || !
(!(((bool) (((const void*)(&htup->t_ctid) != ((void *)0)) &&
((&htup->t_ctid)->ip_posid != 0)
LOG:  server process (PID 30520) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

this repeated a very large number of times.   Occasionally, but
infrequently, I'd see during the database startup something like:
LOG:  unexpected pageaddr 50/1F2D8000 in log segment
0000000100000051000000E6, offset 2981888
during startup. the address would change sometimes:
LOG:  unexpected pageaddr 50/1F2DA000 in log segment
0000000100000051000000E6, offset 2990080

Basically, what is happening here is that any attempt to connect to
cds database fails with assertion (I can connect to postgres ok).

On Thu, Jan 15, 2015 at 12:47 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Maybe the relevant WAL records are more or less correct, and recovery
> restores the database to its correct state, as opposed to the state it
> was actually in at the time of the crash due to a race condition or
> whatever. Could you possibly send bt_page_items() against block 9 of
> the index after recovery? It probably looks totally sane, but it would
> be nice to be sure, if you happen to still have the data.

Too late: I reindexed it.  going back and reviewing the log however, I
did notice:
ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
expected 1 at character 8

in the log before the reindex was thrown.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

15 January 2015, 19:16:00

Hi,

> The plot thickens!  I looped the test, still stock 9.4 as of this time
> and went to lunch. When I came back, the database was in recovery
> mode.  Here is the rough sequence of events.
>

Whoa. That looks scary. Did you see (some of) those errors before? Most
of them should have been emitted independently of being built with
assertions.

> 1) REINDEXED pg_class (wanted clean slate for full reproduction)
> 2) before too long (unfortunately did not configure timestamps in the
> log) starting seeing:
> 
> ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
> expected 1 at character 8
> QUERY:  UPDATE CDSRunTable SET
>           Finished = clock_timestamp(),
>           DidSucceed = _DidSucceed,
>           ErrorMessage = _ErrorMessage
>         WHERE CDSRunTableId = _CdsRunTableId
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
> SQL statement
> STATEMENT:  SELECT CDSReconcileRunTable(2020)
> 
> ..and again with a FATAL
> FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
> ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
> CONTEXT:  SQL statement "UPDATE CDSRunTable SET
>           Finished = clock_timestamp(),
>           DidSucceed = _DidSucceed,
>           ErrorMessage = _ErrorMessage
>         WHERE CDSRunTableId = _CdsRunTableId"
>     PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement

Just to be absolutely clear, those happened *before* the following
errors? And there were no 'during exception cleanup' like errors before?

> 3) shortly (?) after that, I got:
> 
> WARNING:  did not find subXID 14955 in MyProc
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> WARNING:  you don't own a lock of type RowExclusiveLock
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> LOG:  could not send data to client: Broken pipe
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT:  SELECT CDSReconcileRunTable(2151)
> WARNING:  ReleaseLockIfHeld: failed??
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> ERROR:  failed to re-find shared proclock object
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT:  SELECT CDSReconcileRunTable(2151)
> WARNING:  AbortSubTransaction while in ABORT state
> WARNING:  did not find subXID 14955 in MyProc
> WARNING:  you don't own a lock of type AccessShareLock
> WARNING:  ReleaseLockIfHeld: failed??
> ERROR:  failed to re-find shared proclock object
> WARNING:  AbortSubTransaction while in ABORT state
> WARNING:  did not find subXID 14955 in MyProc
> WARNING:  you don't own a lock of type AccessShareLock
> WARNING:  ReleaseLockIfHeld: failed??
> WARNING:  you don't own a lock of type ShareLock
> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> > 0)", File: "lock.c", Line: 1240)
> LOG:  server process (PID 10117) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes

Ick.

Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
entries before? It's hard to know from here, but the 'during exception
cleanup' indicates a problem in abort handling.  Were there any deadlock
detected errors closeby?

You're catching deadlock errors in a subtransaction. Hm.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

15 January 2015, 19:27:50

On 2015-01-15 20:15:42 +0100, Andres Freund wrote:
> > WARNING:  did not find subXID 14955 in MyProc
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > WARNING:  you don't own a lock of type RowExclusiveLock
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > LOG:  could not send data to client: Broken pipe
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > STATEMENT:  SELECT CDSReconcileRunTable(2151)
> > WARNING:  ReleaseLockIfHeld: failed??
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > ERROR:  failed to re-find shared proclock object
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > STATEMENT:  SELECT CDSReconcileRunTable(2151)
> > WARNING:  AbortSubTransaction while in ABORT state
> > WARNING:  did not find subXID 14955 in MyProc
> > WARNING:  you don't own a lock of type AccessShareLock
> > WARNING:  ReleaseLockIfHeld: failed??
> > ERROR:  failed to re-find shared proclock object
> > WARNING:  AbortSubTransaction while in ABORT state
> > WARNING:  did not find subXID 14955 in MyProc
> > WARNING:  you don't own a lock of type AccessShareLock
> > WARNING:  ReleaseLockIfHeld: failed??
> > WARNING:  you don't own a lock of type ShareLock
> > TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> > > 0)", File: "lock.c", Line: 1240)
> > LOG:  server process (PID 10117) was terminated by signal 6: Aborted
> > LOG:  terminating any other active server processes
> 
> Ick.
> 
> Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
> entries before? It's hard to know from here, but the 'during exception
> cleanup' indicates a problem in abort handling.  Were there any deadlock
> detected errors closeby?

Alternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If
so, what was the cause?

> You're catching deadlock errors in a subtransaction. Hm.

A couple questions:
* Do you also use lock_timeout/statement_timeout? If so, what are their settings + deadlock_timeout?
* were any processes killed at that time?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 19:32:48

On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
>> The plot thickens!  I looped the test, still stock 9.4 as of this time
>> and went to lunch. When I came back, the database was in recovery
>> mode.  Here is the rough sequence of events.
>>
>
> Whoa. That looks scary. Did you see (some of) those errors before? Most
> of them should have been emitted independently of being built with
> assertions.

Nope.

>> 1) REINDEXED pg_class (wanted clean slate for full reproduction)
>> 2) before too long (unfortunately did not configure timestamps in the
>> log) starting seeing:
>>
>> ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
>> expected 1 at character 8
>> QUERY:  UPDATE CDSRunTable SET
>>           Finished = clock_timestamp(),
>>           DidSucceed = _DidSucceed,
>>           ErrorMessage = _ErrorMessage
>>         WHERE CDSRunTableId = _CdsRunTableId
>> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
>> SQL statement
>> STATEMENT:  SELECT CDSReconcileRunTable(2020)
>>
>> ..and again with a FATAL
>> FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
>> ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
>> CONTEXT:  SQL statement "UPDATE CDSRunTable SET
>>           Finished = clock_timestamp(),
>>           DidSucceed = _DidSucceed,
>>           ErrorMessage = _ErrorMessage
>>         WHERE CDSRunTableId = _CdsRunTableId"
>>     PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement
>
> Just to be absolutely clear, those happened *before* the following
> errors? And there were no 'during exception cleanup' like errors before?

correct.  unfortunately, I wasn't logging times so I'm not 100% sure
if the 'root page' error happened during the event or sometime before
it.  I'm addressing that with the log prefix.

> Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
> entries before? It's hard to know from here, but the 'during exception
> cleanup' indicates a problem in abort handling.  Were there any deadlock
> detected errors closeby?
>
> You're catching deadlock errors in a subtransaction. Hm.

yup, exactly 1:
WARNING:  LOG: Handling deadlock detected on CdsRunTableId 1626

Based on the id that is proven to be before the crashing started,
although at least one manual server restart happened in the interim.

Since it's possible the database is a loss, do you see any value in
bootstrappinng it again with checksums turned on?  One point of note
is that this is a brand spanking new SSD, maybe we nee to rule out
hardware based corruption?

> lternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If
so, what was the cause?

There was one, but unfortunately since the error was trapped the
relevant detail to catch the other side of the deadlock is missed.
Barring this analysis, I'm inclined to remove that guard: it was
mostly there to deal with what turned out to be some bugs in the code
(specifically nonconstrained update on CDSTable).

> * Do you also use lock_timeout/statement_timeout?
not on this database -- this is a coding environment and not set up
with standard configuration

>  were any processes killed at that time?
later on, yes, by manual restart, and the query was cancelled first: to wit:

WARNING:  LOG: Handling deadlock detected on CdsRunTableId 1626
ERROR:  canceling statement due to user request
STATEMENT:  SELECT CDSReconcileRunTable(1626)

followed by restart -m fast

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 22:03:48

On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Since it's possible the database is a loss, do you see any value in
> bootstrappinng it again with checksums turned on?  One point of note
> is that this is a brand spanking new SSD, maybe we nee to rule out
> hardware based corruption?

hm!  I bootstrapped a new database with checksums on, and lo,

[cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
verification failed, calculated checksum 28520 but expected 28541
[cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT:  SQL statement
"COPY lease_delete FROM
'/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER"

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

15 January 2015, 23:00:49

On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Since it's possible the database is a loss, do you see any value in
>> bootstrappinng it again with checksums turned on?  One point of note
>> is that this is a brand spanking new SSD, maybe we nee to rule out
>> hardware based corruption?
>
> hm!  I bootstrapped a new database with checksums on, and lo,
>
> [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
> verification failed, calculated checksum 28520 but expected 28541
> [cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT:  SQL statement
> "COPY lease_delete FROM
> '/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER"

Running this test on another set of hardware to verify -- if this
turns out to be a false alarm which it may very well be, I can only
offer my apologies!  I've never had a new drive fail like that, in
that manner.  I'll burn the other hardware in overnight and report
back.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

15 January 2015, 23:10:18

On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Running this test on another set of hardware to verify -- if this
> turns out to be a false alarm which it may very well be, I can only
> offer my apologies!  I've never had a new drive fail like that, in
> that manner.  I'll burn the other hardware in overnight and report
> back.

Thanks.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

16 January 2015, 14:05:14

On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Running this test on another set of hardware to verify -- if this
>> turns out to be a false alarm which it may very well be, I can only
>> offer my apologies!  I've never had a new drive fail like that, in
>> that manner.  I'll burn the other hardware in overnight and report
>> back.

huh -- well possibly. not.  This is on a virtual machine attached to a
SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
checksums on) hours then the starting having issues:

[cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
verification failed, calculated checksum 59143 but expected 59137 at
character 20
[cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:         DELETE FROM "onesitepmc"."propertyguestcard" t
WHEREEXISTS         (           SELECT 1 FROM "propertyguestcard_insert" d           WHERE (t."prptyid", t."gcardid") =
(d."prptyid",d."gcardid")         )

[cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT:  PL/pgSQL
function cdsreconciletable(text,text,text,text,boolean) line 197 at
EXECUTE statement   SQL statement "SELECT        * FROM CDSReconcileTable(             t.CDSServer,
t.CDSDatabase,            t.SchemaName,             t.TableName)"   PL/pgSQL function cdsreconcileruntable(bigint) line
35at SQL statement

After that, several hours of clean running, followed by:

[cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG:  could not send data
to client: Broken pipe
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT:  SELECT
CDSReconcileRunTable(1160)
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT:  SELECT
CDSReconcileRunTable(1160)
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type AccessExclusiveLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type RowShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type ExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]ERROR:  failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type AccessExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:  you don't own a
lock of type RowShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]WARNING:  you don't own a
lock of type AccessExclusiveLock
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]ERROR:  failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]PANIC:
ERRORDATA_STACK_SIZE exceeded
[ 3093 2015-01-16 04:41:00.299 CST 0]LOG:  server process (PID 32353)
was terminated by signal 6: Aborted
[ 3093 2015-01-16 04:41:00.300 CST 0]LOG:  terminating any other
active server processes

After that, server resumed processing without further incident.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

16 January 2015, 14:21:38

On Fri, Jan 16, 2015 at 8:05 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> Running this test on another set of hardware to verify -- if this
>>> turns out to be a false alarm which it may very well be, I can only
>>> offer my apologies!  I've never had a new drive fail like that, in
>>> that manner.  I'll burn the other hardware in overnight and report
>>> back.
>
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:

I'm going to bisect this down...FYI.  I'll start with the major
releases first.   This is not going to be a fast process...I'm out of
pocket for the weekend and have a lot of other stuff going on.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Heikki Linnakangas

Date:

16 January 2015, 14:22:17

On 01/16/2015 04:05 PM, Merlin Moncure wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> Running this test on another set of hardware to verify -- if this
>>> turns out to be a false alarm which it may very well be, I can only
>>> offer my apologies!  I've never had a new drive fail like that, in
>>> that manner.  I'll burn the other hardware in overnight and report
>>> back.
>
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:
>
> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
> verification failed, calculated checksum 59143 but expected 59137 at
> character 20

The calculated checksum is suspiciously close to to the expected one. It 
could be coincidence, but the previous checksum warning you posted was 
also quite close:

> [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
> verification failed, calculated checksum 28520 but expected 28541

I believe the checksum algorithm is supposed to mix the bits quite 
thoroughly, so that a difference in a single byte in the input will lead 
to a completely different checksum. However, we add the block number to 
the checksum last:

>     /* Mix in the block number to detect transposed pages */
>     checksum ^= blkno;
>
>     /*
>      * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
>      * one. That avoids checksums of zero, which seems like a good idea.
>      */
>     return (checksum % 65535) + 1;

It looks very much like that a page has for some reason been moved to a 
different block number. And that's exactly what Peter found out in his 
investigation too; an index page was mysteriously copied to a different 
block with identical content.

- Heikki

Re: hung backends stuck in spinlock heavy endless loop

From

Andres Freund

Date:

16 January 2015, 14:22:47

Hi,

On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> >> Running this test on another set of hardware to verify -- if this
> >> turns out to be a false alarm which it may very well be, I can only
> >> offer my apologies!  I've never had a new drive fail like that, in
> >> that manner.  I'll burn the other hardware in overnight and report
> >> back.
> 
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:

Damn.

Is there any chance you can package this somehow so that others can run
it locally? It looks hard to find the actual bug here without adding
instrumentation to to postgres.

> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
> verification failed, calculated checksum 59143 but expected 59137 at
> character 20
> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:
>           DELETE FROM "onesitepmc"."propertyguestcard" t
>           WHERE EXISTS
>           (
>             SELECT 1 FROM "propertyguestcard_insert" d
>             WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid")
>           )

> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT:  PL/pgSQL
> function cdsreconciletable(text,text,text,text,boolean) line 197 at
> EXECUTE statement
>     SQL statement "SELECT        * FROM CDSReconcileTable(
>               t.CDSServer,
>               t.CDSDatabase,
>               t.SchemaName,
>               t.TableName)"
>     PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement


This was the first error? None of the 'could not find subXID' errors
beforehand?


> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING:  did not find
> subXID 7553 in MyProc
> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG:  could not send data
> to client: Broken pipe
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT:  SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
> lock of type ShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
> shared lock object
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT:  SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> AbortSubTransaction while in ABORT state

This indicates a bug in our subtransaction abort handling. It looks to
me like there actually might be several. But it's probably a consequence
of an earlier bug. It's hard to diagnose the actual issue, because we're
not seing the original error(s) :(.

Could you add a EmitErrorReport(); before the FlushErrorState() in
pl_exec.c's exec_stmt_block()?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

16 January 2015, 14:39:05

On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Is there any chance you can package this somehow so that others can run
> it locally? It looks hard to find the actual bug here without adding
> instrumentation to to postgres.

That's possible but involves a lot of complexity in the setup because
of the source database (SQL Server) dependency..

Thinking outside the box here I'm going to migrate the source to
postgres.   This will rule out pl/sh which is the only non-core
dependency but will take some setup work on my end first.  If I can
still reproduce the error at that point, maybe we can go in this
direction, and it it would make local reproduction easier anyways.

>> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
>
> This was the first error? None of the 'could not find subXID' errors
> beforehand?

yep.  no caught exceptions or anything interesting in the log before that.

> Could you add a EmitErrorReport(); before the FlushErrorState() in
> pl_exec.c's exec_stmt_block()?

will do.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

16 January 2015, 18:33:15

On Fri, Jan 16, 2015 at 8:22 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote:
>> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> >> Running this test on another set of hardware to verify -- if this
>> >> turns out to be a false alarm which it may very well be, I can only
>> >> offer my apologies!  I've never had a new drive fail like that, in
>> >> that manner.  I'll burn the other hardware in overnight and report
>> >> back.
>>
>> huh -- well possibly. not.  This is on a virtual machine attached to a
>> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
>> checksums on) hours then the starting having issues:
>
> Damn.
>
> Is there any chance you can package this somehow so that others can run
> it locally? It looks hard to find the actual bug here without adding
> instrumentation to to postgres.

FYI, a two hour burn in on my workstation on 9.3 ran with no issues.
An overnight run would probably be required to prove it, ruling out
both hardware and pl/sh.   If proven, it's possible we may be facing a
regression, perhaps a serious one.

ISTM the next step is to bisect the problem down over the weekend in
order to to narrow the search.  If that doesn't turn up anything
productive I'll look into taking other steps.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

16 January 2015, 23:20:06

On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> ISTM the next step is to bisect the problem down over the weekend in
> order to to narrow the search.  If that doesn't turn up anything
> productive I'll look into taking other steps.

That might be the quickest way to do it, provided you can isolate the
bug fairly reliably. It might be a bit tricky to write a shell script
that assumes a certain amount of time having passed without the bug
tripping indicates that it doesn't exist, and have that work
consistently. I'm slightly concerned that you'll hit other bugs that
have since been fixed, given the large number of possible symptoms
here.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

17 January 2015, 00:10:33

On Fri, Jan 16, 2015 at 6:21 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> It looks very much like that a page has for some reason been moved to a
> different block number. And that's exactly what Peter found out in his
> investigation too; an index page was mysteriously copied to a different
> block with identical content.

What I found suspicious about that was that the spuriously identical
pages were not physically adjacent, but logically adjacent (i.e. the
bad page was considered the B-Tree right link of the good page by the
good, spuriously-copied-by-bad page). It also seems likely that that
small catalog index on pg_class(oid) was well cached in
shared_buffers. So I agree that it's unlikely that this is actually a
hardware or filesystem problem. Beyond that, if I had to guess, I'd
say that the problem is more likely to be in the B-Tree code than it
is in the buffer manager or whatever (so the "logically adjacent"
thing is probably not an artifact of the order that the pages were
accessed, since it appears there was a downlink to the bad page. This
downlink was not added recently. Also, this logical adjacency is
unlikely to be mere coincidence - Postgres seemed to fairly
consistently break this way).

Does anyone have a better developed sense of where the ultimate
problem here is than I do? I guess I've never thought too much about
how the system fails when a catalog index is this thoroughly broken.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

22 January 2015, 21:50:10

On Fri, Jan 16, 2015 at 5:20 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Jan 16, 2015 at 10:33 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> ISTM the next step is to bisect the problem down over the weekend in
>> order to to narrow the search.  If that doesn't turn up anything
>> productive I'll look into taking other steps.
>
> That might be the quickest way to do it, provided you can isolate the
> bug fairly reliably. It might be a bit tricky to write a shell script
> that assumes a certain amount of time having passed without the bug
> tripping indicates that it doesn't exist, and have that work
> consistently. I'm slightly concerned that you'll hit other bugs that
> have since been fixed, given the large number of possible symptoms
> here.

Quick update:  not done yet, but I'm making consistent progress, with
several false starts.  (for example, I had a .conf problem with the
new dynamic shared memory setting and git merrily bisected down to the
introduction of the feature.).
I have to triple check everything :(. The problem is generally
reproducible but I get false negatives that throws off the bisection.
I estimate that early next week I'll have it narrowed down
significantly if not to the exact offending revision.

So far, the 'nasty' damage seems to generally if not always follow a
checksum failure and the checksum failures are always numerically
adjacent.  For example:

[cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING:  page
verification failed, calculated checksum 9465 but expected 9477 at
character 20
[cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING:  page
verification failed, calculated checksum 61889 but expected 61903 at
character 20
[cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING:  page
verification failed, calculated checksum 27311 but expected 27316

I'm not up on the intricacies of our checksum algorithm but this is
making me suspicious that we are looking at a improperly flipped
visibility bit via some obscure problem -- almost certainly with
vacuum playing a role.  This fits the profile of catastrophic damage
that masquerades as numerous other problems.  Or, perhaps, something
is flipping what it thinks is a visibility bit but on the wrong page.

I still haven't categorically ruled out pl/sh yet; that's something to
keep in mind.

In the 'plus' category, aside from flushing out this issue, I've had
zero runtime problems so far aside from the mains problem; bisection
(at least on the 'bad' side) has been reliably engaged by simply
counting the number of warnings/errors/etc in the log.  That's really
impressive.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Jeff Janes

Date:

23 January 2015, 19:23:52

On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

So far, the 'nasty' damage seems to generally if not always follow a
checksum failure and the checksum failures are always numerically
adjacent. For example:

[cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING: page
verification failed, calculated checksum 9465 but expected 9477 at
character 20
[cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING: page
verification failed, calculated checksum 61889 but expected 61903 at
character 20
[cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING: page
verification failed, calculated checksum 27311 but expected 27316

I'm not up on the intricacies of our checksum algorithm but this is
making me suspicious that we are looking at a improperly flipped
visibility bit via some obscure problem -- almost certainly with
vacuum playing a role.

That very much sounds like the block is getting duplicated from one place to another.

Even flipping one hint bit (aren't these index pages? Do they have hint bits) should thoroughly scramble the checksum.

Because the checksum adds in the block number after the scrambling has been done, copying a page to another nearby location will just move the (expected) checksum a little bit.

Cheers,

Jeff

Re: hung backends stuck in spinlock heavy endless loop

From

Peter Geoghegan

Date:

23 January 2015, 21:26:45

On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Quick update:  not done yet, but I'm making consistent progress, with
> several false starts.  (for example, I had a .conf problem with the
> new dynamic shared memory setting and git merrily bisected down to the
> introduction of the feature.).


Thanks. Looking forward to your findings. Ideally, we'd be able to get
a fix in for 9.4.1.

-- 
Peter Geoghegan

Re: hung backends stuck in spinlock heavy endless loop

From

Martijn van Oosterhout

Date:

24 January 2015, 10:42:03

On Thu, Jan 22, 2015 at 03:50:03PM -0600, Merlin Moncure wrote:
> Quick update:  not done yet, but I'm making consistent progress, with
> several false starts.  (for example, I had a .conf problem with the
> new dynamic shared memory setting and git merrily bisected down to the
> introduction of the feature.).
> I have to triple check everything :(. The problem is generally
> reproducible but I get false negatives that throws off the bisection.
> I estimate that early next week I'll have it narrowed down
> significantly if not to the exact offending revision.

I've never used it but the BBChop project claims to be able to bisect
even in the case of intermittent failure.

https://github.com/Ealdwulf/bbchop/

It claims to be slow, but I don't think that's the limiting factor here...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

28 January 2015, 14:05:42

On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I still haven't categorically ruled out pl/sh yet; that's something to
> keep in mind.

Well, after bisection proved not to be fruitful, I replaced the pl/sh
calls with dummy calls that approximated the same behavior and the
problem went away.  So again, it looks like this might be a lot of
false alarm.  A pl/sh driven failure might still be interesting if
it's coming from the internal calls it's making, so I'm still chasing
it down.

merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

28 January 2015, 18:26:08

On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I still haven't categorically ruled out pl/sh yet; that's something to
>> keep in mind.
>
> Well, after bisection proved not to be fruitful, I replaced the pl/sh
> calls with dummy calls that approximated the same behavior and the
> problem went away.  So again, it looks like this might be a lot of
> false alarm.  A pl/sh driven failure might still be interesting if
> it's coming from the internal calls it's making, so I'm still chasing
> it down.

...hm, I spoke to soon.  So I deleted everything, and booted up a new
instance 9.4 vanilla with asserts on and took no other action.
Applying the script with no data activity fails an assertion every
single time:


mmoncure@mernix2 12:25 PM (REL9_4_STABLE) ~/src/p94$ cat
/mnt/ssd/data/pg_log/postgresql-28.log
[ 12287 2015-01-28 12:24:24.080 CST 0]LOG:  received smart shutdown request
[ 13516 2015-01-28 12:24:24.080 CST 0]LOG:  autovacuum launcher shutting down
[ 13513 2015-01-28 12:24:24.081 CST 0]LOG:  shutting down
[ 13513 2015-01-28 12:24:24.083 CST 0]LOG:  database system is shut down
[ 14481 2015-01-28 12:24:25.127 CST 0]LOG:  database system was shut
down at 2015-01-28 12:24:24 CST
[ 14457 2015-01-28 12:24:25.129 CST 0]LOG:  database system is ready
to accept connections
[ 14485 2015-01-28 12:24:25.129 CST 0]LOG:  autovacuum launcher started
TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330)
[ 14457 2015-01-28 12:24:47.983 CST 0]LOG:  server process (PID 14545)
was terminated by signal 6: Aborted
[ 14457 2015-01-28 12:24:47.983 CST 0]DETAIL:  Failed process was
running: SELECT CDSStartRun()
[ 14457 2015-01-28 12:24:47.983 CST 0]LOG:  terminating any other
active server processes
[cds2 14546 2015-01-28 12:24:47.983 CST 0]WARNING:  terminating
connection because of crash of another server process
[cds2 14546 2015-01-28 12:24:47.983 CST 0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[cds2 14546 2015-01-28 12:24:47.983 CST 0]HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
[ 14485 2015-01-28 12:24:47.983 CST 0]WARNING:  terminating connection
because of crash of another server process
[ 14485 2015-01-28 12:24:47.983 CST 0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[ 14485 2015-01-28 12:24:47.983 CST 0]HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
[ 14457 2015-01-28 12:24:47.984 CST 0]LOG:  all server processes
terminated; reinitializing
[ 14554 2015-01-28 12:24:47.995 CST 0]LOG:  database system was
interrupted; last known up at 2015-01-28 12:24:25 CST
[ 14554 2015-01-28 12:24:47.995 CST 0]LOG:  database system was not
properly shut down; automatic recovery in progress
[ 14554 2015-01-28 12:24:47.997 CST 0]LOG:  invalid magic number 0000
in log segment 000000010000000000000001, offset 13000704
[ 14554 2015-01-28 12:24:47.997 CST 0]LOG:  redo is not required
[ 14457 2015-01-28 12:24:48.000 CST 0]LOG:  database system is ready
to accept connections
[ 14558 2015-01-28 12:24:48.000 CST 0]LOG:  autovacuum launcher started


merlin

Re: hung backends stuck in spinlock heavy endless loop

From

Tom Lane

Date:

28 January 2015, 18:48:07

Merlin Moncure <mmoncure@gmail.com> writes:
> ...hm, I spoke to soon.  So I deleted everything, and booted up a new
> instance 9.4 vanilla with asserts on and took no other action.
> Applying the script with no data activity fails an assertion every
> single time:

> TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330)

There's no Assert at line 330 in 9.4, though there is in HEAD.  I suspect
what you've got here is a version mismatch; in particular commit
4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c
such that external modules would need to be recompiled to use it without
error.  I'm not real sure though how you are getting past the
loadable-module version check.  Anyway, I'd try "make distclean" and
full rebuild before anything else.
        regards, tom lane

Re: hung backends stuck in spinlock heavy endless loop

From

Merlin Moncure

Date:

28 January 2015, 18:52:30

On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> ...hm, I spoke to soon.  So I deleted everything, and booted up a new
>> instance 9.4 vanilla with asserts on and took no other action.
>> Applying the script with no data activity fails an assertion every
>> single time:
>
>> TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330)
>
> There's no Assert at line 330 in 9.4, though there is in HEAD.  I suspect
> what you've got here is a version mismatch; in particular commit
> 4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c
> such that external modules would need to be recompiled to use it without
> error.  I'm not real sure though how you are getting past the
> loadable-module version check.  Anyway, I'd try "make distclean" and
> full rebuild before anything else.

you're right -- git got confused and built a 9.5 devel at some random
commit.  I reran it again on 9.4 and it worked ok :(.

merlin