Home > mailing lists

Thread: [HACKERS] segfault in hot standby for hash indexes

[HACKERS] segfault in hot standby for hash indexes

From

Jeff Janes

Date:

21 March 2017, 10:58:07

Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.

Using the attached files, I start the test case like this:

nice sh do_nocrash_sr.sh >& do_nocrash_sr.err &

And start the replica like:

rm -r /tmp/data2_replica/ ;

psql -p 9876 -c "select pg_create_physical_replication_slot('foo')";

~/pgsql/pure_origin/bin/pg_basebackup -p 9876 -D /tmp/data2_replica -R -S foo;

~/pgsql/pure_origin/bin/pg_ctl start -D /tmp/data2_replica/ -o "--port=9874"

After less than a minute, the replica fails.

#0 0x00000000004b85fe in hash_xlog_vacuum_get_latestRemovedXid (record=0x1925418) at hash_xlog.c:1006

1006 iitemid = PageGetItemId(ipage, unused[i]);

(gdb) bt

#0 0x00000000004b85fe in hash_xlog_vacuum_get_latestRemovedXid (record=0x1925418) at hash_xlog.c:1006

#1 0x00000000004b881f in hash_xlog_vacuum_one_page (record=0x1925418) at hash_xlog.c:1113

#2 0x00000000004b8bed in hash_redo (record=0x1925418) at hash_xlog.c:1217

#3 0x000000000051a96c in StartupXLOG () at xlog.c:7152

#4 0x0000000000789ffb in StartupProcessMain () at startup.c:216

#5 0x000000000052d617 in AuxiliaryProcessMain (argc=2, argv=0x7fffe7661500) at bootstrap.c:421

#6 0x00000000007890cf in StartChildProcess (type=StartupProcess) at postmaster.c:5256

#7 0x000000000078419d in PostmasterMain (argc=4, argv=0x18fc300) at postmaster.c:1329

#8 0x00000000006cd78e in main (argc=4, argv=0x18fc300) at main.c:228

'unused' is NULL at that point.

Cheers,

Jeff

Attachment

Re: [HACKERS] segfault in hot standby for hash indexes

From

Amit Kapila

Date:

21 March 2017, 11:24:57

On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.
>

I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid().
It seems to me that we are using different block_id for registering
the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using
different block_id for fetching those items in
hash_xlog_vacuum_get_latestRemovedXid().  So probably matching those
will fix this issue (instead of fetching block number and items from
block_id 1, we should use block_id 0).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

21 March 2017, 14:00:54

Hi Jeff,

On Tue, Mar 21, 2017 at 1:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.
>>
>
> I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid().
> It seems to me that we are using different block_id for registering
> the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using
> different block_id for fetching those items in
> hash_xlog_vacuum_get_latestRemovedXid().  So probably matching those
> will fix this issue (instead of fetching block number and items from
> block_id 1, we should use block_id 0).
>

Thanks for reporting this issue. As Amit said, it is happening due to
block_id mismatch. Attached is the patch that fixes the same. I
apologise for such a silly mistake. Please note that  I was not able
to reproduce the issue on my local machine using the test script you
shared. Could you please check with the attached patch if you are
still seeing the issue. Thanks in advance.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

corrected_block_id_reference_in_hash_vacuum_get_latestRemovedXid.patch

Re: [HACKERS] segfault in hot standby for hash indexes

From

Jeff Janes

Date:

21 March 2017, 19:23:06

On Tue, Mar 21, 2017 at 4:00 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hi Jeff,

On Tue, Mar 21, 2017 at 1:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.
>>
>
> I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid().
> It seems to me that we are using different block_id for registering
> the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using
> different block_id for fetching those items in
> hash_xlog_vacuum_get_latestRemovedXid(). So probably matching those
> will fix this issue (instead of fetching block number and items from
> block_id 1, we should use block_id 0).
>

Thanks for reporting this issue. As Amit said, it is happening due to
block_id mismatch. Attached is the patch that fixes the same. I
apologise for such a silly mistake. Please note that I was not able
to reproduce the issue on my local machine using the test script you
shared. Could you please check with the attached patch if you are
still seeing the issue. Thanks in advance.

Hi Ashutosh,

I can confirm that that fixes the seg faults for me.

Did you mean you couldn't reproduce the problem in the first place, or that you could reproduce it and now the patch fixes it? If the first of those, I forget to say you do have to wait for hot standby to reach a consistency and open for connections, and then connect to the standby ("psql -p 9874"), before the seg fault will be triggered.

But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges from btree_xlog_delete_get_latestRemovedXid, which I don't understand the reason for the divergence. Is there a reason we dropped the PANIC if we have not reached consistency? That is a case which should never happen, but it seems worth preserving the PANIC. And why does this code get 'unused' from XLogRecGetBlockData(record, 0, &len), while the btree code gets it from xlrec? Is that because the record being replayed is structured differently between btree and hash, or is there some other reason?

Thanks,

Jeff

Re: [HACKERS] segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

21 March 2017, 21:19:33

Hi Jeff,

>
> I can confirm that that fixes the seg faults for me.

Thanks for confirmation.

>
> Did you mean you couldn't reproduce the problem in the first place, or that
> you could reproduce it and now the patch fixes it?  If the first of those, I
> forget to say you do have to wait for hot standby to reach a consistency and
> open for connections, and then connect to the standby ("psql -p 9874"),
> before the seg fault will be triggered.

I meant that I was not able to reproduce the issue on HEAD.

>
> But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges
> from btree_xlog_delete_get_latestRemovedXid, which I don't understand the
> reason for the divergence.  Is there a reason we dropped the PANIC if we
> have not reached consistency?

Well, I'm not quite sure how would standby allow any backend to
connect to it until it has reached to a consistent state. If you see
the definition of btree_xlog_delete_get_latestRemovedXid(), just
before consistency check there is a if-condition 'if
(CountDBBackends(InvalidOid) == 0)' which means
we are checking for consistent state only after knowing that there are
some backends connected to the standby. So, Is there a possibility of
having some backend connected to standby server without having it in
consistent state.

That is a case which should never happen, but
> it seems worth preserving the PANIC.  And why does this code get 'unused'
> from XLogRecGetBlockData(record, 0, &len), while the btree code gets it from
> xlrec?  Is that because the record being replayed is structured differently
> between btree and hash, or is there some other reason?
>

Yes, That's right.  In case of btree index, we have used
XLogRegisterData() to add data to WAL record and to fetch the same we
use XLogRecGetData(). In case of hash index we have associated the
same data with some registered buffer using
XLogRegisterBufferData(0,...) and to fetch the same we use
XLogRecGetBlockData(0,...).  Now, if you see XLogRecordAssemble, it
first adds all the data assciated with registered buffers into the WAL
record followed by the main data (). Hence, the WAL record in  btree
and hash are organised differently.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: [HACKERS] segfault in hot standby for hash indexes

From

Amit Kapila

Date:

22 March 2017, 06:11:48

On Tue, Mar 21, 2017 at 11:49 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>
>> I can confirm that that fixes the seg faults for me.
>
> Thanks for confirmation.
>
>>
>> Did you mean you couldn't reproduce the problem in the first place, or that
>> you could reproduce it and now the patch fixes it?  If the first of those, I
>> forget to say you do have to wait for hot standby to reach a consistency and
>> open for connections, and then connect to the standby ("psql -p 9874"),
>> before the seg fault will be triggered.
>
> I meant that I was not able to reproduce the issue on HEAD.
>
>>
>> But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges
>> from btree_xlog_delete_get_latestRemovedXid, which I don't understand the
>> reason for the divergence.  Is there a reason we dropped the PANIC if we
>> have not reached consistency?
>
> Well, I'm not quite sure how would standby allow any backend to
> connect to it until it has reached to a consistent state. If you see
> the definition of btree_xlog_delete_get_latestRemovedXid(), just
> before consistency check there is a if-condition 'if
> (CountDBBackends(InvalidOid) == 0)' which means
> we are checking for consistent state only after knowing that there are
> some backends connected to the standby. So, Is there a possibility of
> having some backend connected to standby server without having it in
> consistent state.
>

I don't think so, but I think we should have reachedConsistency check
and elog(PANIC,..) similar to btree.  If you see other conditions
where we PANIC in btree or hash xlog code, you will notice that those
are also theoretically not possible cases.  It seems this is to save
database from getting corrupt or behaving insanely if due to some
reason (like a coding error or others) the check fails.

In a quick look, I don't find any other divergence in both the
function, is there any other divergence in both functions, if so, I
think we should at the very least mention something about it in the
function header.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

22 March 2017, 13:09:32

Hi,

On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 11:49 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>
>>> I can confirm that that fixes the seg faults for me.
>>
>> Thanks for confirmation.
>>
>>>
>>> Did you mean you couldn't reproduce the problem in the first place, or that
>>> you could reproduce it and now the patch fixes it?  If the first of those, I
>>> forget to say you do have to wait for hot standby to reach a consistency and
>>> open for connections, and then connect to the standby ("psql -p 9874"),
>>> before the seg fault will be triggered.
>>
>> I meant that I was not able to reproduce the issue on HEAD.
>>
>>>
>>> But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges
>>> from btree_xlog_delete_get_latestRemovedXid, which I don't understand the
>>> reason for the divergence.  Is there a reason we dropped the PANIC if we
>>> have not reached consistency?
>>
>> Well, I'm not quite sure how would standby allow any backend to
>> connect to it until it has reached to a consistent state. If you see
>> the definition of btree_xlog_delete_get_latestRemovedXid(), just
>> before consistency check there is a if-condition 'if
>> (CountDBBackends(InvalidOid) == 0)' which means
>> we are checking for consistent state only after knowing that there are
>> some backends connected to the standby. So, Is there a possibility of
>> having some backend connected to standby server without having it in
>> consistent state.
>>
>
> I don't think so, but I think we should have reachedConsistency check
> and elog(PANIC,..) similar to btree.  If you see other conditions
> where we PANIC in btree or hash xlog code, you will notice that those
> are also theoretically not possible cases.  It seems this is to save
> database from getting corrupt or behaving insanely if due to some
> reason (like a coding error or others) the check fails.

okay, agreed. I have included it in the attached patch.

However,  I am still able to see the crash reported by Jeff upthread -
[1]. I think there are couple of things that needs to be corrected
inside hash_xlog_vacuum_get_latestRemovedXid().

1) As Amit mentioned in his earlier mail [2], the block id used for
registering deleted items in XLOG_HASH_VACUUM_ONE_PAGE is different
than the block id used for fetching those items. I had fixed this and
shared the patch earlier. With this patch I still see the crash Jeff
reported yesterday [1].

2) When a full page image of registered block is taken, the modified
data associated with that block is not included in the WAL record. In
such a case, we won't be able to fetch the items array that we have
tried to include during xlog record (XLOG_HASH_VACUUM_ONE_PAGE)
creation using following function.

   XLogRegisterBufData(0, (char *) deletable, ndeletable *
sizeof(OffsetNumber));

If above is not included in the WAL record, then fetching the same
using 'XLogRecGetBlockData(record, 0, &len);' will return NULL
pointer.

 ptr = XLogRecGetBlockData(record, 0, &len);
 unused = (OffsetNumber *) ptr;
 ............
 iitemid = PageGetItemId(ipage, unused[i]);

Here, if ptr is NULL, reference to unused will cause a crash.

To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
registering the buffer. Something like this,

-                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
+                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
REGBUF_KEEP_DATA);

Attached is the patch that fixes this issue. Please have a look and
let me know if you still have any concerns. Thank you.

[1] - https://www.postgresql.org/message-id/CAMkU%3D1w-9Qe%3DFf1o6bSaXpNO9wqpo7_9GL8_CVhw4BoVVHasqg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1%2BYUedok0%2Bmeynnf8K3fqFsfdMpEFz1JiKLyyNv46hjaA%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fixes_in_hash_xlog_vacuum_get_latestRemovedXid.patch

Re: [HACKERS] segfault in hot standby for hash indexes

From

Amit Kapila

Date:

23 March 2017, 06:13:20

On Wed, Mar 22, 2017 at 3:39 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Hi,
>
> On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
> registering the buffer. Something like this,
>
> -                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
> +                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
> REGBUF_KEEP_DATA);
>
> Attached is the patch that fixes this issue.
>

I think this will work, but not sure if there is a merit to deviate
from what btree does to handle this case.   One thing I find slightly
awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
using a number of tuples registered as part of fixed data
(xl_hash_vacuum_one_page) to traverse the data registered as buf data.
I think it will be better if we register offsets also in fixed part of
data as we are doing btree case.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] segfault in hot standby for hash indexes

From

Amit Kapila

Date:

23 March 2017, 06:47:49

On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 3:39 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Hi,
>>
>> On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
>> registering the buffer. Something like this,
>>
>> -                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
>> +                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
>> REGBUF_KEEP_DATA);
>>
>> Attached is the patch that fixes this issue.
>>
>
> I think this will work, but not sure if there is a merit to deviate
> from what btree does to handle this case.   One thing I find slightly
> awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
> using a number of tuples registered as part of fixed data
> (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
> I think it will be better if we register offsets also in fixed part of
> data as we are doing btree case.
>
>

Also another small point in this regard, do we need two separate
variables to track number of deleted items in below code?  I think one
variable is sufficient.

_hash_vacuum_one_page()
{
..
deletable[ndeletable++] = offnum;
tuples_removed += 1;
..
}



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

23 March 2017, 13:56:05

On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Mar 22, 2017 at 3:39 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >> Hi,
> >>
> >> On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
> >> registering the buffer. Something like this,
> >>
> >> -                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
> >> +                       XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
> >> REGBUF_KEEP_DATA);
> >>
> >> Attached is the patch that fixes this issue.
> >>
> >
> > I think this will work, but not sure if there is a merit to deviate
> > from what btree does to handle this case.   One thing I find slightly
> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
> > using a number of tuples registered as part of fixed data
> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
> > I think it will be better if we register offsets also in fixed part of
> > data as we are doing btree case.

Agreed. I have made the changes accordingly. Please check attached v2 patch.

>
> >
> >
>
> Also another small point in this regard, do we need two separate
> variables to track number of deleted items in below code?  I think one
> variable is sufficient.
>
> _hash_vacuum_one_page()
> {
> ..
> deletable[ndeletable++] = offnum;
> tuples_removed += 1;--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
> ..
> }
>

Yes, I think 'ndeletable' alone should be fine.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fixes_in_hash_xlog_vacuum_get_latestRemovedXid_v2.patch

Re: [HACKERS] segfault in hot standby for hash indexes

From

Amit Kapila

Date:

24 March 2017, 08:11:47

On Thu, Mar 23, 2017 at 4:26 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > I think this will work, but not sure if there is a merit to deviate
>> > from what btree does to handle this case.   One thing I find slightly
>> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
>> > using a number of tuples registered as part of fixed data
>> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
>> > I think it will be better if we register offsets also in fixed part of
>> > data as we are doing btree case.
>
> Agreed. I have made the changes accordingly. Please check attached v2 patch.
>

Changes look good to me.   I think you can modify the comments in
structure xl_hash_vacuum_one_page to mention "TARGET OFFSET NUMBERS
FOLLOW AT THE END"

>>
>> >
>> >
>>
>> Also another small point in this regard, do we need two separate
>> variables to track number of deleted items in below code?  I think one
>> variable is sufficient.
>>
>> _hash_vacuum_one_page()
>> {
>> ..
>> deletable[ndeletable++] = offnum;
>> tuples_removed += 1;--
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>> ..
>> }
>>
>
> Yes, I think 'ndeletable' alone should be fine.
>

I think it would have been probably okay to use *int* for ntuples as
that matches with what you are actually assigning in the function.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

24 March 2017, 09:55:19

>>> > I think this will work, but not sure if there is a merit to deviate
>>> > from what btree does to handle this case.   One thing I find slightly
>>> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
>>> > using a number of tuples registered as part of fixed data
>>> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
>>> > I think it will be better if we register offsets also in fixed part of
>>> > data as we are doing btree case.
>>
>> Agreed. I have made the changes accordingly. Please check attached v2 patch.
>>
>
> Changes look good to me.   I think you can modify the comments in
> structure xl_hash_vacuum_one_page to mention "TARGET OFFSET NUMBERS
> FOLLOW AT THE END"
>

Added the comment in xl_hash_vacuum_one_page structure.

>>>
>>> >
>>> >
>>>
>>> Also another small point in this regard, do we need two separate
>>> variables to track number of deleted items in below code?  I think one
>>> variable is sufficient.
>>>
>>> _hash_vacuum_one_page()
>>> {
>>> ..
>>> deletable[ndeletable++] = offnum;
>>> tuples_removed += 1;--
>>>
>>
>> Yes, I think 'ndeletable' alone should be fine.
>>
>
> I think it would have been probably okay to use *int* for ntuples as
> that matches with what you are actually assigning in the function.

okay, corrected it. Attached is newer version of patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fixes_in_hash_xlog_vacuum_get_latestRemovedXid_v3.patch

Re: [HACKERS] segfault in hot standby for hash indexes

From

Amit Kapila

Date:

24 March 2017, 10:49:59

On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>
>> I think it would have been probably okay to use *int* for ntuples as
>> that matches with what you are actually assigning in the function.
>
> okay, corrected it. Attached is newer version of patch.
>

Thanks, this version looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: segfault in hot standby for hash indexes

From

Jeff Janes

Date:

27 March 2017, 18:56:55

On Fri, Mar 24, 2017 at 12:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>
>> I think it would have been probably okay to use *int* for ntuples as
>> that matches with what you are actually assigning in the function.
>
> okay, corrected it. Attached is newer version of patch.
>

Thanks, this version looks good to me.

It solves the problem for me. I'd like to test that I get the right answer on the standby, not just the absence of a crash, but I don't know how to do that effectively. Has anyone used the new wal replay block consistency tool on hash indexes since this microvacuum code was committed?

Jeff

Re: segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

27 March 2017, 19:34:30

>> >>
>> >> I think it would have been probably okay to use *int* for ntuples as
>> >> that matches with what you are actually assigning in the function.
>> >
>> > okay, corrected it. Attached is newer version of patch.
>> >
>>
>> Thanks, this version looks good to me.
>
>
> It solves the problem for me.

Great..Thanks for confirming.

I'd like to test that I get the right answer
> on the standby, not just the absence of a crash, but I don't know how to do
> that effectively.  Has anyone used the new wal replay block consistency tool
> on hash indexes since this microvacuum code was committed?

Yes, I have used it for hash index but I used it after below commit,

commit 9abbf4727de746222ad8fc15b17348065389ae43
Author: Robert Haas <rhaas@postgresql.org>
Date:   Mon Mar 20 15:55:27 2017 -0400
   Another fix for single-page hash index vacuum.
   The WAL consistency checking code needed to be updated for the new   page status bit, but that didn't get done
previously.

All I did was set 'wal_consistency_checking = 'hash'' in
postgresql.conf and ran test cases on primary server. If there is any
inconsistent block on standby the tool would probably terminate the
recovery process and you would see following message in the server
logfile.

"inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u"

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: segfault in hot standby for hash indexes

From

Robert Haas

Date:

27 March 2017, 19:55:00

On Fri, Mar 24, 2017 at 3:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>
>>> I think it would have been probably okay to use *int* for ntuples as
>>> that matches with what you are actually assigning in the function.
>>
>> okay, corrected it. Attached is newer version of patch.
>
> Thanks, this version looks good to me.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: segfault in hot standby for hash indexes

From

Ashutosh Sharma

Date:

28 March 2017, 04:02:38

On Mar 27, 2017 22:25, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Fri, Mar 24, 2017 at 3:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>
>>> I think it would have been probably okay to use *int* for ntuples as
>>> that matches with what you are actually assigning in the function.
>>
>> okay, corrected it. Attached is newer version of patch.
>
> Thanks, this version looks good to me.

Committed.

Thank you.