Thread: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

[HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

From
Wang Hao
Date:
An unlogged table has an initialization fork. The initialization fork does not have an BM_PERMANENT flag when get a buffer.
In checkpoint (not shutdown or end of recovery), it will not write to disk. 
after a crash recovery, the page of initialization fork will not correctly, then make the main fork not correctly too.

Here is an example for GIN index.

create unlogged table gin_test_tbl(i int4[]);
create index gin_test_idx on gin_test_tbl using gin (i);
checkpoint;

kill all the postgres process, and restart again.

vacuum gin_test_tbl;  -- crash.

It seems have same problem in BRIN, GIN, GiST and HASH index which using buffer for meta page initialize in ambuildempty function.

Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Michael Paquier
Date:
(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:
> An unlogged table has an initialization fork. The initialization fork does
> not have an BM_PERMANENT flag when get a buffer.
> In checkpoint (not shutdown or end of recovery), it will not write to disk.
> after a crash recovery, the page of initialization fork will not correctly,
> then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

> Here is an example for GIN index.
>
> create unlogged table gin_test_tbl(i int4[]);
> create index gin_test_idx on gin_test_tbl using gin (i);
> checkpoint;
>
> kill all the postgres process, and restart again.
>
> vacuum gin_test_tbl;  -- crash.
>
> It seems have same problem in BRIN, GIN, GiST and HASH index which using
> buffer for meta page initialize in ambuildempty function.

Yeah, other index AMs deal directly with the sync of the page, that's
why there is no issue for them.

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Michael Paquier
Date:
On Thu, Jan 26, 2017 at 9:14 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> So the patch attached fixes the problem by changing BufferAlloc() in
> such a way that initialization forks are permanently written to disk,
> which is what you are suggesting. As a simple fix for back-branches
> that's enough, though on HEAD I think that we should really rework the
> empty() routines so as the write goes through shared buffers first,
> that seems more solid than relying on the sgmr routines to do this
> work. Robert, what do you think?

Attached is what I have in mind for HEAD. btree, gist, spgist and
bloom indexes are changed so as the init forks created go through the
shared buffers instead of having their empty() routines handle the
flush of the page created. This removes any kind of race conditions
between the checkpointer and the init fork creations, which is I think
a good thing.

Here are the tests I have done.
First running those commands to create all types of indexes.
create extension bloom;
create extension btree_gist;
create extension btree_gin;
create unlogged table foo (a int);
create index foo_bt on foo(a);
create index foo_bloom on foo using bloom(a);
create index foo_gin on foo using gin (a);
create index foo_gist on foo using gist (a);
create index foo_brin on foo using brin (a);
create unlogged table foo_geo (a box);
create index foo_spgist ON foo_geo using spgist(a);
checkpoint;

Then crash the server, restart it, and the following vacuums are able
to complete.
vacuum foo;
vacuum foo_geo;

I have as well created a CF entry for this set of patches:
https://commitfest.postgresql.org/13/971/

Thanks,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Artur Zakirov
Date:
Hello,

I wanted to review the patch. But the patch is applied with errors. I've 
rebased the local copy and have done review on it. I'm not sure is it 
properly to send rebased patch by reviewer, so I haven't sent it to 
avoid confuses.

On 29.01.2017 17:00, Michael Paquier wrote:
> Attached is what I have in mind for HEAD. btree, gist, spgist and
> bloom indexes are changed so as the init forks created go through the
> shared buffers instead of having their empty() routines handle the
> flush of the page created. This removes any kind of race conditions
> between the checkpointer and the init fork creations, which is I think
> a good thing.

I think this is good fixes. I've checked them. And in my opinion they 
are correct.

The code also is good.

>
> Here are the tests I have done.
> First running those commands to create all types of indexes.
> create extension bloom;
> create extension btree_gist;
> create extension btree_gin;
> create unlogged table foo (a int);
> create index foo_bt on foo(a);
> create index foo_bloom on foo using bloom(a);
> create index foo_gin on foo using gin (a);
> create index foo_gist on foo using gist (a);
> create index foo_brin on foo using brin (a);
> create unlogged table foo_geo (a box);
> create index foo_spgist ON foo_geo using spgist(a);
> checkpoint;
>
> Then crash the server, restart it, and the following vacuums are able
> to complete.
> vacuum foo;
> vacuum foo_geo;
>

I've done this tests. Before the patch server crashes on vacuum command. 
After applying the patch server doesn't crash on vacuum command.

I have run regression and TAP tests. They all passed without error.

I think the patch can be marked as "Ready for Committer" after rebase.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Michael Paquier
Date:
On Thu, Mar 9, 2017 at 10:25 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
> I think this is good fixes. I've checked them. And in my opinion they are
> correct.
>
> The code also is good.

Having something with conflicts is not nice, so attached is a rebased version.

> I have run regression and TAP tests. They all passed without error.
>
> I think the patch can be marked as "Ready for Committer" after rebase.

Thanks for the review.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Artur Zakirov
Date:
On 10.03.2017 04:00, Michael Paquier wrote:
> On Thu, Mar 9, 2017 at 10:25 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
>> I think this is good fixes. I've checked them. And in my opinion they are
>> correct.
>>
>> The code also is good.
>
> Having something with conflicts is not nice, so attached is a rebased version.

Thank you!

I've rerun regression and TAP tests. They all passed.

Also maybe it will be good to fix comments.

In buf_internals.h:
> #define BM_PERMANENT            (1U << 31)        /* permanent relation (not
>                                                    * unlogged) */

And in FlushBuffer():
>     /*
>      * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
>      * rule that log updates must hit disk before any of the data-file changes
>      * they describe do.
>      *
>      * However, this rule does not apply to unlogged relations, which will be
>      * lost after a crash anyway.  Most unlogged relation pages do not bear

Because BM_PERMANENT is used for init forks of unlogged indexes now.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Michael Paquier
Date:
On Sat, Mar 11, 2017 at 12:03 AM, Artur Zakirov
<a.zakirov@postgrespro.ru> wrote:
> Because BM_PERMANENT is used for init forks of unlogged indexes now.

Yes, indeed.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Robert Haas
Date:
On Wed, Jan 25, 2017 at 7:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> (Adding Robert in CC.)
>
> On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:
>> An unlogged table has an initialization fork. The initialization fork does
>> not have an BM_PERMANENT flag when get a buffer.
>> In checkpoint (not shutdown or end of recovery), it will not write to disk.
>> after a crash recovery, the page of initialization fork will not correctly,
>> then make the main fork not correctly too.
>
> For init forks the flush need absolutely to happen, so that's really
> not good. We ought to fix BufferAlloc() appropriately here.

I agree with that, but I propose the attached version instead.  It
seems cleaner to have the entire test for setting BM_PERMANENT in one
place rather than splitting it up as you did.

I believe this sets a record for the longest-lived data corruption bug
in a commit made by me.  Six years and change, woohoo.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Michael Paquier
Date:
On Tue, Mar 14, 2017 at 4:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 25, 2017 at 7:14 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> (Adding Robert in CC.)
>>
>> On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:
>>> An unlogged table has an initialization fork. The initialization fork does
>>> not have an BM_PERMANENT flag when get a buffer.
>>> In checkpoint (not shutdown or end of recovery), it will not write to disk.
>>> after a crash recovery, the page of initialization fork will not correctly,
>>> then make the main fork not correctly too.
>>
>> For init forks the flush need absolutely to happen, so that's really
>> not good. We ought to fix BufferAlloc() appropriately here.
>
> I agree with that, but I propose the attached version instead.  It
> seems cleaner to have the entire test for setting BM_PERMANENT in one
> place rather than splitting it up as you did.

Fine for me. You may want to update the comment of BM_PERMANENT in
buf_internals.h as Artur has mentioned upthread. For example by just
adding "and init forks".

> I believe this sets a record for the longest-lived data corruption bug
> in a commit made by me.

Really? I'll need to double-check the git history here.

> Six years and change, woohoo.  :-(

And that much for someone to report it.
-- 
Michael



Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Robert Haas
Date:
On Mon, Mar 13, 2017 at 8:51 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> I agree with that, but I propose the attached version instead.  It
>> seems cleaner to have the entire test for setting BM_PERMANENT in one
>> place rather than splitting it up as you did.
>
> Fine for me. You may want to update the comment of BM_PERMANENT in
> buf_internals.h as Artur has mentioned upthread. For example by just
> adding "and init forks".

OK, done, and back-patched all the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag

From
Michael Paquier
Date:
On Wed, Mar 15, 2017 at 1:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 13, 2017 at 8:51 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> I agree with that, but I propose the attached version instead.  It
>>> seems cleaner to have the entire test for setting BM_PERMANENT in one
>>> place rather than splitting it up as you did.
>>
>> Fine for me. You may want to update the comment of BM_PERMANENT in
>> buf_internals.h as Artur has mentioned upthread. For example by just
>> adding "and init forks".
>
> OK, done, and back-patched all the way.

Thanks.
-- 
Michael