Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag
Date
Msg-id CAB7nPqRfMS13gm2dN7DTfOfnhsXe2LtpJ=RA2xYZhw2yzYmMSw@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag  (Wang Hao <whberet@gmail.com>)
Responses Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] Should buffer of initialization fork have aBM_PERMANENT flag  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:
> An unlogged table has an initialization fork. The initialization fork does
> not have an BM_PERMANENT flag when get a buffer.
> In checkpoint (not shutdown or end of recovery), it will not write to disk.
> after a crash recovery, the page of initialization fork will not correctly,
> then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

> Here is an example for GIN index.
>
> create unlogged table gin_test_tbl(i int4[]);
> create index gin_test_idx on gin_test_tbl using gin (i);
> checkpoint;
>
> kill all the postgres process, and restart again.
>
> vacuum gin_test_tbl;  -- crash.
>
> It seems have same problem in BRIN, GIN, GiST and HASH index which using
> buffer for meta page initialize in ambuildempty function.

Yeah, other index AMs deal directly with the sync of the page, that's
why there is no issue for them.

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] Checksums by default?
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] safer node casting