Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist - Mailing list pgsql-hackers

From Andrey Lepikhov
Subject Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist
Date
Msg-id 670845ba-91a5-0bc5-9cf6-6b0e8ba287cf@postgrespro.ru
Whole thread Raw
In response to Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist
List pgsql-hackers

On 25/03/2019 15:21, Heikki Linnakangas wrote:
> On 25/03/2019 09:57, David Steele wrote:
>> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>>> The patchset had a problem with all-zero pages, has appeared at index
>>> build stage: the generic_log_relation() routine sends all pages into the
>>> WAL. So  lsn field at all-zero page was initialized and the
>>> PageIsVerified() routine detects it as a bad page.
>>> The solution may be:
>>> 1. To improve index build algorithms and eliminate the possibility of
>>> not used pages appearing.
>>> 2. To mark each page as 'dirty' right after initialization. In this case
>>> we will got 'empty' page instead of the all-zeroed.
>>> 3. Do not write into the WAL all-zero pages.
> 
> Hmm. When do we create all-zero pages during index build? That seems 
> pretty surprising.
GIST uses buffered pages. During GIST build it is possible (very rarely) 
what no one index tuple was written to the block page before new block 
was allocated. And the page has become an all-zero page.
You can't have problems in the current GIST code, because it writes into 
the WAL only changed pages.
But the idea of the patch is traversing blocks of index relation 
one-by-one after end of index building process and write this blocks to 
the WAL. In this case we will see all-zeroed pages and need to check it.
> 
>>> On 04.02.2019 10:04, Michael Paquier wrote:
>>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>>> Ok. It is used only for demonstration.
>>>>
>>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>>> author as this got no reviews.
>>
>> The patch no longer applies so marked Waiting on Author.
>>
>> Alexander, Heikki, are either of you planning to review the patch in
>> this CF?
> 
> I had another quick look.
> 
> I still think using the "generic xlog AM" for this is a wrong level of 
> abstraction, and we should use the XLOG_FPI records for this directly. 
> We can extend XLOG_FPI so that it can store multiple pages in a single 
> record, if it doesn't already handle it.
> 
> Another counter-point to using the generic xlog record is that you're 
> currently doing unnecessary two memcpy's of all pages in the index, in 
> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
> 
> I guess the generic_log_relation() function can stay where it is, but it 
> should use XLogRegisterBuffer() and XLogInsert() directly.
Ok. This patch waited feedback for a long time. I will check the GIST 
code changes from previous review and will try to use your advice.
> 
> - Heikki

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Next
From: Peter Eisentraut
Date:
Subject: Re: psql display of foreign keys