Re: Avoiding smgrimmedsync() during nbtree index builds - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Avoiding smgrimmedsync() during nbtree index builds
Date
Msg-id d410c275-7181-52ce-63ae-0af229b99a70@iki.fi
Whole thread Raw
In response to Re: Avoiding smgrimmedsync() during nbtree index builds  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On 05/03/2022 00:03, Melanie Plageman wrote:
> On Wed, Mar 2, 2022 at 8:09 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>>
>> Rebased to appease cfbot.
>>
>> I ran these paches under a branch which shows code coverage in cirrus.  It
>> looks good to my eyes.
>> https://api.cirrus-ci.com/v1/artifact/task/5212346552418304/coverage/coverage/00-index.html
> 
> thanks for doing that and for the rebase! since I made updates, the
> attached version 6 is also rebased.

I'm surprised by all the changes in nbtsort.c to choose between using 
the buffer manager or the new API. I would've expected the new API to 
abstract that away. Otherwise we need to copy that logic to all the 
index AMs.

I'd suggest an API along the lines of:

/*
  * Start building a relation in bulk.
  *
  * If the relation is going to be small, we will use the buffer manager,
  * but if it's going to be large, this will skip the buffer manager and
  * write the pages directly to disk.
  */
bulk_init(SmgrRelation smgr, BlockNumber estimated_size)

/*
  * Extend relation by one page
  */
bulk_extend(SmgrRelation, BlockNumber, Page)

/*
  * Finish building the relation
  *
  * This will fsync() the data to disk, if required.
  */
bulk_finish()


Behind this interface, you can encapsulate the logic to choose whether 
to use the buffer manager or not. And I think bulk_extend() could do the 
WAL-logging too.

Or you could make the interface look more like the buffer manager:

/* as above */
bulk_init(SmgrRelation smgr, BlockNumber estimated_size)
bulk_finish()

/*
  * Get a buffer for writing out a page.
  *
  * The contents of the buffer are uninitialized. The caller
  * must fill it in before releasing it.
  */
BulkBuffer
bulk_getbuf(SmgrRelation smgr, BlockNumber blkno)

/*
  * Release buffer. It will be WAL-logged and written out to disk.
  * Not necessarily immediately, but at bulk_finish() at latest.
  *
  * NOTE: There is no way to read back the page after you release
  * it, until you finish the build with bulk_finish().
  */
void
bulk_releasebuf(SmgrRelation smgr, BulkBuffer buf)


With this interface, you could also batch multiple pages and WAL-log 
them all in one WAL record with log_newpage_range(), for example.

- Heikki



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: A proposal for shared memory based backup infrastructure
Next
From: Andrey Borodin
Date:
Subject: Re: Amcheck verification of GiST and GIN