Home > mailing lists

Relation bulk write facility - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Relation bulk write facility
Date	September 19, 2023 15:13:47
Msg-id	30e8f366-58b3-b239-c521-422122dd5150@iki.fi Whole thread Raw
Responses	Re: Relation bulk write facility
List	pgsql-hackers

Tree view

Several places bypass the buffer manager and use direct smgrextend() 
calls to populate a new relation: Index AM build methods, rewriteheap.c 
and RelationCopyStorage(). There's fair amount of duplicated code to 
WAL-log the pages, calculate checksums, call smgrextend(), and finally 
call smgrimmedsync() if needed. The duplication is tedious and 
error-prone. For example, if we want to optimize by WAL-logging multiple 
pages in one record, that needs to be implemented in each AM separately. 
Currently only sorted GiST index build does that but it would be equally 
beneficial in all of those places.

And I believe we got the smgrimmedsync() logic slightly wrong in a 
number of places [1]. And it's not great for latency, we could let the 
checkpointer do the fsyncing lazily, like Robert mentioned in the same 
thread.

The attached patch centralizes that pattern to a new bulk writing 
facility, and changes all those AMs to use it. The facility buffers 32 
pages and WAL-logs them in record, calculates checksums. You could 
imagine a lot of further optimizations, like writing those 32 pages in 
one vectored pvwrite() call [2], and not skipping the buffer manager 
when the relation is small. But the scope of this initial version is 
mostly to refactor the existing code.

One new optimization included here is to let the checkpointer do the 
fsyncing if possible. That gives a big speedup when e.g. restoring a 
schema-only dump with lots of relations.

[1] 
https://www.postgresql.org/message-id/58effc10-c160-b4a6-4eb7-384e95e6f9e3%40iki.fi

[2] 
https://www.postgresql.org/message-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

v1-0001-Introduce-a-new-bulk-loading-facility.patch

pgsql-hackers by date:

From: Chris Cleveland
Date: 19 September 2023, 14:32:04
Subject: Projection pushdown to index access method

From: Robert Haas
Date: 19 September 2023, 15:41:14
Subject: Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

Relation bulk write facility - Mailing list pgsql-hackers

Attachment

Previous

Next