Several places bypass the buffer manager and use direct smgrextend()
calls to populate a new relation: Index AM build methods, rewriteheap.c
and RelationCopyStorage(). There's fair amount of duplicated code to
WAL-log the pages, calculate checksums, call smgrextend(), and finally
call smgrimmedsync() if needed. The duplication is tedious and
error-prone. For example, if we want to optimize by WAL-logging multiple
pages in one record, that needs to be implemented in each AM separately.
Currently only sorted GiST index build does that but it would be equally
beneficial in all of those places.
And I believe we got the smgrimmedsync() logic slightly wrong in a
number of places [1]. And it's not great for latency, we could let the
checkpointer do the fsyncing lazily, like Robert mentioned in the same
thread.
The attached patch centralizes that pattern to a new bulk writing
facility, and changes all those AMs to use it. The facility buffers 32
pages and WAL-logs them in record, calculates checksums. You could
imagine a lot of further optimizations, like writing those 32 pages in
one vectored pvwrite() call [2], and not skipping the buffer manager
when the relation is small. But the scope of this initial version is
mostly to refactor the existing code.
One new optimization included here is to let the checkpointer do the
fsyncing if possible. That gives a big speedup when e.g. restoring a
schema-only dump with lots of relations.
[1]
https://www.postgresql.org/message-id/58effc10-c160-b4a6-4eb7-384e95e6f9e3%40iki.fi
[2]
https://www.postgresql.org/message-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
--
Heikki Linnakangas
Neon (https://neon.tech)