Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers

From Andres Freund
Subject Re: refactoring relation extension and BufferAlloc(), faster COPY
Date
Msg-id 20230301223515.pucbj7nb54n4i4nv@awork3.anarazel.de
Whole thread Raw
In response to Re: refactoring relation extension and BufferAlloc(), faster COPY  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2023-03-02 00:04:14 +0200, Heikki Linnakangas wrote:
> On 01/03/2023 09:33, Andres Freund wrote:
> > typedef enum ExtendBufferedFlags {
> >     EB_SKIP_EXTENSION_LOCK = (1 << 0),
> >     EB_IN_RECOVERY = (1 << 1),
> >     EB_CREATE_FORK_IF_NEEDED = (1 << 2),
> >     EB_LOCK_FIRST = (1 << 3),
> >
> >     /* internal flags follow */
> >     EB_RELEASE_PINS = (1 << 4),
> > } ExtendBufferedFlags;
>
> Is EB_IN_RECOVERY always set when RecoveryInProgress()? Is it really needed?

Right now it's just passed in from the caller. It's at the moment just needed
to know what to pass to smgrcreate(isRedo).

However, XLogReadBufferExtended() doesn't currently use this path, so maybe
it's not actually needed.


> What does EB_LOCK_FIRST do?

Lock the first returned buffer, this is basically the replacement for
num_locked_buffers from the earlier version. I think it's likely that either
locking the first, or potentially at some later point locking all buffers, is
all that's needed for ExtendBufferedRelBy().

EB_LOCK_FIRST_BUFFER would perhaps be better?


> > /*
> >   * To identify the relation - either relation or smgr + relpersistence has to
> >   * be specified. Used via the EB_REL()/EB_SMGR() macros below. This allows us
> >   * to use the same function for both crash recovery and normal operation.
> >   */
> > typedef struct ExtendBufferedWhat
> > {
> >     Relation rel;
> >     struct SMgrRelationData *smgr;
> >     char relpersistence;
> > } ExtendBufferedWhat;
> >
> > #define EB_REL(p_rel) ((ExtendBufferedWhat){.rel = p_rel})
> > /* requires use of EB_SKIP_EXTENSION_LOCK */
> > #define EB_SMGR(p_smgr, p_relpersistence) ((ExtendBufferedWhat){.smgr = p_smgr, .relpersistence =
p_relpersistence})
>
> Clever. I'm still not 100% convinced we need the EB_SMGR variant, but with
> this we'll have the flexibility in any case.

Hm - how would you use it from XLogReadBufferExtended() without that?


XLogReadBufferExtended() spends a disappointing amount of time in
smgropen(). Quite visible in profiles.

In the plan read case at least one in XLogReadBufferExtended()
itself, then in ReadBufferWithoutRelcache(). The extension path right now is
worse - it does one smgropen() for each extended block.

I think we should avoid using ReadBufferWithoutRelcache() in
XLogReadBufferExtended() in the read path as well, but that's for later.


> > extern Buffer ExtendBufferedRel(ExtendBufferedWhat eb,
> >                                 ForkNumber forkNum,
> >                                 BufferAccessStrategy strategy,
> >                                 uint32 flags);
> > extern BlockNumber ExtendBufferedRelBy(ExtendBufferedWhat eb,
> >                                        ForkNumber fork,
> >                                        BufferAccessStrategy strategy,
> >                                        uint32 flags,
> >                                        uint32 extend_by,
> >                                        Buffer *buffers,
> >                                        uint32 *extended_by);
> > extern Buffer ExtendBufferedRelTo(ExtendBufferedWhat eb,
> >                                   ForkNumber fork,
> >                                   BufferAccessStrategy strategy,
> >                                   uint32 flags,
> >                                   BlockNumber extend_to,
> >                                   ReadBufferMode mode);
> >
> > As you can see I removed ReadBufferMode from most of the functions (as
> > suggested by Heikki earlier). When extending by 1/multiple pages, we only need
> > to know whether to lock or not.
>
> Ok, that's better. Still complex and a lot of arguments, but I don't have
> any great suggestions on how to improve it.

I don't think there are going to be all that many callers of
ExtendBufferedRelBy() and ExtendBufferedRelTo(), most are going to be
ExtendBufferedRel(), I think. So the complexity seems acceptable.


> > The reason ExtendBufferedRelTo() has a 'mode' argument is that that allows to
> > fall back to reading page normally if there was a concurrent relation
> > extension.
>
> Hmm, I think you'll need another return value, to let the caller know if the
> relation was extended or not. Or a flag to ereport(ERROR) if the page
> already exists, for ginbuild() and friends.

I don't think ginbuild() et al need to use ExtendBufferedRelTo()? A plain
ExtendBufferedRel() should suffice.  The places that do need it are
fsm_extend() and vm_extend() - I did end up avoiding the redundant lookup.

But I was wondering about a flag controlling this as well.


Attached is my current version of this. Still needs more polishing, including
comments explaining the flags. But I thought it'd be useful to have it out
there.

There's two new patches in the series:
- a patch to not initialize pages in the loop in fsm_extend(), vm_extend()
  anymore - we have a check about initializing pages at a later point, so
  there doesn't really seem to be a need for it?
- a patch to use the new ExtendBufferedRelTo() in fsm_extend(), vm_extend()
  and XLogReadBufferExtended()


In this version I also tries to address some of the other feedback raised in
the thread. One thing I haven't decided what to do about yet is David's
feedback about a version of PinLocalBuffer() that doesn't adjust the
usagecount, which wouldn't need to read the buf_state.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: cataloguing NOT NULL constraints
Next
From: Nathan Bossart
Date:
Subject: Re: stopgap fix for signal handling during restore_command