Re: pgsql: Add contrib/pg_walinspect. - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pgsql: Add contrib/pg_walinspect.
Date
Msg-id CALj2ACWmf5YE4VvJ3b-n8MkygXudVLfweuCz5ZVvkUq7gzcnOg@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Add contrib/pg_walinspect.  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: pgsql: Add contrib/pg_walinspect.  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Apr 28, 2022 at 8:41 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2022-04-28 at 12:11 +1200, Thomas Munro wrote:
> >
> > Another option might be to abandon this whole no-wait concept and
> > revert 2258e76f's changes to xlogutils.c.  pg_walinspect already does
> > preliminary checks that LSNs are in range, so you can't enter a value
> > that will wait indefinitely, and it's interruptible (it's a 1ms
> > sleep/check loop, not my favourite programming pattern but that's
> > pre-existing code).  If you're unlucky enough to hit the case where
> > the LSN is judged to be valid but is in the middle of a record that
> > hasn't been totally flushed yet, it'll just be a bit slower to return
> > as we wait for the inevitable later flush(es) to happen.  The rest of
> > your record will *surely* be flushed pretty soon (or the flushing
> > backend panics the whole system and time ends).  I don't imagine this
> > is performance critical work, so maybe that'd be acceptable?
>
> I'm inclined toward this option. I was a bit iffy on those xlogutils.c
> changes to begin with, and they are causing a problem now, so I'd like
> to avoid layering on more workarounds.
>
> The time when we need to wait is very narrow, only in this case where
> it's earlier than the flush pointer, and the flush pointer is in the
> middle of a record that's not fully flushed. And as you say, we won't
> be waiting very long in that case, because once we start to write a WAL
> record it better finish soon.
>
> Bharath, thoughts? When you originally introduced the nowait behavior,
> I believe that was to solve the case where someone specifies an LSN
> range well in the future, but we can still catch that and throw an
> error if we see that it's beyond the flush pointer. Do you see a
> problem with just doing that and getting rid of the nowait changes?

It's not just the flush ptr, without no wait mode, the functions would
wait if start/input lsn is, say current flush lsn - 1 or 2 or more
(before the previous record) bytes. If the functions were to wait, by
how much time should they wait? a timeout? forever? This is what the
complexity we wanted to avoid. I would still vote for the private_data
approach and if the end of WAL reaches when flush lsn falls in the
middle of the record, let's just exit the loop and report the results
back to the client.

I addressed Thomas's review comment and attached v3 patch. Please have a look.

Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: bogus: logical replication rows/cols combinations
Next
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply