Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait
Date
Msg-id 20170120014557.GB18360@tamriel.snowman.net
Whole thread Raw
In response to Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
* Andres Freund (andres@anarazel.de) wrote:
> On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > WAL replay does do more work, generally speaking (the WAL has to be
> > read, the checksum validated on it, and then the write has to go out,
> > while the checkpointer just writes the page out from memory), but it's
> > also dealing with less contention on the system (there aren't a bunch of
> > backends hammering the disks to pull data in with reads when you're
> > doing crash recovery...).
>
> There's a huge difference though: WAL replay is single threaded, whereas
> generating WAL is not.

I'm aware- but *checkpointing* is still single-threaded, unless, as I
mentioned, you end up with backends pushing out their own changes to the
heap to make room for new pages to come in.  Or is there some other way
the checkpoint ends up being performed with multiple processes?

> Especially if there's synchronous IO required
> (most commonly reading in data, because more data was modified in the
> current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> buffers), you can be significantly slower than generating the WAL.

That is an interesting point, if I'm following what you're saying
correctly- during the replay we can end up having more pages modified
than fit in shared buffers, which means that we have to read back in
pages that we pushed out to implement the non-FPI WAL changes to that
page.  I wonder if we should have a way to configure the amount of
memory allowed to be used for WAL replay, independent of shared_buffers?

I mean, really, during crash recovery on a dedicated database box, you'd
probably want to say "ALL the memory can be used if it makes crash
recovery faster!".  That said, I wonder if our eviction algorithm could
be improved/changed when performing WAL replay too to reduce the chances
that we'll have to read a page back in.

Very interesting.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctlstart without --wait