Home > mailing lists

Re: posix_fadvise missing in the walsender - Mailing list pgsql-hackers

From	Joachim Wieland
Subject	Re: posix_fadvise missing in the walsender
Date	February 21, 2013 02:49:30
Msg-id	CACw0+13C6mbCtGmYAyx1_+RsjCiKJhJ7WpG3JhfosF2CXxGndA@mail.gmail.com Whole thread Raw
In response to	Re: posix_fadvise missing in the walsender (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: posix_fadvise missing in the walsender
List	pgsql-hackers

Tree view

On Wed, Feb 20, 2013 at 4:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 19, 2013 at 5:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I agree with Merlin and Joachim - if we have the call in one place, we
>> should have it in both.
>
> We might want to assess whether we even want to have it one place.
> I've seen cases where the existing call hurts performance, because of
> WAL file recycling.

That's interesting, I hadn't thought about WAL recycling.

I now agree that this whole thing is even more complicated, you might
have an archive_command set as well, like "cp" for instance, that
reads in the WAL file again, possibly even right after we called
posix_fadvise on it.

It appears to me that the right strategy depends on a few factors:

a) what ratio of your active dataset fits into RAM?
b) how many WAL files do you have?
c) how long does it take for them to get recycled?
d) archive_command set / wal_senders active?

And recommendations for the two extremes would be:

If your dataset fits mostly into RAM and if you have only few WAL
files that get recycled quickly then you don't want to evict the WAL
file from the buffer cache.
On the other hand if your dataset doesn't fit into RAM and you have
many WAL files that take a while until they get recycled, then you
should consider hinting to the OS.

If you're in that second category (I am) and you're also using the
archive_command you could just piggyback the posix_fadvise call onto
your archive_command, assuming that the walsender is already done with
the file at that moment. And I'm also pretty certain that Robert's
setup that he used for the write scalability tests fell into the first
category.

So given the above, I think it's possible to come up with benchmarks
that prove whatever you want to prove :-)

Joachim

pgsql-hackers by date:

From: Greg Stark
Date: 21 February 2013, 02:38:15
Subject: Re: [RFC] indirect toast tuple support

From: Michael Paquier
Date: 21 February 2013, 02:56:02
Subject: Re: Support for REINDEX CONCURRENTLY

Re: posix_fadvise missing in the walsender - Mailing list pgsql-hackers

Previous

Next