Re: posix_fadvise missing in the walsender - Mailing list pgsql-hackers

From Joachim Wieland
Subject Re: posix_fadvise missing in the walsender
Date
Msg-id CACw0+13C6mbCtGmYAyx1_+RsjCiKJhJ7WpG3JhfosF2CXxGndA@mail.gmail.com
Whole thread Raw
In response to Re: posix_fadvise missing in the walsender  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: posix_fadvise missing in the walsender  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Feb 20, 2013 at 4:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 19, 2013 at 5:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I agree with Merlin and Joachim - if we have the call in one place, we
>> should have it in both.
>
> We might want to assess whether we even want to have it one place.
> I've seen cases where the existing call hurts performance, because of
> WAL file recycling.

That's interesting, I hadn't thought about WAL recycling.

I now agree that this whole thing is even more complicated, you might
have an archive_command set as well, like "cp" for instance, that
reads in the WAL file again, possibly even right after we called
posix_fadvise on it.

It appears to me that the right strategy depends on a few factors:

a) what ratio of your active dataset fits into RAM?
b) how many WAL files do you have?
c) how long does it take for them to get recycled?
d) archive_command set / wal_senders active?

And recommendations for the two extremes would be:

If your dataset fits mostly into RAM and if you have only few WAL
files that get recycled quickly then you don't want to evict the WAL
file from the buffer cache.
On the other hand if your dataset doesn't fit into RAM and you have
many WAL files that take a while until they get recycled, then you
should consider hinting to the OS.

If you're in that second category (I am) and you're also using the
archive_command you could just piggyback the posix_fadvise call onto
your archive_command, assuming that the walsender is already done with
the file at that moment. And I'm also pretty certain that Robert's
setup that he used for the write scalability tests fell into the first
category.

So given the above, I think it's possible to come up with benchmarks
that prove whatever you want to prove :-)


Joachim



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: [RFC] indirect toast tuple support
Next
From: Michael Paquier
Date:
Subject: Re: Support for REINDEX CONCURRENTLY