Thread: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
Reference: http://postgresql.1045698.n5.nabble.com/Simple-join-doesn-t-use-index-td5738689.html

This is a pretty common gotcha: user sets shared_buffers but misses
the esoteric but important effective_cache_size.  ISTM
effective_cache_size should always be >= shared buffers -- this is a
soft configuration error that could be reported as a warning and
perhaps overridden on the fly.

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Claudio Freire
Date:
On Tue, Jan 8, 2013 at 11:39 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Reference: http://postgresql.1045698.n5.nabble.com/Simple-join-doesn-t-use-index-td5738689.html
>
> This is a pretty common gotcha: user sets shared_buffers but misses
> the esoteric but important effective_cache_size.  ISTM
> effective_cache_size should always be >= shared buffers -- this is a
> soft configuration error that could be reported as a warning and
> perhaps overridden on the fly.

Not true. If there are many concurrent users running concurrent
queries against parallel databases, such as some test systems I have
that contain many databases for many test environments, such a setting
wouldn't make sense. If a DBA sets it to lower than shared_buffers,
that setting has to be honored.

Rather, I'd propose the default setting should be "-1" or something
"default" and "automagic" that works most of the time (but not all).



On Tue, Jan 8, 2013 at 9:53 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Jan 8, 2013 at 11:39 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Reference: http://postgresql.1045698.n5.nabble.com/Simple-join-doesn-t-use-index-td5738689.html
>>
>> This is a pretty common gotcha: user sets shared_buffers but misses
>> the esoteric but important effective_cache_size.  ISTM
>> effective_cache_size should always be >= shared buffers -- this is a
>> soft configuration error that could be reported as a warning and
>> perhaps overridden on the fly.
>
> Not true. If there are many concurrent users running concurrent
> queries against parallel databases, such as some test systems I have
> that contain many databases for many test environments, such a setting
> wouldn't make sense. If a DBA sets it to lower than shared_buffers,
> that setting has to be honored.
>
> Rather, I'd propose the default setting should be "-1" or something
> "default" and "automagic" that works most of the time (but not all).

+1.  I've found that a value of three-quarters of system memory works
pretty well most of the time.  Of course, there's not a single,
portable way of getting that on every platform we support.  If I
remember my last investigation into this area, to use that particular
rule we would probably need at least three paths - one for Windows,
one for System V descendents, and one for BSD descendents.  And there
might still be obscure things that wouldn't be covered.  Of course
this also makes the admittedly unwarranted assumption that the
database owns the box, which could be wrong too, but purposely
lowballing effective_cache_size to discourage index-scan plans seems
unlikely to be a winning strategy.

A cruder heuristic that might be useful is 3 * shared_buffers.  If
people follow the guidance to set shared_buffers around 25% of RAM,
then this will work out to around 75% again.  Of course, many people,
for valid reasons, use smaller values of shared_buffers than that, and
a few use larger ones.  It might still be better than no auto-tuning,
though I wouldn't swear to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Tue, Jan  8, 2013 at 05:23:36PM -0500, Robert Haas wrote:
> > Rather, I'd propose the default setting should be "-1" or something
> > "default" and "automagic" that works most of the time (but not all).
> 
> +1.  I've found that a value of three-quarters of system memory works
> pretty well most of the time.  Of course, there's not a single,
> portable way of getting that on every platform we support.  If I
> remember my last investigation into this area, to use that particular
> rule we would probably need at least three paths - one for Windows,
> one for System V descendents, and one for BSD descendents.  And there
> might still be obscure things that wouldn't be covered.  Of course
> this also makes the admittedly unwarranted assumption that the
> database owns the box, which could be wrong too, but purposely
> lowballing effective_cache_size to discourage index-scan plans seems
> unlikely to be a winning strategy.
> 
> A cruder heuristic that might be useful is 3 * shared_buffers.  If
> people follow the guidance to set shared_buffers around 25% of RAM,
> then this will work out to around 75% again.  Of course, many people,
> for valid reasons, use smaller values of shared_buffers than that, and
> a few use larger ones.  It might still be better than no auto-tuning,
> though I wouldn't swear to it.

Agreed.  This is similar to the fudge we have about random_page_cost:
       Random access to mechanical disk storage is normally much more expensive       than four-times sequential
access. However, a lower default is used       (4.0) because the majority of random accesses to disk, such as indexed
   reads, are assumed to be in cache.  The default value can be thought of       as modeling random access as 40 times
slowerthan sequential, while       expecting 90% of random reads to be cached.
 

effective_cache_size is impossible to set accurately because you have no
idea what other things might be in the cache, or what other concurrent
queries might be filling the cache.  Going with something at least
partly reasonable makes a lot more sense.  While we don't know the size
of RAM, we do know the size of shared_buffers, and keying on that for a
default seems like a no-brainer, rather than using 128MB.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 8, 2013 at 9:53 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> Rather, I'd propose the default setting should be "-1" or something
>> "default" and "automagic" that works most of the time (but not all).

> A cruder heuristic that might be useful is 3 * shared_buffers.

Both parts of that work for me.  It's certainly silly that the default
value of effective_cache_size is now equivalent to the default value
of shared_buffers.  And I don't especially like the idea of trying to
make it depend directly on the box's physical RAM, for the same
practical reasons Robert mentioned.

It might be better to use 4 * shared_buffers, as that corresponds to the
multiple that's been the default since 8.2 or so (ie 128MB vs 32MB), and
3x just seems kinda oddball.
        regards, tom lane



On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jan 8, 2013 at 9:53 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> Rather, I'd propose the default setting should be "-1" or something
>>> "default" and "automagic" that works most of the time (but not all).
>
>> A cruder heuristic that might be useful is 3 * shared_buffers.
>
> Both parts of that work for me.  It's certainly silly that the default
> value of effective_cache_size is now equivalent to the default value
> of shared_buffers.  And I don't especially like the idea of trying to
> make it depend directly on the box's physical RAM, for the same
> practical reasons Robert mentioned.

For the record, I don't believe those problems would be particularly
hard to solve.

> It might be better to use 4 * shared_buffers, as that corresponds to the
> multiple that's been the default since 8.2 or so (ie 128MB vs 32MB), and
> 3x just seems kinda oddball.

I suspect that would be OK, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ...  And I don't especially like the idea of trying to
>> make it depend directly on the box's physical RAM, for the same
>> practical reasons Robert mentioned.

> For the record, I don't believe those problems would be particularly
> hard to solve.

Well, the problem of "find out the box's physical RAM" is doubtless
solvable if we're willing to put enough sweat and tears into it, but
I'm dubious that it's worth the trouble.  The harder part is how to know
if the box is supposed to be dedicated to the database.  Bear in mind
that the starting point of this debate was the idea that we're talking
about an inexperienced DBA who doesn't know about any configuration knob
we might provide for the purpose.

I'd prefer to go with a default that's predictable and not totally
foolish --- and some multiple of shared_buffers seems like it'd fit the
bill.
        regards, tom lane



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Andrew Dunstan
Date:
On 01/08/2013 08:08 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ...  And I don't especially like the idea of trying to
>>> make it depend directly on the box's physical RAM, for the same
>>> practical reasons Robert mentioned.
>> For the record, I don't believe those problems would be particularly
>> hard to solve.
> Well, the problem of "find out the box's physical RAM" is doubtless
> solvable if we're willing to put enough sweat and tears into it, but
> I'm dubious that it's worth the trouble.  The harder part is how to know
> if the box is supposed to be dedicated to the database.  Bear in mind
> that the starting point of this debate was the idea that we're talking
> about an inexperienced DBA who doesn't know about any configuration knob
> we might provide for the purpose.
>
> I'd prefer to go with a default that's predictable and not totally
> foolish --- and some multiple of shared_buffers seems like it'd fit the
> bill.

+1. That seems to be by far the biggest bang for the buck. Anything else 
will surely involve a lot more code for not much more benefit.

cheers

andrew



All,

>> Well, the problem of "find out the box's physical RAM" is doubtless
>> solvable if we're willing to put enough sweat and tears into it, but
>> I'm dubious that it's worth the trouble.  The harder part is how to know
>> if the box is supposed to be dedicated to the database.  Bear in mind
>> that the starting point of this debate was the idea that we're talking
>> about an inexperienced DBA who doesn't know about any configuration knob
>> we might provide for the purpose.

Frankly, you'd need to go through a whole decision tree to do this right:

- How much RAM do you have?
- Is that RAM shared with other services?
- Is this a DW or OLTP server?
... etc.

We just don't want to get into that in the core code.  However ...

>> I'd prefer to go with a default that's predictable and not totally
>> foolish --- and some multiple of shared_buffers seems like it'd fit the
>> bill.
> 
> +1. That seems to be by far the biggest bang for the buck. Anything else
> will surely involve a lot more code for not much more benefit.

I don't think we're going far enough here.  I think there should be an
optional setting in postgresql.conf called:

available_ram

The, shared_buffers, wal_buffers, and effective_cache_size (and possible
other future settings) can be set to -1.  If they are set to -1, then we
use the figure:

shared_buffers = available_ram * 0.25(with a ceiling of 8GB)
wal_buffers = available_ram * 0.05(with a ceiling of 32MB)
effective_cache_size = available_ram * 0.75(with a floor of 128MB)

If they are set to an amount, then we use the amount they are set to.

It would be nice to also automatically set work_mem, maint_work_mem,
temp_buffers, etc. based on the above, but that would be considerably
more difficult and require performance testing we haven't done yet.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Benedikt Grundmann
Date:



On Wed, Jan 9, 2013 at 2:01 AM, Josh Berkus <josh@agliodbs.com> wrote:
All,

>> Well, the problem of "find out the box's physical RAM" is doubtless
>> solvable if we're willing to put enough sweat and tears into it, but
>> I'm dubious that it's worth the trouble.  The harder part is how to know
>> if the box is supposed to be dedicated to the database.  Bear in mind
>> that the starting point of this debate was the idea that we're talking
>> about an inexperienced DBA who doesn't know about any configuration knob
>> we might provide for the purpose.

For what it is worth even if it is a dedicated database box 75% might be way too high. I remember investigating bad performance on our biggest database server, that in the end turned out to be a too high setting of effective_cache_size. From reading the code back then my rationale for it being to high was that the code that makes use of the effective_cache_size tries very hard to account for what the current query would do to the cache but doesn't take into account how many queries (on separate datasets!) are currently begin executed (and competing for the same cache).  On that box we often have 100+ active connections and many looking at different big datasets.

Cheers,

bene

Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Well, the problem of "find out the box's physical RAM" is doubtless
> solvable if we're willing to put enough sweat and tears into it, but
> I'm dubious that it's worth the trouble.  The harder part is how to know
> if the box is supposed to be dedicated to the database.  Bear in mind
> that the starting point of this debate was the idea that we're talking
> about an inexperienced DBA who doesn't know about any configuration knob
> we might provide for the purpose.

It seems to me that pgfincore has the smarts we need to know about that,
and that Cédric has code and refenrences for making it work on all
platforms we care about (linux, bsd, windows for starters).

Then we could have a pretty decent snapshot of the information, and
maybe a background worker of some sort would be able to refresh the
information while the server is running.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Dimitri,

> It seems to me that pgfincore has the smarts we need to know about that,
> and that Cédric has code and refenrences for making it work on all
> platforms we care about (linux, bsd, windows for starters).

Well, fincore is Linux-only, and for that matter only more recent
versions of linux.  It would be great if we could just detect the RAM,
but then we *still* have to ask the user if this is a dedicated box or not.

> Then we could have a pretty decent snapshot of the information, and
> maybe a background worker of some sort would be able to refresh the
> information while the server is running.

I don't see a background worker as necessary or even desirable; people
don't frequently add RAM to their systems.  A manually callable command
would be completely adequate.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Claudio Freire
Date:
On Wed, Jan 9, 2013 at 3:39 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> It seems to me that pgfincore has the smarts we need to know about that,
>> and that Cédric has code and refenrences for making it work on all
>> platforms we care about (linux, bsd, windows for starters).
>
> Well, fincore is Linux-only, and for that matter only more recent
> versions of linux.  It would be great if we could just detect the RAM,
> but then we *still* have to ask the user if this is a dedicated box or not.

Not really. I'm convinced, and not only for e_c_s, that
autoconfiguration is within the realm of possibility.

However, since such a statement holds little weight without a patch
attached, I've been keeping it to myself (until I can invest the
effort needed for that patch).

In any case, as eavesdroppers can infer a cryptographic key by timing
operations or measuring power consumption, I'm pretty sure postgres
can infer cost metrics and/or time sharing with clever
instrumentation. The trick lies in making such instrumentation
uninstrusive.



Claudio,

> Not really. I'm convinced, and not only for e_c_s, that
> autoconfiguration is within the realm of possibility.

Hey, if you can do it, my hat's off to you.

> In any case, as eavesdroppers can infer a cryptographic key by timing
> operations or measuring power consumption, I'm pretty sure postgres
> can infer cost metrics and/or time sharing with clever
> instrumentation. The trick lies in making such instrumentation
> uninstrusive.

... and not requiring a great deal of code maintenance for each and
every release of Linux and Windows.

Anyway, we could do something for 9.3 if we just make "available RAM" a
manual setting.  Asking the user "how much RAM is available for
Postgres" is not a terribly difficult question.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Wed, Jan 9, 2013 at 12:38 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:

> For what it is worth even if it is a dedicated database box 75% might be way
> too high. I remember investigating bad performance on our biggest database
> server, that in the end turned out to be a too high setting of
> effective_cache_size. From reading the code back then my rationale for it
> being to high was that the code that makes use of the effective_cache_size
> tries very hard to account for what the current query would do to the cache
> but doesn't take into account how many queries (on separate datasets!) are
> currently begin executed (and competing for the same cache).  On that box we
> often have 100+ active connections and many looking at different big
> datasets.

I think that most busy installations either run a lot of small queries
(for which effective_cache_size is irrelevant), or a few large
queries.  Your case is probably somewhat rare, and so as far as
defaults go, it would be sacrificed for the common good. The docs do
anticipate the need to account for multiple concurrent queries to be
discounted in deciding how to set effective_cache_size, but perhaps
the wording could be improved.

Out of curiosity, what did your queries look like after you lowered
effective_cache_size?  Were there a lot of sequential scans, or did it
just choose different indexes than it had before?  If a lot of
sequential scans, were they mostly on just a few tables that each had
many sequential scans going on simultaneously, or was it 100+
different tables each with one sequential scan going on?  (You said
different big datasets, but I don't know if these are in different
tables, or in common tables with a column to distinguish them.)

Cheers,

Jeff



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Tue, Jan  8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote:
>
> On 01/08/2013 08:08 PM, Tom Lane wrote:
> >Robert Haas <robertmhaas@gmail.com> writes:
> >>On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>...  And I don't especially like the idea of trying to
> >>>make it depend directly on the box's physical RAM, for the same
> >>>practical reasons Robert mentioned.
> >>For the record, I don't believe those problems would be particularly
> >>hard to solve.
> >Well, the problem of "find out the box's physical RAM" is doubtless
> >solvable if we're willing to put enough sweat and tears into it, but
> >I'm dubious that it's worth the trouble.  The harder part is how to know
> >if the box is supposed to be dedicated to the database.  Bear in mind
> >that the starting point of this debate was the idea that we're talking
> >about an inexperienced DBA who doesn't know about any configuration knob
> >we might provide for the purpose.
> >
> >I'd prefer to go with a default that's predictable and not totally
> >foolish --- and some multiple of shared_buffers seems like it'd fit the
> >bill.
>
> +1. That seems to be by far the biggest bang for the buck. Anything
> else will surely involve a lot more code for not much more benefit.

I have developed the attached patch which implements an auto-tuned
effective_cache_size which is 4x the size of shared buffers.  I had to
set effective_cache_size to its old 128MB default so the EXPLAIN
regression tests would pass unchanged.

I considered a new available_ram variable but that just gives us another
variable, and in a way shared_buffers is a fixed amount, while
effective_cache_size is an estimate, so I thought driving everything
from shared_buffers made sense.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Magnus Hagander
Date:
On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jan  8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote:
>>
>> On 01/08/2013 08:08 PM, Tom Lane wrote:
>> >Robert Haas <robertmhaas@gmail.com> writes:
>> >>On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >>>...  And I don't especially like the idea of trying to
>> >>>make it depend directly on the box's physical RAM, for the same
>> >>>practical reasons Robert mentioned.
>> >>For the record, I don't believe those problems would be particularly
>> >>hard to solve.
>> >Well, the problem of "find out the box's physical RAM" is doubtless
>> >solvable if we're willing to put enough sweat and tears into it, but
>> >I'm dubious that it's worth the trouble.  The harder part is how to know
>> >if the box is supposed to be dedicated to the database.  Bear in mind
>> >that the starting point of this debate was the idea that we're talking
>> >about an inexperienced DBA who doesn't know about any configuration knob
>> >we might provide for the purpose.
>> >
>> >I'd prefer to go with a default that's predictable and not totally
>> >foolish --- and some multiple of shared_buffers seems like it'd fit the
>> >bill.
>>
>> +1. That seems to be by far the biggest bang for the buck. Anything
>> else will surely involve a lot more code for not much more benefit.
>
> I have developed the attached patch which implements an auto-tuned
> effective_cache_size which is 4x the size of shared buffers.  I had to
> set effective_cache_size to its old 128MB default so the EXPLAIN
> regression tests would pass unchanged.

That's not really autotuning though. ISTM that making the *default* 4
x shared_buffers might make perfect sense, but do we really need to
hijack the value of "-1" for that? That might be useful for some time
when we have actual autotuning, that somehow inspects the system and
tunes it from there.

I also don't think it should be called autotuning, when it's just a
"smarter default value".

I like the feature, though, just not the packaging.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> I have developed the attached patch which implements an auto-tuned
>> effective_cache_size which is 4x the size of shared buffers.  I had to
>> set effective_cache_size to its old 128MB default so the EXPLAIN
>> regression tests would pass unchanged.

> That's not really autotuning though. ISTM that making the *default* 4
> x shared_buffers might make perfect sense, but do we really need to
> hijack the value of "-1" for that? That might be useful for some time
> when we have actual autotuning, that somehow inspects the system and
> tunes it from there.

Well, the real problem with this patch is that it documents what the
auto-tuning algorithm is; without that commitment, just saying "-1 means
autotune" might be fine.

Did you consider the alternative of just tweaking initdb to insert a
default for effective_cache_size that's 4x whatever it picks for
shared_buffers?  That would probably be about 3 lines of code, and it
wouldn't nail down any particular server-side behavior.
        regards, tom lane



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Thu, Sep  5, 2013 at 06:14:33PM +0200, Magnus Hagander wrote:
> > I have developed the attached patch which implements an auto-tuned
> > effective_cache_size which is 4x the size of shared buffers.  I had to
> > set effective_cache_size to its old 128MB default so the EXPLAIN
> > regression tests would pass unchanged.
> 
> That's not really autotuning though. ISTM that making the *default* 4
> x shared_buffers might make perfect sense, but do we really need to
> hijack the value of "-1" for that? That might be useful for some time
> when we have actual autotuning, that somehow inspects the system and
> tunes it from there.
>
> I also don't think it should be called autotuning, when it's just a
> "smarter default value".
> 
> I like the feature, though, just not the packaging.

That "auto-tuning" text came from the wal_buffer documentation, which
does exactly this based on shared_buffers:
       The contents of the WAL buffers are written out to disk at every       transaction commit, so extremely large
valuesare unlikely to       provide a significant benefit.  However, setting this value to at       least a few
megabytescan improve write performance on a busy
 
-->     server where many clients are committing at once.  The auto-tuning
               -----------       selected by the default setting of -1 should give reasonable       results in most
cases.

I am fine with rewording and not using -1, but we should change the
wal_buffer default and documentation too then.  I am not sure what other
value than -1 to use?  0?  I figure if we ever get better auto-tuning,
we would just remove this functionality and make it better.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Thu, Sep  5, 2013 at 12:48:54PM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >> I have developed the attached patch which implements an auto-tuned
> >> effective_cache_size which is 4x the size of shared buffers.  I had to
> >> set effective_cache_size to its old 128MB default so the EXPLAIN
> >> regression tests would pass unchanged.
> 
> > That's not really autotuning though. ISTM that making the *default* 4
> > x shared_buffers might make perfect sense, but do we really need to
> > hijack the value of "-1" for that? That might be useful for some time
> > when we have actual autotuning, that somehow inspects the system and
> > tunes it from there.
> 
> Well, the real problem with this patch is that it documents what the
> auto-tuning algorithm is; without that commitment, just saying "-1 means
> autotune" might be fine.

OK, but I did this based on wal_buffers, which has a -1 default, calls
it auto-tuning, and explains how the default is computed.

> Did you consider the alternative of just tweaking initdb to insert a
> default for effective_cache_size that's 4x whatever it picks for
> shared_buffers?  That would probably be about 3 lines of code, and it
> wouldn't nail down any particular server-side behavior.

The problem there is that many users are told to tune shared_buffers,
but don't touch effective cache size.  Having initdb set the
effective_cache_size value would not help there.  Again, this is all
based on the auto-tuning of wal_buffers.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Josh Berkus
Date:
On 09/05/2013 02:16 PM, Bruce Momjian wrote:
>> Well, the real problem with this patch is that it documents what the
>> auto-tuning algorithm is; without that commitment, just saying "-1 means
>> autotune" might be fine.
> 
> OK, but I did this based on wal_buffers, which has a -1 default, calls
> it auto-tuning, and explains how the default is computed.

I don't see a real problem with this.  For users who have set their
shared_buffers correctly, effective_cache_size should also be correct.

> The problem there is that many users are told to tune shared_buffers,
> but don't touch effective cache size.  Having initdb set the
> effective_cache_size value would not help there.  Again, this is all
> based on the auto-tuning of wal_buffers.

Standard advice we've given in the past is 25% shared buffers, 75%
effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.Maybe we're changing the conventional
calculation,but I thought I'd
 
point that out.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
On Thu, Sep 5, 2013 at 5:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/05/2013 02:16 PM, Bruce Momjian wrote:
>>> Well, the real problem with this patch is that it documents what the
>>> auto-tuning algorithm is; without that commitment, just saying "-1 means
>>> autotune" might be fine.
>>
>> OK, but I did this based on wal_buffers, which has a -1 default, calls
>> it auto-tuning, and explains how the default is computed.
>
> I don't see a real problem with this.  For users who have set their
> shared_buffers correctly, effective_cache_size should also be correct.

Agreed.  I think -1 is the right setting for autotune as things stand
today. If we want something else, then we should change other settings
as well (like wal_buffers) and that is not in the scope of this patch.

>> The problem there is that many users are told to tune shared_buffers,
>> but don't touch effective cache size.  Having initdb set the
>> effective_cache_size value would not help there.  Again, this is all
>> based on the auto-tuning of wal_buffers.
>
> Standard advice we've given in the past is 25% shared buffers, 75%
> effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
>  Maybe we're changing the conventional calculation, but I thought I'd
> point that out.

This was debated upthread.

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Thu, Sep  5, 2013 at 03:11:53PM -0700, Josh Berkus wrote:
> On 09/05/2013 02:16 PM, Bruce Momjian wrote:
> >> Well, the real problem with this patch is that it documents what the
> >> auto-tuning algorithm is; without that commitment, just saying "-1 means
> >> autotune" might be fine.
> > 
> > OK, but I did this based on wal_buffers, which has a -1 default, calls
> > it auto-tuning, and explains how the default is computed.
> 
> I don't see a real problem with this.  For users who have set their
> shared_buffers correctly, effective_cache_size should also be correct.
> 
> > The problem there is that many users are told to tune shared_buffers,
> > but don't touch effective cache size.  Having initdb set the
> > effective_cache_size value would not help there.  Again, this is all
> > based on the auto-tuning of wal_buffers.
> 
> Standard advice we've given in the past is 25% shared buffers, 75%
> effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
>  Maybe we're changing the conventional calculation, but I thought I'd
> point that out.

Yes, I had wondered that myself, and 3x and 4x were thrown out as
options.  There were more people who liked 4x, but one of the reasons
was that 3x sounded odd --- not sure what to make of that, but I went
with the most popular.  I am fine with 3x, and I do think it logically
makes more sense, and is less likely to over-estimate than 4x.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Josh Berkus
Date:
On 09/05/2013 03:30 PM, Merlin Moncure wrote:

>> Standard advice we've given in the past is 25% shared buffers, 75%
>> effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
>>  Maybe we're changing the conventional calculation, but I thought I'd
>> point that out.
> 
> This was debated upthread.

Actually, no, it wasn't.  Tom threw out a suggestion that we use 4X for
historical reasons.  That's all, there was no discussion.

So, my point stands: our historical advice has been to set EFS to 75% of
RAM.  Maybe we're changing that advice, but if so, let's change it.
Otherwise 3X makes more sense.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Cédric Villemain
Date:
Le jeudi 5 septembre 2013 17:14:37 Bruce Momjian a écrit :
> On Thu, Sep  5, 2013 at 06:14:33PM +0200, Magnus Hagander wrote:
> > > I have developed the attached patch which implements an auto-tuned
> > > effective_cache_size which is 4x the size of shared buffers.  I had to
> > > set effective_cache_size to its old 128MB default so the EXPLAIN
> > > regression tests would pass unchanged.
> >
> > That's not really autotuning though. ISTM that making the *default* 4
> > x shared_buffers might make perfect sense, but do we really need to
> > hijack the value of "-1" for that? That might be useful for some time
> > when we have actual autotuning, that somehow inspects the system and
> > tunes it from there.
> >
> > I also don't think it should be called autotuning, when it's just a
> > "smarter default value".
> >
> > I like the feature, though, just not the packaging.
>
> That "auto-tuning" text came from the wal_buffer documentation, which
> does exactly this based on shared_buffers:
>
>         The contents of the WAL buffers are written out to disk at every
>         transaction commit, so extremely large values are unlikely to
>         provide a significant benefit.  However, setting this value to at
>         least a few megabytes can improve write performance on a busy
> -->     server where many clients are committing at once.  The auto-tuning
>                                                                -----------
>         selected by the default setting of -1 should give reasonable
>         results in most cases.
>
> I am fine with rewording and not using -1, but we should change the
> wal_buffer default and documentation too then.  I am not sure what other
> value than -1 to use?  0?  I figure if we ever get better auto-tuning,
> we would just remove this functionality and make it better.

I'm fine with a -1 for auto-tune or inteligent default: it means (for me) that
you don't need to care about this parameter in most case.

A negative impact of the simpler multiplier might be that if suddendly someone
reduce the shared_buffers size to fix some strange behavior, then he at the same
needs to increase manualy the effective_cache_size (which remain the sum of the
caches on the system, at least on a dedicated to PostgreSQL one).

IMHO it is easy to know exactly how much of the memory is (or can be) used
for/by PostgreSQL, we can compute that and update effective_cache_size at
regular point int time. (just an idea, I know there are arguments against that
too)

Maybe the value for a 4x multiplier instead of 3x, is that the
effective_cache_size usage can be larger than required. It's not a big trouble.
With all things around NUMA we maybe just need to revisit that area (memory
access cost non linear, double-triple caching, ...) .
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Thu, Sep  5, 2013 at 09:02:27PM -0700, Josh Berkus wrote:
> On 09/05/2013 03:30 PM, Merlin Moncure wrote:
> 
> >> Standard advice we've given in the past is 25% shared buffers, 75%
> >> effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
> >>  Maybe we're changing the conventional calculation, but I thought I'd
> >> point that out.
> > 
> > This was debated upthread.
> 
> Actually, no, it wasn't.  Tom threw out a suggestion that we use 4X for
> historical reasons.  That's all, there was no discussion.
> 
> So, my point stands: our historical advice has been to set EFS to 75% of
> RAM.  Maybe we're changing that advice, but if so, let's change it.
> Otherwise 3X makes more sense.

So, what do we want the effective_cache_size default to be?  3x or 4x?
We clearly state:
       If you have a dedicated database server with 1GB or more of RAM,       a reasonable starting value for
shared_buffersis 25% of the       memory in your system.  There are some workloads where even
 

If we make the default 4x, that means that people using the above
suggestion would be setting their effective_cache_size to 100% of RAM? 
If we go with 4x, which I believe was the majority opinion, what shall
we answer to someone who asks about this contradiction?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



On Mon, Sep 9, 2013 at 6:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Sep  5, 2013 at 09:02:27PM -0700, Josh Berkus wrote:
>> On 09/05/2013 03:30 PM, Merlin Moncure wrote:
>>
>> >> Standard advice we've given in the past is 25% shared buffers, 75%
>> >> effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
>> >>  Maybe we're changing the conventional calculation, but I thought I'd
>> >> point that out.
>> >
>> > This was debated upthread.
>>
>> Actually, no, it wasn't.  Tom threw out a suggestion that we use 4X for
>> historical reasons.  That's all, there was no discussion.
>>
>> So, my point stands: our historical advice has been to set EFS to 75% of
>> RAM.  Maybe we're changing that advice, but if so, let's change it.
>> Otherwise 3X makes more sense.
>
> So, what do we want the effective_cache_size default to be?  3x or 4x?
> We clearly state:
>
>         If you have a dedicated database server with 1GB or more of RAM,
>         a reasonable starting value for shared_buffers is 25% of the
>         memory in your system.  There are some workloads where even
>
> If we make the default 4x, that means that people using the above
> suggestion would be setting their effective_cache_size to 100% of RAM?
> If we go with 4x, which I believe was the majority opinion, what shall
> we answer to someone who asks about this contradiction?

I vote for 3x.  The past defaults had a different ratio, but we are
changing things to make them better, not to leave them the same.  We
should change it be consistent with the advice we have long given.
Sure, 3 is not a power of 2, but I usually root for the underdog.

Cheers,

Jeff



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
On Tue, Sep 10, 2013 at 11:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Sep 9, 2013 at 6:29 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Thu, Sep  5, 2013 at 09:02:27PM -0700, Josh Berkus wrote:
>>> On 09/05/2013 03:30 PM, Merlin Moncure wrote:
>>>
>>> >> Standard advice we've given in the past is 25% shared buffers, 75%
>>> >> effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
>>> >>  Maybe we're changing the conventional calculation, but I thought I'd
>>> >> point that out.
>>> >
>>> > This was debated upthread.
>>>
>>> Actually, no, it wasn't.  Tom threw out a suggestion that we use 4X for
>>> historical reasons.  That's all, there was no discussion.
>>>
>>> So, my point stands: our historical advice has been to set EFS to 75% of
>>> RAM.  Maybe we're changing that advice, but if so, let's change it.
>>> Otherwise 3X makes more sense.
>>
>> So, what do we want the effective_cache_size default to be?  3x or 4x?
>> We clearly state:
>>
>>         If you have a dedicated database server with 1GB or more of RAM,
>>         a reasonable starting value for shared_buffers is 25% of the
>>         memory in your system.  There are some workloads where even
>>
>> If we make the default 4x, that means that people using the above
>> suggestion would be setting their effective_cache_size to 100% of RAM?
>> If we go with 4x, which I believe was the majority opinion, what shall
>> we answer to someone who asks about this contradiction?
>
> I vote for 3x.  The past defaults had a different ratio, but we are
> changing things to make them better, not to leave them the same.  We
> should change it be consistent with the advice we have long given.
> Sure, 3 is not a power of 2, but I usually root for the underdog.

I vote 4x on the basis that for this setting (unlike almost all the
other memory settings) the ramifications for setting it too high
generally aren't too bad.  Also, the o/s and temporary memory usage as
a share of total physical memory has been declining over time
(meaning, that if you have a 256gb memory server and follow the advice
to set to 64gb, your memory for caching is approximately 64gb).

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Josh Berkus
Date:
Merlin,

> I vote 4x on the basis that for this setting (unlike almost all the
> other memory settings) the ramifications for setting it too high
> generally aren't too bad.  Also, the o/s and temporary memory usage as
> a share of total physical memory has been declining over time

If we're doing that, then we should change our general advice on this
setting as well.

Another argument in favor: this is a default setting, and by default,
shared_buffers won't be 25% of RAM.

> (meaning, that if you have a 256gb memory server and follow the advice
> to set to 64gb, your memory for caching is approximately 64gb).

Wait, what?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
On Tue, Sep 10, 2013 at 5:08 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Merlin,
>
>> I vote 4x on the basis that for this setting (unlike almost all the
>> other memory settings) the ramifications for setting it too high
>> generally aren't too bad.  Also, the o/s and temporary memory usage as
>> a share of total physical memory has been declining over time
>
> If we're doing that, then we should change our general advice on this
> setting as well.
>
> Another argument in favor: this is a default setting, and by default,
> shared_buffers won't be 25% of RAM.
>
>> (meaning, that if you have a 256gb memory server and follow the advice
>> to set to 64gb, your memory for caching is approximately 64gb).

oops, meant to say "approximately 256gb".

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Tue, Sep 10, 2013 at 03:08:24PM -0700, Josh Berkus wrote:
> Merlin,
> 
> > I vote 4x on the basis that for this setting (unlike almost all the
> > other memory settings) the ramifications for setting it too high
> > generally aren't too bad.  Also, the o/s and temporary memory usage as
> > a share of total physical memory has been declining over time
> 
> If we're doing that, then we should change our general advice on this
> setting as well.

Uh, what general advice?  I don't see 4x mentioned anywhere.

> Another argument in favor: this is a default setting, and by default,
> shared_buffers won't be 25% of RAM.

So, are you saying you like 4x now?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Wed, Sep 11, 2013 at 09:18:30AM -0400, Bruce Momjian wrote:
> On Tue, Sep 10, 2013 at 03:08:24PM -0700, Josh Berkus wrote:
> > Merlin,
> > 
> > > I vote 4x on the basis that for this setting (unlike almost all the
> > > other memory settings) the ramifications for setting it too high
> > > generally aren't too bad.  Also, the o/s and temporary memory usage as
> > > a share of total physical memory has been declining over time
> > 
> > If we're doing that, then we should change our general advice on this
> > setting as well.
> 
> Uh, what general advice?  I don't see 4x mentioned anywhere.
> 
> > Another argument in favor: this is a default setting, and by default,
> > shared_buffers won't be 25% of RAM.
> 
> So, are you saying you like 4x now?

Here is an arugment for 3x.  First, using the documented 25% of RAM, 3x
puts our effective_cache_size as 75% of RAM, giving us no room for
kernel, backend memory, and work_mem usage.  If anything it should be
lower than 3x, not higher.

Second, if the machine is not a dedicated machine, and supposed 10% of
RAM is used for shared_buffers, 4x would put effective cache size at 40%
of RAM, which again seems too high, considering others are using the
machine and filling the kernel cache.  3x also seems too high, but
acceptable at 30% of RAM.

I basically can't imagine a case where you set shared_buffers to a
reasonable value and would still have 4x of that available for kernel
cache.

Finally, for those who like the idea of 4x, you can think of
shared_buffers (1x) + effective_cache_size (3x) as totalling 4x.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Alvaro Herrera
Date:
Bruce Momjian escribió:

> > So, are you saying you like 4x now?
> 
> Here is an arugment for 3x.  First, using the documented 25% of RAM, 3x
> puts our effective_cache_size as 75% of RAM, giving us no room for
> kernel, backend memory, and work_mem usage.  If anything it should be
> lower than 3x, not higher.

The other argument I see for the 3x value is that it is a compromise.
People with really large servers will want to increase it; people with
very small servers will want to reduce it.


> Finally, for those who like the idea of 4x, you can think of
> shared_buffers (1x) + effective_cache_size (3x) as totalling 4x.

This part of your argument doesn't work really, because AFAIR the
effective_cache_size value ought to consider that shared_buffers is part
of it (so e_c_s is shared_buffers + kernel cache).  So if you're seeing
the 4x as e_c_s + s_b, you would be counting s_b twice.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Wed, Sep 11, 2013 at 12:43:07PM -0300, Alvaro Herrera wrote:
> Bruce Momjian escribió:
> 
> > > So, are you saying you like 4x now?
> > 
> > Here is an arugment for 3x.  First, using the documented 25% of RAM, 3x
> > puts our effective_cache_size as 75% of RAM, giving us no room for
> > kernel, backend memory, and work_mem usage.  If anything it should be
> > lower than 3x, not higher.
> 
> The other argument I see for the 3x value is that it is a compromise.
> People with really large servers will want to increase it; people with
> very small servers will want to reduce it.

Yes, you could make the argument that 2x is the right default,
especially considering work_mem.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Josh Berkus
Date:
On 09/11/2013 08:27 AM, Bruce Momjian wrote:
> On Wed, Sep 11, 2013 at 09:18:30AM -0400, Bruce Momjian wrote:
>> On Tue, Sep 10, 2013 at 03:08:24PM -0700, Josh Berkus wrote:
>>> Another argument in favor: this is a default setting, and by default,
>>> shared_buffers won't be 25% of RAM.
>>
>> So, are you saying you like 4x now?

I still prefer 3X.  However, I acknowledge that there are legitimate
arguments for higher values for a *default*.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Andres Freund
Date:
On 2013-09-11 12:53:29 -0400, Bruce Momjian wrote:
> On Wed, Sep 11, 2013 at 12:43:07PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian escribió:
> > 
> > > > So, are you saying you like 4x now?
> > > 
> > > Here is an arugment for 3x.  First, using the documented 25% of RAM, 3x
> > > puts our effective_cache_size as 75% of RAM, giving us no room for
> > > kernel, backend memory, and work_mem usage.  If anything it should be
> > > lower than 3x, not higher.
> > 
> > The other argument I see for the 3x value is that it is a compromise.
> > People with really large servers will want to increase it; people with
> > very small servers will want to reduce it.
> 
> Yes, you could make the argument that 2x is the right default,
> especially considering work_mem.

I think that most of the arguments in this thread drastically
overestimate the precision and the effect of effective_cache_size. The
planner logic behind it basically only uses it to calculate things
within a single index scan. That alone shows that any precise
calculation cannot be very meaningful.
It also does *NOT* directly influence how the kernel caches disk
io. It's there to guess how likely it is something is still cached when
accessing things repeatedly.

I think nearly all practical experience shows that setting it smaller is
more likely to cause problems than setting it too low. We shouldn't be
too skimpy here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Claudio Freire
Date:
On Wed, Sep 11, 2013 at 12:27 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Another argument in favor: this is a default setting, and by default,
>> > shared_buffers won't be 25% of RAM.
>>
>> So, are you saying you like 4x now?
>
> Here is an arugment for 3x.  First, using the documented 25% of RAM, 3x
> puts our effective_cache_size as 75% of RAM, giving us no room for
> kernel, backend memory, and work_mem usage.  If anything it should be
> lower than 3x, not higher.


AFAIK, e_c_s must include shared_buffers, so 25% + 75% = 75%

And your statement seems to assume 25% + 75% = 100%. Which isn't
universally true, no matter what your math teacher probably taught you
;-)



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Josh Berkus
Date:
> I think that most of the arguments in this thread drastically
> overestimate the precision and the effect of effective_cache_size. The
> planner logic behind it basically only uses it to calculate things
> within a single index scan. That alone shows that any precise
> calculation cannot be very meaningful.
> It also does *NOT* directly influence how the kernel caches disk
> io. It's there to guess how likely it is something is still cached when
> accessing things repeatedly.

Agreed.  I think we should take the patch as-is, and spend the rest of
the 9.4 dev cycle arguing about 3x vs. 4x.

;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Robert Haas
Date:
On Wed, Sep 11, 2013 at 3:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> I think that most of the arguments in this thread drastically
>> overestimate the precision and the effect of effective_cache_size. The
>> planner logic behind it basically only uses it to calculate things
>> within a single index scan. That alone shows that any precise
>> calculation cannot be very meaningful.
>> It also does *NOT* directly influence how the kernel caches disk
>> io. It's there to guess how likely it is something is still cached when
>> accessing things repeatedly.
>
> Agreed.  I think we should take the patch as-is, and spend the rest of
> the 9.4 dev cycle arguing about 3x vs. 4x.
>
> ;-)

I'm happy with that option, but I think the larger point here is that
this only has a hope of being right if you're setting shared_buffers
to 25% of system memory.  And more and more, people are not doing
that, because of the other recommendation, not much discussed here, to
cap shared_buffers at about 8GB.  Systems whose total memory is far
larger than 32GB are becoming quite commonplace, and only figure to
become moreso.  So while I don't particularly object to this proposal,
it would have had a lot more value if we'd done it 5 years ago.

Now the good news is that right now the default is 128MB, and under
any of these proposals the default will go up, quite a bit.  Default
shared_buffers is now 128MB, so we're looking at raising the default
to at least 384MB, and for people who also tune shared_buffers but
might not bother with effective cache size, it'll go up a lot more.
That's clearly a move in the right direction even if the accuracy of
the formula is suspect (which it is).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
On Fri, Sep 13, 2013 at 10:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 11, 2013 at 3:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> I think that most of the arguments in this thread drastically
>>> overestimate the precision and the effect of effective_cache_size. The
>>> planner logic behind it basically only uses it to calculate things
>>> within a single index scan. That alone shows that any precise
>>> calculation cannot be very meaningful.
>>> It also does *NOT* directly influence how the kernel caches disk
>>> io. It's there to guess how likely it is something is still cached when
>>> accessing things repeatedly.
>>
>> Agreed.  I think we should take the patch as-is, and spend the rest of
>> the 9.4 dev cycle arguing about 3x vs. 4x.
>>
>> ;-)
>
> I'm happy with that option, but I think the larger point here is that
> this only has a hope of being right if you're setting shared_buffers
> to 25% of system memory.  And more and more, people are not doing
> that, because of the other recommendation, not much discussed here, to
> cap shared_buffers at about 8GB.  Systems whose total memory is far
> larger than 32GB are becoming quite commonplace, and only figure to
> become moreso.  So while I don't particularly object to this proposal,
> it would have had a lot more value if we'd done it 5 years ago.
>
> Now the good news is that right now the default is 128MB, and under
> any of these proposals the default will go up, quite a bit.  Default
> shared_buffers is now 128MB, so we're looking at raising the default
> to at least 384MB, and for people who also tune shared_buffers but
> might not bother with effective cache size, it'll go up a lot more.
> That's clearly a move in the right direction even if the accuracy of
> the formula is suspect (which it is).

This is a very important point: the 8gb cap is also to high.  We have
a very high transaction rate server here that exploded at 32GB, was
downgraded to 2GB ran fine, then upgraded to 4GB (over my strenuous
objection) and exploded again.

The stock documentation advice I probably needs to be revised to so
that's the lesser of 2GB and 25%.  I'm more and more coming around to
the opinion that in terms of shared buffers we have some major
problems that manifest in high end servers.  So +1 to your point,
although I'm still ok with the auto-setting on the basis that for very
high end servers most of the setting end up being manually tweaked
anyways.   We do need to be cautious though; it's not impossible that
improvements to buffers system might cause the 25% advice to revised
upwards in the not-to-near future if some of the problems get solved..

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Andres Freund
Date:
On 2013-09-13 10:50:06 -0500, Merlin Moncure wrote:
> The stock documentation advice I probably needs to be revised to so
> that's the lesser of 2GB and 25%.

I think that would be a pretty bad idea. There are lots of workloads
where people have postgres happily chugging along with s_b lots bigger
than that and see benefits.
We have a couple people reporting mostly undiagnosed (because that turns
out to be hard!) problems that seem to be avoided with smaller s_b. We
don't even remotely know enough about the problem to make such general
recommendations.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
On Fri, Sep 13, 2013 at 11:07 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-13 10:50:06 -0500, Merlin Moncure wrote:
>> The stock documentation advice I probably needs to be revised to so
>> that's the lesser of 2GB and 25%.
>
> I think that would be a pretty bad idea. There are lots of workloads
> where people have postgres happily chugging along with s_b lots bigger
> than that and see benefits.
> We have a couple people reporting mostly undiagnosed (because that turns
> out to be hard!) problems that seem to be avoided with smaller s_b. We
> don't even remotely know enough about the problem to make such general
> recommendations.

I happen to be one of those "couple" people.  Load goes from 0.1 to
500 without warning then back to 0.1 equally without warning.
Unfortunately the server is in a different jurisdiction such that it
makes deep forensic analysis impossible.  I think this is happening
more and more often as postgres is becoming increasingly deployed on
high(er) -end servers.  I've personally (alone) dealt with 4-5
confirmed cases and there have been many more.  We have a problem.

But, to address your point, the "big s_b" benefits are equally hard to
quantify (unless your database happens to fit in s_b) -- they mostly
help high write activity servers where the write activity fits a very
specific pattern.  But the risks of low s_b (mostly slightly higher
i/o and query latency) are much easier to deal with than high s_b
(even if less likely); random inexplicable server stalls and other
weird manifestations.  My "stock" advice remains to set to max 2gb
until having a reason (of which there can be many) to set otherwise.

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Andres Freund
Date:
On 2013-09-13 11:27:03 -0500, Merlin Moncure wrote:
> On Fri, Sep 13, 2013 at 11:07 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-09-13 10:50:06 -0500, Merlin Moncure wrote:
> >> The stock documentation advice I probably needs to be revised to so
> >> that's the lesser of 2GB and 25%.
> >
> > I think that would be a pretty bad idea. There are lots of workloads
> > where people have postgres happily chugging along with s_b lots bigger
> > than that and see benefits.
> > We have a couple people reporting mostly undiagnosed (because that turns
> > out to be hard!) problems that seem to be avoided with smaller s_b. We
> > don't even remotely know enough about the problem to make such general
> > recommendations.

> I happen to be one of those "couple" people.  Load goes from 0.1 to
> 500 without warning then back to 0.1 equally without warning.
> Unfortunately the server is in a different jurisdiction such that it
> makes deep forensic analysis impossible.  I think this is happening
> more and more often as postgres is becoming increasingly deployed on
> high(er) -end servers.  I've personally (alone) dealt with 4-5
> confirmed cases and there have been many more.  We have a problem.

Absolutely not claiming the contrary. I think it sucks that we couldn't
fully figure out what's happening in detail. I'd love to get my hand on
a setup where it can be reliably reproduced.

> But, to address your point, the "big s_b" benefits are equally hard to
> quantify (unless your database happens to fit in s_b)

Databases where the hot dataset fits in s_b is pretty honking big use
case tho. That's one of the primary reasons to buy machines with
craploads of memory.

That said, I think having a note in the docs that large s_b can cause
such a problem might not be a bad idea and I surely wouldn't argue
against it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On 09/13/2013 09:27 AM, Merlin Moncure wrote:
> I happen to be one of those "couple" people.  Load goes from 0.1 to
> 500 without warning then back to 0.1 equally without warning.
> Unfortunately the server is in a different jurisdiction such that it
> makes deep forensic analysis impossible.  I think this is happening
> more and more often as postgres is becoming increasingly deployed on
> high(er) -end servers.  I've personally (alone) dealt with 4-5
> confirmed cases and there have been many more.  We have a problem.

Can you explain a bit more about this?  I'm currently grappling with a
db cluster which has periodic mysterious total LWLock paralysis, and is
configured with 8GB shared_buffers (and 512GB ram installed).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Fri, Sep 13, 2013 at 2:36 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/13/2013 09:27 AM, Merlin Moncure wrote:
>> I happen to be one of those "couple" people.  Load goes from 0.1 to
>> 500 without warning then back to 0.1 equally without warning.
>> Unfortunately the server is in a different jurisdiction such that it
>> makes deep forensic analysis impossible.  I think this is happening
>> more and more often as postgres is becoming increasingly deployed on
>> high(er) -end servers.  I've personally (alone) dealt with 4-5
>> confirmed cases and there have been many more.  We have a problem.
>
> Can you explain a bit more about this?  I'm currently grappling with a
> db cluster which has periodic mysterious total LWLock paralysis, and is
> configured with 8GB shared_buffers (and 512GB ram installed).

see thread: http://postgresql.1045698.n5.nabble.com/StrategyGetBuffer-optimization-take-2-td5766307.html
plus others.  In this particular case, s_b needs to be set to 2GB or
lower (they tried raising to 4GB for no good reason) and problem
reoccured.

what are the specific symptoms of your problem?  anything interesting
in pg_locks?  is $client willing to experiment with custom patches?

merlin



On 09/13/2013 12:55 PM, Merlin Moncure wrote:
> what are the specific symptoms of your problem?  anything interesting
> in pg_locks?  is $client willing to experiment with custom patches?

3 servers: 1 master, two replicas.
32-core Xeon, hyperthreaded to 64 cores
512GB RAM each
s_b set to 8GB
Load-balanced between all 3
~~ 11 different databases
combined database size around 600GB
using pgbouncer

Irregularly, during periods of high activity (although not necessarily
peak activity) one or another of the systems will go into paralysis,
with all backends apparently waiting on LWLocks (we need more tracing
information to completely confirm this).  Activity at this time is
usually somewhere between 50 and 100 concurrent queries (and 80 to 150
connections).  pg_locks doesn't think anything is waiting on a lock.

What's notable is that sometimes it's just *one* of the replicas which
goes into paralysis.  If the master gets this issue though, the replicas
experience it soon afterwards.  Increasing wal_buffers from 16GB to 64GB
seems to make this issue happen less frequently, but it doesn't go away
entirely.  Only a restart of the server, or killing all backend, ends
the lockup.

The workload is OLTP, essentially, around 20/80 write/read.  They use
PostGIS.  The other notable thing about their workload is that due to an
ORM defect, they get idle-in-transactions which last from 5 to 15
seconds several times a minute.

They are willing to use experimental patches, but only if those patches
can be applied only to a replica.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com




Josh Berkus <josh@agliodbs.com> schrieb:

>What's notable is that sometimes it's just *one* of the replicas which
>goes into paralysis.  If the master gets this issue though, the
>replicas
>experience it soon afterwards.  Increasing wal_buffers from 16GB to
>64GB
>seems to make this issue happen less frequently, but it doesn't go away
>entirely.  

Youre talking about MB, not GB here, right?

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



On 09/13/2013 01:25 PM, Andres Freund wrote:
> 
> 
> Josh Berkus <josh@agliodbs.com> schrieb:
> 
>> What's notable is that sometimes it's just *one* of the replicas which
>> goes into paralysis.  If the master gets this issue though, the
>> replicas
>> experience it soon afterwards.  Increasing wal_buffers from 16GB to
>> 64GB
>> seems to make this issue happen less frequently, but it doesn't go away
>> entirely.  
> 
> Youre talking about MB, not GB here, right?

Oh, yes, right.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Fri, Sep 13, 2013 at 3:20 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/13/2013 12:55 PM, Merlin Moncure wrote:
>> what are the specific symptoms of your problem?  anything interesting
>> in pg_locks?  is $client willing to experiment with custom patches?
>
> 3 servers: 1 master, two replicas.
> 32-core Xeon, hyperthreaded to 64 cores
> 512GB RAM each
> s_b set to 8GB
> Load-balanced between all 3
> ~~ 11 different databases
> combined database size around 600GB
> using pgbouncer
>
> Irregularly, during periods of high activity (although not necessarily
> peak activity) one or another of the systems will go into paralysis,
> with all backends apparently waiting on LWLocks (we need more tracing
> information to completely confirm this).  Activity at this time is
> usually somewhere between 50 and 100 concurrent queries (and 80 to 150
> connections).  pg_locks doesn't think anything is waiting on a lock.
>
> What's notable is that sometimes it's just *one* of the replicas which
> goes into paralysis.  If the master gets this issue though, the replicas
> experience it soon afterwards.  Increasing wal_buffers from 16GB to 64GB
> seems to make this issue happen less frequently, but it doesn't go away
> entirely.  Only a restart of the server, or killing all backend, ends
> the lockup.
>
> The workload is OLTP, essentially, around 20/80 write/read.  They use
> PostGIS.  The other notable thing about their workload is that due to an
> ORM defect, they get idle-in-transactions which last from 5 to 15
> seconds several times a minute.
>
> They are willing to use experimental patches, but only if those patches
> can be applied only to a replica.

ok, points similar:
*) master/slave config (two slaves for me)
*) 'big' server 256GB mem, 32 core
*) 80% approx. (perhaps more)
*) some spacial searching (but not very much)
*) OLTP
*) presentation of load, although in my case it did resolve anywhere
from 30 secs to half hour
*) aside from the spike, 100% healthy

points different
*) application side pooling: 96 app servers, max 5 connections each
(aside: are you using transaction mode pgbouncer?)
*) I saw gripes about relation extension in pg_locks

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Kevin Grittner
Date:
Andres Freund <andres@2ndquadrant.com> wrote:

> Absolutely not claiming the contrary. I think it sucks that we
> couldn't fully figure out what's happening in detail. I'd love to
> get my hand on a setup where it can be reliably reproduced.

I have seen two completely different causes for symptoms like this,
and I suspect that these aren't the only two.

(1)  The dirty page avalanche: PostgreSQL hangs on to a large
number of dirty buffers and then dumps a lot of them at once.  The
OS does the same.  When PostgreSQL dumps its buffers to the OS it
pushes the OS over a "tipping point" where it is writing dirty
buffers too fast for the controller's BBU cache to absorb them.
Everything freezes until the controller writes and accepts OS
writes for a lot of data.  This can take several minutes, during
which time the database seems "frozen".  Cure is some combination
of these: reduce shared_buffers, make the background writer more
aggressive, checkpoint more often, make the OS dirty page writing
more aggressive, add more BBU RAM to the controller.

(2)  Transparent huge page support goes haywire on its defrag work.
Clues on this include very high "system" CPU time during an
episode, and `perf top` shows more time in kernel spinlock
functions than anywhere else.  The database doesn't completely lock
up like with the dirty page avalanche, but it is slow enough that
users often describe it that way.  So far I have only seen this
cured by disabling THP support (in spite of some people urging that
just the defrag be disabled).  It does make me wonder whether there
is something we could do in PostgreSQL to interact better with
THPs.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Merlin Moncure
Date:
On Fri, Sep 13, 2013 at 4:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
>> Absolutely not claiming the contrary. I think it sucks that we
>> couldn't fully figure out what's happening in detail. I'd love to
>> get my hand on a setup where it can be reliably reproduced.
>
> I have seen two completely different causes for symptoms like this,
> and I suspect that these aren't the only two.
>
> (1)  The dirty page avalanche: PostgreSQL hangs on to a large
> number of dirty buffers and then dumps a lot of them at once.  The
> OS does the same.  When PostgreSQL dumps its buffers to the OS it
> pushes the OS over a "tipping point" where it is writing dirty
> buffers too fast for the controller's BBU cache to absorb them.
> Everything freezes until the controller writes and accepts OS
> writes for a lot of data.  This can take several minutes, during
> which time the database seems "frozen".  Cure is some combination
> of these: reduce shared_buffers, make the background writer more
> aggressive, checkpoint more often, make the OS dirty page writing
> more aggressive, add more BBU RAM to the controller.

Yeah -- I've seen this too, and it's a well understood problem.
Getting o/s to spin dirty pages out faster is the name of the game I
think.  Storage is getting so fast that it's (mostly) moot anyways.
Also, this is under the umbrella of 'high i/o' -- the stuff I've been
seeing  is low- or no- I/o.

> (2)  Transparent huge page support goes haywire on its defrag work.
> Clues on this include very high "system" CPU time during an
> episode, and `perf top` shows more time in kernel spinlock
> functions than anywhere else.  The database doesn't completely lock
> up like with the dirty page avalanche, but it is slow enough that
> users often describe it that way.  So far I have only seen this
> cured by disabling THP support (in spite of some people urging that
> just the defrag be disabled).  It does make me wonder whether there
> is something we could do in PostgreSQL to interact better with
> THPs.

Ah, that's a useful tip; need to research that, thanks.  Maybe Josh
might be able to give it a whirl...

merlin



On 09/13/2013 01:58 PM, Merlin Moncure wrote:
> ok, points similar:
> *) master/slave config (two slaves for me)
> *) 'big' server 256GB mem, 32 core
> *) 80% approx. (perhaps more)
> *) some spacial searching (but not very much)
> *) OLTP
> *) presentation of load, although in my case it did resolve anywhere
> from 30 secs to half hour
> *) aside from the spike, 100% healthy
> 
> points different
> *) application side pooling: 96 app servers, max 5 connections each
> (aside: are you using transaction mode pgbouncer?)

Yes.

> *) I saw gripes about relation extension in pg_locks

I'll check for that next time.

We're also working on seeing if we can reproduce this under test conditions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Fri, Sep 13, 2013 at 4:15 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/13/2013 01:58 PM, Merlin Moncure wrote:
>> ok, points similar:
>> *) master/slave config (two slaves for me)
>> *) 'big' server 256GB mem, 32 core
>> *) 80% approx. (perhaps more)
>> *) some spacial searching (but not very much)
>> *) OLTP
>> *) presentation of load, although in my case it did resolve anywhere
>> from 30 secs to half hour
>> *) aside from the spike, 100% healthy
>>
>> points different
>> *) application side pooling: 96 app servers, max 5 connections each
>> (aside: are you using transaction mode pgbouncer?)
>
> Yes

Given that, I would be curious to see what dropping connection pool
down to somewhere between 32-64 connections would do.  This is a
band-aid obviously since not everyone can or wants to run pgbouncer.
But this first thing that pops into my mind in these situations is
that it might alleviate the symptoms.

merlin



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Andres Freund
Date:
On 2013-09-13 14:04:55 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
> > Absolutely not claiming the contrary. I think it sucks that we
> > couldn't fully figure out what's happening in detail. I'd love to
> > get my hand on a setup where it can be reliably reproduced.
>
> I have seen two completely different causes for symptoms like this,
> and I suspect that these aren't the only two.
>
> (1)  The dirty page avalanche: PostgreSQL hangs on to a large
> number of dirty buffers and then dumps a lot of them at once.  The
> OS does the same.  When PostgreSQL dumps its buffers to the OS it
> pushes the OS over a "tipping point" where it is writing dirty
> buffers too fast for the controller's BBU cache to absorb them.
> Everything freezes until the controller writes and accepts OS
> writes for a lot of data.  This can take several minutes, during
> which time the database seems "frozen".  Cure is some combination
> of these: reduce shared_buffers, make the background writer more
> aggressive, checkpoint more often, make the OS dirty page writing
> more aggressive, add more BBU RAM to the controller.

That should hopefully be diagnosable from other system stats like the
dirty rate.

> (2)  Transparent huge page support goes haywire on its defrag work.
> Clues on this include very high "system" CPU time during an
> episode, and `perf top` shows more time in kernel spinlock
> functions than anywhere else.  The database doesn't completely lock
> up like with the dirty page avalanche, but it is slow enough that
> users often describe it that way.  So far I have only seen this
> cured by disabling THP support (in spite of some people urging that
> just the defrag be disabled). 

Yes, I have seen that issue a couple of times now as well. I can confirm
that in at least two cases disabling defragmentation alone proved to be
enough to fix the issue.
Annoyingly enough there are different ways to disable
defragmentation/THP depending on whether you're using THP backported by
redhat or the upstream version...

> It does make me wonder whether there
> is something we could do in PostgreSQL to interact better with
> THPs.

The best thing I see is to just use explicit hugepages. I've previously
sent a prototype for that then has been turned into an actual
implementation by Christian Kruse.
A colleague of mine is working on polishing that patch into something
committable.
If you use large s_b, the memory savings alone (some 100kb instead
dozens of megabytes per backend) can be worth it, not to talk of actual
performance gains.

Updating the kernel helps as well, they've improved the efficiency of
defragmentation a good bit.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Thu, Sep  5, 2013 at 05:14:37PM -0400, Bruce Momjian wrote:
> On Thu, Sep  5, 2013 at 06:14:33PM +0200, Magnus Hagander wrote:
> > > I have developed the attached patch which implements an auto-tuned
> > > effective_cache_size which is 4x the size of shared buffers.  I had to
> > > set effective_cache_size to its old 128MB default so the EXPLAIN
> > > regression tests would pass unchanged.
> > 
> > That's not really autotuning though. ISTM that making the *default* 4
> > x shared_buffers might make perfect sense, but do we really need to
> > hijack the value of "-1" for that? That might be useful for some time
> > when we have actual autotuning, that somehow inspects the system and
> > tunes it from there.
> >
> > I also don't think it should be called autotuning, when it's just a
> > "smarter default value".
> > 
> > I like the feature, though, just not the packaging.
> 
> That "auto-tuning" text came from the wal_buffer documentation, which
> does exactly this based on shared_buffers:
> 
>         The contents of the WAL buffers are written out to disk at every
>         transaction commit, so extremely large values are unlikely to
>         provide a significant benefit.  However, setting this value to at
>         least a few megabytes can improve write performance on a busy
> -->     server where many clients are committing at once.  The auto-tuning
>                                                                -----------
>         selected by the default setting of -1 should give reasonable
>         results in most cases.
> 
> I am fine with rewording and not using -1, but we should change the
> wal_buffer default and documentation too then.  I am not sure what other
> value than -1 to use?  0?  I figure if we ever get better auto-tuning,
> we would just remove this functionality and make it better.

Patch applied with a default of 4x shared buffers.  I have added a 9.4
TODO that we might want to revisit this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Kevin Hale Boyes
Date:
The patch contains a small typo in config.sgml.  Probably just drop the "is" from "is can".

+        results if this database cluster is can utilize most of the memory

Kevin.


On 8 October 2013 10:13, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep  5, 2013 at 05:14:37PM -0400, Bruce Momjian wrote:
> On Thu, Sep  5, 2013 at 06:14:33PM +0200, Magnus Hagander wrote:
> > > I have developed the attached patch which implements an auto-tuned
> > > effective_cache_size which is 4x the size of shared buffers.  I had to
> > > set effective_cache_size to its old 128MB default so the EXPLAIN
> > > regression tests would pass unchanged.
> >
> > That's not really autotuning though. ISTM that making the *default* 4
> > x shared_buffers might make perfect sense, but do we really need to
> > hijack the value of "-1" for that? That might be useful for some time
> > when we have actual autotuning, that somehow inspects the system and
> > tunes it from there.
> >
> > I also don't think it should be called autotuning, when it's just a
> > "smarter default value".
> >
> > I like the feature, though, just not the packaging.
>
> That "auto-tuning" text came from the wal_buffer documentation, which
> does exactly this based on shared_buffers:
>
>         The contents of the WAL buffers are written out to disk at every
>         transaction commit, so extremely large values are unlikely to
>         provide a significant benefit.  However, setting this value to at
>         least a few megabytes can improve write performance on a busy
> -->     server where many clients are committing at once.  The auto-tuning
>                                                                -----------
>         selected by the default setting of -1 should give reasonable
>         results in most cases.
>
> I am fine with rewording and not using -1, but we should change the
> wal_buffer default and documentation too then.  I am not sure what other
> value than -1 to use?  0?  I figure if we ever get better auto-tuning,
> we would just remove this functionality and make it better.

Patch applied with a default of 4x shared buffers.  I have added a 9.4
TODO that we might want to revisit this.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

From
Bruce Momjian
Date:
On Tue, Oct  8, 2013 at 01:04:18PM -0600, Kevin Hale Boyes wrote:
> The patch contains a small typo in config.sgml.  Probably just drop the "is"
> from "is can".
> 
> +        results if this database cluster is can utilize most of the memory
> 
> Kevin.

Thank you, fixed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



On 8 October 2013 17:13, Bruce Momjian <bruce@momjian.us> wrote:

> Patch applied with a default of 4x shared buffers.  I have added a 9.4
> TODO that we might want to revisit this.

I certainly want to revisit this patch and this setting.

How can we possibly justify a default setting that could be more than
physical RAM?

The maximum known safe value is the setting of shared_buffers itself,
without external knowledge. But how can we possibly set it even that
high?

Does anyone have any evidence at all on how to set this? How can we
possibly autotune it?

I prefer the idea of removing "effective_cache_size" completely, since
it has so little effect on workloads and is very frequently
misunderstood by users. It's just dangerous, without being useful.

Why do we autotune the much more important synch scan threshold, yet
allow tuning of e_c_s?

Lets fix e_c_s at 25% of shared_buffers and remove the parameter
completely, just as we do with so many other performance parameters.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On 2014-05-06 15:09:15 +0100, Simon Riggs wrote:
> On 8 October 2013 17:13, Bruce Momjian <bruce@momjian.us> wrote:
> 
> > Patch applied with a default of 4x shared buffers.  I have added a 9.4
> > TODO that we might want to revisit this.
> 
> I certainly want to revisit this patch and this setting.
> 
> How can we possibly justify a default setting that could be more than
> physical RAM?

Because it doesn't hurt overly much if it's set too large?

> The maximum known safe value is the setting of shared_buffers itself,
> without external knowledge. But how can we possibly set it even that
> high?
>
> Does anyone have any evidence at all on how to set this? How can we
> possibly autotune it?

It's just a different default setting? I think the new value will cause
less problems than the old one which frequently leads to index scans not
being used although beneficial.

> I prefer the idea of removing "effective_cache_size" completely, since
> it has so little effect on workloads and is very frequently
> misunderstood by users. It's just dangerous, without being useful.

-many.

> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
> completely, just as we do with so many other performance parameters.

That'd cause *massive* regression for many installations. Without
significantly overhauling costsize.c that's really not feasible. There's
lots of installations that use relatively small s_b settings for good
reasons. If we fix e_c_s to 25% of s_b many queries on those won't use
indexes anymore.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Simon Riggs <simon@2ndQuadrant.com> writes:
> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
> completely, just as we do with so many other performance parameters.

Apparently, you don't even understand what this parameter is for.
Setting it smaller than shared_buffers is insane.
        regards, tom lane



On 6 May 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
>> completely, just as we do with so many other performance parameters.
>
> Apparently, you don't even understand what this parameter is for.
> Setting it smaller than shared_buffers is insane.

You know you can't justify that comment and so do I. What workload is
so badly affected as to justify use of the word insane in this
context?

I can read code. But it appears nobody apart from me actually does, or
at least understand the behaviour that results.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On 6 May 2014 15:17, Andres Freund <andres@2ndquadrant.com> wrote:

>> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
>> completely, just as we do with so many other performance parameters.
>
> That'd cause *massive* regression for many installations. Without
> significantly overhauling costsize.c that's really not feasible. There's
> lots of installations that use relatively small s_b settings for good
> reasons. If we fix e_c_s to 25% of s_b many queries on those won't use
> indexes anymore.

"many queries" can't be correct. All this changes is the cost of
IndexScans that would use more than 25% of shared_buffers worth of
data. Hopefully not many of those in your workload. Changing the cost
doesn't necessarily prevent index scans either. And if there are many
of those in your workload AND you run more than one at same time, then
the larger setting will work against you. So the benefit window for
such a high setting is slim, at best.

I specifically picked 25% of shared_buffers because that is the point
at which sequential scans become more efficient and use the cache more
efficiently. If our cost models are correct, then switching away from
index scans shouldn't hurt at all.

Assuming we can use large tranches of memory for single queries has a
very bad effect on cache hit ratios. Encouraging such usage seems to
fall into the category of insane, from my perspective. Having it as a
default setting is bad.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Simon Riggs <simon@2ndQuadrant.com> writes:
> On 6 May 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
>>> completely, just as we do with so many other performance parameters.

>> Apparently, you don't even understand what this parameter is for.
>> Setting it smaller than shared_buffers is insane.

> You know you can't justify that comment and so do I.

What I meant is that your comments indicate complete lack of understanding
of the parameter.  It's *supposed* to be larger than shared_buffers, and
there is no "safety risk" involved in setting it too high.
        regards, tom lane



On 2014-05-06 17:43:45 +0100, Simon Riggs wrote:
> On 6 May 2014 15:17, Andres Freund <andres@2ndquadrant.com> wrote:
> 
> >> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
> >> completely, just as we do with so many other performance parameters.
> >
> > That'd cause *massive* regression for many installations. Without
> > significantly overhauling costsize.c that's really not feasible. There's
> > lots of installations that use relatively small s_b settings for good
> > reasons. If we fix e_c_s to 25% of s_b many queries on those won't use
> > indexes anymore.
> 
> "many queries" can't be correct.

It is.

> All this changes is the cost of
> IndexScans that would use more than 25% of shared_buffers worth of
> data. Hopefully not many of those in your workload. Changing the cost
> doesn't necessarily prevent index scans either. And if there are many
> of those in your workload AND you run more than one at same time, then
> the larger setting will work against you. So the benefit window for
> such a high setting is slim, at best.

Why? There's many workloads where indexes are larger than shared buffers
but fit into the operating system's cache. And that's precisely what
effective_cache_size is about.
Especially on bigger machines shared_buffers can't be set big enough to
actually use all the machine's memory. It's not uncommon to have 4GB
shared buffers on a machine with 512GB RAM... It'd be absolutely
disastrous to set effective_cache_size to 1GB for an analytics workload.

> I specifically picked 25% of shared_buffers because that is the point
> at which sequential scans become more efficient and use the cache more
> efficiently. If our cost models are correct, then switching away from
> index scans shouldn't hurt at all.

More often than not indexes are smaller than the table size, so this
argument doesn't seem to make much sense.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On 05/06/2014 08:41 AM, Simon Riggs wrote:
> On 6 May 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
>>> completely, just as we do with so many other performance parameters.
>>
>> Apparently, you don't even understand what this parameter is for.
>> Setting it smaller than shared_buffers is insane.
> 
> You know you can't justify that comment and so do I. What workload is
> so badly affected as to justify use of the word insane in this
> context?

Most of them?  Really?

I have to tell you, your post sounds like you've missed out on the last
12 years of PostgreSQL query tuning.  Which is a little shocking
considering where you've spent that 12 years.

> I can read code. But it appears nobody apart from me actually does, or
> at least understand the behaviour that results.

So, break it down for us: explain how we'll get desirable query plans
out of the current code if:

(1) Table & Index is larger than shared_buffers;
(2) Table & Index is smaller than RAM;
(3) Selectivity is 0.02
(4) ECS is set lower than shared_buffers

I think the current cost math does a pretty good job of choosing the
correct behavior if ECS is set correctly.  But if it's not, no.

If I'm wrong, then you've successfully found a bug in our costing math,
so I'd love to see it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Tue, May 6, 2014 at 7:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
> completely, just as we do with so many other performance parameters.

Apparently, you don't even understand what this parameter is for.
Setting it smaller than shared_buffers is insane.

The e_c_s is assumed to be usable for each backend trying to run queries sensitive to it.  If you have dozens of such queries running simultaneously (not something I personally witness, but also not insane) and each of these queries has its own peculiar working set, then having e_c_s smaller than s_b makes sense.

I have a hard time believe that this is at all common, however.   Certainly not common enough so to justify cranking the setting all the way the other direction and then removing the crank handle.

Cheers,

Jeff
On 6 May 2014 18:08, Josh Berkus <josh@agliodbs.com> wrote:
> On 05/06/2014 08:41 AM, Simon Riggs wrote:
>> On 6 May 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>>> Lets fix e_c_s at 25% of shared_buffers and remove the parameter
>>>> completely, just as we do with so many other performance parameters.
>>>
>>> Apparently, you don't even understand what this parameter is for.
>>> Setting it smaller than shared_buffers is insane.
>>
>> You know you can't justify that comment and so do I. What workload is
>> so badly affected as to justify use of the word insane in this
>> context?
>
> Most of them?  Really?

I didn't use the word "most" anywhere. So not really clear what you are saying.


> I have to tell you, your post sounds like you've missed out on the last
> 12 years of PostgreSQL query tuning.  Which is a little shocking
> considering where you've spent that 12 years.
I read the code, think what to say and then say what I think, not
rely on dogma.

I tried to help years ago by changing the docs on e_c_s, but that's
been mostly ignored down the years, as it is again here.


>> I can read code. But it appears nobody apart from me actually does, or
>> at least understand the behaviour that results.
>
> So, break it down for us: explain how we'll get desirable query plans
> out of the current code if:
>
> (1) Table & Index is larger than shared_buffers;
> (2) Table & Index is smaller than RAM;
> (3) Selectivity is 0.02
> (4) ECS is set lower than shared_buffers

Is that it? The above use case is the basis for a default setting??

It's a circular argument, since you're assuming we've all followed
your advice of setting shared_buffers to 25% of RAM, which then
presumes a large gap between (1) and (2). It also ignores that if ECS
is set low then it increases the cost, but does not actually preclude
index scans larger than that setting. It also ignores that if your
database fits in RAM, your random_page_cost setting is wrong and
lowering that appropriately will increase the incidence of index scans
again.

You should also include

(5) You're only running one query at a time (which you know, how?)
(6) You don't care if you flush your cache for later queries
(7) You've got big tables yet are not partitioning them effectively

> I think the current cost math does a pretty good job of choosing the
> correct behavior if ECS is set correctly.  But if it's not, no.
>
> If I'm wrong, then you've successfully found a bug in our costing math,
> so I'd love to see it.

Setting it high generates lovely EXPLAINs for a single query, but do
we have any evidence that whole workloads are better off with higher
settings? And that represents the general case?

And it makes sense even if it makes it bigger than actual RAM??

If you assume that you can use all of that memory, you're badly wrong.
Presumably you also set work_mem larger than shared_buffers, since
that will induce exactly the same behaviour and have the same
downsides. (Large memory usage for single query, but causes cache
churn, plus problems if we try to overuse RAM because of concurrent
usage).

In the absence of  performance measurements that show the genuine
effect on workloads, I am attempting to make a principle-based
argument. I suggested 25% of shared_buffers because we already use
that as the point where other features cut in to minimise cache churn.
I'm making the argument that if *that* setting is the right one to
control cache churn, then why is it acceptable for index scans to
churn up even bigger chunks of cache?

In case it wasn't clear, I am only suggesting 25% of shared_buffers
for large settings, not for micro-configurations. My proposal to
remove the setting completely was a rhetorical question, asking why we
have a setting for this parameter and yet no tunables for other
things.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On 6 May 2014 20:41, Jeff Janes <jeff.janes@gmail.com> wrote:

> The e_c_s is assumed to be usable for each backend trying to run queries
> sensitive to it.  If you have dozens of such queries running simultaneously
> (not something I personally witness, but also not insane) and each of these
> queries has its own peculiar working set, then having e_c_s smaller than s_b
> makes sense.
>
> I have a hard time believe that this is at all common, however.

If larger queries are frequent enough to care about, they will happen together.

We should be acting conservatively with default settings. You can be
as aggressive as you like with your own config.

> Certainly
> not common enough so to justify cranking the setting all the way the other
> direction and then removing the crank handle.

Yes, that part was mostly rhetorical, I wasn't arguing for complete
removal, especially when autotuning is unclear.

I am worried about people that set effective_cache_size but not
shared_buffers, which is too common. If we link the two parameters it
should work in both directions by default.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Tue, May 6, 2014 at 4:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>  I read the code, think what to say and then say what I think, not
> rely on dogma.
>
> I tried to help years ago by changing the docs on e_c_s, but that's
> been mostly ignored down the years, as it is again here.

Well, for what it's worth, I've encountered systems where setting
effective_cache_size too low resulted in bad query plans, but I've
never encountered the reverse situation.  My personal sample size is
pretty small, though.  And, when I did a study of 100+
pgsql-performance reports for last year's PGCon talk, I didn't turn up
any that seemed related to effective_cache_size.  Here's the subset of
those reports that appeared settings-related:

https://sites.google.com/site/robertmhaas/query-performance/settings

I think the basic problem with effective_cache_size is that it's a
pretty weak knob.  I don't think it's a secret that we more often
seq-scan when we should have index-scanned than the other other way
around.  So if I had to hard-code a value for effective_cache_size,
I'd probably pick positive infinity.  Yeah, that could be overkill -
but I bet I'd be able to compensate by frobbing seq_page_cost and
random_page_cost in a pinch.

I basically think the auto-tuning we've installed for
effective_cache_size is stupid.  Most people are going to run with
only a few GB of shared_buffers, so setting effective_cache_size to a
small multiple of that isn't going to make many more people happy than
just raising the value - say from the current default of 128MB to, oh,
4GB - especially because in my experience queries aren't very
sensitive to the exact value; it just has to not be way too small.  I
bet the number of PostgreSQL users who would be made happy by a much
higher hard-coded default is not too different from the number that
will be made happy by the (completely unprincipled) auto-tuning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 05/06/2014 05:54 PM, Robert Haas wrote:
> On Tue, May 6, 2014 at 4:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>   I read the code, think what to say and then say what I think, not
>> rely on dogma.
>>
>> I tried to help years ago by changing the docs on e_c_s, but that's
>> been mostly ignored down the years, as it is again here.
> Well, for what it's worth, I've encountered systems where setting
> effective_cache_size too low resulted in bad query plans, but I've
> never encountered the reverse situation.


I have encountered both. Recently I discovered that a client's 
performance problems were solved pretty instantly by reducing a 
ridiculously high setting down to something more reasonable (in this 
case about 50% of physical RAM is what we set it to).

cheers

andrew



On 6 May 2014 22:54, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 6, 2014 at 4:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>  I read the code, think what to say and then say what I think, not
>> rely on dogma.
>>
>> I tried to help years ago by changing the docs on e_c_s, but that's
>> been mostly ignored down the years, as it is again here.
>
> Well, for what it's worth, I've encountered systems where setting
> effective_cache_size too low resulted in bad query plans, but I've
> never encountered the reverse situation.

I agree with that.

Though that misses my point, which is that you can't know that all of
that memory is truly available on a server with many concurrent users.
Choosing settings that undercost memory intensive plans are not the
best choice for a default strategy in a mixed workload when cache may
be better used elsewhere, even if such settings make sense for some
individual users.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Robert Haas <robertmhaas@gmail.com> writes:
> I basically think the auto-tuning we've installed for
> effective_cache_size is stupid.  Most people are going to run with
> only a few GB of shared_buffers, so setting effective_cache_size to a
> small multiple of that isn't going to make many more people happy than
> just raising the value - say from the current default of 128MB to, oh,
> 4GB - especially because in my experience queries aren't very
> sensitive to the exact value; it just has to not be way too small.  I
> bet the number of PostgreSQL users who would be made happy by a much
> higher hard-coded default is not too different from the number that
> will be made happy by the (completely unprincipled) auto-tuning.

There is a lot to be said for that argument, especially considering
that we're not even really happy with the auto-tuning mechanism,
never mind the behavior it's trying to implement.
        regards, tom lane



On 05/06/2014 01:38 PM, Simon Riggs wrote:
>> Most of them?  Really?
> 
> I didn't use the word "most" anywhere. So not really clear what you are saying.

Sorry, those were supposed to be periods, not question marks.  As in
"Most of them.  Really."

>> I have to tell you, your post sounds like you've missed out on the last
>> 12 years of PostgreSQL query tuning.  Which is a little shocking
>> considering where you've spent that 12 years.
> 
>  I read the code, think what to say and then say what I think, not
> rely on dogma.
> 
> I tried to help years ago by changing the docs on e_c_s, but that's
> been mostly ignored down the years, as it is again here.

Well, if you're going to buck the conventional wisdom, you need to
provide a factual and numerical basis for your arguments.  So far, I
haven't seen you do so, although I'll admit that I haven't read 100% of
hackers traffic.   So if you have previously presented benchmarking
results or math, please post a link to the archives.

>> (1) Table & Index is larger than shared_buffers;
>> (2) Table & Index is smaller than RAM;
>> (3) Selectivity is 0.02
>> (4) ECS is set lower than shared_buffers
> 
> Is that it? The above use case is the basis for a default setting??

Are you just going to ask rhetorical questions?

> It's a circular argument, since you're assuming we've all followed
> your advice of setting shared_buffers to 25% of RAM, which then
> presumes a large gap between (1) and (2).

That 20% to 25% recommendation had a factual and numerical basis, based
on extensive testing using DBT2 at OSDL.  While we have reason to
believe that advice may be somewhat dated, nobody has undertaken the
benchmarking work to create a new advice basis. If you have done so, you
have not shared the results.

> Setting it high generates lovely EXPLAINs for a single query, but do
> we have any evidence that whole workloads are better off with higher
> settings? And that represents the general case?

So?  Create a benchmark.  Prove that you're right.  I'd love to see it,
we're suffering from a serious lack of data here.

> In the absence of  performance measurements that show the genuine
> effect on workloads, I am attempting to make a principle-based
> argument. I suggested 25% of shared_buffers because we already use
> that as the point where other features cut in to minimise cache churn.

That makes no sense whatsoever.  Again, show me the math.

> In case it wasn't clear, I am only suggesting 25% of shared_buffers
> for large settings, not for micro-configurations. My proposal to
> remove the setting completely was a rhetorical question, asking why we
> have a setting for this parameter and yet no tunables for other
> things.

Asking rhetorical questions based on extreme perspectives that you don't
really believe in is *definitionally* trolling.  If you're going to make
an argument in favor of different tuning advice, then do it based on
something in which you actually believe, based on hard evidence.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Robert, Tom:

On 05/06/2014 03:28 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I basically think the auto-tuning we've installed for
>> effective_cache_size is stupid.  Most people are going to run with
>> only a few GB of shared_buffers, so setting effective_cache_size to a
>> small multiple of that isn't going to make many more people happy than
>> just raising the value - say from the current default of 128MB to, oh,
>> 4GB - especially because in my experience queries aren't very
>> sensitive to the exact value; it just has to not be way too small.  I
>> bet the number of PostgreSQL users who would be made happy by a much
>> higher hard-coded default is not too different from the number that
>> will be made happy by the (completely unprincipled) auto-tuning.
> 
> There is a lot to be said for that argument, especially considering
> that we're not even really happy with the auto-tuning mechanism,
> never mind the behavior it's trying to implement.

Right, the decisive question with this patch is: does it improve things
over what would have happened with most users anyway?

Based on the users I deal with ... which skew rather strongly EC2 web
applications and one-off data warehouses ... most users don't set
effective_cache_size *at all* until they hire me or chat me up on IRC.
This means that ECS is running with the default of 128MB, which means
that Postgres is seriously underestimating the probability of cached
data on most machines today.

The users I deal with are a lot more likely to have set shared_buffers
themselves, based on the 25% conventional wisdom (or based on some other
advice).  And, when they want to tinker with pushing index usage, they
change random_page_cost instead, sometimes to silly values (like 0.5). So, based solely on the users I deal with, I
wouldfind automatically
 
setting effective_cache_size to 3X or 4X shared_buffers to be a benefit
compared to leaving it at its current fixed low default.

Other people on this list deal with different kinds of users, so if
people work with a class of users where a default of 4 X shared_buffers
would be definitely a worse idea than the current default, then please
speak up.

ECS is definitely a "weak knob", as Robert says, but it's *good* that
it's a weak knob.  This means that we can make large adjustments in it
without introducing a lot of highly variable behavior ... instead of
random_page_cost, which does.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On 6 May 2014 23:47, Josh Berkus <josh@agliodbs.com> wrote:

> If you're going to make
> an argument in favor of different tuning advice, then do it based on
> something in which you actually believe, based on hard evidence.

The proposed default setting of 4x shared_buffers is unprincipled
*and* lacks hard evidence from you and everybody else.

You've used the phrase "conventional wisdom" to describe things which
you have spoken loudly about.  I personally have not seen sufficient
evidence to rely on that as wisdom.

I note that my detailed comments as to why it is unsafe have been
ignored, again.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On 6 May 2014 23:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I basically think the auto-tuning we've installed for
>> effective_cache_size is stupid.  Most people are going to run with
>> only a few GB of shared_buffers, so setting effective_cache_size to a
>> small multiple of that isn't going to make many more people happy than
>> just raising the value - say from the current default of 128MB to, oh,
>> 4GB - especially because in my experience queries aren't very
>> sensitive to the exact value; it just has to not be way too small.  I
>> bet the number of PostgreSQL users who would be made happy by a much
>> higher hard-coded default is not too different from the number that
>> will be made happy by the (completely unprincipled) auto-tuning.
>
> There is a lot to be said for that argument, especially considering
> that we're not even really happy with the auto-tuning mechanism,
> never mind the behavior it's trying to implement.

+1

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Tue, May 6, 2014 at 10:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 6 May 2014 23:47, Josh Berkus <josh@agliodbs.com> wrote:
>
>> If you're going to make
>> an argument in favor of different tuning advice, then do it based on
>> something in which you actually believe, based on hard evidence.
>
> The proposed default setting of 4x shared_buffers is unprincipled
> *and* lacks hard evidence from you and everybody else.

+1. In my view, we probably should have set it to a much higher
absolute default value. The main problem with setting it to any
multiple of shared_buffers that I can see is that shared_buffers is a
very poor proxy for what effective_cache_size is supposed to
represent. In general, the folk wisdom around sizing shared_buffers
has past its sell-by date.

-- 
Peter Geoghegan



On 07/05/14 17:35, Peter Geoghegan wrote:
> On Tue, May 6, 2014 at 10:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 6 May 2014 23:47, Josh Berkus <josh@agliodbs.com> wrote:
>>
>>> If you're going to make
>>> an argument in favor of different tuning advice, then do it based on
>>> something in which you actually believe, based on hard evidence.
>> The proposed default setting of 4x shared_buffers is unprincipled
>> *and* lacks hard evidence from you and everybody else.
> +1. In my view, we probably should have set it to a much higher
> absolute default value. The main problem with setting it to any
> multiple of shared_buffers that I can see is that shared_buffers is a
> very poor proxy for what effective_cache_size is supposed to
> represent. In general, the folk wisdom around sizing shared_buffers
> has past its sell-by date.
>

+1. ISTM the only sensible approach to auto tune this requires us to 
have a plugin to detect how much RAM the system has (and then setting it 
to 1/2 that say). I wonder if it might be worthwhile writing plugins for 
the handful of popular platforms. For the remainder maybe we could leave 
it defaulting to the current (small) value, and encourage volunteers to 
code the missing ones if they want something better.

Regards

Mark



On 6 May 2014 17:55, Andres Freund <andres@2ndquadrant.com> wrote:

>> All this changes is the cost of
>> IndexScans that would use more than 25% of shared_buffers worth of
>> data. Hopefully not many of those in your workload. Changing the cost
>> doesn't necessarily prevent index scans either. And if there are many
>> of those in your workload AND you run more than one at same time, then
>> the larger setting will work against you. So the benefit window for
>> such a high setting is slim, at best.
>
> Why? There's many workloads where indexes are larger than shared buffers
> but fit into the operating system's cache. And that's precisely what
> effective_cache_size is about.
> Especially on bigger machines shared_buffers can't be set big enough to
> actually use all the machine's memory. It's not uncommon to have 4GB
> shared buffers on a machine with 512GB RAM... It'd be absolutely
> disastrous to set effective_cache_size to 1GB for an analytics workload.

In this case, a setting of effective_cache_size > (4 * shared_buffers)
could be appropriate, as long as we are certain we have the memory.

We don't have any stats on peak memory usage to be certain - although
in that case its pretty clear.

If we had stats on how effective the indexscan was at multiple-hitting
earlier read blocks, we'd be able to autotune, but I accept that
without that we do still need the parameter.

>> I specifically picked 25% of shared_buffers because that is the point
>> at which sequential scans become more efficient and use the cache more
>> efficiently. If our cost models are correct, then switching away from
>> index scans shouldn't hurt at all.
>
> More often than not indexes are smaller than the table size, so this
> argument doesn't seem to make much sense.

If we believe that 25% of shared_buffers worth of heap blocks would
flush the cache doing a SeqScan, why should we allow 400% of
shared_buffers worth of index blocks? In your example, that would be
1GB of heap blocks, or 16GB of index blocks.  If our table is 100GB
with a 32GB index, then yes, that is 1% of the heap and 50% of the
index. But that doesn't matter, since I am discussing the point at
which we prevent the cache being churned. Given your example we do not
allow a SeqScan of a table larger than 1GB to flush cache, since we
use BAS_BULKREAD. If we allow an indexscan plan that will touch 16GB
of an index that will very clearly flush out our 4GB of
shared_buffers, increasing time for later queries even if they only
have to read from OS buffers back into shared_buffers. That will still
show itself as a CPU spike, which is what people say they are seeing.

I think I'm arguing myself towards using a BufferAccessStrategy of
BAS_BULKREAD for large IndexScans, BitMapIndexScans and
BitMapHeapScans. Yes, we can make plans assuming we can use OS cache,
but we shouldn't be churning shared_buffers when we execute those
plans. "large" here meaning the same thing as it does for SeqScans,
which is a scan that seems likely to touch more than 25% of shared
buffers. I'll work up a patch.

Perhaps it would also be useful to consider using a sequential scan of
the index relation for less selective BitmapIndexScans, just as we do
very effectively during VACUUM. Maybe that is a better idea than
bitmap indexes.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Wed, May 7, 2014 at 3:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> If we believe that 25% of shared_buffers worth of heap blocks would
> flush the cache doing a SeqScan, why should we allow 400% of
> shared_buffers worth of index blocks?

I think you're comparing apples and oranges.  The 25% threshold is
answering the question "How big does a sequential scan have to be
before it's likely to flush so much so much unrelated data out of
shared_buffers that it hurts the performance of other things running
on the system?". So it's not really about whether or not things will
*fit* in the cache, but rather a judgement about at what point caching
that stuff is going to be less value than continuing to cache other
things.  Also, it's specifically a judgement about shared_buffers, not
system memory.

But effective_cache_size is used to estimate the likelihood that an
index scan which accesses the same heap or index block twice will
still be in cache on the second hit, and thus need to be faulted in
only once.  So this *is* a judgment about what will fit - generally
over a very short time scale.  And, since bringing a page into
shared_buffers from the OS cache is much less expensive than bringing
a page into memory from disk, it's really about what will fit in
overall system memory, not just shared_buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 7 May 2014 13:31, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 7, 2014 at 3:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> If we believe that 25% of shared_buffers worth of heap blocks would
>> flush the cache doing a SeqScan, why should we allow 400% of
>> shared_buffers worth of index blocks?
>
> I think you're comparing apples and oranges.

I understood the distinction, which is why I changed the direction of
my thinking to say

> Yes, we can make plans assuming we can use OS cache,
> but we shouldn't be churning shared_buffers when we execute those
> plans.

and hence why I proposed

> I think I'm arguing myself towards using a BufferAccessStrategy of
> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
> BitMapHeapScans.

which I hope will be effective in avoiding churn in shared_buffers
even though we may use much larger memory from the OS.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Simon Riggs <simon@2ndQuadrant.com> writes:
> I think I'm arguing myself towards using a BufferAccessStrategy of
> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
> BitMapHeapScans.

As soon as you've got some hard evidence to present in favor of such
changes, we can discuss it.  I've got other things to do besides
hypothesize.

In the meantime, it seems like there is an emerging consensus that nobody
much likes the existing auto-tuning behavior for effective_cache_size,
and that we should revert that in favor of just increasing the fixed
default value significantly.  I see no problem with a value of say 4GB;
that's very unlikely to be worse than the pre-9.4 default (128MB) on any
modern machine.

Votes for or against?
        regards, tom lane



On Wed, May 7, 2014 at 9:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> I think I'm arguing myself towards using a BufferAccessStrategy of
>> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
>> BitMapHeapScans.
>
> As soon as you've got some hard evidence to present in favor of such
> changes, we can discuss it.  I've got other things to do besides
> hypothesize.

Let me throw out one last point:

It's pretty likely that s_b is going to be raised higher as a
percentage of RAM.   I never really bought into the conventional
wisdom of 25% and have had to set it lower many times.  Nevertheless,
it was a documented suggestion.

The core issues are:
1) There is no place to enter total system memory available to the
database in postgresql.conf
2) Memory settings (except for the above) are given as absolute
amounts, not percentages.

It would be a lot easier to standardize configurations particularly if
there was a way to electronically support #1 with auto-detection.
Then, e_c_s. s_b, work_mem, and various other settings could be given
using standard (and perhaps somewhat conservative) percentages using
the best and hopefully factually supported recommendations.   I
oversee dozens of servers in a virtualized environment (as most
enterprise shops are these days).  Everything is 'right sized', often
on demand, and often nobody bothers to adjust the various settings.

> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.

In lieu of something fancy like the above, adjusting the defaults
seems a better way to go (so I vote to revert).

merlin



On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.
> 
> Votes for or against?

+1 for increasing it to 4GB and remove the autotuning. I don't like the
current integration into guc.c much and a new static default doesn't
seem to be worse than the current autotuning. 

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Wed, May 7, 2014 at 10:12 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
>> In the meantime, it seems like there is an emerging consensus that nobody
>> much likes the existing auto-tuning behavior for effective_cache_size,
>> and that we should revert that in favor of just increasing the fixed
>> default value significantly.  I see no problem with a value of say 4GB;
>> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
>> modern machine.
>>
>> Votes for or against?
>
> +1 for increasing it to 4GB and remove the autotuning. I don't like the
> current integration into guc.c much and a new static default doesn't
> seem to be worse than the current autotuning.

It was my proposal originally, so I assume I'd be counted as in favor,
but for the sake of clarity: +1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 7, 2014 at 4:12 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.
>
> Votes for or against?

+1 for increasing it to 4GB and remove the autotuning. I don't like the
current integration into guc.c much and a new static default doesn't
seem to be worse than the current autotuning.

+1.

If we can't make the autotuning better than that, we're better off holding off on that one until we can actually figure out something better. (At which point perhaps we can reach the level where we can just remove it.. But that's all handwaving about the future of course).
 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, May 7, 2014 at 3:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> If we believe that 25% of shared_buffers worth of heap blocks would
>> flush the cache doing a SeqScan, why should we allow 400% of
>> shared_buffers worth of index blocks?

> I think you're comparing apples and oranges.  The 25% threshold is
> answering the question "How big does a sequential scan have to be
> before it's likely to flush so much so much unrelated data out of
> shared_buffers that it hurts the performance of other things running
> on the system?". So it's not really about whether or not things will
> *fit* in the cache, but rather a judgement about at what point caching
> that stuff is going to be less value than continuing to cache other
> things.  Also, it's specifically a judgement about shared_buffers, not
> system memory.

> But effective_cache_size is used to estimate the likelihood that an
> index scan which accesses the same heap or index block twice will
> still be in cache on the second hit, and thus need to be faulted in
> only once.  So this *is* a judgment about what will fit - generally
> over a very short time scale.  And, since bringing a page into
> shared_buffers from the OS cache is much less expensive than bringing
> a page into memory from disk, it's really about what will fit in
> overall system memory, not just shared_buffers.

Another point is that the 25% seqscan threshold actually controls some
specific caching decisions, which effective_cache_size does not.  Raising
effective_cache_size "too high" is unlikely to result in cache trashing;
in fact I'd guess the opposite.  What that would do is cause the planner
to prefer indexscans over seqscans in more cases involving large tables.
But if you've got a table+index that's bigger than RAM, seqscans are
probably going to be worse for the OS cache than indexscans, because
they're going to require bringing in more data.

So I still think this whole argument is founded on shaky hypotheses
with a complete lack of hard data showing that a smaller default for
effective_cache_size would be better.  The evidence we have points
in the other direction.
        regards, tom lane



On 05/07/2014 10:12 AM, Andres Freund wrote:
> On 2014-05-07 10:07:07 -0400, Tom Lane wrote:
>> In the meantime, it seems like there is an emerging consensus that nobody
>> much likes the existing auto-tuning behavior for effective_cache_size,
>> and that we should revert that in favor of just increasing the fixed
>> default value significantly.  I see no problem with a value of say 4GB;
>> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
>> modern machine.
>>
>> Votes for or against?
> +1 for increasing it to 4GB and remove the autotuning. I don't like the
> current integration into guc.c much and a new static default doesn't
> seem to be worse than the current autotuning.
>
>


+1. If we ever want to implement an auto-tuning heuristic it seems we're 
going to need some hard empirical evidence to support it, and that 
doesn't seem likely to appear any time soon.

cheers

andrew



On 7 May 2014 15:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> I think I'm arguing myself towards using a BufferAccessStrategy of
>> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
>> BitMapHeapScans.
>
> As soon as you've got some hard evidence to present in favor of such
> changes, we can discuss it.  I've got other things to do besides
> hypothesize.

Now we have a theory to test, I'll write a patch and we can collect
evidence for, or against.

> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.
>
> Votes for or against?

+1 for fixed 4GB and remove the auto-tuning code.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On 7 May 2014 15:10, Merlin Moncure <mmoncure@gmail.com> wrote:

> The core issues are:
> 1) There is no place to enter total system memory available to the
> database in postgresql.conf
> 2) Memory settings (except for the above) are given as absolute
> amounts, not percentages.

Those sound useful starting points.

The key issue for me is that effective_cache_size is a USERSET. It
applies per-query, just like work_mem (though work_mem is per query
node).

If we had "total system memory" we wouldn't know how to divide it up
amongst users since we have no functionality for "workload
management".

It would be very nice to be able to tell Postgres that "I have 64GB
RAM, use it wisely". At present, any and all users can set
effective_cache_size and work_mem to any value they please, any time
they wish and thus overuse available memory. Which is why I've had to
write plugins to manage the memory allocations better in userspace.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On 05/07/2014 07:31 AM, Andrew Dunstan wrote:

> +1. If we ever want to implement an auto-tuning heuristic it seems we're
> going to need some hard empirical evidence to support it, and that
> doesn't seem likely to appear any time soon.

4GB default it is, then.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On 05/06/2014 10:35 PM, Peter Geoghegan wrote:
> +1. In my view, we probably should have set it to a much higher
> absolute default value. The main problem with setting it to any
> multiple of shared_buffers that I can see is that shared_buffers is a
> very poor proxy for what effective_cache_size is supposed to
> represent. In general, the folk wisdom around sizing shared_buffers
> has past its sell-by date.

Unfortunately nobody has the time/resources to do the kind of testing
required for a new recommendation for shared_buffers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Wed, May 7, 2014 at 11:04 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Unfortunately nobody has the time/resources to do the kind of testing
> required for a new recommendation for shared_buffers.

I meant to suggest that the buffer manager could be improved to the
point that the old advice becomes obsolete. Right now, it's much
harder to analyze shared_buffers than it should be, presumably because
of the problems with the buffer manager. I think that if we could
formulate better *actionable* advice around what we have right now,
that would have already happened.

We ought to be realistic about the fact that the current
recommendations around sizing shared_buffers are nothing more than
folk wisdom. That's the best we have right now, but that seems quite
unsatisfactory to me.

-- 
Peter Geoghegan



On Tue, May 6, 2014 at 9:55 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-05-06 17:43:45 +0100, Simon Riggs wrote:
 
> All this changes is the cost of
> IndexScans that would use more than 25% of shared_buffers worth of
> data. Hopefully not many of those in your workload. Changing the cost
> doesn't necessarily prevent index scans either. And if there are many
> of those in your workload AND you run more than one at same time, then
> the larger setting will work against you. So the benefit window for
> such a high setting is slim, at best.

Not only do you need to run more than one at a time, but they also must use mostly disjoint sets of data, in order for the larger estimate to be bad.
 

Why? There's many workloads where indexes are larger than shared buffers
but fit into the operating system's cache. And that's precisely what
effective_cache_size is about.

It is more about the size of the table referenced by the index, rather than the size of the index.  The point is that doing a large index scan might lead you to visit the same table blocks repeatedly within quick succession.  (If a small index scan is on the inner side of a nested loop, then you might access the same index leaf blocks and the same table blocks repeatedly--that is why is only mostly about the table size, rather than exclusively).
 
Cheers,

Jeff
On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, May 7, 2014 at 11:04 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> Unfortunately nobody has the time/resources to do the kind of testing
>> required for a new recommendation for shared_buffers.
>
> I meant to suggest that the buffer manager could be improved to the
> point that the old advice becomes obsolete. Right now, it's much
> harder to analyze shared_buffers than it should be, presumably because
> of the problems with the buffer manager. I think that if we could
> formulate better *actionable* advice around what we have right now,
> that would have already happened.
>
> We ought to be realistic about the fact that the current
> recommendations around sizing shared_buffers are nothing more than
> folk wisdom. That's the best we have right now, but that seems quite
> unsatisfactory to me.

I think the stock advice is worse then nothing because it is A. based
on obsolete assumptions and B. doesn't indicate what the tradeoffs are
or what kinds of symptoms adjusting the setting could alleviate.  The
documentation should be reduced to things that are known, for example:

*) raising shared buffers does not 'give more memory to postgres for
caching' -- it can only reduce it via double paging
*) are generally somewhat faster than fault to o/s buffers
*) large s_b than working dataset size can be good configuration for
read only loads especially
*) have bad interplay with o/s in some configurations with large settings
*) shared buffers can reduce write i/o in certain workloads
*) interplay with checkpoint
*) have different mechanisms for managing contention than o/s buffers

merlin



On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > On Wed, May 7, 2014 at 11:04 AM, Josh Berkus <josh@agliodbs.com> wrote:
> >> Unfortunately nobody has the time/resources to do the kind of testing
> >> required for a new recommendation for shared_buffers.
> >
> > I meant to suggest that the buffer manager could be improved to the
> > point that the old advice becomes obsolete. Right now, it's much
> > harder to analyze shared_buffers than it should be, presumably because
> > of the problems with the buffer manager. I think that if we could
> > formulate better *actionable* advice around what we have right now,
> > that would have already happened.
> >
> > We ought to be realistic about the fact that the current
> > recommendations around sizing shared_buffers are nothing more than
> > folk wisdom. That's the best we have right now, but that seems quite
> > unsatisfactory to me.
> 
> I think the stock advice is worse then nothing because it is A. based
> on obsolete assumptions and B. doesn't indicate what the tradeoffs are
> or what kinds of symptoms adjusting the setting could alleviate.  The
> documentation should be reduced to things that are known, for example:
> 
> *) raising shared buffers does not 'give more memory to postgres for
> caching' -- it can only reduce it via double paging

That's absolutely not a necessary consequence. If pages are in s_b for a
while the OS will be perfectly happy to throw them away.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On 05/07/2014 11:13 AM, Peter Geoghegan wrote:
> We ought to be realistic about the fact that the current
> recommendations around sizing shared_buffers are nothing more than
> folk wisdom. That's the best we have right now, but that seems quite
> unsatisfactory to me.

So, as one of several people who put literally hundreds of hours into
the original benchmarking which established the sizing recommendations
for shared_buffers (and other settings), I find the phrase "folk wisdom"
personally offensive.  So, can we stop with this?

Otherwise, I don't think I can usefully participate in this discussion.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> *) raising shared buffers does not 'give more memory to postgres for
>> caching' -- it can only reduce it via double paging
>
> That's absolutely not a necessary consequence. If pages are in s_b for a
> while the OS will be perfectly happy to throw them away.

The biggest problem with double buffering is not that it wastes
memory. Rather, it's that it wastes memory bandwidth. I think that
lessening that problem will be the major benefit of making larger
shared_buffers settings practical.

-- 
Peter Geoghegan



On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> *) raising shared buffers does not 'give more memory to postgres for
> >> caching' -- it can only reduce it via double paging
> >
> > That's absolutely not a necessary consequence. If pages are in s_b for a
> > while the OS will be perfectly happy to throw them away.
> 
> The biggest problem with double buffering is not that it wastes
> memory. Rather, it's that it wastes memory bandwidth.

Doesn't match my experience. Even with the current buffer manager
there's usually enough locality to keep important pages in s_b for a
meaningful time. I *have* seen workloads that should have fit into
memory not fit because of double buffering.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Wed, May 7, 2014 at 2:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 05/07/2014 11:13 AM, Peter Geoghegan wrote:
>> We ought to be realistic about the fact that the current
>> recommendations around sizing shared_buffers are nothing more than
>> folk wisdom. That's the best we have right now, but that seems quite
>> unsatisfactory to me.
>
> So, as one of several people who put literally hundreds of hours into
> the original benchmarking which established the sizing recommendations
> for shared_buffers (and other settings), I find the phrase "folk wisdom"
> personally offensive.  So, can we stop with this?
>
> Otherwise, I don't think I can usefully participate in this discussion.

+1.

I think it is quite accurate to say that we can't predict precisely
what value of shared_buffers will perform best for a particular
workload and on a particular system.  There are people out there using
very large values and very small ones, according to what they have
found most effective.  But that does not mean, as the phrase "folk
wisdom" might be taken to imply, that we don't know anything at all
about what actually works well in practice.  Because we do know quite
a bit about that.  I and people I work with have been able to improve
performance greatly on many systems by providing guidance based on
what this community has been able to understand on this topic, and
dismissing it as rubbish is wrong.

Also, I seriously doubt that a one-size-fits-all guideline about
setting shared_buffers will ever be right for every workload.
Workloads, by their nature, are complex beasts.  The size of the
workload varies, and which portions of it are how hot vary, and the
read-write mix varies, and those are not problems with PostgreSQL;
those are problems with data.  That is not to say that we can't do
anything to make PostgreSQL work better across a wider range of
settings for shared_buffers, but it is to say that no matter how much
work we do on the code, setting this optimally for every workload will
probably remain complex.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 7, 2014 at 2:49 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote:
>> On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> >> *) raising shared buffers does not 'give more memory to postgres for
>> >> caching' -- it can only reduce it via double paging
>> >
>> > That's absolutely not a necessary consequence. If pages are in s_b for a
>> > while the OS will be perfectly happy to throw them away.
>>
>> The biggest problem with double buffering is not that it wastes
>> memory. Rather, it's that it wastes memory bandwidth.
>
> Doesn't match my experience. Even with the current buffer manager
> there's usually enough locality to keep important pages in s_b for a
> meaningful time. I *have* seen workloads that should have fit into
> memory not fit because of double buffering.

Same here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 7, 2014 at 11:40 AM, Josh Berkus <josh@agliodbs.com> wrote:
> So, as one of several people who put literally hundreds of hours into
> the original benchmarking which established the sizing recommendations
> for shared_buffers (and other settings), I find the phrase "folk wisdom"
> personally offensive.  So, can we stop with this?

I have also put a lot of time into benchmarking. No personal offence
was intended, and I'm glad that we have some advice to give to users,
but the fact of the matter is that current *official* recommendations
are very vague.


-- 
Peter Geoghegan



On Wed, May 7, 2014 at 11:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> But that does not mean, as the phrase "folk
> wisdom" might be taken to imply, that we don't know anything at all
> about what actually works well in practice.

Folk wisdom doesn't imply that. It implies that we think this works,
and we may well be right, but there isn't all that much rigor behind
some of it. I'm not blaming anyone for this state of affairs. I've
heard plenty of people repeat the "don't exceed 8GB" rule - I
regularly repeated it myself. I cannot find any rigorous defense of
this, though. If you're aware of one, please point it out to me.

-- 
Peter Geoghegan



On 05/07/2014 11:52 AM, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 11:40 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> So, as one of several people who put literally hundreds of hours into
>> the original benchmarking which established the sizing recommendations
>> for shared_buffers (and other settings), I find the phrase "folk wisdom"
>> personally offensive.  So, can we stop with this?
> 
> I have also put a lot of time into benchmarking. No personal offence
> was intended, and I'm glad that we have some advice to give to users,
> but the fact of the matter is that current *official* recommendations
> are very vague.

Well, they should be vague; the only hard data we have is rather
out-of-date (I think 8.2 was our last set of tests).  If we gave users
specific, detailed recommendations, we'd be misleading them.

For that matter, our advice on shared_buffers ... and our design for it
... is going to need to change radically soon, since Linux is getting an
ARC with a frequency cache as well as a recency cache, and FreeBSD and
OpenSolaris already have them.

FWIW, if someone could fund me for a month, I'd be happy to create a
benchmarking setup where we could test these kinds of things; I have
pretty clear ideas how to build one.  I imagine some of our other
consultants could make the same offer.  However, it's too much work for
anyone to get done "in their spare time".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Wed, May 7, 2014 at 11:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Doesn't match my experience. Even with the current buffer manager
>> there's usually enough locality to keep important pages in s_b for a
>> meaningful time. I *have* seen workloads that should have fit into
>> memory not fit because of double buffering.
>
> Same here.

I think that it depends on whether or not you're thinking about the
worst case. Most people are not going to be in the category you
describe here. Plenty of people in the Postgres community run with
very large shared_buffers settings, on non i/o bound workloads, and
report good results - often massive, quickly apparent improvements.
I'm mostly concerned with obsoleting the 8GB hard ceiling rule here.

It probably doesn't matter whether and by how much one factor is worse
than the other, though. I found the section "5.2 Temporal Control:
Buffering" in the following paper, that speaks about the subject quite
interesting: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
--
Peter Geoghegan



On Wed, May 7, 2014 at 2:58 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, May 7, 2014 at 11:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But that does not mean, as the phrase "folk
>> wisdom" might be taken to imply, that we don't know anything at all
>> about what actually works well in practice.
>
> Folk wisdom doesn't imply that. It implies that we think this works,
> and we may well be right, but there isn't all that much rigor behind
> some of it. I'm not blaming anyone for this state of affairs. I've
> heard plenty of people repeat the "don't exceed 8GB" rule - I
> regularly repeated it myself. I cannot find any rigorous defense of
> this, though. If you're aware of one, please point it out to me.

I'm not sure the level of rigor you'd like to see is going to be
available here.  Complex systems have complex behavior; that's life.

At any rate, I'm not aware of any rigorous defense of the "don't
exceed 8GB" rule.  But, #1, I'd never put it that simply.   What I've
found is more like this: If it's possible to size shared_buffers so
that the working set fits entirely within shared_buffers, that
configuration is worthy of strong consideration.  Otherwise, you
probably want to keep shared_buffers low in order to avoid
checkpoint-related I/O spikes and minimize double buffering; try 25%
of system memory up to 512MB on Windows or up to 2GB on 32-bit Linux
or up to 8GB on 64-bit Linux for starters, and then tune based on your
workload.

And #2, I think the origin of the 8GB number on 64-bit non-Windows
systems is that people found that checkpoint-related I/O spikes became
intolerable when you went too much above that number.  On some
systems, the threshold is lower than that - for example, I believe
Merlin and others have reported numbers more like 2GB than 8GB - and
on other systems, the threshold is higher - indeed, some people go way
higher and never hit it at all.  I agree that it would be nice to
better-characterize why different users hit it at different levels,
but it's probably highly dependent on hardware, workload, and kernel
version, so I tend to doubt it can be characterized very simply.

If I had go to guess, I'd bet that fixing Linux's abominable behavior
around the fsync() call would probably go a long way toward making
higher values of shared_buffers more practical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 7, 2014 at 11:04 AM, Josh Berkus <josh@agliodbs.com> wrote:
On 05/06/2014 10:35 PM, Peter Geoghegan wrote:
> +1. In my view, we probably should have set it to a much higher
> absolute default value. The main problem with setting it to any
> multiple of shared_buffers that I can see is that shared_buffers is a
> very poor proxy for what effective_cache_size is supposed to
> represent. In general, the folk wisdom around sizing shared_buffers
> has past its sell-by date.

Unfortunately nobody has the time/resources to do the kind of testing
required for a new recommendation for shared_buffers.

I think it is worse than that.  I don't think we know what such testing would even look like.  SSD?  BBU? max_connections=20000 with 256 cores?  pgbench -N?  capture and replay of Amazon's workload?

If we could spell out/agree upon what kind of testing we would find convincing, that would probably go a long way to getting some people to work on carrying out the tests.  Unless the conclusion was "please have 3TB or RAM and a 50 disk RAID", then there might be few takers.
 
Cheers,

Jeff
On 05/07/2014 01:36 PM, Jeff Janes wrote:
> On Wed, May 7, 2014 at 11:04 AM, Josh Berkus <josh@agliodbs.com> wrote:

>> Unfortunately nobody has the time/resources to do the kind of testing
>> required for a new recommendation for shared_buffers.

> I think it is worse than that.  I don't think we know what such testing
> would even look like.  SSD?  BBU? max_connections=20000 with 256 cores?
>  pgbench -N?  capture and replay of Amazon's workload?
> 
> If we could spell out/agree upon what kind of testing we would find
> convincing, that would probably go a long way to getting some people to
> work on carrying out the tests.  Unless the conclusion was "please have 3TB
> or RAM and a 50 disk RAID", then there might be few takers.

Well, step #1 would be writing some easy-to-run benchmarks which carry
out selected workloads and measure response times.  The minimum starting
set would include one OLTP/Web benchmark, and one DW benchmark.

I'm not talking about the software to run the workload; we have that, in
several varieties.  I'm talking about the actual database generator and
queries to run.  That's the hard work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
>
> *) raising shared buffers does not 'give more memory to postgres for
> caching' -- it can only reduce it via double paging

That's absolutely not a necessary consequence. If pages are in s_b for a
while the OS will be perfectly happy to throw them away.

Is that an empirical observation?  I've run some simulations a couple years ago, and also wrote some instrumentation to test that theory under favorably engineered (but still plausible) conditions, and couldn't get more than a small fraction of s_b to be so tightly bound in that the kernel could forget about them.  Unless of course the entire workload or close to it fits in s_b.

Cheers,

Jeff
On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com>wrote:
> 
> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> > >
> > > *) raising shared buffers does not 'give more memory to postgres for
> > > caching' -- it can only reduce it via double paging
> >
> > That's absolutely not a necessary consequence. If pages are in s_b for a
> > while the OS will be perfectly happy to throw them away.
> >
> 
> Is that an empirical observation?

Yes.

> I've run some simulations a couple years
> ago, and also wrote some instrumentation to test that theory under
> favorably engineered (but still plausible) conditions, and couldn't get
> more than a small fraction of s_b to be so tightly bound in that the kernel
> could forget about them.  Unless of course the entire workload or close to
> it fits in s_b.

I think it depends on your IO access patterns. If the whole working set
fits into the kernel's page cache and there's no other demand for pages
it will stay in. If you constantly rewrite most all your pages they'll
also stay in the OS cache because they'll get written out. If the churn
in shared_buffers is so high (because it's so small in comparison to the
core hot data set) that there'll be dozens if not hundreds clock sweeps
a second you'll also have no locality.
It's also *hugely* kernel version specific :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Wed, May 7, 2014 at 4:15 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
>> On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com>wrote:
>>
>> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
>> > >
>> > > *) raising shared buffers does not 'give more memory to postgres for
>> > > caching' -- it can only reduce it via double paging
>> >
>> > That's absolutely not a necessary consequence. If pages are in s_b for a
>> > while the OS will be perfectly happy to throw them away.
>> >
>>
>> Is that an empirical observation?
>
> Yes.
>
>> I've run some simulations a couple years
>> ago, and also wrote some instrumentation to test that theory under
>> favorably engineered (but still plausible) conditions, and couldn't get
>> more than a small fraction of s_b to be so tightly bound in that the kernel
>> could forget about them.  Unless of course the entire workload or close to
>> it fits in s_b.
>
> I think it depends on your IO access patterns. If the whole working set
> fits into the kernel's page cache and there's no other demand for pages
> it will stay in. If you constantly rewrite most all your pages they'll
> also stay in the OS cache because they'll get written out. If the churn
> in shared_buffers is so high (because it's so small in comparison to the
> core hot data set) that there'll be dozens if not hundreds clock sweeps
> a second you'll also have no locality.
> It's also *hugely* kernel version specific :(

right.  This is, IMNSHO, exactly the sort of language that belongs in the docs.

merlin



On Wed, May 7, 2014 at 2:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> right.  This is, IMNSHO, exactly the sort of language that belongs in the docs.

+1

-- 
Peter Geoghegan



On 2014-05-07 16:24:53 -0500, Merlin Moncure wrote:
> On Wed, May 7, 2014 at 4:15 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
> >> On Wed, May 7, 2014 at 11:38 AM, Andres Freund <andres@2ndquadrant.com>wrote:
> >>
> >> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> >> > >
> >> > > *) raising shared buffers does not 'give more memory to postgres for
> >> > > caching' -- it can only reduce it via double paging
> >> >
> >> > That's absolutely not a necessary consequence. If pages are in s_b for a
> >> > while the OS will be perfectly happy to throw them away.
> >> >
> >>
> >> Is that an empirical observation?
> >
> > Yes.
> >
> >> I've run some simulations a couple years
> >> ago, and also wrote some instrumentation to test that theory under
> >> favorably engineered (but still plausible) conditions, and couldn't get
> >> more than a small fraction of s_b to be so tightly bound in that the kernel
> >> could forget about them.  Unless of course the entire workload or close to
> >> it fits in s_b.
> >
> > I think it depends on your IO access patterns. If the whole working set
> > fits into the kernel's page cache and there's no other demand for pages
> > it will stay in. If you constantly rewrite most all your pages they'll
> > also stay in the OS cache because they'll get written out. If the churn
> > in shared_buffers is so high (because it's so small in comparison to the
> > core hot data set) that there'll be dozens if not hundreds clock sweeps
> > a second you'll also have no locality.
> > It's also *hugely* kernel version specific :(
> 
> right.  This is, IMNSHO, exactly the sort of language that belongs in the docs.

Well, that's just the tip of the iceberg though. Whether you can accept
small shared_buffers to counteract double buffering or not is also a
hard to answer question... That again heavily depends on the usage
patterns. If you have high concurrency and your working set has some
locality it's very important to have a high s_b lest you fall afoul of
the freelist lock. If you have high concurrency but 90+ of your page
lookups *aren't* going to be in the cache you need to be very careful
with a large s_b because the clock sweeps to lower the usagecounts can
enlarge the lock contention.
Then there's both memory and cache efficiency questions around both the
PrivateRefCount array and the lwlocks....

In short: I think it's pretty hard to transfer this into language that's
a) agreed upon b) understandable to someone that hasn't discovered
several of the facts for him/herself.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On Wed, May 7, 2014 at 12:06 PM, Josh Berkus <josh@agliodbs.com> wrote:
> For that matter, our advice on shared_buffers ... and our design for it
> ... is going to need to change radically soon, since Linux is getting an
> ARC with a frequency cache as well as a recency cache, and FreeBSD and
> OpenSolaris already have them.

I knew about ZFS, but Linux is implementing ARC? There are good
reasons to avoid ARC. CAR seems like a more plausible candidate, since
it apparently acknowledges ARC's shortcomings and fixes them.

-- 
Peter Geoghegan



On Tue, May  6, 2014 at 11:15:17PM +0100, Simon Riggs wrote:
> > Well, for what it's worth, I've encountered systems where setting
> > effective_cache_size too low resulted in bad query plans, but I've
> > never encountered the reverse situation.
> 
> I agree with that.
> 
> Though that misses my point, which is that you can't know that all of
> that memory is truly available on a server with many concurrent users.
> Choosing settings that undercost memory intensive plans are not the
> best choice for a default strategy in a mixed workload when cache may
> be better used elsewhere, even if such settings make sense for some
> individual users.

This is the same problem we had with auto-tuning work_mem, in that we
didn't know what other concurrent activity was happening.  Seems we need
concurrent activity detection before auto-tuning work_mem and
effective_cache_size.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
> This is the same problem we had with auto-tuning work_mem, in that we
> didn't know what other concurrent activity was happening.  Seems we need
> concurrent activity detection before auto-tuning work_mem and
> effective_cache_size.
>

Perhaps I am missing something obvious here, but would mmgr have any
useful numbers on this? Like any book-keeping info maintained by
mcxt.c/aset.c? Would extending that interface help?

--
Amit



On Thu, May 15, 2014 at 10:23:19PM +0900, Amit Langote wrote:
> On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > This is the same problem we had with auto-tuning work_mem, in that we
> > didn't know what other concurrent activity was happening.  Seems we need
> > concurrent activity detection before auto-tuning work_mem and
> > effective_cache_size.
> >
> 
> Perhaps I am missing something obvious here, but would mmgr have any
> useful numbers on this? Like any book-keeping info maintained by
> mcxt.c/aset.c? Would extending that interface help?

No, all memory allocat is per-process, except for shared memory.  We
probably need a way to record our large local memory allocations in
PGPROC that other backends can see;  same for effective cache size
assumptions we make.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



On Thu, May 15, 2014 at 11:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, May 15, 2014 at 10:23:19PM +0900, Amit Langote wrote:
>> On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> >
>> > This is the same problem we had with auto-tuning work_mem, in that we
>> > didn't know what other concurrent activity was happening.  Seems we need
>> > concurrent activity detection before auto-tuning work_mem and
>> > effective_cache_size.
>> >
>>
>> Perhaps I am missing something obvious here, but would mmgr have any
>> useful numbers on this? Like any book-keeping info maintained by
>> mcxt.c/aset.c? Would extending that interface help?
>
> No, all memory allocat is per-process, except for shared memory.  We
> probably need a way to record our large local memory allocations in
> PGPROC that other backends can see;  same for effective cache size
> assumptions we make.
>

I see. I thought there would be some centralised way to traverse, say,
a linked list of contexts that individual backends create or something
like that. But, I suppose it would not be straightforward to make any
of that work for what we are after here.

--
Amit



On Thu, May 15, 2014 at 11:36:51PM +0900, Amit Langote wrote:
> > No, all memory allocat is per-process, except for shared memory.  We
> > probably need a way to record our large local memory allocations in
> > PGPROC that other backends can see;  same for effective cache size
> > assumptions we make.
> >
> 
> I see. I thought there would be some centralised way to traverse, say,
> a linked list of contexts that individual backends create or something
> like that. But, I suppose it would not be straightforward to make any
> of that work for what we are after here.

The problem is locking overhead between sessions.  Right now we avoid
all of that, and I think if we just put the value in PGPROC, it will be
good enough with limited locking required.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



On Thu, May 15, 2014 at 8:06 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, May  6, 2014 at 11:15:17PM +0100, Simon Riggs wrote:
>> > Well, for what it's worth, I've encountered systems where setting
>> > effective_cache_size too low resulted in bad query plans, but I've
>> > never encountered the reverse situation.
>>
>> I agree with that.
>>
>> Though that misses my point, which is that you can't know that all of
>> that memory is truly available on a server with many concurrent users.
>> Choosing settings that undercost memory intensive plans are not the
>> best choice for a default strategy in a mixed workload when cache may
>> be better used elsewhere, even if such settings make sense for some
>> individual users.
>
> This is the same problem we had with auto-tuning work_mem, in that we
> didn't know what other concurrent activity was happening.  Seems we need
> concurrent activity detection before auto-tuning work_mem and
> effective_cache_size.

I think it's worse than that: we don't even know what else is
happening *in the same query*.  For example, look at this:

http://www.postgresql.org/message-id/16161.1324414006@sss.pgh.pa.us

That's pretty awful, and it's just one example of a broader class of
problems that we haven't even tried to solve.  We really need a way to
limit memory usage on a per-query basis rather than a per-node basis.
For example, consider a query plan that needs to do four sorts.  If
work_mem = 64MB, we'll happily use 256MB total, 64MB for each sort.
Now, that might cause the system to swap: since there are four sorts,
maybe we ought to have used only 16MB per sort, and switched to a heap
sort if that wasn't enough.  But it's even subtler than that: if we
had known when building the query plan that we only had 16MB per sort
rather than 64MB per sort, we would potentially have estimated higher
costs for those sorts in the first place, which might have led to a
different plan that needed fewer sorts to begin with.

When you start to try to balance memory usage across multiple
backends, things get even more complicated.  If the first query that
starts up is allowed to use all the available memory, and we respond
to that by lowering the effective value of work_mem to something very
small, a second query that shows up a bit later might choose a very
inefficient plan as a result.  That in turn might cause heavy I/O load
on the system for a long time, making the first query run very slowly.We might have been better off just letting the
firstquery finish,
 
and the running the second one (with a much better plan) after it was
done.  Or, maybe we should have only let the first query take a
certain fraction (half? 10%?) of the available memory, so that there
was more left for the second guy.  But that could be wrong too - it
might cause the first plan to be unnecessarily inefficient when nobody
was planning to run any other queries anyway.  Plus, DBAs hate it when
plans change on them unexpectedly, so anything that involves a
feedback loop between current utilization and query plans will be
unpopular with some people for that reason.

These are hard problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company