Thread: per table random-page-cost?

per table random-page-cost?

From

marcin mank

Date:

19 October 2009, 18:10:37

Currently random_page_cost is a GUC. I propose that this could be set per-table.

I think this is a good idea for widely-wanted planner hints. This way
You can say "I do NOT want this table to be index-scanned, because I
know it is not cached" by setting it`s random_page_cost to a large
value (an obviously You can do the other way around, when setting the
random_page_cost to 1 You say "I don`t care how You fetch the pages,
they are all in cache")

The value for the per-table setting could be inferred from
pg_stat(io)?.*tables . We could have a tool to suggest appropriate
values.

We could call it something like cached_percentage (and have the cost
of a random tuple fetch be inferred from the global random_page_cost,
seq_tuple_cost and the per-table cached_percentage). Then we could set
the global random_page_cost to a sane value like 200. Now one can
wonder why the planner works while having such blantantly unrealistic
values for random_page_cost :)

What do You think?

Greetings
Marcin Mank

Re: per table random-page-cost?

From

Robert Haas

Date:

19 October 2009, 18:14:50

On Mon, Oct 19, 2009 at 5:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
> Currently random_page_cost is a GUC. I propose that this could be set per-table.
>
> I think this is a good idea for widely-wanted planner hints. This way
> You can say "I do NOT want this table to be index-scanned, because I
> know it is not cached" by setting it`s random_page_cost to a large
> value (an obviously You can do the other way around, when setting the
> random_page_cost to 1 You say "I don`t care how You fetch the pages,
> they are all in cache")
>
> The value for the per-table setting could be inferred from
> pg_stat(io)?.*tables . We could have a tool to suggest appropriate
> values.
>
> We could call it something like cached_percentage (and have the cost
> of a random tuple fetch be inferred from the global random_page_cost,
> seq_tuple_cost and the per-table cached_percentage). Then we could set
> the global random_page_cost to a sane value like 200. Now one can
> wonder why the planner works while having such blantantly unrealistic
> values for random_page_cost :)
>
> What do You think?

I've been thinking about this a bit, too.  I've been wondering if it
might make sense to have a "random_page_cost" and "seq_page_cost"
setting for each TABLESPACE, to compensate for the fact that different
media might be faster or slower, and a percent-cached setting for each
table over top of that.

...Robert

Re: per table random-page-cost?

From

Greg Stark

Date:

19 October 2009, 18:27:52

On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
> Currently random_page_cost is a GUC. I propose that this could be set per-table.

Or per-tablespace.

Yes, I think there are a class of GUCs which describe the physical
attributes of the storage system which should be per-table or
per-tablespace. random_page_cost, sequential_page_cost,
effective_io_concurrency come to mind.

While this isn't a simple flag to change it does seem like a bit of a
SMOP. The GUC infrastructure stores these values in global variables
which the planner and other systems consult directly. They would
instead have to be made storage parameters which the planner and other
systems check on the appropriate table and default to the global GUC
if they're not set.

-- 
greg

Re: per table random-page-cost?

From

"Kevin Grittner"

Date:

19 October 2009, 18:55:19

Robert Haas <robertmhaas@gmail.com> wrote:
> I've been wondering if it might make sense to have a
> "random_page_cost" and "seq_page_cost" setting for each TABLESPACE,
> to compensate for the fact that different media might be faster or
> slower, and a percent-cached setting for each table over top of
> that.
[after recovering from the initial cringing reaction...]
How about calculating an effective percentage based on other
information.  effective_cache_size, along with relation and database
size, come to mind.  How about the particular index being considered
for the plan?  Of course, you might have to be careful about working
in TOAST table size for a particular query, based on the columns
retrieved.
I have no doubt that there would be some major performance regressions
in the first cut of anything like this, for at least *some* queries. 
The toughest part of this might be to get adequate testing to tune it
for a wide enough variety of real-life situations.
-Kevin

Re: per table random-page-cost?

From

marcin mank

Date:

19 October 2009, 19:18:28

> I've been thinking about this a bit, too.  I've been wondering if it
> might make sense to have a "random_page_cost" and "seq_page_cost"
> setting for each TABLESPACE, to compensate for the fact that different
> media might be faster or slower, and a percent-cached setting for each
> table over top of that.
>

I thought about making it per-table, but realistically I think most
people don`t use tablespaces now. I would not want to be telling
people "to be able to hint the planner to (not) index-scan the table,
You must move it to a separate tablespace".

A global default, a per-tablespace default overriding it, and a
per-table value overriding them both seems like over-engineering to
me.

Greetings
Marcin

Re: per table random-page-cost?

From

marcin mank

Date:

19 October 2009, 19:19:44

> I thought about making it per-table***space***, but realistically I

Re: per table random-page-cost?

From

Tom Lane

Date:

19 October 2009, 19:33:52

marcin mank <marcin.mank@gmail.com> writes:
> I thought about making it per-table, but realistically I think most
> people don`t use tablespaces now. I would not want to be telling
> people "to be able to hint the planner to (not) index-scan the table,
> You must move it to a separate tablespace".

Per-table is not physically sensible.  Per-tablespace has some rationale
to it.
        regards, tom lane

Re: per table random-page-cost?

From

Andrew Dunstan

Date:

19 October 2009, 19:45:44

marcin mank wrote:
>> I've been thinking about this a bit, too.  I've been wondering if it
>> might make sense to have a "random_page_cost" and "seq_page_cost"
>> setting for each TABLESPACE, to compensate for the fact that different
>> media might be faster or slower, and a percent-cached setting for each
>> table over top of that.
>>
>>     
>
> I thought about making it per-table, but realistically I think most
> people don`t use tablespaces now. I would not want to be telling
> people "to be able to hint the planner to (not) index-scan the table,
> You must move it to a separate tablespace".
>   

This is just plain wrong, in my experience. *Every* large installation I 
deal with uses tablespaces.

This proposal is just "hints by the back door", ISTM. As Tom says, there 
is a justification for having it on tablespaces but not on individual 
tables.

If you want to argue for full blown planner hints, that's a whole other 
story. Have you read the previous debates on the subject?

cheers

Re: per table random-page-cost?

From

marcin mank

Date:

19 October 2009, 20:04:37

> This proposal is just "hints by the back door", ISTM. As Tom says, there is
> a justification for having it on tablespaces but not on individual tables.

If the parameter is defined as "the chance that a page is in cache"
there is very real physical meaning to it. And this is per-table, not
per-tablespace. A "users" table will likely be fetched from cache all
the time, while a "billing_records" table will be fetched mostly from
disk.

Greetings
Marcin

Re: per table random-page-cost?

From

Tom Lane

Date:

19 October 2009, 20:21:35

marcin mank <marcin.mank@gmail.com> writes:
>> This proposal is just "hints by the back door", ISTM. As Tom says, there is
>> a justification for having it on tablespaces but not on individual tables.

> If the parameter is defined as "the chance that a page is in cache"
> there is very real physical meaning to it.

We have no such parameter...
        regards, tom lane

Re: per table random-page-cost?

From

Greg Stark

Date:

19 October 2009, 20:29:38

On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> marcin mank <marcin.mank@gmail.com> writes:
>>> This proposal is just "hints by the back door", ISTM. As Tom says, there is
>>> a justification for having it on tablespaces but not on individual tables.
>
>> If the parameter is defined as "the chance that a page is in cache"
>> there is very real physical meaning to it.
>
> We have no such parameter...


And we want our parameters to be things the DBA has a chance of being
able to estimate. How would you come up with sensible figures for this
hypothetical parameter?


-- 
greg

Re: per table random-page-cost?

From

Greg Stark

Date:

19 October 2009, 20:39:46

On Mon, Oct 19, 2009 at 2:54 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> How about calculating an effective percentage based on other
> information.  effective_cache_size, along with relation and database
> size, come to mind.

I think previous proposals for this have fallen down when you actually
try to work out a formula for this. The problem is that you could have
a table which is much smaller than effective_cache_size but is never
in cache due to it being one of many such tables.

I think it would still be good to have some naive kind of heuristic
here as long as it's fairly predictable for DBAs.

But the long-term strategy here I think is to actually have some way
to measure the real cache hit rate on a per-table basis. Whether it's
by timing i/o operations, programmatic access to dtrace, or some other
kind of os interface, if we could know the real cache hit rate it
would be very helpful.

Perhaps we could extrapolate from the shared buffer cache percentage.
If there's a moderately high percentage in shared buffers then it
seems like a reasonable supposition to assume the filesystem cache
would have a similar distribution.

--
greg

Re: per table random-page-cost?

From

Robert Haas

Date:

19 October 2009, 21:29:46

On Mon, Oct 19, 2009 at 5:54 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I've been wondering if it might make sense to have a
>> "random_page_cost" and "seq_page_cost" setting for each TABLESPACE,
>> to compensate for the fact that different media might be faster or
>> slower, and a percent-cached setting for each table over top of
>> that.
>
> [after recovering from the initial cringing reaction...]
>
> How about calculating an effective percentage based on other
> information.  effective_cache_size, along with relation and database
> size, come to mind.  How about the particular index being considered
> for the plan?  Of course, you might have to be careful about working
> in TOAST table size for a particular query, based on the columns
> retrieved.

I think that a per-tablespace page cost should be set by the DBA, same
as we do with global page-costs now.

OTOH, I think that a per-relation percent-in-cache should be
automatically calculated by the database (somehow) and the DBA should
have an option to override in case the database does the wrong thing.
I gave a lightning talk on this topic at PGcon.

> I have no doubt that there would be some major performance regressions
> in the first cut of anything like this, for at least *some* queries.
> The toughest part of this might be to get adequate testing to tune it
> for a wide enough variety of real-life situations.

Agreed.

...Robert

Re: per table random-page-cost?

From

Jeff Davis

Date:

19 October 2009, 21:34:29

On Mon, 2009-10-19 at 16:39 -0700, Greg Stark wrote:
> But the long-term strategy here I think is to actually have some way
> to measure the real cache hit rate on a per-table basis. Whether it's
> by timing i/o operations, programmatic access to dtrace, or some other
> kind of os interface, if we could know the real cache hit rate it
> would be very helpful.

Maybe it would be simpler to just get the high-order bit: is this table
likely to be completely in cache (shared buffers or os buffer cache), or
not?

The lower cache hit ratios are uninteresting: the performance difference
between 1% and 50% is only a factor of two. The higher cache hit ratios
that are lower than "almost 100%" seem unlikely: what kind of scenario
would involve a stable 90% cache hit ratio for a table?

Regards,Jeff Davis

Re: per table random-page-cost?

From

Jeff Janes

Date:

19 October 2009, 23:21:39

On Mon, Oct 19, 2009 at 4:29 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> marcin mank <marcin.mank@gmail.com> writes:
>>>> This proposal is just "hints by the back door", ISTM. As Tom says, there is
>>>> a justification for having it on tablespaces but not on individual tables.
>>
>>> If the parameter is defined as "the chance that a page is in cache"
>>> there is very real physical meaning to it.
>>
>> We have no such parameter...
>
>
> And we want our parameters to be things the DBA has a chance of being
> able to estimate.

Do the current parameters meet that standard?  When setting
seq_page_cost now, don't people have a lot of "Well, we're about this
likely to find it in the cache anyway" built into their settings?


Jeff

Re: per table random-page-cost?

From

"Kevin Grittner"

Date:

19 October 2009, 23:22:40

Jeff Davis <pgsql@j-davis.com> wrote: 
> what kind of scenario
> would involve a stable 90% cache hit ratio for a table?
I'd bet accounts receivable applications often hit that.
(Most payments on recent billings; a sprinkling on older ones.)
I'm sure there are others.
-Kevin

Re: per table random-page-cost?

From

Jeff Davis

Date:

20 October 2009, 00:09:30

On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
> I'd bet accounts receivable applications often hit that.
> (Most payments on recent billings; a sprinkling on older ones.)
> I'm sure there are others.

You worded the examples in terms of writes (I think), and we're talking
about read caching, so I still don't entirely understand.

Also, the example sounds like you'd like to optimize across queries.
There's no mechanism for the planner to remember some query executed a
while ago, and match it up to some new query that it's trying to plan.
Maybe there should be, but that's an entirely different feature. 

I'm not clear on the scenario that we're trying to improve.

Regards,Jeff Davis

Re: per table random-page-cost?

From

Greg Smith

Date:

20 October 2009, 01:30:39

On Mon, 19 Oct 2009, Jeff Davis wrote:

> On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
>> I'd bet accounts receivable applications often hit that.
>> (Most payments on recent billings; a sprinkling on older ones.)
>> I'm sure there are others.
>
> You worded the examples in terms of writes (I think), and we're talking
> about read caching, so I still don't entirely understand.

No, that part was fair.  The unfortunate reality of accounts receivable is 
that reports run to list people who owe one money happen much more often 
than posting payments into the system does.

> Also, the example sounds like you'd like to optimize across queries.
> There's no mechanism for the planner to remember some query executed a
> while ago, and match it up to some new query that it's trying to plan.

Some of the use-cases here involve situations where you know most of a 
relation is likely to be in cache just because there's not much going on 
that might evict it.  In any case, something that attempts to model some 
average percentage you can expect a relation to be in cache is in effect 
serving as a memory of past queries.

> I'm not clear on the scenario that we're trying to improve.

Duh, that would be the situation where someone wants optimizer hints but 
can't call them that because then the idea would be reflexively rejected!

Looks like I should dust off the much more complicated proposal for 
tracking and using in-cache hit percentages I keep not having time to 
finish writing up.  Allowing a user-set value for that is a lot more 
reasonable if the system computes a reasonable one itself under normal 
circumstances.  That's what I think people really want, even if it's not 
what they're asking for.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: per table random-page-cost?

From

marcin mank

Date:

20 October 2009, 05:32:12

On Tue, Oct 20, 2009 at 1:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If the parameter is defined as "the chance that a page is in cache"
>> there is very real physical meaning to it.
>
> We have no such parameter...

What a simple person like me would think would work is:

- call the parameter "cached_probability".
- Invent a way to store it (I`d actually try to do it the exact same
way recent "alter table set statistics distinct" does it)

a) less radical idea: replace all usage of random_page_cost with
seq_page_cost * cached_probability + random_page_cost *
(1-cached_probability)

b) more radical idea:
b1) invent a new GUC: cached_page_cost
b2) replace all usage of seq_page_cost with cached_page_cost *
cached_probability + seq_page_cost * (1-cached_probability)
b3) replace all usage of random_page_cost with   cached_page_cost *
cached_probability + random_page_cost * (1-cached_probability)

> How would you come up with sensible figures for this hypothetical parameter?

select schemaname,relname,heap_blks_hit /
cast(heap_blks_read+heap_blks_hit+1 as float)from pg_statio_all_tables

Would be a nice starting point.

Greetings
Marcin

Re: per table random-page-cost?

From

"Kevin Grittner"

Date:

20 October 2009, 10:58:32

Greg Smith <gsmith@gregsmith.com> wrote:
> The unfortunate reality of accounts receivable is that reports run
> to list people who owe one money happen much more often than posting
> payments into the system does.
How often do you have to print a list of past due accounts?  I've
generally seen that done weekly or monthly, in the same places that
there are many people standing full time in payment windows just to
collect money from those lining up to pay.  When they bring in a
document which identifies the receivable (like, in our case, a traffic
or parking ticket), there's no need to look at any older data in the
database.
Heck, even our case management applications likely follow the 90% to
95% cache hit pattern in counties which aren't fully cached, since
there's a lot more activity on court cases filed this year and last
year than on cases filed 30 years ago.
-Kevin

Re: per table random-page-cost?

From

Greg Smith

Date:

20 October 2009, 13:08:31

On Tue, 20 Oct 2009, Kevin Grittner wrote:

> How often do you have to print a list of past due accounts?  I've
> generally seen that done weekly or monthly, in the same places that
> there are many people standing full time in payment windows just to
> collect money from those lining up to pay.

This is only because your A/R collections staff includes people with guns.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: per table random-page-cost?

From

Cédric Villemain

Date:

22 October 2009, 12:01:44

Le mardi 20 octobre 2009 06:30:26, Greg Smith a écrit :
> On Mon, 19 Oct 2009, Jeff Davis wrote:
> > On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
> >> I'd bet accounts receivable applications often hit that.
> >> (Most payments on recent billings; a sprinkling on older ones.)
> >> I'm sure there are others.
> >
> > You worded the examples in terms of writes (I think), and we're talking
> > about read caching, so I still don't entirely understand.
>
> No, that part was fair.  The unfortunate reality of accounts receivable is
> that reports run to list people who owe one money happen much more often
> than posting payments into the system does.
>
> > Also, the example sounds like you'd like to optimize across queries.
> > There's no mechanism for the planner to remember some query executed a
> > while ago, and match it up to some new query that it's trying to plan.
>
> Some of the use-cases here involve situations where you know most of a
> relation is likely to be in cache just because there's not much going on
> that might evict it.  In any case, something that attempts to model some
> average percentage you can expect a relation to be in cache is in effect
> serving as a memory of past queries.
>
> > I'm not clear on the scenario that we're trying to improve.
>
> Duh, that would be the situation where someone wants optimizer hints but
> can't call them that because then the idea would be reflexively rejected!
>
> Looks like I should dust off the much more complicated proposal for
> tracking and using in-cache hit percentages I keep not having time to
> finish writing up.  Allowing a user-set value for that is a lot more
> reasonable if the system computes a reasonable one itself under normal
> circumstances.  That's what I think people really want, even if it's not
> what they're asking for.

Have you already some work in a git or somewhere ?


>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: per table random-page-cost?

From

Cédric Villemain

Date:

22 October 2009, 12:03:45

Le lundi 19 octobre 2009 23:14:40, Robert Haas a écrit :
> On Mon, Oct 19, 2009 at 5:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
> > Currently random_page_cost is a GUC. I propose that this could be set
> > per-table.
> >
> > I think this is a good idea for widely-wanted planner hints. This way
> > You can say "I do NOT want this table to be index-scanned, because I
> > know it is not cached" by setting it`s random_page_cost to a large
> > value (an obviously You can do the other way around, when setting the
> > random_page_cost to 1 You say "I don`t care how You fetch the pages,
> > they are all in cache")
> >
> > The value for the per-table setting could be inferred from
> > pg_stat(io)?.*tables . We could have a tool to suggest appropriate
> > values.
> >
> > We could call it something like cached_percentage (and have the cost
> > of a random tuple fetch be inferred from the global random_page_cost,
> > seq_tuple_cost and the per-table cached_percentage). Then we could set
> > the global random_page_cost to a sane value like 200. Now one can
> > wonder why the planner works while having such blantantly unrealistic
> > values for random_page_cost :)
> >
> > What do You think?
>
> I've been thinking about this a bit, too.  I've been wondering if it
> might make sense to have a "random_page_cost" and "seq_page_cost"
> setting for each TABLESPACE, to compensate for the fact that different
> media might be faster or slower, and a percent-cached setting for each
> table over top of that.

At least settings by TABLESPACE should exists. I totaly agree with that.

>
> ...Robert
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: per table random-page-cost?

From

Cédric Villemain

Date:

22 October 2009, 12:16:47

Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
> > Currently random_page_cost is a GUC. I propose that this could be set
> > per-table.
>
> Or per-tablespace.
>
> Yes, I think there are a class of GUCs which describe the physical
> attributes of the storage system which should be per-table or
> per-tablespace. random_page_cost, sequential_page_cost,
> effective_io_concurrency come to mind.

and, perhaps effective_cache_size.

You can have situation where you don't want some tables go to OS memory (you
can disabled that at filesystem level, ... l'd like to be able to do that at
postgres level but it is another point)

So you put those tables in a separate tablespace, and tell postgresql that the
effective_cache_size is 0 (for this tablespace), up to postgres to do the right
thing with that ;)


>
> While this isn't a simple flag to change it does seem like a bit of a
> SMOP. The GUC infrastructure stores these values in global variables
> which the planner and other systems consult directly. They would
> instead have to be made storage parameters which the planner and other
> systems check on the appropriate table and default to the global GUC
> if they're not set.
>



--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: per table random-page-cost?

From

Robert Haas

Date:

22 October 2009, 12:33:58

On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
<cedric.villemain@dalibo.com> wrote:
> Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
>> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
>> > Currently random_page_cost is a GUC. I propose that this could be set
>> > per-table.
>>
>> Or per-tablespace.
>>
>> Yes, I think there are a class of GUCs which describe the physical
>> attributes of the storage system which should be per-table or
>> per-tablespace. random_page_cost, sequential_page_cost,
>> effective_io_concurrency come to mind.
>
> and, perhaps effective_cache_size.
>
> You can have situation where you don't want some tables go to OS memory (you
> can disabled that at filesystem level, ... l'd like to be able to do that at
> postgres level but it is another point)
>
> So you put those tables in a separate tablespace, and tell postgresql that the
> effective_cache_size is 0 (for this tablespace), up to postgres to do the right
> thing with that ;)

Why would you ever want to set effective_cache_size to 0?

...Robert

Re: per table random-page-cost?

From

Greg Stark

Date:

22 October 2009, 15:24:21

On Thu, Oct 22, 2009 at 8:16 AM, Cédric Villemain
<cedric.villemain@dalibo.com> wrote:
> You can have situation where you don't want some tables go to OS memory

I don't think this is a configuration we want to cater for. The
sysadmin shouldn't be required to understand the i/o pattern of
postgres. He or she cannot know whether the database will want to
access the same blocks twice for internal algorithms that isn't
visible from the user point of view.

The scenarios where you might want to do this would be if you know
there are tables which are accessed very randomly with no locality and
very low cache hit rates. I think the direction we want to head is
towards making sure the cache manager is automatically resistant to
such data.

There is another use case which perhaps needs to be addressed: if the
user has some queries which are very latency sensitive and others
which are not latency sensitive. In that case it might be very
important to keep the pages of data used by the high priority queries
in the cache. That's something we should have a high level abstract
interface for, not depend on low level system features.

--
greg

Re: per table random-page-cost?

From

Josh Berkus

Date:

22 October 2009, 15:30:09

All,

Wouldn't per *tablespace* costs make more sense?

--Josh

Re: per table random-page-cost?

From

"Kevin Grittner"

Date:

22 October 2009, 15:41:42

Greg Stark <gsstark@mit.edu> wrote:
> There is another use case which perhaps needs to be addressed: if
> the user has some queries which are very latency sensitive and
> others which are not latency sensitive.
Yes.  Some products allow you to create a named cache and bind
particular objects to it.  This can be used both to keep a large
object with a low cache hit rate from pushing other things out of the
cache or to create a pseudo "memory resident" set of objects by
binding them to a cache which is sized a little bigger than those
objects.  I don't know if you have any other suggestions for this
problem, but the named cache idea didn't go over well last time it was
suggested.
In all fairness, PostgreSQL does a good enough job in general that I
haven't missed this feature nearly as much as I thought I would; and
its absence means one less thing to worry about keeping properly
tuned.
-Kevin

Re: per table random-page-cost?

From

Robert Haas

Date:

22 October 2009, 15:49:00

On Thu, Oct 22, 2009 at 2:28 PM, Josh Berkus <josh@agliodbs.com> wrote:
> All,
>
> Wouldn't per *tablespace* costs make more sense?
>
> --Josh

Yes, we already had several votes in favor of that approach.  See upthread.

...Robert

Re: per table random-page-cost?

From

Greg Stark

Date:

22 October 2009, 17:25:06

Well I think we need sone way to accomplish the same high level goal  
of guaranteeing response times for latency-critical queries.

However my point is that cache policy is an internal implementation  
detail we don't want to expose in a user interface.

-- 
Greg

On 2009-10-22, at 11:41 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov > wrote:

> Greg Stark <gsstark@mit.edu> wrote:
>
>> There is another use case which perhaps needs to be addressed: if
>> the user has some queries which are very latency sensitive and
>> others which are not latency sensitive.
>
> Yes.  Some products allow you to create a named cache and bind
> particular objects to it.  This can be used both to keep a large
> object with a low cache hit rate from pushing other things out of the
> cache or to create a pseudo "memory resident" set of objects by
> binding them to a cache which is sized a little bigger than those
> objects.  I don't know if you have any other suggestions for this
> problem, but the named cache idea didn't go over well last time it was
> suggested.
>
> In all fairness, PostgreSQL does a good enough job in general that I
> haven't missed this feature nearly as much as I thought I would; and
> its absence means one less thing to worry about keeping properly
> tuned.
>
> -Kevin

Re: per table random-page-cost?

From

"Joshua D. Drake"

Date:

22 October 2009, 20:08:27

On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
> On Thu, Oct 22, 2009 at 11:16 AM, CÃ©dric Villemain
> <cedric.villemain@dalibo.com> wrote:
> > Le lundi 19 octobre 2009 23:27:20, Greg Stark a Ã©crit :
> >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
> >> > Currently random_page_cost is a GUC. I propose that this could be set
> >> > per-table.
> >>
> >> Or per-tablespace.
> >>
> >> Yes, I think there are a class of GUCs which describe the physical
> >> attributes of the storage system which should be per-table or
> >> per-tablespace. random_page_cost, sequential_page_cost,
> >> effective_io_concurrency come to mind.
> >
> > and, perhaps effective_cache_size.
> >
> > You can have situation where you don't want some tables go to OS memory (you
> > can disabled that at filesystem level, ... l'd like to be able to do that at
> > postgres level but it is another point)
> >
> > So you put those tables in a separate tablespace, and tell postgresql that the
> > effective_cache_size is 0 (for this tablespace), up to postgres to do the right
> > thing with that ;)
>
> Why would you ever want to set effective_cache_size to 0?

I think this is a misunderstanding of how effective_cache_size works. I
can't think of any reason to do that. I could see a reason to tell the
OS to not throw a relation into cache but that is a different thing.

Joshua D. Drake


>
> ...Robert
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

Re: per table random-page-cost?

From

Alvaro Herrera

Date:

22 October 2009, 22:38:45

Greg Stark escribió:

> There is another use case which perhaps needs to be addressed: if the
> user has some queries which are very latency sensitive and others
> which are not latency sensitive. In that case it might be very
> important to keep the pages of data used by the high priority queries
> in the cache. That's something we should have a high level abstract
> interface for, not depend on low level system features.

Yeah.

I wonder if the right thing for this is to mark the objects (pages), or
the queries, as needing special attention.  If you mark the queries,
then perhaps they could behave slightly differently like adding +2 or so
to buffer usage count instead of +1, so that they take longer than a
normal buffer in getting evicted.  This way you don't force the admin to
figure out what's the right size ratio for different named caches, etc.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: per table random-page-cost?

From

"Joshua D. Drake"

Date:

23 October 2009, 03:08:15

On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
> On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
> <cedric.villemain@dalibo.com> wrote:
> > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote:
> >> > Currently random_page_cost is a GUC. I propose that this could be set
> >> > per-table.
> >>
> >> Or per-tablespace.
> >>
> >> Yes, I think there are a class of GUCs which describe the physical
> >> attributes of the storage system which should be per-table or
> >> per-tablespace. random_page_cost, sequential_page_cost,
> >> effective_io_concurrency come to mind.
> >
> > and, perhaps effective_cache_size.
> >
> > You can have situation where you don't want some tables go to OS memory (you
> > can disabled that at filesystem level, ... l'd like to be able to do that at
> > postgres level but it is another point)
> >
> > So you put those tables in a separate tablespace, and tell postgresql that the
> > effective_cache_size is 0 (for this tablespace), up to postgres to do the right
> > thing with that ;)
> 
> Why would you ever want to set effective_cache_size to 0?

I think this is a misunderstanding of how effective_cache_size works. I
can't think of any reason to do that. I could see a reason to tell the
OS to not throw a relation into cache but that is a different thing.

Joshua D. Drake


> 
> ...Robert
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

Re: per table random-page-cost?

From

Cédric Villemain

Date:

23 October 2009, 06:23:53

Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit :
> On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
> > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
> >
> > <cedric.villemain@dalibo.com> wrote:
> > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com>
wrote:
> > >> > Currently random_page_cost is a GUC. I propose that this could be
> > >> > set per-table.
> > >>
> > >> Or per-tablespace.
> > >>
> > >> Yes, I think there are a class of GUCs which describe the physical
> > >> attributes of the storage system which should be per-table or
> > >> per-tablespace. random_page_cost, sequential_page_cost,
> > >> effective_io_concurrency come to mind.
> > >
> > > and, perhaps effective_cache_size.
> > >
> > > You can have situation where you don't want some tables go to OS memory
> > > (you can disabled that at filesystem level, ... l'd like to be able to
> > > do that at postgres level but it is another point)
> > >
> > > So you put those tables in a separate tablespace, and tell postgresql
> > > that the effective_cache_size is 0 (for this tablespace), up to
> > > postgres to do the right thing with that ;)
> >
> > Why would you ever want to set effective_cache_size to 0?
>
> I think this is a misunderstanding of how effective_cache_size works. I
> can't think of any reason to do that. I could see a reason to tell the
> OS to not throw a relation into cache but that is a different thing.

Well the effective_cache_size in this context is OS cache memory (0 in my case)
+ estimation of shared_buffer.. ah so DBA should estimate the amount in the
shared_buffer only, ok.

So consider effective_cache_size = 0 + what pg_buffer_cache will tell.

My case is a table containing 29 GB of bytea in a database of 52 GB. Every row
on the 29GB table is grab only few times. And it will just renew OS cache
memory  every time (the server have only 8GB of ram).
So when I remove this table (not the index) from the OS cache memory,  I keep
more interesting blocks in the OS cache memory.

And disk + raid are quick enought to bypass the OS cache memory for this
tablespace.


Are things a bit clearer and usage not so silly ?



>
> Joshua D. Drake
>
> > ...Robert
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: per table random-page-cost?

From

Robert Haas

Date:

23 October 2009, 09:23:24

On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain
<cedric.villemain@dalibo.com> wrote:
> Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit :
>> On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
>> > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
>> >
>> > <cedric.villemain@dalibo.com> wrote:
>> > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
>> > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com>
> wrote:
>> > >> > Currently random_page_cost is a GUC. I propose that this could be
>> > >> > set per-table.
>> > >>
>> > >> Or per-tablespace.
>> > >>
>> > >> Yes, I think there are a class of GUCs which describe the physical
>> > >> attributes of the storage system which should be per-table or
>> > >> per-tablespace. random_page_cost, sequential_page_cost,
>> > >> effective_io_concurrency come to mind.
>> > >
>> > > and, perhaps effective_cache_size.
>> > >
>> > > You can have situation where you don't want some tables go to OS memory
>> > > (you can disabled that at filesystem level, ... l'd like to be able to
>> > > do that at postgres level but it is another point)
>> > >
>> > > So you put those tables in a separate tablespace, and tell postgresql
>> > > that the effective_cache_size is 0 (for this tablespace), up to
>> > > postgres to do the right thing with that ;)
>> >
>> > Why would you ever want to set effective_cache_size to 0?
>>
>> I think this is a misunderstanding of how effective_cache_size works. I
>> can't think of any reason to do that. I could see a reason to tell the
>> OS to not throw a relation into cache but that is a different thing.
>
> Well the effective_cache_size in this context is OS cache memory (0 in my case)
> + estimation of shared_buffer.. ah so DBA should estimate the amount in the
> shared_buffer only, ok.
>
> So consider effective_cache_size = 0 + what pg_buffer_cache will tell.
>
> My case is a table containing 29 GB of bytea in a database of 52 GB. Every row
> on the 29GB table is grab only few times. And it will just renew OS cache
> memory  every time (the server have only 8GB of ram).
> So when I remove this table (not the index) from the OS cache memory,  I keep
> more interesting blocks in the OS cache memory.
>
> And disk + raid are quick enought to bypass the OS cache memory for this
> tablespace.
>
>
> Are things a bit clearer and usage not so silly ?

Well, I think you're vastly overestimating the power of
effective_cache_size.  effective_cache_size changes the planner's
estimation of how likely a repeated partial index scan is to find the
same block in cache.  So it only affects
nested-loop-with-inner-indexscan plans, and if effective_cache_size is
set to a value larger than the size of the index (or maybe the
relation, I'm too lazy to go reread the code right now), one value is
as good as another.  For a typical user, I think you could set
effective_cache_size to, say, a terabyte, and it wouldn't make a bit
of difference.  Heck, why not 2TB.

As far as I can see, the only possible value of setting this knob to a
value other than positive-infinity is that if you have a huge dataset
that's not close to fitting in memory, this might cause the planner to
pick a merge join over a nested loop with inner indexscan, which might
be better if it makes the I/O sequential rather than random.  Anyone
think I'm a pessimist?

...Robert

Re: per table random-page-cost?

From

Cédric Villemain

Date:

23 October 2009, 10:23:14

Le vendredi 23 octobre 2009 14:23:09, Robert Haas a écrit :
> On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain
>
> <cedric.villemain@dalibo.com> wrote:
> > Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit :
> >> On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote:
> >> > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
> >> >
> >> > <cedric.villemain@dalibo.com> wrote:
> >> > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> >> > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank
> >> > >> <marcin.mank@gmail.com>
> >
> > wrote:
> >> > >> > Currently random_page_cost is a GUC. I propose that this could be
> >> > >> > set per-table.
> >> > >>
> >> > >> Or per-tablespace.
> >> > >>
> >> > >> Yes, I think there are a class of GUCs which describe the physical
> >> > >> attributes of the storage system which should be per-table or
> >> > >> per-tablespace. random_page_cost, sequential_page_cost,
> >> > >> effective_io_concurrency come to mind.
> >> > >
> >> > > and, perhaps effective_cache_size.
> >> > >
> >> > > You can have situation where you don't want some tables go to OS
> >> > > memory (you can disabled that at filesystem level, ... l'd like to
> >> > > be able to do that at postgres level but it is another point)
> >> > >
> >> > > So you put those tables in a separate tablespace, and tell
> >> > > postgresql that the effective_cache_size is 0 (for this tablespace),
> >> > > up to postgres to do the right thing with that ;)
> >> >
> >> > Why would you ever want to set effective_cache_size to 0?
> >>
> >> I think this is a misunderstanding of how effective_cache_size works. I
> >> can't think of any reason to do that. I could see a reason to tell the
> >> OS to not throw a relation into cache but that is a different thing.
> >
> > Well the effective_cache_size in this context is OS cache memory (0 in my
> > case) + estimation of shared_buffer.. ah so DBA should estimate the
> > amount in the shared_buffer only, ok.
> >
> > So consider effective_cache_size = 0 + what pg_buffer_cache will tell.
> >
> > My case is a table containing 29 GB of bytea in a database of 52 GB.
> > Every row on the 29GB table is grab only few times. And it will just
> > renew OS cache memory  every time (the server have only 8GB of ram).
> > So when I remove this table (not the index) from the OS cache memory,  I
> > keep more interesting blocks in the OS cache memory.
> >
> > And disk + raid are quick enought to bypass the OS cache memory for this
> > tablespace.
> >
> >
> > Are things a bit clearer and usage not so silly ?
>
> Well, I think you're vastly overestimating the power of
> effective_cache_size.  effective_cache_size changes the planner's
> estimation of how likely a repeated partial index scan is to find the
> same block in cache.  So it only affects
> nested-loop-with-inner-indexscan plans, and if effective_cache_size is
> set to a value larger than the size of the index (or maybe the
> relation, I'm too lazy to go reread the code right now), one value is
> as good as another.  For a typical user, I think you could set
> effective_cache_size to, say, a terabyte, and it wouldn't make a bit
> of difference.  Heck, why not 2TB.

Ok. I don't care too much on this parameter so.
As we were talking about some parameters that can be tablespace specific, I
thought this one can have different values too.

>
> As far as I can see, the only possible value of setting this knob to a
> value other than positive-infinity is that if you have a huge dataset
> that's not close to fitting in memory, this might cause the planner to
> pick a merge join over a nested loop with inner indexscan, which might
> be better if it makes the I/O sequential rather than random.  Anyone
> think I'm a pessimist?
>
> ...Robert
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: per table random-page-cost?

From

Josh Berkus

Date:

23 October 2009, 20:04:37

Cedric,

> ase is a table containing 29 GB of bytea in a database of 52 GB. Every row 
> on the 29GB table is grab only few times. And it will just renew OS cache 
> memory  every time (the server have only 8GB of ram).
> So when I remove this table (not the index) from the OS cache memory,  I keep 
> more interesting blocks in the OS cache memory.

effective_cache_size doesn't control what gets cached, it just tells the
planner about it.

Now, if we had an OS which could be convinced to handle caching
differently for different physical devices, then I could see wanting
this setting to be per-tablespace.  For example, it would make a lot of
sense not to FS-cache any data which is on a ramdisk or superfast SSD
array.  The same with archive data which you expected to be slow and
infrequently accessed on a NAS device.  If your OS can do that, while
caching data from other sources, then it would make sense.

However, I don't know any current OS which allows for this.  Does anyone
else?

--Josh Berkus

Re: per table random-page-cost?

From

Stefan Moeding

Date:

24 October 2009, 03:13:55

Hi!

Josh Berkus writes:

> Now, if we had an OS which could be convinced to handle caching
> differently for different physical devices, then I could see wanting
> this setting to be per-tablespace.  For example, it would make a lot of
> sense not to FS-cache any data which is on a ramdisk or superfast SSD
> array.  The same with archive data which you expected to be slow and
> infrequently accessed on a NAS device.  If your OS can do that, while
> caching data from other sources, then it would make sense.
>
> However, I don't know any current OS which allows for this.  Does anyone
> else?

Isn't bypassing the buffer cache exactly what direct I/O is about?
Solaris UFS has a "forcedirectio" mount option, AIX JFS2 calls it "dio"
and Veritas VxFS uses the "convosync=direct" option to disable caching
the content of the filesystem.

-- 
Stefan

Re: per table random-page-cost?

From

Cédric Villemain

Date:

26 October 2009, 16:22:51

Le samedi 24 octobre 2009 01:04:19, Josh Berkus a écrit :
> Cedric,
>
> > ase is a table containing 29 GB of bytea in a database of 52 GB. Every
> > row on the 29GB table is grab only few times. And it will just renew OS
> > cache memory  every time (the server have only 8GB of ram).
> > So when I remove this table (not the index) from the OS cache memory,  I
> > keep more interesting blocks in the OS cache memory.
>
> effective_cache_size doesn't control what gets cached, it just tells the
> planner about it.
>
> Now, if we had an OS which could be convinced to handle caching
> differently for different physical devices, then I could see wanting
> this setting to be per-tablespace.  For example, it would make a lot of
> sense not to FS-cache any data which is on a ramdisk or superfast SSD
> array.  The same with archive data which you expected to be slow and
> infrequently accessed on a NAS device.  If your OS can do that, while
> caching data from other sources, then it would make sense.
>
> However, I don't know any current OS which allows for this.  Does anyone
> else?

Isn't it what "fadvise -dontneed" let you do ?

Josh, I talk about effective_cache_size per tablespace  *exactly* for the
reason you explain.

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

Re: per table random-page-cost?

From

Bruce Momjian

Date:

09 November 2009, 19:56:48

Robert Haas wrote:
> On Thu, Oct 22, 2009 at 2:28 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > All,
> >
> > Wouldn't per *tablespace* costs make more sense?
> >
> > --Josh
> 
> Yes, we already had several votes in favor of that approach.  See upthread.

Added to TODO:
Allow per-tablespace random_page_cost    * http://archives.postgresql.org/pgsql-hackers/2009-10/msg01128.php 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +