Thread: per table random-page-cost?
Currently random_page_cost is a GUC. I propose that this could be set per-table. I think this is a good idea for widely-wanted planner hints. This way You can say "I do NOT want this table to be index-scanned, because I know it is not cached" by setting it`s random_page_cost to a large value (an obviously You can do the other way around, when setting the random_page_cost to 1 You say "I don`t care how You fetch the pages, they are all in cache") The value for the per-table setting could be inferred from pg_stat(io)?.*tables . We could have a tool to suggest appropriate values. We could call it something like cached_percentage (and have the cost of a random tuple fetch be inferred from the global random_page_cost, seq_tuple_cost and the per-table cached_percentage). Then we could set the global random_page_cost to a sane value like 200. Now one can wonder why the planner works while having such blantantly unrealistic values for random_page_cost :) What do You think? Greetings Marcin Mank
On Mon, Oct 19, 2009 at 5:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > Currently random_page_cost is a GUC. I propose that this could be set per-table. > > I think this is a good idea for widely-wanted planner hints. This way > You can say "I do NOT want this table to be index-scanned, because I > know it is not cached" by setting it`s random_page_cost to a large > value (an obviously You can do the other way around, when setting the > random_page_cost to 1 You say "I don`t care how You fetch the pages, > they are all in cache") > > The value for the per-table setting could be inferred from > pg_stat(io)?.*tables . We could have a tool to suggest appropriate > values. > > We could call it something like cached_percentage (and have the cost > of a random tuple fetch be inferred from the global random_page_cost, > seq_tuple_cost and the per-table cached_percentage). Then we could set > the global random_page_cost to a sane value like 200. Now one can > wonder why the planner works while having such blantantly unrealistic > values for random_page_cost :) > > What do You think? I've been thinking about this a bit, too. I've been wondering if it might make sense to have a "random_page_cost" and "seq_page_cost" setting for each TABLESPACE, to compensate for the fact that different media might be faster or slower, and a percent-cached setting for each table over top of that. ...Robert
On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > Currently random_page_cost is a GUC. I propose that this could be set per-table. Or per-tablespace. Yes, I think there are a class of GUCs which describe the physical attributes of the storage system which should be per-table or per-tablespace. random_page_cost, sequential_page_cost, effective_io_concurrency come to mind. While this isn't a simple flag to change it does seem like a bit of a SMOP. The GUC infrastructure stores these values in global variables which the planner and other systems consult directly. They would instead have to be made storage parameters which the planner and other systems check on the appropriate table and default to the global GUC if they're not set. -- greg
Robert Haas <robertmhaas@gmail.com> wrote: > I've been wondering if it might make sense to have a > "random_page_cost" and "seq_page_cost" setting for each TABLESPACE, > to compensate for the fact that different media might be faster or > slower, and a percent-cached setting for each table over top of > that. [after recovering from the initial cringing reaction...] How about calculating an effective percentage based on other information. effective_cache_size, along with relation and database size, come to mind. How about the particular index being considered for the plan? Of course, you might have to be careful about working in TOAST table size for a particular query, based on the columns retrieved. I have no doubt that there would be some major performance regressions in the first cut of anything like this, for at least *some* queries. The toughest part of this might be to get adequate testing to tune it for a wide enough variety of real-life situations. -Kevin
> I've been thinking about this a bit, too. I've been wondering if it > might make sense to have a "random_page_cost" and "seq_page_cost" > setting for each TABLESPACE, to compensate for the fact that different > media might be faster or slower, and a percent-cached setting for each > table over top of that. > I thought about making it per-table, but realistically I think most people don`t use tablespaces now. I would not want to be telling people "to be able to hint the planner to (not) index-scan the table, You must move it to a separate tablespace". A global default, a per-tablespace default overriding it, and a per-table value overriding them both seems like over-engineering to me. Greetings Marcin
> I thought about making it per-table***space***, but realistically I
marcin mank <marcin.mank@gmail.com> writes: > I thought about making it per-table, but realistically I think most > people don`t use tablespaces now. I would not want to be telling > people "to be able to hint the planner to (not) index-scan the table, > You must move it to a separate tablespace". Per-table is not physically sensible. Per-tablespace has some rationale to it. regards, tom lane
marcin mank wrote: >> I've been thinking about this a bit, too. I've been wondering if it >> might make sense to have a "random_page_cost" and "seq_page_cost" >> setting for each TABLESPACE, to compensate for the fact that different >> media might be faster or slower, and a percent-cached setting for each >> table over top of that. >> >> > > I thought about making it per-table, but realistically I think most > people don`t use tablespaces now. I would not want to be telling > people "to be able to hint the planner to (not) index-scan the table, > You must move it to a separate tablespace". > This is just plain wrong, in my experience. *Every* large installation I deal with uses tablespaces. This proposal is just "hints by the back door", ISTM. As Tom says, there is a justification for having it on tablespaces but not on individual tables. If you want to argue for full blown planner hints, that's a whole other story. Have you read the previous debates on the subject? cheers
> This proposal is just "hints by the back door", ISTM. As Tom says, there is > a justification for having it on tablespaces but not on individual tables. If the parameter is defined as "the chance that a page is in cache" there is very real physical meaning to it. And this is per-table, not per-tablespace. A "users" table will likely be fetched from cache all the time, while a "billing_records" table will be fetched mostly from disk. Greetings Marcin
marcin mank <marcin.mank@gmail.com> writes: >> This proposal is just "hints by the back door", ISTM. As Tom says, there is >> a justification for having it on tablespaces but not on individual tables. > If the parameter is defined as "the chance that a page is in cache" > there is very real physical meaning to it. We have no such parameter... regards, tom lane
On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > marcin mank <marcin.mank@gmail.com> writes: >>> This proposal is just "hints by the back door", ISTM. As Tom says, there is >>> a justification for having it on tablespaces but not on individual tables. > >> If the parameter is defined as "the chance that a page is in cache" >> there is very real physical meaning to it. > > We have no such parameter... And we want our parameters to be things the DBA has a chance of being able to estimate. How would you come up with sensible figures for this hypothetical parameter? -- greg
On Mon, Oct 19, 2009 at 2:54 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > How about calculating an effective percentage based on other > information. effective_cache_size, along with relation and database > size, come to mind. I think previous proposals for this have fallen down when you actually try to work out a formula for this. The problem is that you could have a table which is much smaller than effective_cache_size but is never in cache due to it being one of many such tables. I think it would still be good to have some naive kind of heuristic here as long as it's fairly predictable for DBAs. But the long-term strategy here I think is to actually have some way to measure the real cache hit rate on a per-table basis. Whether it's by timing i/o operations, programmatic access to dtrace, or some other kind of os interface, if we could know the real cache hit rate it would be very helpful. Perhaps we could extrapolate from the shared buffer cache percentage. If there's a moderately high percentage in shared buffers then it seems like a reasonable supposition to assume the filesystem cache would have a similar distribution. -- greg
On Mon, Oct 19, 2009 at 5:54 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> I've been wondering if it might make sense to have a >> "random_page_cost" and "seq_page_cost" setting for each TABLESPACE, >> to compensate for the fact that different media might be faster or >> slower, and a percent-cached setting for each table over top of >> that. > > [after recovering from the initial cringing reaction...] > > How about calculating an effective percentage based on other > information. effective_cache_size, along with relation and database > size, come to mind. How about the particular index being considered > for the plan? Of course, you might have to be careful about working > in TOAST table size for a particular query, based on the columns > retrieved. I think that a per-tablespace page cost should be set by the DBA, same as we do with global page-costs now. OTOH, I think that a per-relation percent-in-cache should be automatically calculated by the database (somehow) and the DBA should have an option to override in case the database does the wrong thing. I gave a lightning talk on this topic at PGcon. > I have no doubt that there would be some major performance regressions > in the first cut of anything like this, for at least *some* queries. > The toughest part of this might be to get adequate testing to tune it > for a wide enough variety of real-life situations. Agreed. ...Robert
On Mon, 2009-10-19 at 16:39 -0700, Greg Stark wrote: > But the long-term strategy here I think is to actually have some way > to measure the real cache hit rate on a per-table basis. Whether it's > by timing i/o operations, programmatic access to dtrace, or some other > kind of os interface, if we could know the real cache hit rate it > would be very helpful. Maybe it would be simpler to just get the high-order bit: is this table likely to be completely in cache (shared buffers or os buffer cache), or not? The lower cache hit ratios are uninteresting: the performance difference between 1% and 50% is only a factor of two. The higher cache hit ratios that are lower than "almost 100%" seem unlikely: what kind of scenario would involve a stable 90% cache hit ratio for a table? Regards,Jeff Davis
On Mon, Oct 19, 2009 at 4:29 PM, Greg Stark <gsstark@mit.edu> wrote: > On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> marcin mank <marcin.mank@gmail.com> writes: >>>> This proposal is just "hints by the back door", ISTM. As Tom says, there is >>>> a justification for having it on tablespaces but not on individual tables. >> >>> If the parameter is defined as "the chance that a page is in cache" >>> there is very real physical meaning to it. >> >> We have no such parameter... > > > And we want our parameters to be things the DBA has a chance of being > able to estimate. Do the current parameters meet that standard? When setting seq_page_cost now, don't people have a lot of "Well, we're about this likely to find it in the cache anyway" built into their settings? Jeff
Jeff Davis <pgsql@j-davis.com> wrote: > what kind of scenario > would involve a stable 90% cache hit ratio for a table? I'd bet accounts receivable applications often hit that. (Most payments on recent billings; a sprinkling on older ones.) I'm sure there are others. -Kevin
On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote: > I'd bet accounts receivable applications often hit that. > (Most payments on recent billings; a sprinkling on older ones.) > I'm sure there are others. You worded the examples in terms of writes (I think), and we're talking about read caching, so I still don't entirely understand. Also, the example sounds like you'd like to optimize across queries. There's no mechanism for the planner to remember some query executed a while ago, and match it up to some new query that it's trying to plan. Maybe there should be, but that's an entirely different feature. I'm not clear on the scenario that we're trying to improve. Regards,Jeff Davis
On Mon, 19 Oct 2009, Jeff Davis wrote: > On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote: >> I'd bet accounts receivable applications often hit that. >> (Most payments on recent billings; a sprinkling on older ones.) >> I'm sure there are others. > > You worded the examples in terms of writes (I think), and we're talking > about read caching, so I still don't entirely understand. No, that part was fair. The unfortunate reality of accounts receivable is that reports run to list people who owe one money happen much more often than posting payments into the system does. > Also, the example sounds like you'd like to optimize across queries. > There's no mechanism for the planner to remember some query executed a > while ago, and match it up to some new query that it's trying to plan. Some of the use-cases here involve situations where you know most of a relation is likely to be in cache just because there's not much going on that might evict it. In any case, something that attempts to model some average percentage you can expect a relation to be in cache is in effect serving as a memory of past queries. > I'm not clear on the scenario that we're trying to improve. Duh, that would be the situation where someone wants optimizer hints but can't call them that because then the idea would be reflexively rejected! Looks like I should dust off the much more complicated proposal for tracking and using in-cache hit percentages I keep not having time to finish writing up. Allowing a user-set value for that is a lot more reasonable if the system computes a reasonable one itself under normal circumstances. That's what I think people really want, even if it's not what they're asking for. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Tue, Oct 20, 2009 at 1:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If the parameter is defined as "the chance that a page is in cache" >> there is very real physical meaning to it. > > We have no such parameter... What a simple person like me would think would work is: - call the parameter "cached_probability". - Invent a way to store it (I`d actually try to do it the exact same way recent "alter table set statistics distinct" does it) a) less radical idea: replace all usage of random_page_cost with seq_page_cost * cached_probability + random_page_cost * (1-cached_probability) b) more radical idea: b1) invent a new GUC: cached_page_cost b2) replace all usage of seq_page_cost with cached_page_cost * cached_probability + seq_page_cost * (1-cached_probability) b3) replace all usage of random_page_cost with cached_page_cost * cached_probability + random_page_cost * (1-cached_probability) > How would you come up with sensible figures for this hypothetical parameter? select schemaname,relname,heap_blks_hit / cast(heap_blks_read+heap_blks_hit+1 as float)from pg_statio_all_tables Would be a nice starting point. Greetings Marcin
Greg Smith <gsmith@gregsmith.com> wrote: > The unfortunate reality of accounts receivable is that reports run > to list people who owe one money happen much more often than posting > payments into the system does. How often do you have to print a list of past due accounts? I've generally seen that done weekly or monthly, in the same places that there are many people standing full time in payment windows just to collect money from those lining up to pay. When they bring in a document which identifies the receivable (like, in our case, a traffic or parking ticket), there's no need to look at any older data in the database. Heck, even our case management applications likely follow the 90% to 95% cache hit pattern in counties which aren't fully cached, since there's a lot more activity on court cases filed this year and last year than on cases filed 30 years ago. -Kevin
On Tue, 20 Oct 2009, Kevin Grittner wrote: > How often do you have to print a list of past due accounts? I've > generally seen that done weekly or monthly, in the same places that > there are many people standing full time in payment windows just to > collect money from those lining up to pay. This is only because your A/R collections staff includes people with guns. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Le mardi 20 octobre 2009 06:30:26, Greg Smith a écrit : > On Mon, 19 Oct 2009, Jeff Davis wrote: > > On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote: > >> I'd bet accounts receivable applications often hit that. > >> (Most payments on recent billings; a sprinkling on older ones.) > >> I'm sure there are others. > > > > You worded the examples in terms of writes (I think), and we're talking > > about read caching, so I still don't entirely understand. > > No, that part was fair. The unfortunate reality of accounts receivable is > that reports run to list people who owe one money happen much more often > than posting payments into the system does. > > > Also, the example sounds like you'd like to optimize across queries. > > There's no mechanism for the planner to remember some query executed a > > while ago, and match it up to some new query that it's trying to plan. > > Some of the use-cases here involve situations where you know most of a > relation is likely to be in cache just because there's not much going on > that might evict it. In any case, something that attempts to model some > average percentage you can expect a relation to be in cache is in effect > serving as a memory of past queries. > > > I'm not clear on the scenario that we're trying to improve. > > Duh, that would be the situation where someone wants optimizer hints but > can't call them that because then the idea would be reflexively rejected! > > Looks like I should dust off the much more complicated proposal for > tracking and using in-cache hit percentages I keep not having time to > finish writing up. Allowing a user-set value for that is a lot more > reasonable if the system computes a reasonable one itself under normal > circumstances. That's what I think people really want, even if it's not > what they're asking for. Have you already some work in a git or somewhere ? > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
Le lundi 19 octobre 2009 23:14:40, Robert Haas a écrit : > On Mon, Oct 19, 2009 at 5:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > > Currently random_page_cost is a GUC. I propose that this could be set > > per-table. > > > > I think this is a good idea for widely-wanted planner hints. This way > > You can say "I do NOT want this table to be index-scanned, because I > > know it is not cached" by setting it`s random_page_cost to a large > > value (an obviously You can do the other way around, when setting the > > random_page_cost to 1 You say "I don`t care how You fetch the pages, > > they are all in cache") > > > > The value for the per-table setting could be inferred from > > pg_stat(io)?.*tables . We could have a tool to suggest appropriate > > values. > > > > We could call it something like cached_percentage (and have the cost > > of a random tuple fetch be inferred from the global random_page_cost, > > seq_tuple_cost and the per-table cached_percentage). Then we could set > > the global random_page_cost to a sane value like 200. Now one can > > wonder why the planner works while having such blantantly unrealistic > > values for random_page_cost :) > > > > What do You think? > > I've been thinking about this a bit, too. I've been wondering if it > might make sense to have a "random_page_cost" and "seq_page_cost" > setting for each TABLESPACE, to compensate for the fact that different > media might be faster or slower, and a percent-cached setting for each > table over top of that. At least settings by TABLESPACE should exists. I totaly agree with that. > > ...Robert > -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : > On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > > Currently random_page_cost is a GUC. I propose that this could be set > > per-table. > > Or per-tablespace. > > Yes, I think there are a class of GUCs which describe the physical > attributes of the storage system which should be per-table or > per-tablespace. random_page_cost, sequential_page_cost, > effective_io_concurrency come to mind. and, perhaps effective_cache_size. You can have situation where you don't want some tables go to OS memory (you can disabled that at filesystem level, ... l'd like to be able to do that at postgres level but it is another point) So you put those tables in a separate tablespace, and tell postgresql that the effective_cache_size is 0 (for this tablespace), up to postgres to do the right thing with that ;) > > While this isn't a simple flag to change it does seem like a bit of a > SMOP. The GUC infrastructure stores these values in global variables > which the planner and other systems consult directly. They would > instead have to be made storage parameters which the planner and other > systems check on the appropriate table and default to the global GUC > if they're not set. > -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain <cedric.villemain@dalibo.com> wrote: > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote: >> > Currently random_page_cost is a GUC. I propose that this could be set >> > per-table. >> >> Or per-tablespace. >> >> Yes, I think there are a class of GUCs which describe the physical >> attributes of the storage system which should be per-table or >> per-tablespace. random_page_cost, sequential_page_cost, >> effective_io_concurrency come to mind. > > and, perhaps effective_cache_size. > > You can have situation where you don't want some tables go to OS memory (you > can disabled that at filesystem level, ... l'd like to be able to do that at > postgres level but it is another point) > > So you put those tables in a separate tablespace, and tell postgresql that the > effective_cache_size is 0 (for this tablespace), up to postgres to do the right > thing with that ;) Why would you ever want to set effective_cache_size to 0? ...Robert
On Thu, Oct 22, 2009 at 8:16 AM, Cédric Villemain <cedric.villemain@dalibo.com> wrote: > You can have situation where you don't want some tables go to OS memory I don't think this is a configuration we want to cater for. The sysadmin shouldn't be required to understand the i/o pattern of postgres. He or she cannot know whether the database will want to access the same blocks twice for internal algorithms that isn't visible from the user point of view. The scenarios where you might want to do this would be if you know there are tables which are accessed very randomly with no locality and very low cache hit rates. I think the direction we want to head is towards making sure the cache manager is automatically resistant to such data. There is another use case which perhaps needs to be addressed: if the user has some queries which are very latency sensitive and others which are not latency sensitive. In that case it might be very important to keep the pages of data used by the high priority queries in the cache. That's something we should have a high level abstract interface for, not depend on low level system features. -- greg
All, Wouldn't per *tablespace* costs make more sense? --Josh
Greg Stark <gsstark@mit.edu> wrote: > There is another use case which perhaps needs to be addressed: if > the user has some queries which are very latency sensitive and > others which are not latency sensitive. Yes. Some products allow you to create a named cache and bind particular objects to it. This can be used both to keep a large object with a low cache hit rate from pushing other things out of the cache or to create a pseudo "memory resident" set of objects by binding them to a cache which is sized a little bigger than those objects. I don't know if you have any other suggestions for this problem, but the named cache idea didn't go over well last time it was suggested. In all fairness, PostgreSQL does a good enough job in general that I haven't missed this feature nearly as much as I thought I would; and its absence means one less thing to worry about keeping properly tuned. -Kevin
On Thu, Oct 22, 2009 at 2:28 PM, Josh Berkus <josh@agliodbs.com> wrote: > All, > > Wouldn't per *tablespace* costs make more sense? > > --Josh Yes, we already had several votes in favor of that approach. See upthread. ...Robert
Well I think we need sone way to accomplish the same high level goal of guaranteeing response times for latency-critical queries. However my point is that cache policy is an internal implementation detail we don't want to expose in a user interface. -- Greg On 2009-10-22, at 11:41 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov > wrote: > Greg Stark <gsstark@mit.edu> wrote: > >> There is another use case which perhaps needs to be addressed: if >> the user has some queries which are very latency sensitive and >> others which are not latency sensitive. > > Yes. Some products allow you to create a named cache and bind > particular objects to it. This can be used both to keep a large > object with a low cache hit rate from pushing other things out of the > cache or to create a pseudo "memory resident" set of objects by > binding them to a cache which is sized a little bigger than those > objects. I don't know if you have any other suggestions for this > problem, but the named cache idea didn't go over well last time it was > suggested. > > In all fairness, PostgreSQL does a good enough job in general that I > haven't missed this feature nearly as much as I thought I would; and > its absence means one less thing to worry about keeping properly > tuned. > > -Kevin
On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain > <cedric.villemain@dalibo.com> wrote: > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > >> > Currently random_page_cost is a GUC. I propose that this could be set > >> > per-table. > >> > >> Or per-tablespace. > >> > >> Yes, I think there are a class of GUCs which describe the physical > >> attributes of the storage system which should be per-table or > >> per-tablespace. random_page_cost, sequential_page_cost, > >> effective_io_concurrency come to mind. > > > > and, perhaps effective_cache_size. > > > > You can have situation where you don't want some tables go to OS memory (you > > can disabled that at filesystem level, ... l'd like to be able to do that at > > postgres level but it is another point) > > > > So you put those tables in a separate tablespace, and tell postgresql that the > > effective_cache_size is 0 (for this tablespace), up to postgres to do the right > > thing with that ;) > > Why would you ever want to set effective_cache_size to 0? I think this is a misunderstanding of how effective_cache_size works. I can't think of any reason to do that. I could see a reason to tell the OS to not throw a relation into cache but that is a different thing. Joshua D. Drake > > ...Robert > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
Greg Stark escribió: > There is another use case which perhaps needs to be addressed: if the > user has some queries which are very latency sensitive and others > which are not latency sensitive. In that case it might be very > important to keep the pages of data used by the high priority queries > in the cache. That's something we should have a high level abstract > interface for, not depend on low level system features. Yeah. I wonder if the right thing for this is to mark the objects (pages), or the queries, as needing special attention. If you mark the queries, then perhaps they could behave slightly differently like adding +2 or so to buffer usage count instead of +1, so that they take longer than a normal buffer in getting evicted. This way you don't force the admin to figure out what's the right size ratio for different named caches, etc. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain > <cedric.villemain@dalibo.com> wrote: > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > >> > Currently random_page_cost is a GUC. I propose that this could be set > >> > per-table. > >> > >> Or per-tablespace. > >> > >> Yes, I think there are a class of GUCs which describe the physical > >> attributes of the storage system which should be per-table or > >> per-tablespace. random_page_cost, sequential_page_cost, > >> effective_io_concurrency come to mind. > > > > and, perhaps effective_cache_size. > > > > You can have situation where you don't want some tables go to OS memory (you > > can disabled that at filesystem level, ... l'd like to be able to do that at > > postgres level but it is another point) > > > > So you put those tables in a separate tablespace, and tell postgresql that the > > effective_cache_size is 0 (for this tablespace), up to postgres to do the right > > thing with that ;) > > Why would you ever want to set effective_cache_size to 0? I think this is a misunderstanding of how effective_cache_size works. I can't think of any reason to do that. I could see a reason to tell the OS to not throw a relation into cache but that is a different thing. Joshua D. Drake > > ...Robert > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit : > On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: > > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain > > > > <cedric.villemain@dalibo.com> wrote: > > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : > > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> wrote: > > >> > Currently random_page_cost is a GUC. I propose that this could be > > >> > set per-table. > > >> > > >> Or per-tablespace. > > >> > > >> Yes, I think there are a class of GUCs which describe the physical > > >> attributes of the storage system which should be per-table or > > >> per-tablespace. random_page_cost, sequential_page_cost, > > >> effective_io_concurrency come to mind. > > > > > > and, perhaps effective_cache_size. > > > > > > You can have situation where you don't want some tables go to OS memory > > > (you can disabled that at filesystem level, ... l'd like to be able to > > > do that at postgres level but it is another point) > > > > > > So you put those tables in a separate tablespace, and tell postgresql > > > that the effective_cache_size is 0 (for this tablespace), up to > > > postgres to do the right thing with that ;) > > > > Why would you ever want to set effective_cache_size to 0? > > I think this is a misunderstanding of how effective_cache_size works. I > can't think of any reason to do that. I could see a reason to tell the > OS to not throw a relation into cache but that is a different thing. Well the effective_cache_size in this context is OS cache memory (0 in my case) + estimation of shared_buffer.. ah so DBA should estimate the amount in the shared_buffer only, ok. So consider effective_cache_size = 0 + what pg_buffer_cache will tell. My case is a table containing 29 GB of bytea in a database of 52 GB. Every row on the 29GB table is grab only few times. And it will just renew OS cache memory every time (the server have only 8GB of ram). So when I remove this table (not the index) from the OS cache memory, I keep more interesting blocks in the OS cache memory. And disk + raid are quick enought to bypass the OS cache memory for this tablespace. Are things a bit clearer and usage not so silly ? > > Joshua D. Drake > > > ...Robert > -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain <cedric.villemain@dalibo.com> wrote: > Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit : >> On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: >> > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain >> > >> > <cedric.villemain@dalibo.com> wrote: >> > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : >> > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank@gmail.com> > wrote: >> > >> > Currently random_page_cost is a GUC. I propose that this could be >> > >> > set per-table. >> > >> >> > >> Or per-tablespace. >> > >> >> > >> Yes, I think there are a class of GUCs which describe the physical >> > >> attributes of the storage system which should be per-table or >> > >> per-tablespace. random_page_cost, sequential_page_cost, >> > >> effective_io_concurrency come to mind. >> > > >> > > and, perhaps effective_cache_size. >> > > >> > > You can have situation where you don't want some tables go to OS memory >> > > (you can disabled that at filesystem level, ... l'd like to be able to >> > > do that at postgres level but it is another point) >> > > >> > > So you put those tables in a separate tablespace, and tell postgresql >> > > that the effective_cache_size is 0 (for this tablespace), up to >> > > postgres to do the right thing with that ;) >> > >> > Why would you ever want to set effective_cache_size to 0? >> >> I think this is a misunderstanding of how effective_cache_size works. I >> can't think of any reason to do that. I could see a reason to tell the >> OS to not throw a relation into cache but that is a different thing. > > Well the effective_cache_size in this context is OS cache memory (0 in my case) > + estimation of shared_buffer.. ah so DBA should estimate the amount in the > shared_buffer only, ok. > > So consider effective_cache_size = 0 + what pg_buffer_cache will tell. > > My case is a table containing 29 GB of bytea in a database of 52 GB. Every row > on the 29GB table is grab only few times. And it will just renew OS cache > memory every time (the server have only 8GB of ram). > So when I remove this table (not the index) from the OS cache memory, I keep > more interesting blocks in the OS cache memory. > > And disk + raid are quick enought to bypass the OS cache memory for this > tablespace. > > > Are things a bit clearer and usage not so silly ? Well, I think you're vastly overestimating the power of effective_cache_size. effective_cache_size changes the planner's estimation of how likely a repeated partial index scan is to find the same block in cache. So it only affects nested-loop-with-inner-indexscan plans, and if effective_cache_size is set to a value larger than the size of the index (or maybe the relation, I'm too lazy to go reread the code right now), one value is as good as another. For a typical user, I think you could set effective_cache_size to, say, a terabyte, and it wouldn't make a bit of difference. Heck, why not 2TB. As far as I can see, the only possible value of setting this knob to a value other than positive-infinity is that if you have a huge dataset that's not close to fitting in memory, this might cause the planner to pick a merge join over a nested loop with inner indexscan, which might be better if it makes the I/O sequential rather than random. Anyone think I'm a pessimist? ...Robert
Le vendredi 23 octobre 2009 14:23:09, Robert Haas a écrit : > On Fri, Oct 23, 2009 at 5:23 AM, Cédric Villemain > > <cedric.villemain@dalibo.com> wrote: > > Le vendredi 23 octobre 2009 01:08:15, Joshua D. Drake a écrit : > >> On Thu, 2009-10-22 at 11:33 -0400, Robert Haas wrote: > >> > On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain > >> > > >> > <cedric.villemain@dalibo.com> wrote: > >> > > Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit : > >> > >> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank > >> > >> <marcin.mank@gmail.com> > > > > wrote: > >> > >> > Currently random_page_cost is a GUC. I propose that this could be > >> > >> > set per-table. > >> > >> > >> > >> Or per-tablespace. > >> > >> > >> > >> Yes, I think there are a class of GUCs which describe the physical > >> > >> attributes of the storage system which should be per-table or > >> > >> per-tablespace. random_page_cost, sequential_page_cost, > >> > >> effective_io_concurrency come to mind. > >> > > > >> > > and, perhaps effective_cache_size. > >> > > > >> > > You can have situation where you don't want some tables go to OS > >> > > memory (you can disabled that at filesystem level, ... l'd like to > >> > > be able to do that at postgres level but it is another point) > >> > > > >> > > So you put those tables in a separate tablespace, and tell > >> > > postgresql that the effective_cache_size is 0 (for this tablespace), > >> > > up to postgres to do the right thing with that ;) > >> > > >> > Why would you ever want to set effective_cache_size to 0? > >> > >> I think this is a misunderstanding of how effective_cache_size works. I > >> can't think of any reason to do that. I could see a reason to tell the > >> OS to not throw a relation into cache but that is a different thing. > > > > Well the effective_cache_size in this context is OS cache memory (0 in my > > case) + estimation of shared_buffer.. ah so DBA should estimate the > > amount in the shared_buffer only, ok. > > > > So consider effective_cache_size = 0 + what pg_buffer_cache will tell. > > > > My case is a table containing 29 GB of bytea in a database of 52 GB. > > Every row on the 29GB table is grab only few times. And it will just > > renew OS cache memory every time (the server have only 8GB of ram). > > So when I remove this table (not the index) from the OS cache memory, I > > keep more interesting blocks in the OS cache memory. > > > > And disk + raid are quick enought to bypass the OS cache memory for this > > tablespace. > > > > > > Are things a bit clearer and usage not so silly ? > > Well, I think you're vastly overestimating the power of > effective_cache_size. effective_cache_size changes the planner's > estimation of how likely a repeated partial index scan is to find the > same block in cache. So it only affects > nested-loop-with-inner-indexscan plans, and if effective_cache_size is > set to a value larger than the size of the index (or maybe the > relation, I'm too lazy to go reread the code right now), one value is > as good as another. For a typical user, I think you could set > effective_cache_size to, say, a terabyte, and it wouldn't make a bit > of difference. Heck, why not 2TB. Ok. I don't care too much on this parameter so. As we were talking about some parameters that can be tablespace specific, I thought this one can have different values too. > > As far as I can see, the only possible value of setting this knob to a > value other than positive-infinity is that if you have a huge dataset > that's not close to fitting in memory, this might cause the planner to > pick a merge join over a nested loop with inner indexscan, which might > be better if it makes the I/O sequential rather than random. Anyone > think I'm a pessimist? > > ...Robert > -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
Cedric, > ase is a table containing 29 GB of bytea in a database of 52 GB. Every row > on the 29GB table is grab only few times. And it will just renew OS cache > memory every time (the server have only 8GB of ram). > So when I remove this table (not the index) from the OS cache memory, I keep > more interesting blocks in the OS cache memory. effective_cache_size doesn't control what gets cached, it just tells the planner about it. Now, if we had an OS which could be convinced to handle caching differently for different physical devices, then I could see wanting this setting to be per-tablespace. For example, it would make a lot of sense not to FS-cache any data which is on a ramdisk or superfast SSD array. The same with archive data which you expected to be slow and infrequently accessed on a NAS device. If your OS can do that, while caching data from other sources, then it would make sense. However, I don't know any current OS which allows for this. Does anyone else? --Josh Berkus
Hi! Josh Berkus writes: > Now, if we had an OS which could be convinced to handle caching > differently for different physical devices, then I could see wanting > this setting to be per-tablespace. For example, it would make a lot of > sense not to FS-cache any data which is on a ramdisk or superfast SSD > array. The same with archive data which you expected to be slow and > infrequently accessed on a NAS device. If your OS can do that, while > caching data from other sources, then it would make sense. > > However, I don't know any current OS which allows for this. Does anyone > else? Isn't bypassing the buffer cache exactly what direct I/O is about? Solaris UFS has a "forcedirectio" mount option, AIX JFS2 calls it "dio" and Veritas VxFS uses the "convosync=direct" option to disable caching the content of the filesystem. -- Stefan
Le samedi 24 octobre 2009 01:04:19, Josh Berkus a écrit : > Cedric, > > > ase is a table containing 29 GB of bytea in a database of 52 GB. Every > > row on the 29GB table is grab only few times. And it will just renew OS > > cache memory every time (the server have only 8GB of ram). > > So when I remove this table (not the index) from the OS cache memory, I > > keep more interesting blocks in the OS cache memory. > > effective_cache_size doesn't control what gets cached, it just tells the > planner about it. > > Now, if we had an OS which could be convinced to handle caching > differently for different physical devices, then I could see wanting > this setting to be per-tablespace. For example, it would make a lot of > sense not to FS-cache any data which is on a ramdisk or superfast SSD > array. The same with archive data which you expected to be slow and > infrequently accessed on a NAS device. If your OS can do that, while > caching data from other sources, then it would make sense. > > However, I don't know any current OS which allows for this. Does anyone > else? Isn't it what "fadvise -dontneed" let you do ? Josh, I talk about effective_cache_size per tablespace *exactly* for the reason you explain. -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
Robert Haas wrote: > On Thu, Oct 22, 2009 at 2:28 PM, Josh Berkus <josh@agliodbs.com> wrote: > > All, > > > > Wouldn't per *tablespace* costs make more sense? > > > > --Josh > > Yes, we already had several votes in favor of that approach. See upthread. Added to TODO: Allow per-tablespace random_page_cost * http://archives.postgresql.org/pgsql-hackers/2009-10/msg01128.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +