Thread: Stats for inheritance trees
Following up on the discussion here http://archives.postgresql.org/message-id/4B3875C6020000250002D97D@gw.wicourts.gov I'd like to propose making the following changes that would allow saner planning for queries involving inheritance: 1. Currently the primary key of pg_statistic is (starelid, staattnum) indicating the table and column the stats entry is for. I propose adding a bool stainherit to the pkey. "false" means the stats entry is for just that table column, ie, the traditional interpretation. "true" means the stats entry covers that column and all its inheritance children. Such entries could be used directly by the planner in cases where it currently punts and delivers a default estimate. 2. When ANALYZE is invoked on a table that has inheritance children, it will perform its normal duties for just that table (creating or updating entries with stainherit = false) and then perform a second scan that covers that table and all its children. This will be used to create or update entries with stainherit = true. It might be possible to avoid scanning the parent table itself twice, but I won't contort the code too much to avoid that, since in most practical applications the parent is empty or small anyway. 3. Ideally autovacuum would know enough to perform ANALYZEs on inheritance parents after enough churn has occurred in their child table(s). I am not entirely clear about a good way to do that. We could have it just directly force an ANALYZE on parent(s) of any table it has chosen to ANALYZE, but that might be overkill --- in particular leading to excess ANALYZEs when several children receive a lot of updates. Even without a really smart solution to #3, this would be a big step forward for inheritance queries. BTW, while at it I'm inclined to add a non-unique index on pg_inherits.inhparent, so that find_inheritance_children won't have to seqscan pg_inherits anymore. It's surprising people haven't complained about that before. The code says * XXX might be a good idea to create an index on pg_inherits' inhparent * field, so that we can use an indexscan insteadof sequential scan here. * However, in typical databases pg_inherits won't have enough entries to * justifyan indexscan... but we've long since learned that people stress databases in odd ways. Comments? regards, tom lane
On Mon, 2009-12-28 at 17:41 -0500, Tom Lane wrote: > Following up on the discussion here > http://archives.postgresql.org/message-id/4B3875C6020000250002D97D@gw.wicourts.gov > I'd like to propose making the following changes that would allow saner > planning for queries involving inheritance: Sounds good. > 1. Currently the primary key of pg_statistic is (starelid, staattnum) > indicating the table and column the stats entry is for. I propose > adding a bool stainherit to the pkey. "false" means the stats entry > is for just that table column, ie, the traditional interpretation. > "true" means the stats entry covers that column and all its inheritance > children. Such entries could be used directly by the planner in cases > where it currently punts and delivers a default estimate. > > 2. When ANALYZE is invoked on a table that has inheritance children, > it will perform its normal duties for just that table (creating or > updating entries with stainherit = false) and then perform a second > scan that covers that table and all its children. This will be used > to create or update entries with stainherit = true. It might be > possible to avoid scanning the parent table itself twice, but I won't > contort the code too much to avoid that, since in most practical > applications the parent is empty or small anyway. Will there be logic to decide how stainherit should be set? Many columns in an inherited set have similar values in different children, e.g. OrderValue, OrderStatus but many columns also have very different values in different children. e.g. OrderId, OrderPlacementDate, OrderFulfillmentDate > 3. Ideally autovacuum would know enough to perform ANALYZEs on > inheritance parents after enough churn has occurred in their child > table(s). I am not entirely clear about a good way to do that. > We could have it just directly force an ANALYZE on parent(s) of any > table it has chosen to ANALYZE, but that might be overkill --- in > particular leading to excess ANALYZEs when several children receive > a lot of updates. > > Even without a really smart solution to #3, this would be a big step > forward for inheritance queries. > > BTW, while at it I'm inclined to add a non-unique index on > pg_inherits.inhparent, so that find_inheritance_children won't have to > seqscan pg_inherits anymore. It's surprising people haven't complained > about that before. They have, we just haven't done anything about it. > The code says > > * XXX might be a good idea to create an index on pg_inherits' inhparent > * field, so that we can use an indexscan instead of sequential scan here. > * However, in typical databases pg_inherits won't have enough entries to > * justify an indexscan... > > but we've long since learned that people stress databases in odd ways. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2009-12-28 at 17:41 -0500, Tom Lane wrote: >> 2. When ANALYZE is invoked on a table that has inheritance children, >> it will perform its normal duties for just that table (creating or >> updating entries with stainherit = false) and then perform a second >> scan that covers that table and all its children. This will be used >> to create or update entries with stainherit = true. It might be >> possible to avoid scanning the parent table itself twice, but I won't >> contort the code too much to avoid that, since in most practical >> applications the parent is empty or small anyway. > Will there be logic to decide how stainherit should be set? Many columns > in an inherited set have similar values in different children, e.g. > OrderValue, OrderStatus but many columns also have very different values > in different children. e.g. OrderId, OrderPlacementDate, > OrderFulfillmentDate I don't see that that's relevant here. We're trying to estimate the overall result of a join against an inheritance tree. The fact that some or even all of the matching rows might come from specific child tables is not relevant at this level of detail. When it is relevant, we'll be looking at the child-table stats (and constraints), not at the parent's. regards, tom lane
I wrote: > Following up on the discussion here > http://archives.postgresql.org/message-id/4B3875C6020000250002D97D@gw.wicourts.gov > I'd like to propose making the following changes that would allow saner > planning for queries involving inheritance: I've committed the easy aspects of this (still need to work on autovacuum). I ran into one unexpected loose end: what shall we do with ALTER TABLE SET STATISTICS DISTINCT? As committed, any manually set value of attdistinct will be applied to both the relation-local and inherited stats for the column. From a logical standpoint this is clearly the Wrong Thing, because it's quite possible that different values would be needed. On the other hand, I'm not sure how much it matters in practice. In the typical cases where you need to force a value, you're probably going to force a fractional value, and those would be likely be OK for both. The only "real" fix I could see would be to create an additional pg_attribute column and a separate command to set it. But it really doesn't seem worth that much trouble. Or we could think about some heuristics, like applying the manual value to inherited stats only if it's fractional (negative). But I'm afraid any rule like that would get in the way as often as it would help. Thoughts? regards, tom lane
I wrote: > 3. Ideally autovacuum would know enough to perform ANALYZEs on > inheritance parents after enough churn has occurred in their child > table(s). I am not entirely clear about a good way to do that. > We could have it just directly force an ANALYZE on parent(s) of any > table it has chosen to ANALYZE, but that might be overkill --- in > particular leading to excess ANALYZEs when several children receive > a lot of updates. I've been looking at this for a bit, and I think the only reasonable way to do it is to make the pgstats mechanism track the need for ANALYZE on a parent table. A hack like I suggested above would make the autovacuum.c code even messier than it already is, and it seems inevitable that we'd get duplicate analyze actions from different autovac workers. Now, I don't really want to add Yet Another per-table counter to pgstats for this. The stats are big enough already. However, the existing mechanism for triggering ANALYZE looks pretty bogus to me as I look at it now: there's a last_anl_tuples value with a very hazy definition, and what's worse it's being computed off numbers that may be only crude estimates from ANALYZE. What I propose doing is to replace that counter with a "changes_since_analyze" counter, which can be managed very simply: * when a tabstat message comes in, increment changes_since_analyze by the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted; * when an analyze report message comes in, reset changes_since_analyze to zero. This gives us a number that is actually pretty credible and can still be compared to the analyze threshold the same as before. I think the current definition dates from before we had accurate insert/delete/update tracking, but now that we have that, we should use it. Now, having done that, what I would suggest doing is having autovacuum propagate the changes_since_analyze count that it sees up to the parent table(s) whenever it does an autoanalyze. (This would require adding a new message type that allows reporting a changes_since_analyze increment independently of inserted/updated/deleted, or else adding changes_since_analyze as an independent field in regular tabstat messages.) In most cases, with the parent table probably smaller than the child tables, this would immediately make the parent a candidate for analyze. That might be overkill, in which case we could try multiplying the count by some sort of derating factor, but getting hold of a good derating factor might be more expensive than it's worth --- I think you'd have to look at all the other children of the same parent to see how big the current one is compared to the rest. Thoughts? regards, tom lane
On Tue, Dec 29, 2009 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Following up on the discussion here >> http://archives.postgresql.org/message-id/4B3875C6020000250002D97D@gw.wicourts.gov >> I'd like to propose making the following changes that would allow saner >> planning for queries involving inheritance: > > I've committed the easy aspects of this (still need to work on autovacuum). > I ran into one unexpected loose end: what shall we do with ALTER TABLE > SET STATISTICS DISTINCT? > > As committed, any manually set value of attdistinct will be applied to > both the relation-local and inherited stats for the column. From a > logical standpoint this is clearly the Wrong Thing, because it's quite > possible that different values would be needed. On the other hand, > I'm not sure how much it matters in practice. In the typical cases > where you need to force a value, you're probably going to force a > fractional value, and those would be likely be OK for both. > > The only "real" fix I could see would be to create an additional > pg_attribute column and a separate command to set it. But it really > doesn't seem worth that much trouble. > > Or we could think about some heuristics, like applying the manual > value to inherited stats only if it's fractional (negative). > But I'm afraid any rule like that would get in the way as often > as it would help. Having separate properties for regular attdistinct and inherited attdistinct seems fairly worthwhile, but I share your lack of enthusiasm for solving the problem by adding more columns to pg_attribute. One possibility would be to create a new system catalog to hold "non-critical" information on pg_attribute properties - that is, anything that isn't likely to be needed to plan and execute ordinary queries. attstattarget and attdistinct would certainly qualify, and there may be others. This would avoid bloating pg_attribute with things that frequently won't be needed, and as a side benefit non-bootstrap catalogs are less of a PITA to modify. A second possibility would be to generalize the concept of reloptions to apply to columns. Per previous discussion, my per-tablespace random_page_cost/seq_page_cost patch will generalize reloptions to apply to tablespaces as well, under the name spcoptions. We could add attoptions as well, reusing most of the same code and potentially allowing us to support future options with less recoding. I rather like this option; it seems like a good fit for any sort of knob that we want to make available, but don't expect to be used frequently. (I think, however, that if we're going to do this, I should go ahead and commit my tablespace patch first, to avoid needless rebase hell.) These two options aren't completely mutually exclusive; we could decide that it makes sense to do both, though off the top of my head it doesn't seem worth the trouble. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Having separate properties for regular attdistinct and inherited > attdistinct seems fairly worthwhile, but I share your lack of > enthusiasm for solving the problem by adding more columns to > pg_attribute. One possibility would be to create a new system catalog > to hold "non-critical" information on pg_attribute properties - that > is, anything that isn't likely to be needed to plan and execute > ordinary queries. attstattarget and attdistinct would certainly > qualify, and there may be others. Hmm... offhand it seems like all of these columns would be potential candidates for pushing out to a secondary catalog: attstattarget | integer | not nullattdistinct | real | not nullattndims | integer | not nullattislocal | boolean | not nullattinhcount | integer | not nullattacl | aclitem[] | But even with another ndistinct column, that barely amounts to 2 dozen bytes saved. Maybe it's worth the trouble in order to shave space from tuple descriptors, but it seems pretty marginal. (Actually, it looks to me like we could lose attndims altogether with little pain ...) > A second possibility would be to generalize the concept of reloptions > to apply to columns. Hm ... the base assumption in the reloptions code is that the user is free to twiddle all the values. attdistinct and attstattarget might fit that bill but none of the other stuff would, so we couldn't go very far in terms of pushing things out of the core attributes. Still there's some attraction in not having to alter pg_attribute the next time we think of something like these. regards, tom lane
On Tue, Dec 29, 2009 at 8:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Having separate properties for regular attdistinct and inherited >> attdistinct seems fairly worthwhile, but I share your lack of >> enthusiasm for solving the problem by adding more columns to >> pg_attribute. One possibility would be to create a new system catalog >> to hold "non-critical" information on pg_attribute properties - that >> is, anything that isn't likely to be needed to plan and execute >> ordinary queries. attstattarget and attdistinct would certainly >> qualify, and there may be others. > > Hmm... offhand it seems like all of these columns would be potential > candidates for pushing out to a secondary catalog: > > attstattarget | integer | not null > attdistinct | real | not null > attndims | integer | not null > attislocal | boolean | not null > attinhcount | integer | not null > attacl | aclitem[] | > > But even with another ndistinct column, that barely amounts to 2 dozen > bytes saved. Maybe it's worth the trouble in order to shave space from > tuple descriptors, but it seems pretty marginal. Maybe. I would think that attacl[] would need to be consulted frequently enough that you'd want to have it around, but maybe not. > (Actually, it looks to me like we could lose attndims altogether with > little pain ...) > >> A second possibility would be to generalize the concept of reloptions >> to apply to columns. > > Hm ... the base assumption in the reloptions code is that the user is > free to twiddle all the values. attdistinct and attstattarget might fit > that bill but none of the other stuff would, so we couldn't go very far > in terms of pushing things out of the core attributes. Still there's > some attraction in not having to alter pg_attribute the next time we > think of something like these. Yep. It would also lower the barrier to future innovations of that type, which would be a good thing, IMO. Unfortunately we'd likely need to continue to support the existing syntax at least for attstattarget, which is kind of a bummer, but seems managable. I think we could throw over the syntax for ALTER TABLE ... ADD STATISTICS DISTINCT since it is an 8.5-ism. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Yep. It would also lower the barrier to future innovations of that > type, which would be a good thing, IMO. Unfortunately we'd likely > need to continue to support the existing syntax at least for > attstattarget, which is kind of a bummer, but seems managable. I > think we could throw over the syntax for ALTER TABLE ... ADD > STATISTICS DISTINCT since it is an 8.5-ism. Yeah --- if we think we might want to do this, now is the time, before we're stuck with supporting that syntax. (I was thinking earlier today that attdistinct was already in 8.4, but it's not.) regards, tom lane
On Tue, Dec 29, 2009 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Yep. It would also lower the barrier to future innovations of that >> type, which would be a good thing, IMO. Unfortunately we'd likely >> need to continue to support the existing syntax at least for >> attstattarget, which is kind of a bummer, but seems managable. I >> think we could throw over the syntax for ALTER TABLE ... ADD >> STATISTICS DISTINCT since it is an 8.5-ism. > > Yeah --- if we think we might want to do this, now is the time, > before we're stuck with supporting that syntax. (I was thinking > earlier today that attdistinct was already in 8.4, but it's not.) If you'd be willing to look over the latest version of my per-tablespace random_page_cost/seq_page_cost patch, which I posted to -hackers some time in the last few days, I can get that committed and then start working on this, if you'd like. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > If you'd be willing to look over the latest version of my > per-tablespace random_page_cost/seq_page_cost patch, which I posted to > -hackers some time in the last few days, I can get that committed and > then start working on this, if you'd like. I think Alvaro would actually be the right person to review that, since the reloptions code is almost entirely his work. regards, tom lane
Tom Lane escribió: > Robert Haas <robertmhaas@gmail.com> writes: > > If you'd be willing to look over the latest version of my > > per-tablespace random_page_cost/seq_page_cost patch, which I posted to > > -hackers some time in the last few days, I can get that committed and > > then start working on this, if you'd like. > > I think Alvaro would actually be the right person to review that, > since the reloptions code is almost entirely his work. I can't promise anything right now though, as my wife could get with labour very soon ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Dec 30, 2009 at 10:24 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Tom Lane escribió: >> Robert Haas <robertmhaas@gmail.com> writes: >> > If you'd be willing to look over the latest version of my >> > per-tablespace random_page_cost/seq_page_cost patch, which I posted to >> > -hackers some time in the last few days, I can get that committed and >> > then start working on this, if you'd like. >> >> I think Alvaro would actually be the right person to review that, >> since the reloptions code is almost entirely his work. > > I can't promise anything right now though, as my wife could get with > labour very soon ... In terms of reasons for not being able to guarantee anything, I'd have to say that's one of the best I've heard. In all honesty, I'm not very worried about the reloptions stuff proper. I have copied the existing coding pattern so closely that it's a little hard to imagine that I've broken anything too badly. My main concerns are: 1. Am I leaking memory anywhere? and 2. Can anything bad happen as a result of invalidation events and/or concurrent updates to pg_tablespace? If anyone feels qualified to check my work on those two points, that would be great. In reality, even if I've done something relatively stupid, it isn't likely to have much practical impact since pg_tablespace updates figure to be infrequent and many people won't use this feature at all. But I'd still rather not do something stupid. Thanks, ...Robert
On Tue, Dec 29, 2009 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Yep. It would also lower the barrier to future innovations of that >> type, which would be a good thing, IMO. Unfortunately we'd likely >> need to continue to support the existing syntax at least for >> attstattarget, which is kind of a bummer, but seems managable. I >> think we could throw over the syntax for ALTER TABLE ... ADD >> STATISTICS DISTINCT since it is an 8.5-ism. > > Yeah --- if we think we might want to do this, now is the time, > before we're stuck with supporting that syntax. (I was thinking > earlier today that attdistinct was already in 8.4, but it's not.) I am just starting to look at this now. One of the questions I have is what we should call the options. We could call the regular options something like "ndistinct" or "distinct", but I'm not too sure what to call the "for-inheritance-trees" version of that. I suppose we could just use the familiar "inh" prefix and call it "inhndistinct", but that looks suspiciously like gobbledygook. Someone's understanding of just what that is supposed to mean might be a little... indistinct (ba dum). Another option would be to call it "inherits_ndistinct", or something like that, which seems a little more readable, but not great: I don't think there's going to be any getting around the need to RTFM before setting these parameters. In terms of syntax, I'm thinking something like: ALTER TABLE name ALTER COLUMN column SET ( column_parameter = value [, ...] ) I am also very tempted before beginning this work to rename reloptions.c to options.c or genoptions.c or somesuch. If we're going to use it for relations, attributes, and tablespaces, chances are good we're going to use it for other things, too. The FDW stuff is already borrowing from it as well. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Another option would be to call it "inherits_ndistinct", or something > like that, which seems a little more readable, but not great: I don't > think there's going to be any getting around the need to RTFM before > setting these parameters. Well, the previously agreed-to syntax was SET STATISTICS DISTINCT. I don't see a problem with using "distinct" and "inherited_distinct" or something like that. "ndistinct" is probably not a good name to expose to non-programmers. The must-RTFM argument is fairly weak, though, since these are knobs that only advanced users would twiddle anyway. regards, tom lane
On Tue, Jan 5, 2010 at 1:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Another option would be to call it "inherits_ndistinct", or something >> like that, which seems a little more readable, but not great: I don't >> think there's going to be any getting around the need to RTFM before >> setting these parameters. > > Well, the previously agreed-to syntax was SET STATISTICS DISTINCT. > I don't see a problem with using "distinct" and "inherited_distinct" > or something like that. "ndistinct" is probably not a good name to > expose to non-programmers. I like ndistinct because it makes the whole thing sound related to statistics, which it is. But I'll do it your way unless there are other votes. ...Robert
On Tue, Jan 5, 2010 at 1:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 5, 2010 at 1:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Another option would be to call it "inherits_ndistinct", or something >>> like that, which seems a little more readable, but not great: I don't >>> think there's going to be any getting around the need to RTFM before >>> setting these parameters. >> >> Well, the previously agreed-to syntax was SET STATISTICS DISTINCT. >> I don't see a problem with using "distinct" and "inherited_distinct" >> or something like that. "ndistinct" is probably not a good name to >> expose to non-programmers. > > I like ndistinct because it makes the whole thing sound related to > statistics, which it is. But I'll do it your way unless there are > other votes. It's probably also worth noting that the reason I used DISTINCT originally is because it's already a keyword. That's a moot point here. But as I say I'll stick with your names unless there are contravening votes. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > It's probably also worth noting that the reason I used DISTINCT > originally is because it's already a keyword. True. It occurs to me that the pg_stats view already exposes "n_distinct" as a column name. I wouldn't object to using "n_distinct" and "n_distinct_inherited" or some such. regards, tom lane
On Tue, Jan 5, 2010 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> It's probably also worth noting that the reason I used DISTINCT >> originally is because it's already a keyword. > > True. > > It occurs to me that the pg_stats view already exposes "n_distinct" > as a column name. I wouldn't object to using "n_distinct" and > "n_distinct_inherited" or some such. OK. So we have: 1. distinct and inherited_distinct, or 2. n_distinct and n_distinct_inherited Any other votes/thoughts/opinions/color commentary? ...Robert
On Tue, Dec 29, 2009 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Yep. It would also lower the barrier to future innovations of that >> type, which would be a good thing, IMO. Unfortunately we'd likely >> need to continue to support the existing syntax at least for >> attstattarget, which is kind of a bummer, but seems managable. I >> think we could throw over the syntax for ALTER TABLE ... ADD >> STATISTICS DISTINCT since it is an 8.5-ism. > > Yeah --- if we think we might want to do this, now is the time, > before we're stuck with supporting that syntax. (I was thinking > earlier today that attdistinct was already in 8.4, but it's not.) [ Hack, hack, hack. ] I'm not quite sure what the correct approach is to making attoptions available to examine_attribute(), which can't get at them in any very obvious way because the relation descriptor it gets passed as an argument only includes the fixed-size portion of each pg_attribute tuple. The obvious approaches are: 1. Extract attrelid and attnum from the tuple and issue a syscache lookup to get the full tuple. 2. Make RelationBuildTupleDesc() parse the attoptions and store them into a new RelationData member. I'm leaning toward the latter method. The upside of that method is that making attoptions part of the Relation descriptor means that they'll be conveniently available to all clients of the relcache. The downside is that right now, only ANALYZE actually needs to care at the moment, and yet we're incurring the overhead across the board. My thought is that that's OK, but I wonder if anyone thinks it might cause a measurable performance hit? WIP patch attached. Right now this just adds the ability to set and store attoptions, but doesn't actually do anything useful with them. No doc changes, no pg_dump support, no psql support, doesn't remove the SET STATISTICS DISTINCT code. All these warts will be fixed in a future version once I decide what to do about the problem mentioned above. ...Robert
Attachment
On Dec 29, 2009, at 6:29 PM, Tom Lane wrote: > * when a tabstat message comes in, increment changes_since_analyze by > the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted; > > * when an analyze report message comes in, reset changes_since_analyze > to zero. If that's being added, could we extend the concept to also keep a reltuples_delta column (name suggestions welcome!) thatis = reltuples_delta + t_tuples_inserted - t_tuples_deleted, and then set reltuples_delta back to 0 after an analyze(or anything else that would reset reltuples)? That means you could use reltuples + reltuples_delta as a fairly accuraterow count. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
decibel <decibel@decibel.org> writes: > On Dec 29, 2009, at 6:29 PM, Tom Lane wrote: >> * when a tabstat message comes in, increment changes_since_analyze by >> the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted; >> >> * when an analyze report message comes in, reset changes_since_analyze >> to zero. > If that's being added, could we extend the concept to also keep a reltuples_delta column (name suggestions welcome!) thatis = reltuples_delta + t_tuples_inserted - t_tuples_deleted, and then set reltuples_delta back to 0 after an analyze(or anything else that would reset reltuples)? That means you could use reltuples + reltuples_delta as a fairly accuraterow count. We already have a fairly accurate row count. regards, tom lane
On Tue, Jan 5, 2010 at 4:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 29, 2009 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Yep. It would also lower the barrier to future innovations of that >>> type, which would be a good thing, IMO. Unfortunately we'd likely >>> need to continue to support the existing syntax at least for >>> attstattarget, which is kind of a bummer, but seems managable. I >>> think we could throw over the syntax for ALTER TABLE ... ADD >>> STATISTICS DISTINCT since it is an 8.5-ism. >> >> Yeah --- if we think we might want to do this, now is the time, >> before we're stuck with supporting that syntax. (I was thinking >> earlier today that attdistinct was already in 8.4, but it's not.) > > [ Hack, hack, hack. ] > > I'm not quite sure what the correct approach is to making attoptions > available to examine_attribute(), which can't get at them in any very > obvious way because the relation descriptor it gets passed as an > argument only includes the fixed-size portion of each pg_attribute > tuple. The obvious approaches are: > > 1. Extract attrelid and attnum from the tuple and issue a syscache > lookup to get the full tuple. > 2. Make RelationBuildTupleDesc() parse the attoptions and store them > into a new RelationData member. > > I'm leaning toward the latter method. Upon further study, this does not look easy at all. There are complex assumptions embedded in this code about what aspects of an index can change on the fly and how those changes must be handled. Trying to modify this seems likely to lead to (1) a further increase in code complexity and (2) breakage. I'm thinking maybe the best approach here is to store this information in a separate cache indexed by <attrelid, attnum>. ...Robert