Thread: WIP: cross column correlation ...
hello everbody, we have spent some time in finally attacking cross column correlation. as this is an issue which keeps bugging us for a coupleof applications (some years). this is a WIP patch which can do: special cross column correlation specific syntax: CREATE CROSS COLUMN STATISTICS ON tablename (field, ...); DROP CROSS COLUMN STATISTICS ON tablename (field, ...); we use specific syntax because we simply cannot keep track of all possible correlations in the DB so the admi can take careof things explicitly. some distant day somebody might want to write a mechanism to derive the desired stats automaticallybut this is beyond the scope of our project for now. as far as the patch is concerned: it is patched nicely into clauselist_selectivity(), but has some rough edges, even when a cross-col stat is found, the singlecol selectivities are still counted ( = lovering the selectivity even more), this is a TODO. this patch adds the grammar and the start of planner integration with a static selectivity value for now, the previous discussionabout cross-column statistics can be continued and perhaps comes to fruition soon. how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner can useit directly rather than multiplying all the selectivities. this should make estimates a lot more precise. the current approach can be extended to work with expressions and well as "straight" conditions. goal: to make cross column correlation work for 9.2 ... the purpose of this mail is mostly to get the race for a patch going and to see if the approach as such is reasonable / feasible. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
Attachment
2011/2/22 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: > how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner canuse it directly rather than multiplying all the selectivities. this should make estimates a lot more precise. > the current approach can be extended to work with expressions and well as "straight" conditions. /me prepares to go down in flames. Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); Then, having provided a method for the DBA to extinguish the raging flames of searing agony which are consuming them while a crocodile chews off their leg and their boss asks them why they didn't use Oracle, we can continue bikeshedding about the best way of fixing this problem in a more user-transparent fashion. As to the approach you've proposed here, I'm not sure I understand what this is actually doing. Selectivity estimates aren't made directly for predicates; they're made based on MCV and histogram information for predicates. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > /me prepares to go down in flames. > Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); One of the criteria we've always had for a suitable hint-or-whatever- you-call-it design is that it *not* involve decorating the queries. There are a number of reasons for that, some of the killer ones being (1) People frequently *can't* adjust their queries that way, because they're coming out of some broken query generator or other. (Crappy query generators are of course one of the prime reasons for poor-performing queries in the first place, so you can't write this off as not being a key use case.) (2) Anything we do like that, we'd be locked into supporting forever, even after we think of better solutions. (3) People don't like decorating their queries with nonstandard stuff; it smells of vendor lock-in. Especially if it's actually SQL syntax and not comments. Once you put something into the DML it's just too hard to fix applications to get rid of it (the inverse case of point #1). I haven't looked at Hans' patch in any detail, and don't intend to do so while the CF is still running; but at least he got this point right. regards, tom lane
On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> /me prepares to go down in flames. > >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: > >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > > One of the criteria we've always had for a suitable hint-or-whatever- > you-call-it design is that it *not* involve decorating the queries. > There are a number of reasons for that, some of the killer ones being > > (1) People frequently *can't* adjust their queries that way, because > they're coming out of some broken query generator or other. (Crappy > query generators are of course one of the prime reasons for > poor-performing queries in the first place, so you can't write this off > as not being a key use case.) > > (2) Anything we do like that, we'd be locked into supporting forever, > even after we think of better solutions. > > (3) People don't like decorating their queries with nonstandard stuff; > it smells of vendor lock-in. Especially if it's actually SQL syntax > and not comments. Once you put something into the DML it's just too > hard to fix applications to get rid of it (the inverse case of point > #1). Those are real problems, but I still want it. The last time I hit this problem I spent two days redesigning my schema and adding triggers all over the place to make things work. If I had been dealing with a 30TB database instead of a 300MB database I would have been royally up a creek. To put that another way, it's true that some people can't adjust their queries, but also some people can. It's true that nonstandard stuff sucks, but queries that don't work suck, too. And as for better solutions, how many major release cycles do we expect people to wait for them? Even one major release cycle is an eternity when you're trying to get the application working before your company runs out of money, and this particular problem has had a lot of cycles expended on it without producing anything very tangible (proposed patch, which like you I can't spare a lot of cycles to look at just now, possibly excepted). I agree that if we can get something that actually works that doesn't involve decorating the queries, that is better. But I would surely rather decorate the queries than rewrite the entire application around the problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One of the criteria we've always had for a suitable hint-or-whatever- >> you-call-it design is that it *not* involve decorating the queries. > [ snip ] > To put that another way, it's true that some people can't adjust their > queries, but also some people can. It's true that nonstandard stuff > sucks, but queries that don't work suck, too. And as for better > solutions, how many major release cycles do we expect people to wait > for them? Well, a decorating-the-queries solution that isn't utter crap is not going to be a small amount of work, either. regards, tom lane
> Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > If you're going to go that far, why not just collect statistics on that specific predicate? ie, ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1); Then it won't fall subject to all of the pitfalls that Tom outlines below. Selectivities are easy to estimate if we know the predicate. They only become hard when they have to work for every possible predicate. Best, Nathan
On Feb 23, 2011, at 2:58 AM, Robert Haas wrote: > 2011/2/22 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: >> how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner canuse it directly rather than multiplying all the selectivities. this should make estimates a lot more precise. >> the current approach can be extended to work with expressions and well as "straight" conditions. > > /me prepares to go down in flames. > > Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); i thought there was an agreement that we don't want planner hints? as tom pointed out - many broken queries come out of some query generator where even the design to make the design is brokenby design. personally i like query generators as long as other people use them ... telling people that this is the wrong way to go isactually financing my holiday next week ... ;). in general - hibernate and stuff like that is a no-go. personally i like the type of planner hints oleg and teodor came up with - i think we should do more of those hooks theyare using but hiding it in some syntax is not a good idea. it does not change the query and it still gives a lot of room to toy around. it looks like a compromise. however, oleg's contrib module does not fix the core problem of cross column statistics because a hint is usually staticbut you want flexible selectivity. regards, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
> Those are real problems, but I still want it. The last time I hit > this problem I spent two days redesigning my schema and adding > triggers all over the place to make things work. If I had been > dealing with a 30TB database instead of a 300MB database I would have > been royally up a creek. > > To put that another way, it's true that some people can't adjust their > queries, but also some people can. It's true that nonstandard stuff > sucks, but queries that don't work suck, too. And as for better > solutions, how many major release cycles do we expect people to wait > for them? Even one major release cycle is an eternity when you're > trying to get the application working before your company runs out of > money, and this particular problem has had a lot of cycles expended on > it without producing anything very tangible (proposed patch, which > like you I can't spare a lot of cycles to look at just now, possibly > excepted). cheapest and easiest solution if you run into this: add "fake" functions which the planner cannot estimate properly. use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesign theschema to get around it but it is such an ugly solution that it does not even deserve to be called "ugly" ... however, fast and reliable way to get around it. regards, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley <npboley@gmail.com> wrote: >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: >> >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > > If you're going to go that far, why not just collect statistics on > that specific predicate? > > ie, ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1); > > Then it won't fall subject to all of the pitfalls that Tom outlines below. > > Selectivities are easy to estimate if we know the predicate. They only > become hard when they have to work for every possible predicate. Fair point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2011/2/23 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: >> Those are real problems, but I still want it. The last time I hit >> this problem I spent two days redesigning my schema and adding >> triggers all over the place to make things work. If I had been >> dealing with a 30TB database instead of a 300MB database I would have >> been royally up a creek. >> >> To put that another way, it's true that some people can't adjust their >> queries, but also some people can. It's true that nonstandard stuff >> sucks, but queries that don't work suck, too. And as for better >> solutions, how many major release cycles do we expect people to wait >> for them? Even one major release cycle is an eternity when you're >> trying to get the application working before your company runs out of >> money, and this particular problem has had a lot of cycles expended on >> it without producing anything very tangible (proposed patch, which >> like you I can't spare a lot of cycles to look at just now, possibly >> excepted). > > cheapest and easiest solution if you run into this: add "fake" functions which the planner cannot estimate properly. > use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesign theschema to get around it but it is such an ugly solution that it does not even deserve to be called "ugly" ... > however, fast and reliable way to get around it. We couldn't possibly design a hint mechanism that would be uglier or less future-proof than this workaround (which, by the way, I'll keep in mind for the next time I get bitten by this). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>> >> >> cheapest and easiest solution if you run into this: add "fake" functions which the planner cannot estimate properly. >> use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesignthe schema to get around it but it is such an ugly solution that it does not even deserve to be called "ugly" ... >> however, fast and reliable way to get around it. > > We couldn't possibly design a hint mechanism that would be uglier or > less future-proof than this workaround (which, by the way, I'll keep > in mind for the next time I get bitten by this). > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > i think the main issue is: what we do is ugly because of despair and a lack of alternative ... what you proposed is uglyby design ;). overall: the workaround will win the ugliness contest, however ;). many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
On Feb 23, 2011, at 3:46 PM, Robert Haas wrote: > On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley <npboley@gmail.com> wrote: >>> Personally, I think the first thing we ought to do is add a real, bona >>> fide planner hint to override the selectivity calculation manually, >>> maybe something like this: >>> >>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); >> >> If you're going to go that far, why not just collect statistics on >> that specific predicate? >> >> ie, ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1); >> >> Then it won't fall subject to all of the pitfalls that Tom outlines below. >> >> Selectivities are easy to estimate if we know the predicate. They only >> become hard when they have to work for every possible predicate. > > Fair point. > > -- > Robert Haas basically we got the idea of allowing "expressions" in cross column stuff. i think this can be very useful. it would fixthe problem of a query like that: SELECT * FROM table WHERE cos(field) = some_number; this takes a constant fraction of the table which is usually plain wrong as well (and the error tends to multiply insidethe plan). i am just not sure if i have understood all corner cases of that already. ultimate goal: get it right for join estimates (this is why a syntax extension is definitely needed - you cannot track allof them automatically). many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
2011/2/23 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: > i thought there was an agreement that we don't want planner hints? Well, I want them. I think some other people do, too. Whether those people are more numerous than than the people who don't want them, and how much that matters either way, is another question. I don't want to have to use them very often, but I like to have an out when I get desperate. > as tom pointed out - many broken queries come out of some query generator where even the design to make the design is brokenby design. > personally i like query generators as long as other people use them ... telling people that this is the wrong way to gois actually financing my holiday next week ... ;). in general - hibernate and stuff like that is a no-go. > > personally i like the type of planner hints oleg and teodor came up with - i think we should do more of those hooks theyare using but hiding it in some syntax is not a good idea. > it does not change the query and it still gives a lot of room to toy around. it looks like a compromise. > > however, oleg's contrib module does not fix the core problem of cross column statistics because a hint is usually staticbut you want flexible selectivity. IIRC, what Teodor and Oleg did was a contrib module that excluded a certain index from consideration based on a GUC. That to me is a little more hacky than just wiring the selectivity estimate. You're going to need to set that just before each query that needs it, and reset it afterwards, so it's actually worse than just decorating the queries, IMHO. Also, I haven't run into any actual problems in the field that would be solved by this approach, though I am sure others have. IME, most bad query plans are caused by either incorrect estimates of selectivity, or wrongheaded notions about what's likely to be cached. If we could find a way, automated or manual, of providing the planner some better information about the facts of life in those areas, I think we'd be way better off. I'm open to ideas about what the best way to do that is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
PostgreSQL - Hans-J�rgen Sch�nig wrote: > > Those are real problems, but I still want it. The last time I hit > > this problem I spent two days redesigning my schema and adding > > triggers all over the place to make things work. If I had been > > dealing with a 30TB database instead of a 300MB database I would have > > been royally up a creek. > > > > To put that another way, it's true that some people can't adjust their > > queries, but also some people can. It's true that nonstandard stuff > > sucks, but queries that don't work suck, too. And as for better > > solutions, how many major release cycles do we expect people to wait > > for them? Even one major release cycle is an eternity when you're > > trying to get the application working before your company runs out of > > money, and this particular problem has had a lot of cycles expended on > > it without producing anything very tangible (proposed patch, which > > like you I can't spare a lot of cycles to look at just now, possibly > > excepted). > > > > cheapest and easiest solution if you run into this: add "fake" functions > which the planner cannot estimate properly. use OR to artificially > prop up estimates or use AND to artificially lower them. there is > actually no need to redesign the schema to get around it but it is such > an ugly solution that it does not even deserve to be called "ugly" ... > however, fast and reliable way to get around it. I agree that is super-ugly and we do need to address the cross-column statistics better. I personally like the 2-D histogram idea: http://archives.postgresql.org/pgsql-hackers/2010-12/msg00913.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas wrote: > 2011/2/23 PostgreSQL - Hans-J?rgen Sch?nig <postgres@cybertec.at>: > > i thought there was an agreement that we don't want planner hints? > > Well, I want them. I think some other people do, too. Whether those > people are more numerous than than the people who don't want them, and > how much that matters either way, is another question. I don't want > to have to use them very often, but I like to have an out when I get > desperate. > > > as tom pointed out - many broken queries come out of some query generator where even the design to make the design isbroken by design. > > personally i like query generators as long as other people use them ... telling people that this is the wrong way togo is actually financing my holiday next week ... ;). ?in general - hibernate and stuff like that is a no-go. > > > > personally i like the type of planner hints oleg and teodor came up with - i think we should do more of those hooks theyare using but hiding it in some syntax is not a good idea. > > it does not change the query and it still gives a lot of room to toy around. it looks like a compromise. > > > > however, oleg's contrib module does not fix the core problem of cross column statistics because a hint is usually staticbut you want flexible selectivity. > > IIRC, what Teodor and Oleg did was a contrib module that excluded a > certain index from consideration based on a GUC. That to me is a > little more hacky than just wiring the selectivity estimate. You're > going to need to set that just before each query that needs it, and > reset it afterwards, so it's actually worse than just decorating the > queries, IMHO. Also, I haven't run into any actual problems in the > field that would be solved by this approach, though I am sure others > have. IME, most bad query plans are caused by either incorrect > estimates of selectivity, or wrongheaded notions about what's likely > to be cached. If we could find a way, automated or manual, of > providing the planner some better information about the facts of life > in those areas, I think we'd be way better off. I'm open to ideas > about what the best way to do that is. For me the key is finding a way to get that information to the planner so all queries can benefit, not just the queries we decorate. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
> Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > > Then, having provided a method for the DBA to extinguish the raging > flames of searing agony which are consuming them while a crocodile > chews off their leg and their boss asks them why they didn't use > Oracle, we can continue bikeshedding about the best way of fixing this > problem in a more user-transparent fashion. Is there some way we can do that without adding the selectivity hint to the query itself? That's the biggest issue with hints. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Wed, Feb 23, 2011 at 8:09 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: >> >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); >> >> Then, having provided a method for the DBA to extinguish the raging >> flames of searing agony which are consuming them while a crocodile >> chews off their leg and their boss asks them why they didn't use >> Oracle, we can continue bikeshedding about the best way of fixing this >> problem in a more user-transparent fashion. > > Is there some way we can do that without adding the selectivity hint to > the query itself? That's the biggest issue with hints. I've been mulling this issue over a bit more - Nathan Boley raised a similar point upthread. I think it's useful to consider some concrete cases which can occur. 1. Default estimate. The planner tends to estimate that the selectivity of <something> = <something> is 0.005, and that the selectivity of <something> != <something> is 0.995, when it doesn't know any better. This estimate often sucks. Sometimes it sucks because it's too high, other times because it's too low, and of course sometimes it is close enough for government work. 2. One special customer. Suppose we have a database that contains lots and lots of people and associates different attributes to those people, including customer_id. We put all of our employees in the table too, and assign them customer_id = 1, since the record with customer.id = 1 represents us. I've built this kind of system for several different employers over the years. Turns out, the subset of the person table with customer_id = 1 looks very different, in terms of the MCVs on the remaining columns and the distribution of the values otherwise, than the records with customer_id != 1. I'm sure this problem comes up in different forms in other domains; this is just where I've seen it the most. 3. The mostly-redundant condition. Something like creation_date > 'some timestamp' AND active. Turns out, most of the not active stuff is also... old. A variant of this is creation_date > 'some timestamp' AND customer_id = 1, which overlaps #2. For extra fun the creation date and customer_id may be in different tables, with some intermediate join muddying the waters. 4. The condition that's redundant except when it isn't. The classic example here is WHERE zipcode = <constant> AND state = <constant>. Most of the time, the selectivity of the two clauses together is much higher than the product of their individually selectivities; you might as well ignore the second part altogether. But if some numbskull user enters a state that doesn't match the zipcode, then suddenly it matters a lot - the selectivity drops to zero when the second part is added. 5. The bitfield. Conditions like (x & 64) != 0. I know disk is cheap, but people keep doing this. There are probably some others I'm missing, too. That's just off the top of my head. Now here are some possible approaches to fixing it: A. Decorate the query. This would often be useful for case #1, and some instances of #3 and #5. It's useless for #2 and #4. B. Specify a particular predicate and the selectivity thereof. Like, whenever you see (x & 64) = 0, assume the selectivity is 0.5. Upon reflection, this seems pretty terrible in every respect. Unless you only ever issue an extremely limited range of queries, you're going to be hardwiring a lot of selectivities. I think this really only handles case #5 well, and maybe some instances of case #1. C. Specify an expression and gather statistics on it as if it were a column: i.e. ALTER TABLE tab ADD VIRTUAL STATISTICS COLUMN x & 64. This is pretty good. It is pretty much ideal for #2 and also handles #5 and some cases of #3 and #1 well. You could even make it handle some instances of #4 if you made the virtual column ROW(state, zipcode) and rewrote the query as a row comparison. D. N x N implicativeness matrix. Record for each pair of attributes the extent to which a given value for A implies a value for B, and derate the selectivity multipliers based on this information. This is an idea of Heikki's. It seemed good to me when he proposed it, and I think he proposed it in regards to #4, but I'm not sure we really ever figured out how to make it work. E. Given a set of columns (A1, .., An), collect MCVs and make a histogram for ROW(A1, ..., An), and then use it to handle cases like #4. This is similar to C and is intended to handle the zipcode problem, but it's not as flexible (because you are only specifying columns, not expressions). However, it's intended to work without rewriting the state/zipcode comparisons as a rowcompare. If you want to take the above as in any way an exhaustive survey of the landscape (which it isn't), C seems like a standout, maybe augmented by the making the planner able to notice that A1 = x1 AND A2 = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite queries as much. I don't really know how to handle the join selectivity problem. I am not convinced that there is a better solution to that than decorating the query. After all the join selectivity depends not only on the join clause itself, but also on what you've filtered out of each table in the meantime. Note that I am not sure whether any of this is similar to what the WIP patch already implements, so apologies for possibly rampaging off in a different direction and/or reinventing your ideas. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > If you want to take the above as in any way an exhaustive survey of > the landscape (which it isn't), C seems like a standout, maybe > augmented by the making the planner able to notice that A1 = x1 AND A2 > = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite > queries as much. > > I don't really know how to handle the join selectivity problem. I am > not convinced that there is a better solution to that than decorating > the query. After all the join selectivity depends not only on the > join clause itself, but also on what you've filtered out of each table > in the meantime. Thinking some more, I think another downside to the "decorate the query" idea is that many queries use constants that are supplied only at runtime, so there would be no way to hard-code a selectivity value into a query when you don't know the value. Could a selectivity function handle that? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> If you want to take the above as in any way an exhaustive survey of >> the landscape (which it isn't), C seems like a standout, maybe >> augmented by the making the planner able to notice that A1 = x1 AND A2 >> = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite >> queries as much. >> >> I don't really know how to handle the join selectivity problem. I am >> not convinced that there is a better solution to that than decorating >> the query. After all the join selectivity depends not only on the >> join clause itself, but also on what you've filtered out of each table >> in the meantime. > > Thinking some more, I think another downside to the "decorate the query" > idea is that many queries use constants that are supplied only at > runtime, so there would be no way to hard-code a selectivity value into > a query when you don't know the value. Could a selectivity function > handle that? Beats me. What do you have in mind? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Feb 24, 2011, at 2:09 AM, Josh Berkus wrote: > >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: >> >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); >> >> Then, having provided a method for the DBA to extinguish the raging >> flames of searing agony which are consuming them while a crocodile >> chews off their leg and their boss asks them why they didn't use >> Oracle, we can continue bikeshedding about the best way of fixing this >> problem in a more user-transparent fashion. > > Is there some way we can do that without adding the selectivity hint to > the query itself? That's the biggest issue with hints. > well, you could hide this hint in the system table - say; instead of decorating the query you could store the decorationin some system relation ... but, if you get it right, you call this decoration histogram ;). i think the patch with a multi-dim histogram is good (i have seen something similar for PostGIS). what is still needed in our patch is a.) multi-dim sampling (no idea how to get it right) and b.) investigating how to dealwith joins and expressions (e.g. cos(id) ). hints into the right direction are highly welcome. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
Robert Haas wrote: > On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> If you want to take the above as in any way an exhaustive survey of > >> the landscape (which it isn't), C seems like a standout, maybe > >> augmented by the making the planner able to notice that A1 = x1 AND A2 > >> = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite > >> queries as much. > >> > >> I don't really know how to handle the join selectivity problem. ?I am > >> not convinced that there is a better solution to that than decorating > >> the query. ?After all the join selectivity depends not only on the > >> join clause itself, but also on what you've filtered out of each table > >> in the meantime. > > > > Thinking some more, I think another downside to the "decorate the query" > > idea is that many queries use constants that are supplied only at > > runtime, so there would be no way to hard-code a selectivity value into > > a query when you don't know the value. ?Could a selectivity function > > handle that? > > Beats me. What do you have in mind? My point is just that many queries have constants who's values are not known at the time the query is written, so any system should have a way to handle that somehow. This is why query decoration is usually not a good solution, and why something more flexible that is stored as part of the column is preferred. Perhaps a selectivity function that has easy access to the computed selectivity of the constant involved might be a win. For example, for the zip code/state code case we could have something like: function mysel(zip, state) { pgsel(zip)} meaning we would still use the selectivities found in the optimizer statistics (pgsel), but modify them in some way. In the case above, the selectivity only comes from the zip code. You could also do things like: function mysel(x, y) { pgsel(x) * pgsel(y) * 0.001} Such functions have a higher probability of working for all queries involving that column. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 2/23/11 7:10 AM, Robert Haas wrote: > IME, most bad query plans are caused by either incorrect > estimates of selectivity, or wrongheaded notions about what's likely > to be cached. If we could find a way, automated or manual, of > providing the planner some better information about the facts of life > in those areas, I think we'd be way better off. I'm open to ideas > about what the best way to do that is. As previously discussed, I'm fine with approaches which involve modifying database objects. These are auditable and centrally managable and aren't devastating to upgrades. So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way that decorating queries would not. Similiarly, I would love to be able to set "cache %" on a per-relation basis, and override the whole dubious calculation involving random_page_cost for scans of that table. The great thing about object decorations is that we could then collect data on which ones worked and which didn't through the performance list and then use those to improve the query planner. I doubt such would work with query decorations. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus wrote: > On 2/23/11 7:10 AM, Robert Haas wrote: > > IME, most bad query plans are caused by either incorrect > > estimates of selectivity, or wrongheaded notions about what's likely > > to be cached. If we could find a way, automated or manual, of > > providing the planner some better information about the facts of life > > in those areas, I think we'd be way better off. I'm open to ideas > > about what the best way to do that is. > > As previously discussed, I'm fine with approaches which involve > modifying database objects. These are auditable and centrally managable > and aren't devastating to upgrades. > > So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way > that decorating queries would not. > > Similiarly, I would love to be able to set "cache %" on a per-relation > basis, and override the whole dubious calculation involving > random_page_cost for scans of that table. We should just fine a way of checking what percentage of a table is already in the shared buffers. That doesn't help us with the kernel cache, but it would be a good start and something that doesn't require user tuning. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian <bruce@momjian.us> wrote: > Josh Berkus wrote: >> On 2/23/11 7:10 AM, Robert Haas wrote: >> > IME, most bad query plans are caused by either incorrect >> > estimates of selectivity, or wrongheaded notions about what's likely >> > to be cached. If we could find a way, automated or manual, of >> > providing the planner some better information about the facts of life >> > in those areas, I think we'd be way better off. I'm open to ideas >> > about what the best way to do that is. >> >> As previously discussed, I'm fine with approaches which involve >> modifying database objects. These are auditable and centrally managable >> and aren't devastating to upgrades. >> >> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way >> that decorating queries would not. >> >> Similiarly, I would love to be able to set "cache %" on a per-relation >> basis, and override the whole dubious calculation involving >> random_page_cost for scans of that table. > > We should just fine a way of checking what percentage of a table is > already in the shared buffers. That doesn't help us with the kernel > cache, but it would be a good start and something that doesn't require > user tuning. You're reinventing a wheel that's already been discarded multiple times. There are at least four separate problems: 1. The percentage of the table which is cached in shared_buffers at plan time need not match the percentage that is cached at execution time. A delay of even a few seconds between planning and execution could make the numbers totally different, and plans can be cached for much longer than that. 2. Because shared_buffers can turn over quite quickly, planning the statement multiple times in relatively quick succession could give different results each time. Previous discussions on this topic have concluded that DBAs hate plan instability, and hate GEQO because it causes plan instability, and this would inject plan instabiilty into the main planner. 3. The percentage of the table which is cached in shared_buffers is not necessarily representative of the percentage which is cached in general. On a large machine, shared_buffers may be less than 10% of the total cache. It would be unwise to make guesses about what is and is not cached based on a small percentage of the cache. 4. Even if we could accurately estimate the percentage of the table that is cached, what then? For example, suppose that a user issues a query which retrieves 1% of a table, and we know that 1% of that table is cached. How much of the data that the user asked for is cache? Hard to say, right? It could be none of it or all of it. The second scenario is easy to imagine - just suppose the query's been executed twice. The first scenario isn't hard to imagine either. One idea Tom and I kicked around previously is to set an assumed caching percentage for each table based on its size relative to effective_cache_size - in other words, assume that the smaller a table is, the more of it will be cached. Consider a system with 8GB of RAM, and a table which is 64kB. It is probably unwise to make any plan based on the assumption that that table is less than fully cached. If it isn't before the query executes, it soon will be. Going to any amount of work elsewhere in the plan to avoid the work of reading that table in from disk is probably a dumb idea. Of course, one downside of this approach is that it doesn't know which tables are hot and which tables are cold, but it would probably still be an improvement over the status quo. All that having been said, I think that while Josh is thinking fuzzily about the mathematics of his proposal, the basic idea is pretty sensible. It is not easy - likely not possible - for the system to have a good idea which things will be in some kind of cache at the time the query is executed; it could even change mid-query. The execution of one part of the query could evict from the cache data which some other part of the plan assumed would be cached. But DBAs frequently have a very good idea of which stuff is in cache - they can make observations over a period of time and then adjust settings and then observe some more and adjust some more. PostgreSQL is extremely easy to administer compared with some of its competitors, and it's frequently necessary to change very little. But there's a difference between what you absolutely have to change to make it work and what you have the option to change when necessary. We need to decrease the amount of stuff in the first category (as we recently did with wal_buffers) and increase the amount of stuff in the second category. People coming from Oracle are not favorably impressed either by the amount of monitoring data PostgreSQL can gather or by the number of knobs that are available to fix problems when they occur. We don't need to have as many knobs as Oracle and we probably don't want to, and for that matter we probably couldn't if we did want to for lack of manpower, but that doesn't mean we should have none. Maybe sometime during my life someone will invent a self-driving car where I can just get in and say "take me to Bruce's house" and an hour later it'll parallel park at the end of his driveway. That will be great. But I think that the first generation of self-driving cars will still have a steering wheel, and a brake pedal, and a little switch that turns self-driving mode OFF. It is one thing to say that you have a system which is really good and does not need much manual adjustment, and we have such a system. It is another thing altogether to systematically remove, or refuse to add, any controls that might permit adjustment in cases where it is necessary. That can be the right thing to do if the system is of such high quality that such manual adjustment is so unlikely to be necessary as not to be worth worrying about; but we are not at that point. And frankly I think if we don't add some knobs to let this stuff be tuned manually, we will never get the experience we need to write good auto-tuning algorithms.Greg Smith would not have known what algorithm to proposefor tuning the wal_buffers option if he had not had a bunch of experience setting it by hand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2011/2/25 Robert Haas <robertmhaas@gmail.com>: > On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian <bruce@momjian.us> wrote: >> Josh Berkus wrote: >>> On 2/23/11 7:10 AM, Robert Haas wrote: >>> > IME, most bad query plans are caused by either incorrect >>> > estimates of selectivity, or wrongheaded notions about what's likely >>> > to be cached. If we could find a way, automated or manual, of >>> > providing the planner some better information about the facts of life >>> > in those areas, I think we'd be way better off. I'm open to ideas >>> > about what the best way to do that is. >>> >>> As previously discussed, I'm fine with approaches which involve >>> modifying database objects. These are auditable and centrally managable >>> and aren't devastating to upgrades. >>> >>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way >>> that decorating queries would not. >>> >>> Similiarly, I would love to be able to set "cache %" on a per-relation >>> basis, and override the whole dubious calculation involving >>> random_page_cost for scans of that table. >> >> We should just fine a way of checking what percentage of a table is >> already in the shared buffers. That doesn't help us with the kernel >> cache, but it would be a good start and something that doesn't require >> user tuning. > > You're reinventing a wheel that's already been discarded multiple > times. There are at least four separate problems: > > 1. The percentage of the table which is cached in shared_buffers at > plan time need not match the percentage that is cached at execution > time. A delay of even a few seconds between planning and execution > could make the numbers totally different, and plans can be cached for > much longer than that. > > 2. Because shared_buffers can turn over quite quickly, planning the > statement multiple times in relatively quick succession could give > different results each time. Previous discussions on this topic have > concluded that DBAs hate plan instability, and hate GEQO because it > causes plan instability, and this would inject plan instabiilty into > the main planner. > > 3. The percentage of the table which is cached in shared_buffers is > not necessarily representative of the percentage which is cached in > general. On a large machine, shared_buffers may be less than 10% of > the total cache. It would be unwise to make guesses about what is and > is not cached based on a small percentage of the cache. > > 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? > Hard to say, right? It could be none of it or all of it. The second > scenario is easy to imagine - just suppose the query's been executed > twice. The first scenario isn't hard to imagine either. > > One idea Tom and I kicked around previously is to set an assumed > caching percentage for each table based on its size relative to > effective_cache_size - in other words, assume that the smaller a table > is, the more of it will be cached. Consider a system with 8GB of RAM, > and a table which is 64kB. It is probably unwise to make any plan > based on the assumption that that table is less than fully cached. If > it isn't before the query executes, it soon will be. Going to any > amount of work elsewhere in the plan to avoid the work of reading that > table in from disk is probably a dumb idea. Of course, one downside > of this approach is that it doesn't know which tables are hot and > which tables are cold, but it would probably still be an improvement > over the status quo. Yes, good idea. > > All that having been said, I think that while Josh is thinking fuzzily > about the mathematics of his proposal, the basic idea is pretty > sensible. It is not easy - likely not possible - for the system to > have a good idea which things will be in some kind of cache at the > time the query is executed; it could even change mid-query. The > execution of one part of the query could evict from the cache data > which some other part of the plan assumed would be cached. But DBAs > frequently have a very good idea of which stuff is in cache - they can > make observations over a period of time and then adjust settings and > then observe some more and adjust some more. I believe we can maintain a small map of area of a relation which are in the OS buffer cache (shared buffers move more), or at least a percentage of the relation in OS cache. Getting autovacuum daemon being able to update those maps/counters might be enought and easy to do, it is really near what auto-analyze do. My observation is that the percentage in cache is stable on a production workload after some tens of minutes needed to warm the server. What should really help here is to have hooks in the cost functions to test those ideas without the need to patch postgresql-core a lot. Will it be ok to have hooks or will it add to much CPU consumption in a sensible part of the code ? > > PostgreSQL is extremely easy to administer compared with some of its > competitors, and it's frequently necessary to change very little. But > there's a difference between what you absolutely have to change to > make it work and what you have the option to change when necessary. > We need to decrease the amount of stuff in the first category (as we > recently did with wal_buffers) and increase the amount of stuff in the > second category. People coming from Oracle are not favorably > impressed either by the amount of monitoring data PostgreSQL can > gather or by the number of knobs that are available to fix problems > when they occur. We don't need to have as many knobs as Oracle and we > probably don't want to, and for that matter we probably couldn't if we > did want to for lack of manpower, but that doesn't mean we should have > none. > > Maybe sometime during my life someone will invent a self-driving car > where I can just get in and say "take me to Bruce's house" and an hour > later it'll parallel park at the end of his driveway. That will be > great. But I think that the first generation of self-driving cars > will still have a steering wheel, and a brake pedal, and a little > switch that turns self-driving mode OFF. It is one thing to say that > you have a system which is really good and does not need much manual > adjustment, and we have such a system. It is another thing altogether > to systematically remove, or refuse to add, any controls that might > permit adjustment in cases where it is necessary. That can be the > right thing to do if the system is of such high quality that such > manual adjustment is so unlikely to be necessary as not to be worth > worrying about; but we are not at that point. And frankly I think if > we don't add some knobs to let this stuff be tuned manually, we will > never get the experience we need to write good auto-tuning algorithms. > Greg Smith would not have known what algorithm to propose for tuning > the wal_buffers option if he had not had a bunch of experience setting > it by hand. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
4. Even if we could accurately estimate the percentage of the table
that is cached, what then? For example, suppose that a user issues a
query which retrieves 1% of a table, and we know that 1% of that table
is cached. How much of the data that the user asked for is cache?
Hard to say, right? It could be none of it or all of it. The second
scenario is easy to imagine - just suppose the query's been executed
twice. The first scenario isn't hard to imagine either.
I have a set of slow disks which can impact performance nearly as much as in cached in memory versus the fast disks.
How practical would it be for analyze to keep a record of response times for given sections of a table as it randomly accesses them and generate some kind of a map for expected response times for the pieces of data it is analysing?
It may well discover, on it's own, that recent data (1 month old or less) has a random read response time of N, older data (1 year old) in a different section of the relation tends to have a response time of 1000N, and really old data (5 year old) tends to have a response time of 3000N.
2011/2/25 Cédric Villemain <cedric.villemain.debian@gmail.com>: >> All that having been said, I think that while Josh is thinking fuzzily >> about the mathematics of his proposal, the basic idea is pretty >> sensible. It is not easy - likely not possible - for the system to >> have a good idea which things will be in some kind of cache at the >> time the query is executed; it could even change mid-query. The >> execution of one part of the query could evict from the cache data >> which some other part of the plan assumed would be cached. But DBAs >> frequently have a very good idea of which stuff is in cache - they can >> make observations over a period of time and then adjust settings and >> then observe some more and adjust some more. > > I believe we can maintain a small map of area of a relation which are > in the OS buffer cache (shared buffers move more), or at least a > percentage of the relation in OS cache. Getting autovacuum daemon > being able to update those maps/counters might be enought and easy to > do, it is really near what auto-analyze do. My observation is that > the percentage in cache is stable on a production workload after some > tens of minutes needed to warm the server. I don't think we can assume that will be true in all workloads. Imagine a server doing batch processing. People submit large batches of work that take, say, an hour to complete. Not all batches use the same set of tables - maybe they even run in different databases. After a big batch process finishes crunching numbers in database A, very little of database B will be cached. But it's not necessarily right to assume that when we start queries for a new batch in database B, although it's more likely to be right for large tables (which will take a long time to get cached meaningfully, if they ever do) than small ones. Also, it could lead to strange issues where batches run much faster or slower depending on which batch immediately proceeded them. If we're going to do something a lot of times, it'd be better to bite the bullet and read it all in rather than going to more work elsewhere, but if we're only going to touch it once, then not so much. You might also have this issue on systems that run OLTP workloads all day and then do some batch processing at night to get ready for the next business day. Kevin Grittner wrote previously about those jobs needing some different settings in his environment (I'm not remembering which settings at the moment). Suppose that the batch process is going to issue a query that can be planned in one of two possible ways. One way involves reading 10% of a relation, and the other way involves reading the whole thing. The first plan takes 200 s to execute if the relation is not cached, and 180 s if the relevant portion is cached. The second plan takes 300 s to execute if the relation is not cached, and 100 s if it is cached. At the start of the batch run, the relation won't be cached, because it's used *only* by the overnight job and not by the daily OLTP traffic. Which way should we execute the query? The answer is that if the batch job only needs to execute that query *once*, we should do it the first way. But if it needs to execute it three or more times, the second way is better, but only if we use the second plan every time. If we start out with the first plan, we're always better off sticking with it *unless* we know that we're going to repeat the query at least twice more after the iteration we're currently planning. To make the right decision, the query planner needs a crystal ball. Or, a little help from the DBA. > What should really help here is to have hooks in the cost functions to > test those ideas without the need to patch postgresql-core a lot. Will > it be ok to have hooks or will it add to much CPU consumption in a > sensible part of the code ? Depends on where you put them, I guess. Hooks are pretty cheap, but they're also pretty hard to use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > How practical would it be for analyze to keep a record of response times for > given sections of a table as it randomly accesses them and generate some > kind of a map for expected response times for the pieces of data it is > analysing? I think what you want is random_page_cost that can be tailored per tablespace. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2011/2/25 Robert Haas <robertmhaas@gmail.com>: > 2011/2/25 Cédric Villemain <cedric.villemain.debian@gmail.com>: >>> All that having been said, I think that while Josh is thinking fuzzily >>> about the mathematics of his proposal, the basic idea is pretty >>> sensible. It is not easy - likely not possible - for the system to >>> have a good idea which things will be in some kind of cache at the >>> time the query is executed; it could even change mid-query. The >>> execution of one part of the query could evict from the cache data >>> which some other part of the plan assumed would be cached. But DBAs >>> frequently have a very good idea of which stuff is in cache - they can >>> make observations over a period of time and then adjust settings and >>> then observe some more and adjust some more. >> >> I believe we can maintain a small map of area of a relation which are >> in the OS buffer cache (shared buffers move more), or at least a >> percentage of the relation in OS cache. Getting autovacuum daemon >> being able to update those maps/counters might be enought and easy to >> do, it is really near what auto-analyze do. My observation is that >> the percentage in cache is stable on a production workload after some >> tens of minutes needed to warm the server. > > I don't think we can assume that will be true in all workloads. > Imagine a server doing batch processing. People submit large batches > of work that take, say, an hour to complete. Not all batches use the > same set of tables - maybe they even run in different databases. > After a big batch process finishes crunching numbers in database A, > very little of database B will be cached. But it's not necessarily > right to assume that when we start queries for a new batch in database > B, although it's more likely to be right for large tables (which will > take a long time to get cached meaningfully, if they ever do) than > small ones. Also, it could lead to strange issues where batches run > much faster or slower depending on which batch immediately proceeded > them. If we're going to do something a lot of times, it'd be better > to bite the bullet and read it all in rather than going to more work > elsewhere, but if we're only going to touch it once, then not so much. > > You might also have this issue on systems that run OLTP workloads all > day and then do some batch processing at night to get ready for the > next business day. Kevin Grittner wrote previously about those jobs > needing some different settings in his environment (I'm not > remembering which settings at the moment). Suppose that the batch > process is going to issue a query that can be planned in one of two > possible ways. One way involves reading 10% of a relation, and the > other way involves reading the whole thing. The first plan takes 200 > s to execute if the relation is not cached, and 180 s if the relevant > portion is cached. The second plan takes 300 s to execute if the > relation is not cached, and 100 s if it is cached. At the start of > the batch run, the relation won't be cached, because it's used *only* > by the overnight job and not by the daily OLTP traffic. Which way > should we execute the query? > > The answer is that if the batch job only needs to execute that query > *once*, we should do it the first way. But if it needs to execute it > three or more times, the second way is better, but only if we use the > second plan every time. If we start out with the first plan, we're > always better off sticking with it *unless* we know that we're going > to repeat the query at least twice more after the iteration we're > currently planning. To make the right decision, the query planner > needs a crystal ball. Or, a little help from the DBA. Yes, we are talking of improving some part of the model. Some workloads are dramatic and need special customization. This is true. Still there is a path of improvement, and probably it will remain a path of improvement after the current model is updated. I am not proposing something to solve all the issues, but way more interesting IMHO than just letting the dba say : 'this table is in cache at XX%'. Btw, pgfincore already do solve the usecase you provide by helping the DBA to prepare its batch processing, so in some sense I am familiar with what you describe (take the second plan, pgfincore will preload in the background, and your query will be done in 100s from the first). > >> What should really help here is to have hooks in the cost functions to >> test those ideas without the need to patch postgresql-core a lot. Will >> it be ok to have hooks or will it add to much CPU consumption in a >> sensible part of the code ? > > Depends on where you put them, I guess. Hooks are pretty cheap, but > they're also pretty hard to use. Yes, it will be easier to make an extension, have people testing it and validate or not the 'new' model -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
> 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? FWIW, for a manual override setting, I was thinking that the % would convert to a probability. In that way, it wouldn't be different from the existing RPC calculation; we're just estimating how *likely* it is that the data the user wants is cached. > One idea Tom and I kicked around previously is to set an assumed > caching percentage for each table based on its size relative to > effective_cache_size - in other words, assume that the smaller a table > is, the more of it will be cached. Consider a system with 8GB of RAM, > and a table which is 64kB. It is probably unwise to make any plan > based on the assumption that that table is less than fully cached. If > it isn't before the query executes, it soon will be. Going to any > amount of work elsewhere in the plan to avoid the work of reading that > table in from disk is probably a dumb idea. Of course, one downside > of this approach is that it doesn't know which tables are hot and > which tables are cold, but it would probably still be an improvement > over the status quo. Actually, we *do* have some idea which tables are hot. Or at least, we could. Currently, pg_stats for tables are "timeless"; they just accumulate from the last reset, which has always been a problem in general for monitoring. If we could make top-level table and index stats time-based, even in some crude way, we would know which tables were currently hot. That would also have the benefit of making server performance analysis and autotuning easier. > But DBAs > frequently have a very good idea of which stuff is in cache - they can > make observations over a period of time and then adjust settings and > then observe some more and adjust some more. Agreed. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > >> How practical would it be for analyze to keep a record of response times for >> given sections of a table as it randomly accesses them and generate some >> kind of a map for expected response times for the pieces of data it is >> analysing? > > I think what you want is random_page_cost that can be tailored per > tablespace. We have that. But it's not the same as tracking *sections of a table*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus <josh@agliodbs.com> wrote: >> One idea Tom and I kicked around previously is to set an assumed >> caching percentage for each table based on its size relative to >> effective_cache_size - in other words, assume that the smaller a table >> is, the more of it will be cached. Consider a system with 8GB of RAM, >> and a table which is 64kB. It is probably unwise to make any plan >> based on the assumption that that table is less than fully cached. If >> it isn't before the query executes, it soon will be. Going to any >> amount of work elsewhere in the plan to avoid the work of reading that >> table in from disk is probably a dumb idea. Of course, one downside >> of this approach is that it doesn't know which tables are hot and >> which tables are cold, but it would probably still be an improvement >> over the status quo. > > Actually, we *do* have some idea which tables are hot. Or at least, we > could. Currently, pg_stats for tables are "timeless"; they just > accumulate from the last reset, which has always been a problem in > general for monitoring. If we could make top-level table and index > stats time-based, even in some crude way, we would know which tables > were currently hot. That would also have the benefit of making server > performance analysis and autotuning easier. I think there would be value in giving the DBA an easier way to see which tables are hot, but I am really leery about the idea of trying to feed that directly into the query planner. I think this is one of those cases where we let people tune it manually for starters, and then wait for feedback. Eventually someone will say "oh, I never tune that by hand any more, ever since I wrote this script which does the following computation... and I just run it out cron". And then we will get out the party hats. But we will never get the experience we need to say what that auto-tuning algorithm will be unless we first provide the knob for someone to fiddle with manually. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > > Actually, we *do* have some idea which tables are hot. ?Or at least, we > > could. ? Currently, pg_stats for tables are "timeless"; they just > > accumulate from the last reset, which has always been a problem in > > general for monitoring. ?If we could make top-level table and index > > stats time-based, even in some crude way, we would know which tables > > were currently hot. ?That would also have the benefit of making server > > performance analysis and autotuning easier. > > I think there would be value in giving the DBA an easier way to see > which tables are hot, but I am really leery about the idea of trying > to feed that directly into the query planner. I think this is one of > those cases where we let people tune it manually for starters, and > then wait for feedback. Eventually someone will say "oh, I never tune > that by hand any more, ever since I wrote this script which does the > following computation... and I just run it out cron". And then we > will get out the party hats. But we will never get the experience we > need to say what that auto-tuning algorithm will be unless we first > provide the knob for someone to fiddle with manually. It is also possible we will implement a manual way and never get around to automating it. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 25 Feb 2011, at 13:18, Robert Haas wrote: > People coming from Oracle are not favorably > impressed either by the amount of monitoring data PostgreSQL can > gather or by the number of knobs that are available to fix problems > when they occur. We don't need to have as many knobs as Oracle and we > probably don't want to, and for that matter we probably couldn't if we > did want to for lack of manpower, but that doesn't mean we should have > none. Still, having more data a user can probe would be nice. I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible'direction. And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries forthe same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their sql server performed much better than a team of over-sweating oracle dba's. In my current work place/camp we have many deployments of the same system, over different types of machines, each with differentcustomer data that vary so much that queries need to be rather generic. Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweakingto the configuration parameters. Just because it is just too hard to overlook all of them. I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over50 different type of data and 50 different sets of hardware. If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.
On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera <span dir="ltr"><<a href="mailto:alvherre@commandprompt.com">alvherre@commandprompt.com</a>></span>wrote:<br /><div class="gmail_quote"><blockquoteclass="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204,204); padding-left: 1ex;"> Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:<br /><div class="im"><br/> > How practical would it be for analyze to keep a record of response times for<br /> > given sectionsof a table as it randomly accesses them and generate some<br /> > kind of a map for expected response times forthe pieces of data it is<br /> > analysing?<br /><br /></div>I think what you want is random_page_cost that can betailored per<br /> tablespace.<br /><br /></blockquote></div><br />Yes, that can certainly help but does nothing to helpwith finding typical hot-spots or cached sections of the table and sending that information to the planner.<br /><br/>Between Analyze random sampling and perhaps some metric during actual IO of random of queries we should be able todetermine and record which pieces of data tend to be hot/in cache, or readily available and what data tends not to be.<br/><br /><br />If the planner knew that the value "1" tends to have a much lower cost to fetch than any other valuein the table (it is cached or otherwise readily available), it can choose a plan better suited toward that.<br />
On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz <gj@pointblue.com.pl> wrote: > > On 25 Feb 2011, at 13:18, Robert Haas wrote: > >> People coming from Oracle are not favorably >> impressed either by the amount of monitoring data PostgreSQL can >> gather or by the number of knobs that are available to fix problems >> when they occur. We don't need to have as many knobs as Oracle and we >> probably don't want to, and for that matter we probably couldn't if we >> did want to for lack of manpower, but that doesn't mean we should have >> none. > > Still, having more data a user can probe would be nice. > > I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible'direction. > And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries forthe same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their > sql server performed much better than a team of over-sweating oracle dba's. I don't think *anyone* is avoiding that approach. There is almost universal consensus here that auto-tuning is better than manual tuning, even to the extent of being unwilling to add knobs to allow manual tuning of settings we have no idea how to auto-tune and no plans to auto-tune. > In my current work place/camp we have many deployments of the same system, over different types of machines, each withdifferent customer data that vary so much that queries need to be rather generic. > Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very littletweaking to the configuration parameters. Just because it is just too hard to overlook all of them. > I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installationsover 50 different type of data and 50 different sets of hardware. > If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms. What do you have in mind? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> > Actually, we *do* have some idea which tables are hot. ?Or at least, we >> > could. ? Currently, pg_stats for tables are "timeless"; they just >> > accumulate from the last reset, which has always been a problem in >> > general for monitoring. ?If we could make top-level table and index >> > stats time-based, even in some crude way, we would know which tables >> > were currently hot. ?That would also have the benefit of making server >> > performance analysis and autotuning easier. >> >> I think there would be value in giving the DBA an easier way to see >> which tables are hot, but I am really leery about the idea of trying >> to feed that directly into the query planner. I think this is one of >> those cases where we let people tune it manually for starters, and >> then wait for feedback. Eventually someone will say "oh, I never tune >> that by hand any more, ever since I wrote this script which does the >> following computation... and I just run it out cron". And then we >> will get out the party hats. But we will never get the experience we >> need to say what that auto-tuning algorithm will be unless we first >> provide the knob for someone to fiddle with manually. > > It is also possible we will implement a manual way and never get around > to automating it. :-( You make it sound as if we know how but are just too lazy to right the code. That is not one of the weaknesses that this community has. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>> >> >> Still, having more data a user can probe would be nice. >> >> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible'direction. >> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries forthe same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their >> sql server performed much better than a team of over-sweating oracle dba's. > > I don't think *anyone* is avoiding that approach. There is almost > universal consensus here that auto-tuning is better than manual > tuning, even to the extent of being unwilling to add knobs to allow > manual tuning of settings we have no idea how to auto-tune and no > plans to auto-tune. > >> In my current work place/camp we have many deployments of the same system, over different types of machines, each withdifferent customer data that vary so much that queries need to be rather generic. >> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very littletweaking to the configuration parameters. Just because it is just too hard to overlook all of them. >> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installationsover 50 different type of data and 50 different sets of hardware. >> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms. > > What do you have in mind? > what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on. the planner code then goes through its restrictions in the query and finds the best / longest combination it can find andwhich has some statistics defined. it seems we can also do this for join selectivity and expressions. the planner code for "raw column correlation" withoutexpression ( cos(id) or so) and joins is there (WIP, no ANALYZE support and so on so far). i think auto tuning is a good thing to have and the door to actually do it is wide open with our approach. all it takes is a mechanism to see which "conditions" are used how often and somebody could write a job which automaticallytells the system which stats to collect / sample. i think for an "average" user this is the most simplistic thing then. but, to get there we have to get the bloody samplingand the rest of the planner code right in the first place. auto tuning in this area is still something which is far in the future - but at least the road to it is clear. some people suggested some approach dealing with effective_cache_size and so on ... there are many good approaches here butthey don't address the actual problem of wrong size-estimates. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: > what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on. > The problem is that we haven't figured out how to usefully store a histogram for <a,b>. Consider the oft-quoted example of a <city,postal-code> -- or <city,zip code> for Americans. A histogram of the tuple is just the same as a histogram on the city. It doesn't tell you how much extra selectivity the postal code or zip code gives you. And if you happen to store a histogram of <postal code, city> by mistake then it doesn't tell you anything at all. We need a data structure that lets us answer the bayesian question "given a city of New York how selective is zip-code = 02139". I don't know what that data structure would be. Heikki and I had a wacky hand-crafted 2D histogram data structure that I suspect doesn't actually work. And someone else did some research on list and came up with a fancy sounding name of a statistics concept that might be what we want. -- greg
On Sat, Feb 26, 2011 at 06:44:52PM +0000, Greg Stark wrote: > 2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>: > > what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on. > > The problem is that we haven't figured out how to usefully store a > histogram for <a,b>. Consider the oft-quoted example of a > <city,postal-code> -- or <city,zip code> for Americans. A histogram > of the tuple is just the same as a histogram on the city. But there are cases where it can work. Frankly the example you mention is odd because for we can't even build useful 1D histograms for <city> and <zip code>, so the fact that 2D is hard is not surprising. The histograms we do build work fine from > and <, just equality. The 2D will handle the same. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On 26 Feb 2011, at 14:45, Robert Haas wrote: > On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz >> > > I don't think *anyone* is avoiding that approach. There is almost > universal consensus here that auto-tuning is better than manual > tuning, even to the extent of being unwilling to add knobs to allow > manual tuning of settings we have no idea how to auto-tune and no > plans to auto-tune. > Perhaps one step further is required. To change some settings so that it can be auto-tuned better. There are some even moredrastic steps that would have to be taken and I believe that Microsoft engineers had to take them. Steps back. For instance, if there is an issue with inability tofind out how much of a table is in the cache, perhaps postgresql should have an option to turn off cached reads/writes completely and thus allow DBA to regulate that using the shared_buffers setting.It doesn't sound great, but if you think about it I'm sure there are people willing to use it, if that adds a bit more auto-tunning to the server. I would even go a step further,and say that I believe that some people will embrace it on the basis that they can constraint the amount of memory PostgreSQL uses on their server as a whole, and thatincludes caches. >> In my current work place/camp we have many deployments of the same system, over different types of machines, each withdifferent customer data that vary so much that queries need to be rather generic. >> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very littletweaking to the configuration parameters. Just because it is just too hard to overlook all of them. >> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installationsover 50 different type of data and 50 different sets of hardware. >> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms. > > What do you have in mind? All I'm trying to say, that whilst you guys focus mostly on single database server installations PostgreSQL has also a greatuser base that use it as part of a product that is deployed on different sized machines, and with same model but different data variation. We don't sell the product to the people and let them take care of it, butrather sell the service - you would say. But we also don't have a DBA per customer that would look solely at the knob tweaking side of things. So my argument here is, that there isn't always a person who would know tables and databasesby their characteristics and thus be able to tweak settings manually. That probably is just a one of many examples where it makes sense, and probably their primary property is that there's noDBA overlooking whole database and thus being able to tune it.
> Grzegorz Jaskiewicz wrote: > I guess that the systems could behave much better, but no one is > going to tweak settings for 50 different installations over 50 > different type of data and 50 different sets of hardware. > If there was even a tiny amount of automation provided in the > postgresql, I would welcome it with open arms. Hmmm... Well, we have about 100 pieces of hardware with about 200 databases, and we *do* tune them individually, but it's not as onerous as it might seem. For our 72 production circuit court servers, for example, we have one standard configuration which has as its last line an include file for overrides. For some counties that override file is empty. For many we override effective_cache_size based on the RAM installed in the machine. Since most of these servers have the database fully cached, the "standard" file uses equal, low settings for seq_page_cost and random_page_cost, but we override that where necessary. We don't generally tune anything else differently among these servers. (Maybe work_mem, I'd have to check.) Which leads me to think that these might be the key items to autotune. It's not actually that hard for me to imagine timing a small percentage of randomly selected page accesses and developing costing factors for the page costs on the fly. It might be a bit trickier to autotune effective_cache_size, but I can think of two or three heuristics which might work. Automatically generating sane values for these three things would eliminate a significant fraction of problems posted to the performance list. -Kevin
Greg Stark wrote: > Consider the oft-quoted example of a -- or > for Americans. I'm not sure everyone realizes just how complicated this particular issue is. If we can do a good job with U.S. city, state, zip code we will have something which will handle a lot of cases. Consider: (1) Municipality name isn't unique in the U.S. Many states besides Wisconsin have a municipality called Madison (I seem to remember there were over 20 of them). So city without state doesn't necessarily get you anywhere near having a unique zip code or range. (2) A large city has a set of zip codes, all starting with the same first three digits. So identifying the municipality doesn't always identify the zip code, although for small cities it often does. Madison, Wisconsin has thirty-some zip codes, some of which are rather specialized and don't see much use. (3) Small municipalities surrounded by or adjacent to a large city may not get their own zip code. 53704 not only covers a large swath of the northern end of the City of Madison, but is also the zip code for the Village of Maple Bluff and at least parts of the Township of Westport. I guess what I'm saying is that this use case has enough complexity to make an interesting problem to solve. It may even be more challenging than you would want for an initial trial of a technique. -Kevin
Rod Taylor wrote: > On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera <alvherre@commandprompt.com>wrote: > > > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > > > > > How practical would it be for analyze to keep a record of response times > > for > > > given sections of a table as it randomly accesses them and generate some > > > kind of a map for expected response times for the pieces of data it is > > > analysing? > > > > I think what you want is random_page_cost that can be tailored per > > tablespace. > > > > > Yes, that can certainly help but does nothing to help with finding typical > hot-spots or cached sections of the table and sending that information to > the planner. > > Between Analyze random sampling and perhaps some metric during actual IO of > random of queries we should be able to determine and record which pieces of > data tend to be hot/in cache, or readily available and what data tends not > to be. > > > If the planner knew that the value "1" tends to have a much lower cost to > fetch than any other value in the table (it is cached or otherwise readily > available), it can choose a plan better suited toward that. Well, one idea I have always had is feeding things the executor finds back to the optimizer for use in planning future queries. One argument against that is that a planned query might run with different data behavior than seen by the executor in the past, but we know if the optimizer is planning something for immediate execution or later execution, so we could use executor stats only when planning for immediate execution. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Grzegorz Jaskiewicz wrote: > > On 25 Feb 2011, at 13:18, Robert Haas wrote: > > > People coming from Oracle are not favorably > > impressed either by the amount of monitoring data PostgreSQL can > > gather or by the number of knobs that are available to fix problems > > when they occur. We don't need to have as many knobs as Oracle and we > > probably don't want to, and for that matter we probably couldn't if we > > did want to for lack of manpower, but that doesn't mean we should have > > none. > > Still, having more data a user can probe would be nice. > > I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible'direction. > And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries forthe same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their > sql server performed much better than a team of over-sweating oracle dba's. > > In my current work place/camp we have many deployments of the same system, over different types of machines, each withdifferent customer data that vary so much that queries need to be rather generic. > Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very littletweaking to the configuration parameters. Just because it is just too hard to overlook all of them. > I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installationsover 50 different type of data and 50 different sets of hardware. > If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms. I totally agree. If we add a tuning parameter that does 10x better than automatic, but only 1% of our users use it, we would be better off, overall, with the automatic tuning. See my blog post which talks about the same tradeoff when adding configuration variables: http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas wrote: > On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> > Actually, we *do* have some idea which tables are hot. ?Or at least, we > >> > could. ? Currently, pg_stats for tables are "timeless"; they just > >> > accumulate from the last reset, which has always been a problem in > >> > general for monitoring. ?If we could make top-level table and index > >> > stats time-based, even in some crude way, we would know which tables > >> > were currently hot. ?That would also have the benefit of making server > >> > performance analysis and autotuning easier. > >> > >> I think there would be value in giving the DBA an easier way to see > >> which tables are hot, but I am really leery about the idea of trying > >> to feed that directly into the query planner. ?I think this is one of > >> those cases where we let people tune it manually for starters, and > >> then wait for feedback. ?Eventually someone will say "oh, I never tune > >> that by hand any more, ever since I wrote this script which does the > >> following computation... and I just run it out cron". ?And then we > >> will get out the party hats. ?But we will never get the experience we > >> need to say what that auto-tuning algorithm will be unless we first > >> provide the knob for someone to fiddle with manually. > > > > It is also possible we will implement a manual way and never get around > > to automating it. ? :-( > > You make it sound as if we know how but are just too lazy to right the > code. That is not one of the weaknesses that this community has. Well, several automatic idea have been floated, but rejected because they don't work well for queries that are planned and executed later. Perhaps we should consider auto-tuning of queries that are planned for immediate execution. I just posed that idea in an email to this thread. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sun, Feb 27, 2011 at 3:01 AM, Bruce Momjian <bruce@momjian.us> wrote: > Grzegorz Jaskiewicz wrote: >> >> On 25 Feb 2011, at 13:18, Robert Haas wrote: >> >> > People coming from Oracle are not favorably >> > impressed either by the amount of monitoring data PostgreSQL can >> > gather or by the number of knobs that are available to fix problems >> > when they occur. We don't need to have as many knobs as Oracle and we >> > probably don't want to, and for that matter we probably couldn't if we >> > did want to for lack of manpower, but that doesn't mean we should have >> > none. >> >> Still, having more data a user can probe would be nice. >> >> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible'direction. >> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries forthe same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their >> sql server performed much better than a team of over-sweating oracle dba's. >> >> In my current work place/camp we have many deployments of the same system, over different types of machines, each withdifferent customer data that vary so much that queries need to be rather generic. >> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very littletweaking to the configuration parameters. Just because it is just too hard to overlook all of them. >> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installationsover 50 different type of data and 50 different sets of hardware. >> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms. > > I totally agree. If we add a tuning parameter that does 10x better than > automatic, but only 1% of our users use it, we would be better off, > overall, with the automatic tuning. It's not an either/or proposition. There is no reason why we can't let things be tuned automatically, but provide overrides for cases where the automatic tuning does not work well, of which there will always be some. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Feb 27, 2011 at 3:03 AM, Bruce Momjian <bruce@momjian.us> wrote: >> You make it sound as if we know how but are just too lazy to right the >> code. That is not one of the weaknesses that this community has. > > Well, several automatic idea have been floated, but rejected because > they don't work well for queries that are planned and executed later. > Perhaps we should consider auto-tuning of queries that are planned for > immediate execution. I just posed that idea in an email to this thread. Which ideas were rejected for that reason? If we're talking about the idea of using the current contents of the buffer cache and perhaps the OS cache to plan queries, I think that's not likely to work well even if we do restrict it to queries that we're going to execute immediately. Upthread I listed four problems with the idea of planning queries based on the current contents of shared_buffers, and this certainly doesn't address all four. http://archives.postgresql.org/pgsql-hackers/2011-02/msg02206.php To reiterate my basic theme here one more time, we have a very good query planner, but it can fall on its face very badly when it is unable to correctly estimate selectivity, or due to caching effects, and we have very little to recommend to people who run afoul of those problems right now. The problems are real, significant, and affect a large number of users, some of whom give up on PostgreSQL as a direct result. I am glad that we are committed to having a system that is auto-tuning to the greatest degree possible, but I think it is very short-sighted of us not to provide workarounds for the cases where they are legitimately needed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I think there would be value in giving the DBA an easier way to see > which tables are hot, but I am really leery about the idea of trying > to feed that directly into the query planner. I think this is one of > those cases where we let people tune it manually for starters, and > then wait for feedback. Eventually someone will say "oh, I never tune > that by hand any more, ever since I wrote this script which does the > following computation... and I just run it out cron". And then we > will get out the party hats. But we will never get the experience we > need to say what that auto-tuning algorithm will be unless we first > provide the knob for someone to fiddle with manually. I'm not disagreeing with that. I'm saying "first, we give DBAs a way to see which tables are currently hot". Such a feature has multiple benefits, making it worth the overhead and/or coding effort. Whether we're shooting for autotuning or manual tuning, it starts with having the data. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Sun, Feb 27, 2011 at 5:17 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> I think there would be value in giving the DBA an easier way to see >> which tables are hot, but I am really leery about the idea of trying >> to feed that directly into the query planner. I think this is one of >> those cases where we let people tune it manually for starters, and >> then wait for feedback. Eventually someone will say "oh, I never tune >> that by hand any more, ever since I wrote this script which does the >> following computation... and I just run it out cron". And then we >> will get out the party hats. But we will never get the experience we >> need to say what that auto-tuning algorithm will be unless we first >> provide the knob for someone to fiddle with manually. > > I'm not disagreeing with that. I'm saying "first, we give DBAs a way to > see which tables are currently hot". Such a feature has multiple > benefits, making it worth the overhead and/or coding effort. > > Whether we're shooting for autotuning or manual tuning, it starts with > having the data. Well, what we have now is a bunch of counters in pg_stat_all_tables and pg_statio_all_tables. Making that easier for the DBA almost seems like more of a job for a third-party tool that, say, graphs it, than a job for PG itself. But if you have an idea I'm ears. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Josh Berkus
Date:
> Well, what we have now is a bunch of counters in pg_stat_all_tables > and pg_statio_all_tables. Right. What I'm saying is those aren't good enough, and have never been good enough. Counters without a time basis are pretty much useless for performance monitoring/management (Baron Schwartz has a blog post talking about this, but I can't find it right now). Take, for example, a problem I was recently grappling with for Nagios. I'd like to do a check as to whether or not tables are getting autoanalyzed often enough. After all, autovac can fall behind, and we'd want to be alerted of that. The problem is, in order to measure whether or not autoanalyze is behind, you need to count how many inserts,updates,deletes have happened since the last autoanalyze. pg_stat_user_tables just gives us the counters since the last reset ... and the reset time isn't even stored in PostgreSQL. This means that, without adding external tools like pg_statsinfo, we can't autotune autoanalyze at all. There are quite a few other examples where the counters could contribute to autotuning and DBA performance monitoring if only they were time-based. As it is, they're useful for finding unused indexes and that's about it. One possibility, of course, would be to take pg_statsinfo and make it part of core. There's a couple disadvantages of that; (1) is the storage and extra objects required, which would then require us to add extra management routines as well. (2) is that pg_statsinfo only stores top-level view history, meaning that it wouldn't be very adaptable to improvements we make in system views in the future. On the other hand, anything which increases the size of pg_statistic would be a nightmare. One possible compromise solution might be to implement code for the stats collector to automatically reset the stats at a given clock interval. If we combined this with keeping the reset time, and keeping a snapshot of the stats from the last clock tick (and their reset time) that would be "good enough" for most monitoring. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Martijn van Oosterhout
Date:
On Mon, Feb 28, 2011 at 10:04:54AM -0800, Josh Berkus wrote: > Take, for example, a problem I was recently grappling with for Nagios. > I'd like to do a check as to whether or not tables are getting > autoanalyzed often enough. After all, autovac can fall behind, and we'd > want to be alerted of that. > > The problem is, in order to measure whether or not autoanalyze is > behind, you need to count how many inserts,updates,deletes have happened > since the last autoanalyze. pg_stat_user_tables just gives us the > counters since the last reset ... and the reset time isn't even stored > in PostgreSQL. The solution I use for that in to use munin to monitor everything and let it generate alerts based on the levels. It's not great, but better than nothing. The problem, as you say, is that you want to now the rates rather than the absolute values. The problem with rates is that you can get wildly different results depending on the time interval you're looking at. For the concrete example above, autoanalyse has to be able to determine if there is work to do so the information must be somehwere. I'm guessing it's not easily available? If you had a function is_autovacuumcandidate you'd be done ofcourse. But there's ofcourse lots of stats people want, it's just not clear how to get them. What you really need is to store the stats every few minutes, but that's what munin does. I doubt it's worth building RRD like capabilities into postgres. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Robert Haas
Date:
On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh@agliodbs.com> wrote: > On the other hand, anything which increases the size of pg_statistic > would be a nightmare. Hmm? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Josh Berkus
Date:
On 2/28/11 10:24 AM, Robert Haas wrote: > On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh@agliodbs.com> wrote: >> On the other hand, anything which increases the size of pg_statistic >> would be a nightmare. > > Hmm? Like replacing each statistic with a series of time-based buckets, which would then increase the size of the table by 5X to 10X. That was the first solution I thought of, and rejected. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Excerpts from Robert Haas's message of sáb feb 26 02:24:26 -0300 2011: > On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > > > >> How practical would it be for analyze to keep a record of response times for > >> given sections of a table as it randomly accesses them and generate some > >> kind of a map for expected response times for the pieces of data it is > >> analysing? > > > > I think what you want is random_page_cost that can be tailored per > > tablespace. > > We have that. Oh, right. > But it's not the same as tracking *sections of a table*. I dunno. I imagine if you have a "section" of a table in different storage than other sections, you created a tablespace and moved the partition holding that section there. Otherwise, how do you prevent the tuples from moving to other "sections"? (We don't really have a concept of "sections" of a table.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes: > On 2/28/11 10:24 AM, Robert Haas wrote: >> On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> On the other hand, anything which increases the size of pg_statistic >>> would be a nightmare. >> Hmm? > Like replacing each statistic with a series of time-based buckets, which > would then increase the size of the table by 5X to 10X. That was the > first solution I thought of, and rejected. I think Josh is thinking of the stats collector's dump file, not pg_statistic. Ultimately we need to think of a reporting mechanism that's a bit smarter than "rewrite the whole file for any update" ... regards, tom lane
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Robert Haas
Date:
On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >> On 2/28/11 10:24 AM, Robert Haas wrote: >>> On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh@agliodbs.com> wrote: >>>> On the other hand, anything which increases the size of pg_statistic >>>> would be a nightmare. > >>> Hmm? > >> Like replacing each statistic with a series of time-based buckets, which >> would then increase the size of the table by 5X to 10X. That was the >> first solution I thought of, and rejected. > > I think Josh is thinking of the stats collector's dump file, not > pg_statistic. Yeah. > Ultimately we need to think of a reporting mechanism that's a bit > smarter than "rewrite the whole file for any update" ... Well, we have these things called "tables". Any chance of using those? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Ultimately we need to think of a reporting mechanism that's a bit >> smarter than "rewrite the whole file for any update" ... > Well, we have these things called "tables". Any chance of using those? Having the stats collector write tables would violate the classical form of the heisenberg principle (thou shalt avoid having thy measurement tools affect that which is measured), not to mention assorted practical problems like not wanting the stats collector to take locks or run transactions. The ideal solution would likely be for the stats collector to expose its data structures as shared memory, but I don't think we get to do that under SysV shmem --- it doesn't like variable-size shmem much. Maybe that's another argument for looking harder into mmap or POSIX shmem, although it's not clear to me how well either of those fixes that. regards, tom lane
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Michael Glaesemann
Date:
On Feb 28, 2011, at 14:31, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Ultimately we need to think of a reporting mechanism that's a bit >>> smarter than "rewrite the whole file for any update" ... > >> Well, we have these things called "tables". Any chance of using those? > > Having the stats collector write tables would violate the classical form > of the heisenberg principle (thou shalt avoid having thy measurement > tools affect that which is measured), not to mention assorted practical > problems like not wanting the stats collector to take locks or run > transactions. > > The ideal solution would likely be for the stats collector to expose its > data structures as shared memory, but I don't think we get to do that > under SysV shmem --- it doesn't like variable-size shmem much. Maybe > that's another argument for looking harder into mmap or POSIX shmem, > although it's not clear to me how well either of those fixes that. Spitballing here, but could sqlite be an intermediate, compromise solution? Michael Glaesemann grzm seespotcode net
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Josh Berkus
Date:
> Spitballing here, but could sqlite be an intermediate, compromise solution? For a core PostgreSQL component ?!?!? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
"Joshua D. Drake"
Date:
On Mon, 2011-02-28 at 11:39 -0800, Josh Berkus wrote: > > Spitballing here, but could sqlite be an intermediate, compromise solution? > > For a core PostgreSQL component ?!?!? Sure, why not? It is ACID compliant, has the right kind of license, has a standard API that we are all used to. It seems like a pretty decent solution in consideration. We don't need MVCC for this problem. JD > > -- > -- Josh Berkus > PostgreSQL Experts Inc. > http://www.pgexperts.com > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Euler Taveira de Oliveira
Date:
Em 28-02-2011 15:50, Tom Lane escreveu: > Ultimately we need to think of a reporting mechanism that's a bit > smarter than "rewrite the whole file for any update" ... > What about splitting statistic file per database? -- Euler Taveira de Oliveira http://www.timbira.com/
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes: > On Mon, 2011-02-28 at 11:39 -0800, Josh Berkus wrote: > Spitballing here, but could sqlite be an intermediate, compromise solution? >> >> For a core PostgreSQL component ?!?!? > Sure, why not? Because it's fifty times more mechanism than we need here? We don't want a SQL interface (not even a lightweight one) and it's unclear that we ever want the data to go to disk at all. regards, tom lane
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Tom Lane
Date:
Euler Taveira de Oliveira <euler@timbira.com> writes: > Em 28-02-2011 15:50, Tom Lane escreveu: >> Ultimately we need to think of a reporting mechanism that's a bit >> smarter than "rewrite the whole file for any update" ... > What about splitting statistic file per database? That would improve matters for some usage patterns, but I'm afraid only a minority. regards, tom lane
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Robert Haas
Date:
On Mon, Feb 28, 2011 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Ultimately we need to think of a reporting mechanism that's a bit >>> smarter than "rewrite the whole file for any update" ... > >> Well, we have these things called "tables". Any chance of using those? > > Having the stats collector write tables would violate the classical form > of the heisenberg principle (thou shalt avoid having thy measurement > tools affect that which is measured), not to mention assorted practical > problems like not wanting the stats collector to take locks or run > transactions. > > The ideal solution would likely be for the stats collector to expose its > data structures as shared memory, but I don't think we get to do that > under SysV shmem --- it doesn't like variable-size shmem much. Maybe > that's another argument for looking harder into mmap or POSIX shmem, > although it's not clear to me how well either of those fixes that. Well, certainly, you could make it work with mmap() - you could arrange a mechanism whereby anyone who tries to reference off the end of the portion they've mapped calls stat() on the file and remaps it at its now-increased size. But you'd need to think carefully about locking and free-space management, which is where it starts to sound an awful lot like you're reinventing the idea of a heap. Maybe there's a way to design some kind of lighter weight mechanism, but the complexity of the problem is not obviously a lot less than the general problem of storing frequently updated tabular data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Greg Stark
Date:
On Mon, Feb 28, 2011 at 6:31 PM, Josh Berkus <josh@agliodbs.com> wrote: > Like replacing each statistic with a series of time-based buckets, which > would then increase the size of the table by 5X to 10X. That was the > first solution I thought of, and rejected. > I don't understand what you're talking about at all here. I think there are a lot of unsolved problems in monitoring but the one thing I think everyone is pretty clear on is that the right way to export metrics like these is to export a counter and then have some external component periodically copy the counter into some history table and calculate the derivative, second derivative, running average of the first derivative, etc. What's needed here is for someone to write a good mrtg/rrd/whatever replacement using postgres as its data store. If you're monitoring something sensitive then you would store the data in a *different* postgres server to avoid Tom's complaint. There may be aspects of the job that Postgres does poorly but we can focus on improving those parts of Postgres rather than looking for another database. And frankly Postgres isn't that bad a tool for it -- when I did some performance analysis recently I actually ended up loading the data into Postgres so I could do some of the aggregations using window functions anyways. -- greg
Re: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Josh Berkus
Date:
> I don't understand what you're talking about at all here. I think > there are a lot of unsolved problems in monitoring but the one thing I > think everyone is pretty clear on is that the right way to export > metrics like these is to export a counter and then have some external > component periodically copy the counter into some history table and > calculate the derivative, second derivative, running average of the > first derivative, etc. You missed the original point of the discussion, which was to have stats we could use for auto-tuning internally. Not to export them. For example, there are optimizations we could make with the query planner if we knew which tables and indexes were "hot" in general. That's how we started this discussion, and it's not solved by storing the stats history on another server. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Re: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Robert Treat
Date:
On Mon, Feb 28, 2011 at 4:13 PM, Greg Stark <gsstark@mit.edu> wrote: > On Mon, Feb 28, 2011 at 6:31 PM, Josh Berkus <josh@agliodbs.com> wrote: > What's needed here is for someone to write a good mrtg/rrd/whatever > replacement using postgres as its data store. If you're monitoring > something sensitive then you would store the data in a *different* > postgres server to avoid Tom's complaint. There may be aspects of the > job that Postgres does poorly but we can focus on improving those > parts of Postgres rather than looking for another database. And > frankly Postgres isn't that bad a tool for it -- when I did some > performance analysis recently I actually ended up loading the data > into Postgres so I could do some of the aggregations using window > functions anyways. > Greg, see https://labs.omniti.com/labs/reconnoiter, but also see Josh's nearby email about how he's trying to solve this internal to the database. Robert Treat play: xzilla.net work: omniti.com hiring: l42.org/Lg
Re: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Chris Browne
Date:
josh@agliodbs.com (Josh Berkus) writes: >> I don't understand what you're talking about at all here. I think >> there are a lot of unsolved problems in monitoring but the one thing >> I think everyone is pretty clear on is that the right way to export >> metrics like these is to export a counter and then have some external >> component periodically copy the counter into some history table and >> calculate the derivative, second derivative, running average of the >> first derivative, etc. > > You missed the original point of the discussion, which was to have > stats we could use for auto-tuning internally. Not to export them. > > For example, there are optimizations we could make with the query > planner if we knew which tables and indexes were "hot" in general. > That's how we started this discussion, and it's not solved by storing > the stats history on another server. There's value to both, and there's no dearth of monitoring frameworks that people keep on replacing with successors, so there's certainly room for both ;-). Recent stuff about such... <https://lopsa.org/content/philosophy-monitoring> <https://labs.omniti.com/labs/reconnoiter> I'm not quite sure what ought to be in PostgreSQL as a "built-in;" I suspect that what's eventually needed is to be able to correlate things across database instances, so that when Tom says, "I need to know what data the planner's working on," the answer can be "OK, got that..." This data is surely useful to get out of the system, so I'd bias towards something sorta like what Greg suggests. And the closed-ended answer may prevent us from asking more sophisticated questions, also not a notably good thing... -- (reverse (concatenate 'string "moc.liamg" "@" "enworbbc")) "If tautologies do not convey information, mathematicians would not be surprised by them." -- Mark Miller
> But it's not the same as tracking *sections of a table*.I dunno. I imagine if you have a "section" of a table in different
storage than other sections, you created a tablespace and moved the
partition holding that section there. Otherwise, how do you prevent the
tuples from moving to other "sections"? (We don't really have a concept
of "sections" of a table.)
Section could be as simple as being on the inner or outer part of a single disk, or as complicated as being on the SSD cache of a spinning disk, or in the multi-gigabyte cache on the raid card or SAN due to being consistently accessed.
Section is the wrong word. If primary key values under 10 million are consistently accessed, they will be cached even if they do get moved through the structure. Values over 10M may be fast if on the same page as the other value but probably aren't.
This is very evident when dealing with time based data in what can be a very large structure. 1% may be very hot and in memory while 99% is not.
Partitioning only helps if you can predict what will be hot in the future. Sometimes an outside source (world events) impacts what section of the structure is hot.
regards,
Rod
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > The ideal solution would likely be for the stats collector to expose its > data structures as shared memory, but I don't think we get to do that > under SysV shmem --- it doesn't like variable-size shmem much. Maybe > that's another argument for looking harder into mmap or POSIX shmem, > although it's not clear to me how well either of those fixes that. We could certainly use message passing style atop pgpipe.c here, right? After all we already have a protocol and know how to represent complex data structure in there, and all components of PostgreSQL should be able to alleviate this, I'd think. Or this fever ain't really gone yet :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
From
Bernd Helmle
Date:
--On 28. Februar 2011 15:02:30 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Because it's fifty times more mechanism than we need here? We don't > want a SQL interface (not even a lightweight one) and it's unclear that > we ever want the data to go to disk at all. I wonder wether a library like librrd would be a solution for this. -- Thanks Bernd