Thread: constraint exclusion analysis caching
Yesterday a client and I were sad to discover that the overhead of constraint exclusion is apparently O(n) in the number of partitions, and that where we had ~180 partitions each with a simple constraint (check (field = nnn)) the overhead appeared to amount to about 0.25s on some quite performant hardware, which is way too high for our application. Actual execution of the query in question was talking one tenth of that time. For now we're going to work around this by directing the queries directly to the child tables, although this does involve fairly large application changes. However, I wondered if we couldn't mitigate this by caching the results of constraint exclusion analysis for a particular table + condition. I have no idea how hard this would be, but in principle it seems silly to keep paying the same penalty over and over again. Thoughts? cheers andrew
On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote: > However, I wondered if we couldn't mitigate this by caching the results > of constraint exclusion analysis for a particular table + condition. I > have no idea how hard this would be, but in principle it seems silly to > keep paying the same penalty over and over again. This would be a perfect candidate for the plan-branch based on actual parameters capability, in association with globally cached plans mentioned here: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00920.php Cheers, Csaba.
Andrew Dunstan <andrew@dunslane.net> writes: > Yesterday a client and I were sad to discover that the overhead of > constraint exclusion is apparently O(n) in the number of partitions, and > that where we had ~180 partitions each with a simple constraint (check > (field = nnn)) the overhead appeared to amount to about 0.25s on some > quite performant hardware, which is way too high for our application. I would think that any sort of formal partitioning feature would fix the problem, because the planner would understand directly about partitioning instead of having to prove the correctness of not scanning each one of the other 179 partitions. The existing feature is cool in the sense of obtaining useful behavior from generalized spare parts, but it was never designed or expected to give great planning speed with large numbers of partitions. TFM points out that constraint exclusion cannot scale beyond perhaps a hundred partitions ... regards, tom lane
"Andrew Dunstan" <andrew@dunslane.net> writes: > Actual execution of the query in question was talking one tenth of that > time. >... > but in principle it seems silly to keep paying the same penalty over and > over again. I would think constraint_exclusion only really makes sense if you're spending a lot more time executing than planning queries. Either that means you're preparing queries once and then executing them many many times or you're planning much slower queries where planning time is insignificant compared to the time to execute them. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote: > Yesterday a client and I were sad to discover that the overhead of > constraint exclusion is apparently O(n) in the number of partitions, and > that where we had ~180 partitions each with a simple constraint (check > (field = nnn)) the overhead appeared to amount to about 0.25s on some > quite performant hardware, which is way too high for our application. > Actual execution of the query in question was talking one tenth of that > time. > > For now we're going to work around this by directing the queries > directly to the child tables, although this does involve fairly large > application changes. > > However, I wondered if we couldn't mitigate this by caching the results > of constraint exclusion analysis for a particular table + condition. I > have no idea how hard this would be, but in principle it seems silly to > keep paying the same penalty over and over again. I think the only way forward is to put an index across the constraints, to allow the exclusion time to be O(logN). Currently the constraints are all independent of each other and can even overlap. So we would need a way of * confirming that the partitions are non-overlapping * defining some structure to them, to allow them to be organised in a sequence that allows either a bsearch or an index to exist The latter requires some kind of top-down definition, which hopefully is on the way from Gavin. This can then allow exclusion to take place dynamically within the executor, to allow a form of nested join. My other requirements are noted here... http://wiki.postgresql.org/wiki/Image:Partitioning_Requirements.pdf I'm not working on this at all at the moment. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
* Gregory Stark (stark@enterprisedb.com) wrote: > "Andrew Dunstan" <andrew@dunslane.net> writes: > > > Actual execution of the query in question was talking one tenth of that > > time. > >... > > but in principle it seems silly to keep paying the same penalty over and > > over again. > > I would think constraint_exclusion only really makes sense if you're spending > a lot more time executing than planning queries. Either that means you're > preparing queries once and then executing them many many times or you're > planning much slower queries where planning time is insignificant compared to > the time to execute them. Would it be possible to change the application to use prepared queries? Seems like that'd make more sense the changing it to use the child tables directly.. Just my 2c. Thanks, Stephen
Stephen Frost wrote: > * Gregory Stark (stark@enterprisedb.com) wrote: > >> "Andrew Dunstan" <andrew@dunslane.net> writes: >> >> >>> Actual execution of the query in question was talking one tenth of that >>> time. >>> ... >>> but in principle it seems silly to keep paying the same penalty over and >>> over again. >>> >> I would think constraint_exclusion only really makes sense if you're spending >> a lot more time executing than planning queries. Either that means you're >> preparing queries once and then executing them many many times or you're >> planning much slower queries where planning time is insignificant compared to >> the time to execute them. >> > > Would it be possible to change the application to use prepared queries? > Seems like that'd make more sense the changing it to use the child > tables directly.. Just my 2c. > > > This is actually a technique already used elsewhere in the app, so it will fit quite well. Thanks for the suggestion, though. (BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's more) ?) cheers andrew
* Andrew Dunstan (andrew@dunslane.net) wrote: >> Seems like that'd make more sense the changing it to use the child >> tables directly.. Just my 2c. > > This is actually a technique already used elsewhere in the app, so it > will fit quite well. Thanks for the suggestion, though. Sure. > (BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's > more) ?) I'm amazed at the number of people who ask me this.. Guess it's just different for different communities. Basically, I like to keep my mail in the different folders it belongs in, so I'd rather get responses to my emails through the list than directly to me. Additionally, I don't really need to get two copies of every email sent to me on a mailing list. It's actually really frowned upon in the Debian community to not respect MFT and it's common to have it set to just the mailing list. More information about it: http://cr.yp.to/proto/replyto.html Enjoy, Stephen
"Stephen Frost" <sfrost@snowman.net> writes: > I'd rather get responses to my emails through the list than directly to me. > Additionally, I don't really need to get two copies of every email sent to > me on a mailing list. Then doesn't setting it to: Andrew Dunstan <andrew@dunslane.net>,PostgreSQL-development <pgsql-hackers@postgresql.org> do precisely the opposite of what you would want? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Stephen Frost wrote: > >> (BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's >> more) ?) >> > > I'm amazed at the number of people who ask me this.. Guess it's just > different for different communities. Basically, I like to keep my mail > in the different folders it belongs in, so I'd rather get responses to > my emails through the list than directly to me. Additionally, I don't > really need to get two copies of every email sent to me on a mailing > list. > I am amazed that you don't see that what your MUA is doing is actually both wrong and that it inconveniences people. For example, because it put *my* address in the list for your message above, it caused my MUA quite correctly to add a To: line to myself, which I certainly didn't want to do. And it's completely unnecessary. For example, I have set my majordomo preferences for the postgresql.org lists not to send me copies of emails where I am also in the To: or Cc: lines. After doing that I get no duplicates. And I don't casue anyone else to have to edit the addresses when they reply to my mail. If you want to ensure that you reply to a list, use an MUA that has a reply-to-list command - I see you use mutt, which has such a command IIRC. cheers andrew
Andrew, * Andrew Dunstan (andrew@dunslane.net) wrote: > For example, because it put *my* address in the list for your message > above, it caused my MUA quite correctly to add a To: line to myself, > which I certainly didn't want to do. Honestly, I suspect thunderbird just doesn't know your addresses if it's adding your address back in. Adding your address isn't for you- it's for other people. The, completely reasonable, assumption is that if your address was included in a To or Cc that you're not on the list and stripping that out would mean you'd be left out. > And it's completely unnecessary. For example, I have set my majordomo > preferences for the postgresql.org lists not to send me copies of emails > where I am also in the To: or Cc: lines. After doing that I get no > duplicates. This doesn't help at all, actually. As I pointed out previously, I *want* the mail through the list, what I *don't* want is people sending list mail directly to me. > And I don't casue anyone else to have to edit the addresses when they > reply to my mail. Are you sure thunderbird recognizes the email address you use for posting as a local identity/account? Mutt has a specific 'alternates' configuration to let it know what addresses are local. > If you want to ensure that you reply to a list, use an MUA that has a > reply-to-list command - I see you use mutt, which has such a command > IIRC. Indeed, and it's exactly what I use when replying to list mail. The issue isn't making sure that *I* reply to a list, it's asking other people to reply through the list rather than to me. Thanks, Stephen
Stephen Frost wrote: > Andrew, > > * Andrew Dunstan (andrew@dunslane.net) wrote: > >> For example, because it put *my* address in the list for your message >> above, it caused my MUA quite correctly to add a To: line to myself, >> which I certainly didn't want to do. >> > > Honestly, I suspect thunderbird just doesn't know your addresses if > it's adding your address back in. Adding your address isn't for you- > it's for other people. The, completely reasonable, assumption is that > if your address was included in a To or Cc that you're not on the list > and stripping that out would mean you'd be left out. > > >> And it's completely unnecessary. For example, I have set my majordomo >> preferences for the postgresql.org lists not to send me copies of emails >> where I am also in the To: or Cc: lines. After doing that I get no >> duplicates. >> > > This doesn't help at all, actually. As I pointed out previously, I > *want* the mail through the list, what I *don't* want is people sending > list mail directly to me. > > >> And I don't casue anyone else to have to edit the addresses when they >> reply to my mail. >> > > Are you sure thunderbird recognizes the email address you use for > posting as a local identity/account? Mutt has a specific 'alternates' > configuration to let it know what addresses are local. > > >> If you want to ensure that you reply to a list, use an MUA that has a >> reply-to-list command - I see you use mutt, which has such a command >> IIRC. >> > > Indeed, and it's exactly what I use when replying to list mail. The > issue isn't making sure that *I* reply to a list, it's asking other > people to reply through the list rather than to me. > > > a. I don't use Thunderbird. b. Of couse the MUA knows what my address is. c. Yours are pretty much the *only* settings of all the users of this list that cause me issues. Judging by your own words I am not alone in being thus inconvenienced (otherwise, why would "an amazing number" of people ask you about it?). If you don't care about that then there's nothing much I can do. Alvaro used to have a similar setup. When I complained he very kindly fixed it. d. Your "completely reasonable" assumption above is, of course, bogus. Most people when replying to a list reply to all adresses. Assuming that the non-list addresses are for people not on the list is nonsense. cheers andrew
Stephen Frost wrote: > > And it's completely unnecessary. For example, I have set my majordomo > > preferences for the postgresql.org lists not to send me copies of emails > > where I am also in the To: or Cc: lines. After doing that I get no > > duplicates. > > This doesn't help at all, actually. As I pointed out previously, I > *want* the mail through the list, what I *don't* want is people sending > list mail directly to me. Wouldn't it make sense, then, to filter any email which is Cc'ed to a list, into that list's folder? Add to that a bit of duplicate removal (say, procmail's, or whatever you use) and you're set. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.