Thread: constraint exclusion analysis caching

constraint exclusion analysis caching

From
Andrew Dunstan
Date:
Yesterday a client and I were sad to discover that the overhead of 
constraint exclusion is apparently O(n) in the number of partitions, and 
that where we had ~180 partitions each with a simple constraint (check 
(field = nnn)) the overhead appeared to amount to about 0.25s on some 
quite performant hardware, which is way too high for our application. 
Actual execution of the query in question was talking one tenth of that 
time.

For now we're going to work around this by directing the queries 
directly to the child tables, although this does involve fairly large 
application changes.

However, I wondered if we couldn't mitigate this by caching the results 
of constraint exclusion analysis for a particular table + condition. I 
have no idea how hard this would be, but in principle it seems silly to 
keep paying the same penalty over and over again.

Thoughts?

cheers

andrew




Re: constraint exclusion analysis caching

From
Csaba Nagy
Date:
On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote:
> However, I wondered if we couldn't mitigate this by caching the results 
> of constraint exclusion analysis for a particular table + condition. I 
> have no idea how hard this would be, but in principle it seems silly to 
> keep paying the same penalty over and over again.

This would be a perfect candidate for the plan-branch based on actual
parameters capability, in association with globally cached plans
mentioned here:

http://archives.postgresql.org/pgsql-hackers/2008-04/msg00920.php

Cheers,
Csaba.




Re: constraint exclusion analysis caching

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Yesterday a client and I were sad to discover that the overhead of 
> constraint exclusion is apparently O(n) in the number of partitions, and 
> that where we had ~180 partitions each with a simple constraint (check 
> (field = nnn)) the overhead appeared to amount to about 0.25s on some 
> quite performant hardware, which is way too high for our application. 

I would think that any sort of formal partitioning feature would fix the
problem, because the planner would understand directly about
partitioning instead of having to prove the correctness of not scanning
each one of the other 179 partitions.  The existing feature is cool in
the sense of obtaining useful behavior from generalized spare parts,
but it was never designed or expected to give great planning speed
with large numbers of partitions.  TFM points out that constraint
exclusion cannot scale beyond perhaps a hundred partitions ...
        regards, tom lane


Re: constraint exclusion analysis caching

From
Gregory Stark
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:

> Actual execution of the query in question was talking one tenth of that
> time.
>...
> but in principle it seems silly to keep paying the same penalty over and
> over again.

I would think constraint_exclusion only really makes sense if you're spending
a lot more time executing than planning queries. Either that means you're
preparing queries once and then executing them many many times or you're
planning much slower queries where planning time is insignificant compared to
the time to execute them.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!
 


Re: constraint exclusion analysis caching

From
Simon Riggs
Date:
On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote:

> Yesterday a client and I were sad to discover that the overhead of 
> constraint exclusion is apparently O(n) in the number of partitions, and 
> that where we had ~180 partitions each with a simple constraint (check 
> (field = nnn)) the overhead appeared to amount to about 0.25s on some 
> quite performant hardware, which is way too high for our application. 
> Actual execution of the query in question was talking one tenth of that 
> time.
> 
> For now we're going to work around this by directing the queries 
> directly to the child tables, although this does involve fairly large 
> application changes.
> 
> However, I wondered if we couldn't mitigate this by caching the results 
> of constraint exclusion analysis for a particular table + condition. I 
> have no idea how hard this would be, but in principle it seems silly to 
> keep paying the same penalty over and over again.

I think the only way forward is to put an index across the constraints,
to allow the exclusion time to be O(logN).

Currently the constraints are all independent of each other and can even
overlap. So we would need a way of

* confirming that the partitions are non-overlapping
* defining some structure to them, to allow them to be organised in a
sequence that allows either a bsearch or an index to exist

The latter requires some kind of top-down definition, which hopefully is
on the way from Gavin.

This can then allow exclusion to take place dynamically within the
executor, to allow a form of nested join.

My other requirements are noted here...
http://wiki.postgresql.org/wiki/Image:Partitioning_Requirements.pdf

I'm not working on this at all at the moment.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: constraint exclusion analysis caching

From
Stephen Frost
Date:
* Gregory Stark (stark@enterprisedb.com) wrote:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
>
> > Actual execution of the query in question was talking one tenth of that
> > time.
> >...
> > but in principle it seems silly to keep paying the same penalty over and
> > over again.
>
> I would think constraint_exclusion only really makes sense if you're spending
> a lot more time executing than planning queries. Either that means you're
> preparing queries once and then executing them many many times or you're
> planning much slower queries where planning time is insignificant compared to
> the time to execute them.

Would it be possible to change the application to use prepared queries?
Seems like that'd make more sense the changing it to use the child
tables directly..  Just my 2c.
Thanks,
    Stephen

Re: constraint exclusion analysis caching

From
Andrew Dunstan
Date:

Stephen Frost wrote:
> * Gregory Stark (stark@enterprisedb.com) wrote:
>   
>> "Andrew Dunstan" <andrew@dunslane.net> writes:
>>
>>     
>>> Actual execution of the query in question was talking one tenth of that
>>> time.
>>> ...
>>> but in principle it seems silly to keep paying the same penalty over and
>>> over again.
>>>       
>> I would think constraint_exclusion only really makes sense if you're spending
>> a lot more time executing than planning queries. Either that means you're
>> preparing queries once and then executing them many many times or you're
>> planning much slower queries where planning time is insignificant compared to
>> the time to execute them.
>>     
>
> Would it be possible to change the application to use prepared queries?
> Seems like that'd make more sense the changing it to use the child
> tables directly..  Just my 2c.
>
>     
>   

This is actually a technique already used elsewhere in the app, so it 
will fit quite well. Thanks for the suggestion, though.

(BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's 
more) ?)

cheers

andrew


Re: constraint exclusion analysis caching

From
Stephen Frost
Date:
* Andrew Dunstan (andrew@dunslane.net) wrote:
>> Seems like that'd make more sense the changing it to use the child
>> tables directly..  Just my 2c.
>
> This is actually a technique already used elsewhere in the app, so it
> will fit quite well. Thanks for the suggestion, though.

Sure.

> (BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's
> more) ?)

I'm amazed at the number of people who ask me this..  Guess it's just
different for different communities.  Basically, I like to keep my mail
in the different folders it belongs in, so I'd rather get responses to
my emails through the list than directly to me.  Additionally, I don't
really need to get two copies of every email sent to me on a mailing
list.

It's actually really frowned upon in the Debian community to not respect
MFT and it's common to have it set to just the mailing list.

More information about it: http://cr.yp.to/proto/replyto.html
Enjoy,
    Stephen

Re: constraint exclusion analysis caching

From
Gregory Stark
Date:
"Stephen Frost" <sfrost@snowman.net> writes:

> I'd rather get responses to my emails through the list than directly to me.
> Additionally, I don't really need to get two copies of every email sent to
> me on a mailing list.

Then doesn't setting it to: Andrew Dunstan <andrew@dunslane.net>,PostgreSQL-development <pgsql-hackers@postgresql.org>

do precisely the opposite of what you would want?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


Re: constraint exclusion analysis caching

From
Andrew Dunstan
Date:

Stephen Frost wrote:
>   
>> (BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's  
>> more) ?)
>>     
>
> I'm amazed at the number of people who ask me this..  Guess it's just
> different for different communities.  Basically, I like to keep my mail
> in the different folders it belongs in, so I'd rather get responses to
> my emails through the list than directly to me.  Additionally, I don't
> really need to get two copies of every email sent to me on a mailing
> list.
>   

I am amazed that you don't see that what your MUA is doing is actually 
both wrong and that it inconveniences people.

For example, because it put *my* address in the list for your message 
above, it caused my MUA quite correctly to add a To: line to myself, 
which I certainly didn't want to do.

And it's completely unnecessary. For example, I have set my majordomo 
preferences for the postgresql.org lists not to send me copies of emails 
where I am also in the To: or Cc: lines. After doing that I get no 
duplicates.

And I don't casue anyone else to have to edit the addresses when they 
reply to my mail.

If you want to ensure that you reply to a list, use an MUA that has a 
reply-to-list command - I see you use mutt, which has such a command IIRC.

cheers

andrew



Re: constraint exclusion analysis caching

From
Stephen Frost
Date:
Andrew,

* Andrew Dunstan (andrew@dunslane.net) wrote:
> For example, because it put *my* address in the list for your message
> above, it caused my MUA quite correctly to add a To: line to myself,
> which I certainly didn't want to do.

Honestly, I suspect thunderbird just doesn't know your addresses if
it's adding your address back in.  Adding your address isn't for you-
it's for other people.  The, completely reasonable, assumption is that
if your address was included in a To or Cc that you're not on the list
and stripping that out would mean you'd be left out.

> And it's completely unnecessary. For example, I have set my majordomo
> preferences for the postgresql.org lists not to send me copies of emails
> where I am also in the To: or Cc: lines. After doing that I get no
> duplicates.

This doesn't help at all, actually.  As I pointed out previously, I
*want* the mail through the list, what I *don't* want is people sending
list mail directly to me.

> And I don't casue anyone else to have to edit the addresses when they
> reply to my mail.

Are you sure thunderbird recognizes the email address you use for
posting as a local identity/account?  Mutt has a specific 'alternates'
configuration to let it know what addresses are local.

> If you want to ensure that you reply to a list, use an MUA that has a
> reply-to-list command - I see you use mutt, which has such a command
> IIRC.

Indeed, and it's exactly what I use when replying to list mail.  The
issue isn't making sure that *I* reply to a list, it's asking other
people to reply through the list rather than to me.
Thanks,
    Stephen

Re: constraint exclusion analysis caching

From
Andrew Dunstan
Date:

Stephen Frost wrote:
> Andrew,
>
> * Andrew Dunstan (andrew@dunslane.net) wrote:
>   
>> For example, because it put *my* address in the list for your message  
>> above, it caused my MUA quite correctly to add a To: line to myself,  
>> which I certainly didn't want to do.
>>     
>
> Honestly, I suspect thunderbird just doesn't know your addresses if
> it's adding your address back in.  Adding your address isn't for you-
> it's for other people.  The, completely reasonable, assumption is that
> if your address was included in a To or Cc that you're not on the list
> and stripping that out would mean you'd be left out.
>
>   
>> And it's completely unnecessary. For example, I have set my majordomo  
>> preferences for the postgresql.org lists not to send me copies of emails  
>> where I am also in the To: or Cc: lines. After doing that I get no  
>> duplicates.
>>     
>
> This doesn't help at all, actually.  As I pointed out previously, I
> *want* the mail through the list, what I *don't* want is people sending
> list mail directly to me.
>
>   
>> And I don't casue anyone else to have to edit the addresses when they  
>> reply to my mail.
>>     
>
> Are you sure thunderbird recognizes the email address you use for
> posting as a local identity/account?  Mutt has a specific 'alternates'
> configuration to let it know what addresses are local.
>
>   
>> If you want to ensure that you reply to a list, use an MUA that has a  
>> reply-to-list command - I see you use mutt, which has such a command 
>> IIRC.
>>     
>
> Indeed, and it's exactly what I use when replying to list mail.  The
> issue isn't making sure that *I* reply to a list, it's asking other
> people to reply through the list rather than to me.
>
>     
>   

a. I don't use Thunderbird.
b. Of couse the MUA knows what my address is.
c. Yours are pretty much the *only* settings of all the users of this 
list that cause me issues. Judging by your own words I am not alone in 
being thus inconvenienced (otherwise, why would "an amazing number" of 
people ask you about it?). If you don't care about that then there's 
nothing much I can do.  Alvaro used to have a similar setup. When I 
complained he very kindly fixed it.
d. Your "completely reasonable" assumption above is, of course, bogus. 
Most people when replying to a list reply to all adresses. Assuming that 
the non-list addresses are for people not on the list is nonsense.

cheers

andrew


Re: constraint exclusion analysis caching

From
Alvaro Herrera
Date:
Stephen Frost wrote:

> > And it's completely unnecessary. For example, I have set my majordomo  
> > preferences for the postgresql.org lists not to send me copies of emails  
> > where I am also in the To: or Cc: lines. After doing that I get no  
> > duplicates.
> 
> This doesn't help at all, actually.  As I pointed out previously, I
> *want* the mail through the list, what I *don't* want is people sending
> list mail directly to me.

Wouldn't it make sense, then, to filter any email which is Cc'ed to a
list, into that list's folder?  Add to that a bit of duplicate removal
(say, procmail's, or whatever you use) and you're set.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.