Re: [HACKERS] Runtime Partition Pruning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Runtime Partition Pruning |
Date | |
Msg-id | CA+TgmoZHYoAL4HYwnGO25B8CxCB+vNMdf+7rbUzYykR4sU9yUA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Runtime Partition Pruning (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Runtime Partition Pruning
|
List | pgsql-hackers |
On Thu, Dec 21, 2017 at 8:37 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> No, I don't think comparing to previous custom plans is a workable >> approach. I was thinking, rather, that if we know for example that >> we've doing pruning on partition_column = $1, then we know that only >> one partition will match. That's probably a common case. If we've >> got partition_column > $1, we could assume that, say, 75% of the >> partitions would match. partition_column BETWEEN $1 and $2 is >> probably a bit more selective, so maybe we assume 50% of the >> partitions would match. > > Okay. Do you think this is something we need to solve for this patch? > When I complained originally I didn't quite see any way to even test > the majority of this patch with the regression tests, but Beena has > since proven me wrong about that. Although I have done one round of view of this patch, I haven't really got my head around it completely yet and I haven't spent of time on it yet, so my opinions are not as well-formed as maybe they should be. I'm glad, by the way, that you are putting some effort into it, as I think that will help move this forward more quickly. At a high level, I want to avoid trying to solve too many problems in one patch (which was the motivation behind my comment near the top of the thread), but I also want to end up with something useful (which I think is your concern). Leaving aside the difficulty of implementation, I have some questions about what the right thing to do actually is. In a simple case, I'm guessing that the cost of creating a custom plan will exceed the amount that the plan saves, but in more complex cases, I'm not sure that will be true. For instance, if we know the particular parameter value at plan time, we can make a more accurate estimate of how many times that value appears, which can then feed into our choice of what plan shape to use. That is, for a query like SELECT * FROM a JOIN b ON a.x = b.x WHERE a.y = $1, the generic plan might choose, say, a nested loop with b on the inner side, but if we know that a particular value for $1 will match a lot of rows in a, we might prefer a hash or merge join for that specific case. Run-time pruning doesn't give us that flexibility. My intuition is that the more complex we make the query, the more point there will be to making custom plans, and the simpler the query, the more likely it is that a generic plan will be good enough that it's not worth replanning every time. Now, in my experience, the current system for custom plans vs. generic plans doesn't approach the problem in this way at all, and in my experience that results in some pretty terrible behavior. It will do things like form a custom plan every time because the estimated cost of the custom plan is lower than the estimated cost of the generic plan even though the two plans are structurally identical; only the estimates differ. It will waste gobs of CPU cycles by replanning a primary key lookup 5 times just on the off chance that a lookup on the primary key index isn't the best option. But this patch isn't going to fix any of that. The best we can probably do is try to adjust the costing for Append paths in some way that reflects the costs and benefits of pruning. I'm tentatively in favor of trying to do something modest in that area, but I don't have a detailed proposal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: