Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200328185406.vizs5jndlhgxbh26@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote: >On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> ... >> >> As a side note here, I'm wondering if this (determining useful pathkeys) >> >> can be made a bit smarter by looking both at query_pathkeys and pathkeys >> >> useful for merging, similarly to what truncate_useless_pathkeys() does. >> >> But that can be seen as an improvement of what we do now. >> > >> >Unless your comment below about looking at truncate_useless_pathkeys >> >is implying you're considering aiming to get this in now, I wonder if >> >we should just expand the comment to reference pathkeys useful for >> >merging as a possible future extension. >> > >> >> Maybe. I've been thinking about how to generate those path keys, but >> it's a bit tricky, because pathkeys_useful_for_merging() - the thing >> that backs truncate_useless_pathkeys() actually gets pathkeys and merely >> verifies if those are useful for merging. But this code needs to do the >> opposite - generate those pathkeys. >> >> But let's say we know we have a join on columns (a,b,c). For each of >> those PathKey values we know it's useful for merging, but should we >> generate pathkeys (a,b,c), (b,c,a), ... or any other permutation? I >> suppose we can look at pathkeys for existing paths of the relation to >> prune the possibilities a bit, but what then? > >I'm not convinced it's worth it this time around. Both because of the >late hour in the CF etc., but also because it seems likely to become >pretty complex quickly, and also far more likely to raise performance >questions in planning if there's not a lot of work done to keep it >limited. > Agreed. There'll be time to add more optimizations in the future. >> BTW I wonder if we actually need the ec_has_volatile check in >> get_useful_pathkeys_for_relation. The comment says we can't but is it >> really true? pathkeys_useful_for_ordering doesn't do any such checks, so >> I'd bet this is merely an unnecessary copy-paste from postgres_fdw. >> Opinions? > >I hadn't really looked at that part in depth before, but after reading >it over, re-reading the definition of volatility, and running a few >basic queries, I agree. > >For example: we already do allow volatile pathkeys in a simple query like: >-- t(a, b) with index on (a) >select * from t order by a, random(); > >It makes sense that you couldn't push down such a sort to a foreign >server, given there's no constraints said function isn't operating >directly on the primary database (in the fdw case). But that obviously >wouldn't apply here. > Thanks for confirming my reasoning. >> >> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation: >> >> >* Considering query_pathkeys is always worth it, because it might let us >> >> >* avoid a local sort. >> >> > >> >> >That originally was a copy from the fdw code, but since the two >> >> >functions have diverged (Is that concerning? I could be confusing, but >> >> >isn't a compilation problem) I didn't move the function. >> >> > >> >> >> >> I think it's OK the two functions diverged, it's simply because the FDW >> >> one needs to check other things too. But I might rework this once I look >> >> closer at truncate_useless_pathkeys. >> > >> >Agreed, for now at least. It's tempting to think they should always be >> >shared, but I'm not convinced (without a lot more digging) that this >> >represents structural rather than incidental duplication. >> > >> >> The more I look at pathkeys_useful_for_ordering() the more I think the >> get_useful_pathkeys_for_relation() function should look more like it >> than the postgres_fdw one ... > >If we go down that path (and indeed this is actually implied by >removing the volatile check too), doesn't that really just mean (at >least for now) that get_useful_pathkeys_for_relation goes away >entirely and we effectively use root->query_pathkeys directly? Yes, basically. >The only thing your lose them is the check that each eclass has a >member in the rel. But that check probably wasn't quite right anyway: >at least for incremental sort (maybe not regular sort), I think we >probably don't necessarily care that the entire list has members in the >rel, but rather that some prefix does, right? Probably, although I always forget how exactly this EC business works. My reasoning is more "If pathkeys_useful_for_ordering does not need that, why should we need it here?" >The idea there would be that we could create a gather merge here that >supplies a partial ordering (that prefix of query_pathkeys) and then >above it the planner might place another incremental sort (say above a >join), or perhaps even a join above that would actually be sufficient >(since many joins are capable of providing an ordering anyway). > Not sure. Isn't the idea that we do the Incremental Sort below the Gather Merge, because then it's actually done in parallel? >I've attached (added prefix .txt to avoid the cfbot assuming this is a >full series) an idea in that direction to see if we're thinking along >the same route. You'll want to apply and view with `git diff -w` >probably. > >The attached also adds a few XXX comments. In particular, I wonder if >we should verify that some prefix of the root->query_pathkeys is >actually made up of eclass members for the current rel, because >otherwise I think we can skip the loop on the subpaths entirely. > Hmmm, that's an interesting possible optimization. I wonder if that actually saves anything, though, because the number of paths in the loop is likely fairly low. >> >> >I did notice though that find_em_expr_for_rel() is wholesale >> >> >copied (and unchanged) from the fdw code, so I moved it to >> >> >equivclass.c so both places can share it. >> >> > >> >> >> >> +1 >> >> >> >> ... which would also get rid of find_em_expr_for_rel(). > >... which, I think, would retain the need for find_em_expr_for_rel(). > >Note: The attached applied to the previous series compiles and runs >make check...but I haven't really tested it; it's meant more for "is >this the direction we want to go". > Thanks, I'll take a look. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: