Re: index scan with index cond on first column doesn't recognize sort order of second column - Mailing list pgsql-general

From Greg Stark
Subject Re: index scan with index cond on first column doesn't recognize sort order of second column
Date
Msg-id 877kc3jxsb.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: index scan with index cond on first column doesn't recognize sort order of second column  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: index scan with index cond on first column doesn't recognize sort order of second column  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: index scan with index cond on first column doesn't recognize sort order of second column  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > But it might be better ordered. I think you may have missed the original
> > context. In a query like:
> >   SELECT * FROM table_a join table_b using (col2) WHERE a.col1 = $0
> > Where there's an index on table_a(col1,col2) could use a merge join without a
> > sort.
>
> Yes.  The appropriate way to recognize that (without breaking things)
> is to consider that the indexscan generates pathkeys (col2) because col1
> can be assumed constant.  This is not truncate_useless_pathkeys's
> business, however.  It would have to be done in the code that creates
> the indexscan path to begin with.

Er, right, I think that's the code I originally quoted from indxpath.c:

        /*
         * 2. Match the index against non-'or' restriction clauses.
         */
        restrictclauses = group_clauses_by_indexkey(rel, index);

        /*
         * 3. Compute pathkeys describing index's ordering, if any, then
         * see how many of them are actually useful for this query.
         */
        index_pathkeys = build_index_pathkeys(root, rel, index,
                                              ForwardScanDirection);
        index_is_ordered = (index_pathkeys != NIL);
        useful_pathkeys = truncate_useless_pathkeys(root, rel,
                                                    index_pathkeys);


> (Ideally, we'd mark the path to show that it could be considered to have
> either sort ordering (col1, col2) or sort ordering (col2), so that it could
> match to either requirement. I'm not sure if that's feasible with the
> current datastructures though. Might have to make two copies of the path
> :-()

I don't think that's a problem because the optimizer seems to be doing a
pretty good job of constant propagation and it tracks down the transitive
closure of all the equijoin clauses. So if we have a col1=$0 clause for one
table then any other tables that are equijoined on col1 will also have a
col1=$0 clause. I'm hoping that's been done already, I'm not sure.

> Actually you are barking up the wrong tree entirely; I'm pretty certain
> that truncate_useless_pathkeys *doesn't* remove this pathkey, because it
> should notice that it is relevant to the mergejoin condition.

<tries gdb out on postgres> Hm... <rebuilds postgres with -O0> Hm...

No, truncate_useless_pathkeys actually returns NULL. It appears it decides the
pathkey is actually completely useless for the mergejoin and both
pathkeys_useful_for_merging and pathkeys_useful_for_ordering return 0. I guess
ltruncate returns NULL if the first argument is 0.

I think indxpath.c has to try truncate_useless_pathkeys and if it returns null
try removing any leading elements that it has an = clause for. It should try
truncate_useless_pathkeys after removing each element. I think it can just
stop after finding the first match.

Basic postgres mm question, is there a cdr function, and is it safe to leave
references to the cdr of this list around? It all gets cleaned up with a
ripcord deallocator?

--
greg

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: index scan with index cond on first column doesn't recognize sort order of second column
Next
From: Tom Lane
Date:
Subject: Re: index scan with index cond on first column doesn't recognize sort order of second column