Re: sqlsmith crash incremental sort - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: sqlsmith crash incremental sort
Date
Msg-id 20200416184416.vcmmy22zfmov54kl@development
Whole thread Raw
In response to Re: sqlsmith crash incremental sort  (James Coleman <jtc331@gmail.com>)
Responses Re: sqlsmith crash incremental sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: sqlsmith crash incremental sort  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Thu, Apr 16, 2020 at 12:04:03PM -0400, James Coleman wrote:
>On Thu, Apr 16, 2020 at 8:22 AM Richard Guo <guofenglinux@gmail.com> wrote:
>>
>>
>> On Thu, Apr 16, 2020 at 6:35 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>>
>>>
>>> On Wed, Apr 15, 2020 at 10:47 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>>>
>>>>
>>>> Well, yeah. The problem is the Unique simply compares the columns in the
>>>> order it sees them, and it does not match the column order desired by
>>>> incremental sort. But we don't push down this information at all :-(
>>>
>>>
>>> This is a nice optimization better to have. Since the 'Sort and Unique'
>>> would unique-ify the result of a UNION by sorting on all columns, why
>>> not we adjust the sort order trying to match parse->sortClause so that
>>> we can avoid the final sort node?
>>>
>>> Doing that we can transform plan from:
>>>
>>> # explain (costs off) select * from foo union select * from foo order by 1,3;
>>>                   QUERY PLAN
>>> -----------------------------------------------
>>>  Incremental Sort
>>>    Sort Key: foo.a, foo.c
>>>    Presorted Key: foo.a
>>>    ->  Unique
>>>          ->  Sort
>>>                Sort Key: foo.a, foo.b, foo.c
>>>                ->  Append
>>>                      ->  Seq Scan on foo
>>>                      ->  Seq Scan on foo foo_1
>>> (9 rows)
>>>
>>> To:
>>>
>>> # explain (costs off) select * from foo union select * from foo order by 1,3;
>>>                QUERY PLAN
>>> -----------------------------------------
>>>  Unique
>>>    ->  Sort
>>>          Sort Key: foo.a, foo.c, foo.b
>>>          ->  Append
>>>                ->  Seq Scan on foo
>>>                ->  Seq Scan on foo foo_1
>>> (6 rows)
>>>
>>
>> Attached is what I'm thinking about this optimization. Does it make any
>> sense?
>
>Shouldn't this go one either a new thread or on the thread for the
>patch Tomas was referencing (by Teodor I believe)?
>

FWIW the optimization I had in mind is this:

   https://commitfest.postgresql.org/21/1651/

I now realize that was about GROUP BY, but it's not all that different
and the concerns will / should be fairly similar, I think.

IMO simply tweaking the sort keys to match the upper parts of the plan
is probably way too simplistic, I'm afraid. For example, if the Unique
significantly reduces cardinality, then the cost of the additional sort
is much less important. It may be much better to optimize the "large"
sort of the whole data set, either by reordering the columns as proposed
by Teodor in his patch (by number of distinct values and/or cost of the
comparison function function).

Furthermore, this is one of the places that is not using incremental
sort yet - I can easily imagine doing something like this:


    Sort
      -> Unique
         -> Incremenal Sort
       -> ...

could be a massive win. So I think we can't just rejigger the sort keys
abitrarily, we should / need to consider those alternatives.

>Or are you saying you believe this patch guarantees we never see this
>problem in incremental sort costing?
>

Yeah, that's not entirely close to me. But maybe it shows us where we to
get the unprocessed target list?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Next
From: Tom Lane
Date:
Subject: Re: "cache reference leak" issue happened when using sepgsql module