Home > mailing lists

Re: sqlsmith crash incremental sort - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: sqlsmith crash incremental sort
Date	April 16, 2020 18:44:16
Msg-id	20200416184416.vcmmy22zfmov54kl@development Whole thread
In response to	Re: sqlsmith crash incremental sort (James Coleman <jtc331@gmail.com>)
Responses	Re: sqlsmith crash incremental sort Re: sqlsmith crash incremental sort
List	pgsql-hackers

Tree view

On Thu, Apr 16, 2020 at 12:04:03PM -0400, James Coleman wrote:
>On Thu, Apr 16, 2020 at 8:22 AM Richard Guo <guofenglinux@gmail.com> wrote:
>>
>>
>> On Thu, Apr 16, 2020 at 6:35 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>>
>>>
>>> On Wed, Apr 15, 2020 at 10:47 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>>>
>>>>
>>>> Well, yeah. The problem is the Unique simply compares the columns in the
>>>> order it sees them, and it does not match the column order desired by
>>>> incremental sort. But we don't push down this information at all :-(
>>>
>>>
>>> This is a nice optimization better to have. Since the 'Sort and Unique'
>>> would unique-ify the result of a UNION by sorting on all columns, why
>>> not we adjust the sort order trying to match parse->sortClause so that
>>> we can avoid the final sort node?
>>>
>>> Doing that we can transform plan from:
>>>
>>> # explain (costs off) select * from foo union select * from foo order by 1,3;
>>>                   QUERY PLAN
>>> -----------------------------------------------
>>>  Incremental Sort
>>>    Sort Key: foo.a, foo.c
>>>    Presorted Key: foo.a
>>>    ->  Unique
>>>          ->  Sort
>>>                Sort Key: foo.a, foo.b, foo.c
>>>                ->  Append
>>>                      ->  Seq Scan on foo
>>>                      ->  Seq Scan on foo foo_1
>>> (9 rows)
>>>
>>> To:
>>>
>>> # explain (costs off) select * from foo union select * from foo order by 1,3;
>>>                QUERY PLAN
>>> -----------------------------------------
>>>  Unique
>>>    ->  Sort
>>>          Sort Key: foo.a, foo.c, foo.b
>>>          ->  Append
>>>                ->  Seq Scan on foo
>>>                ->  Seq Scan on foo foo_1
>>> (6 rows)
>>>
>>
>> Attached is what I'm thinking about this optimization. Does it make any
>> sense?
>
>Shouldn't this go one either a new thread or on the thread for the
>patch Tomas was referencing (by Teodor I believe)?
>

FWIW the optimization I had in mind is this:

   https://commitfest.postgresql.org/21/1651/

I now realize that was about GROUP BY, but it's not all that different
and the concerns will / should be fairly similar, I think.

IMO simply tweaking the sort keys to match the upper parts of the plan
is probably way too simplistic, I'm afraid. For example, if the Unique
significantly reduces cardinality, then the cost of the additional sort
is much less important. It may be much better to optimize the "large"
sort of the whole data set, either by reordering the columns as proposed
by Teodor in his patch (by number of distinct values and/or cost of the
comparison function function).

Furthermore, this is one of the places that is not using incremental
sort yet - I can easily imagine doing something like this:

    Sort
      -> Unique
         -> Incremenal Sort
       -> ...

could be a massive win. So I think we can't just rejigger the sort keys
abitrarily, we should / need to consider those alternatives.

>Or are you saying you believe this patch guarantees we never see this
>problem in incremental sort costing?
>

Yeah, that's not entirely close to me. But maybe it shows us where we to
get the unprocessed target list?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Andres Freund
Date: 16 April 2020, 18:27:14
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false

From: Tom Lane
Date: 16 April 2020, 18:47:00
Subject: Re: "cache reference leak" issue happened when using sepgsql module

Re: sqlsmith crash incremental sort - Mailing list pgsql-hackers

Previous

Next