Re: sqlsmith crash incremental sort - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: sqlsmith crash incremental sort
Date
Msg-id 20200416231300.5mkhjzs6ki4yntto@development
Whole thread Raw
In response to Re: sqlsmith crash incremental sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: sqlsmith crash incremental sort
List pgsql-hackers
On Thu, Apr 16, 2020 at 08:44:16PM +0200, Tomas Vondra wrote:
>On Thu, Apr 16, 2020 at 12:04:03PM -0400, James Coleman wrote:
>>On Thu, Apr 16, 2020 at 8:22 AM Richard Guo <guofenglinux@gmail.com> wrote:
>>>
>>>
>>>On Thu, Apr 16, 2020 at 6:35 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>>>
>>>>
>>>>On Wed, Apr 15, 2020 at 10:47 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>>>>
>>>>>
>>>>>Well, yeah. The problem is the Unique simply compares the columns in the
>>>>>order it sees them, and it does not match the column order desired by
>>>>>incremental sort. But we don't push down this information at all :-(
>>>>
>>>>
>>>>This is a nice optimization better to have. Since the 'Sort and Unique'
>>>>would unique-ify the result of a UNION by sorting on all columns, why
>>>>not we adjust the sort order trying to match parse->sortClause so that
>>>>we can avoid the final sort node?
>>>>
>>>>Doing that we can transform plan from:
>>>>
>>>># explain (costs off) select * from foo union select * from foo order by 1,3;
>>>>                  QUERY PLAN
>>>>-----------------------------------------------
>>>> Incremental Sort
>>>>   Sort Key: foo.a, foo.c
>>>>   Presorted Key: foo.a
>>>>   ->  Unique
>>>>         ->  Sort
>>>>               Sort Key: foo.a, foo.b, foo.c
>>>>               ->  Append
>>>>                     ->  Seq Scan on foo
>>>>                     ->  Seq Scan on foo foo_1
>>>>(9 rows)
>>>>
>>>>To:
>>>>
>>>># explain (costs off) select * from foo union select * from foo order by 1,3;
>>>>               QUERY PLAN
>>>>-----------------------------------------
>>>> Unique
>>>>   ->  Sort
>>>>         Sort Key: foo.a, foo.c, foo.b
>>>>         ->  Append
>>>>               ->  Seq Scan on foo
>>>>               ->  Seq Scan on foo foo_1
>>>>(6 rows)
>>>>
>>>
>>>Attached is what I'm thinking about this optimization. Does it make any
>>>sense?
>>
>>Shouldn't this go one either a new thread or on the thread for the
>>patch Tomas was referencing (by Teodor I believe)?
>>
>
>FWIW the optimization I had in mind is this:
>
>  https://commitfest.postgresql.org/21/1651/
>
>I now realize that was about GROUP BY, but it's not all that different
>and the concerns will / should be fairly similar, I think.
>
>IMO simply tweaking the sort keys to match the upper parts of the plan
>is probably way too simplistic, I'm afraid. For example, if the Unique
>significantly reduces cardinality, then the cost of the additional sort
>is much less important. It may be much better to optimize the "large"
>sort of the whole data set, either by reordering the columns as proposed
>by Teodor in his patch (by number of distinct values and/or cost of the
>comparison function function).
>
>Furthermore, this is one of the places that is not using incremental
>sort yet - I can easily imagine doing something like this:
>
>
>   Sort
>     -> Unique
>        -> Incremenal Sort
>       -> ...
>
>could be a massive win. So I think we can't just rejigger the sort keys
>abitrarily, we should / need to consider those alternatives.
>
>>Or are you saying you believe this patch guarantees we never see this
>>problem in incremental sort costing?
>>
>
>Yeah, that's not entirely close to me. But maybe it shows us where we to
>get the unprocessed target list?
>

I think at the very least this needs to apply the same change also to
generate_nonunion_paths, because otherwise this fails because of the
same issue:

   set enable_hashagg = off;
   explain select * from foo except select * from foo order by 1, 3;

I'm still of the opinion that this is really an optimization and
behavior change, and I feel rather uneasy about pushing it post feature
freeze without appropriate review. I also think we really ought to
consider how would this work with the other optimizations I outlined
elsewhere in this thread (comparison costs, ...).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Ranier Vilela
Date:
Subject: [PATCH] Fix possible overflow on tuplesort.c