Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200408155703.evmgiyjeffyhklui@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote:
>On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
>> >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
>> >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
>> >><tomas.vondra@2ndquadrant.com> wrote:
>> >>>
>> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
>> >>>>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>> >>>>>hyrax is not too happy with this test:
>> >>>>>
>> >>>>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
>> >>>>>
>> >>>>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>> >>>>>EXPLAIN output, but it evidently is.
>> >>>>>
>> >>>>
>> >>>>Thanks, I'll investigate. It's not clear to me either what might be
>> >>>>causing this, but I guess something must have gone wrong in
>> >>>>estimation/planning.
>> >>>>
>> >>>
>> >>>OK, I know what's going on - it's a rather embarassing issue in the
>> >>>regression test. There's no analyze on the test tables, so it uses
>> >>>default estimates for number of groups etc. But with clobber cache the
>> >>>test runs long enough for autoanalyze to kick in and collect stats, so
>> >>>we generate better estimates which changes the plan.
>> >>>
>> >>>I'll get this fixed - explicit analyze and tweaking the data a bit
>> >>>should do the trick.
>> >>
>> >>Looking at the tests that failed, I think we should consider just adding:
>> >>set enable_sort = off;
>> >>because several of those tests have very specific amounts of data to
>> >>ensure we test the transition points around the different modes in the
>> >>incremental sort node.
>> >>
>> >
>> >Maybe, but I'd much rather tweak the data so that we test both the
>> >costing and execution part.
>> >
>>
>> I do think this does the trick by increasing the number of rows a bit
>> (from 100 to 1000) to make the Sort more expensive than Incremental
>> Sort, while still testing the transition points.
>>
>> James, can you verify it that's still true?
>
>Those changes all look good to me from a "testing correctness" POV.
>Also I like that we now test multiple sort methods in the explain
>output, like: "Sort Methods: top-N heapsort, quicksort".
>

OK, good. I'll push the fix.

>I personally find the `i/100` notation harder to read than a case, but
>that's just an opinion...
>

Yeah, but with 1000 rows we'd need a more complex CASE statement (I
don't think simply having two groups - small+large would work).

>Should we change `analyze` to `analyze t` to avoid unnecessarily
>re-analyzing all other tables in the regression db?
>

Ah, definitely. That was a mistake. Thanks for noticing.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Function to track shmem reinit time
Next
From: Robert Haas
Date:
Subject: Re: Vacuum o/p with (full 1, parallel 0) option throwing an error