Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe-cn3iBMX+6WD_fFOwxaZYCa7vuKpi8aqddOmoB54EBHw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Incremental sort
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
> >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
> >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
> >><tomas.vondra@2ndquadrant.com> wrote:
> >>>
> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> >>>>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >>>>>hyrax is not too happy with this test:
> >>>>>
> >>>>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
> >>>>>
> >>>>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >>>>>EXPLAIN output, but it evidently is.
> >>>>>
> >>>>
> >>>>Thanks, I'll investigate. It's not clear to me either what might be
> >>>>causing this, but I guess something must have gone wrong in
> >>>>estimation/planning.
> >>>>
> >>>
> >>>OK, I know what's going on - it's a rather embarassing issue in the
> >>>regression test. There's no analyze on the test tables, so it uses
> >>>default estimates for number of groups etc. But with clobber cache the
> >>>test runs long enough for autoanalyze to kick in and collect stats, so
> >>>we generate better estimates which changes the plan.
> >>>
> >>>I'll get this fixed - explicit analyze and tweaking the data a bit
> >>>should do the trick.
> >>
> >>Looking at the tests that failed, I think we should consider just adding:
> >>set enable_sort = off;
> >>because several of those tests have very specific amounts of data to
> >>ensure we test the transition points around the different modes in the
> >>incremental sort node.
> >>
> >
> >Maybe, but I'd much rather tweak the data so that we test both the
> >costing and execution part.
> >
>
> I do think this does the trick by increasing the number of rows a bit
> (from 100 to 1000) to make the Sort more expensive than Incremental
> Sort, while still testing the transition points.
>
> James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

James



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
Next
From: David Steele
Date:
Subject: Re: Conflict handling for COPY FROM