Re: Top-N sorts verses parallelism - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Top-N sorts verses parallelism
Date
Msg-id CAMkU=1xS2ucXoFcHsw8=joUi5wz+07AarR8ScdtQNw+yeBHi4g@mail.gmail.com
Whole thread Raw
In response to Re: Top-N sorts verses parallelism  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Top-N sorts verses parallelism  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Dec 14, 2017 at 5:12 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> This looks like a costing bug.  The estimated cost of sorting 416,667
> estimated tuples in one parallel worker is almost identical to the estimated
> cost of sorting 1,000,000 tuples when parallelism is turned off.  Adding
> some logging to the cost_sort function, it looks like the limit does not get
> sent down for the parallel estimate:
>
> NOTICE:  JJ cost_sort tuples 1000000.000000, limit 61.000000, sort_mem 65536
> NOTICE:  JJ cost_sort tuples 416667.000000, limit -1.000000, sort_mem 65536
>
> So the LIMIT gets passed down for the execution step, but not for the
> planning step.

Oh, well spotted.  I was looking in the wrong place.  Maybe the fix is
as simple as the attached?  It certainly helps in the cases I tested,
including with wider tables.

I had hit on the same change.  And was also surprised that it was located where it was.  With the change, it uses the parallel plan all the way down to LIMIT 1.

With the patch, it still satisfies make check, so if it introduces errors they are subtle ones.  If we can't actually do this and it needs to stay -1, then I think we need a comment to explain why.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: Reproducible builds: genbki.pl vs schemapg.h
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] replace GrantObjectType with ObjectType