Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant - Mailing list pgsql-hackers

From David Rowley
Subject Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Date
Msg-id CAApHDvpZRWiWE8N8BW+cYndYuWTZ15wrg941mRYNnSkMBWBu7Q@mail.gmail.com
Whole thread Raw
In response to Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Thu, 13 Oct 2022 at 00:30, Richard Guo <guofenglinux@gmail.com> wrote:
> I also have concerns about the 2 Limit nodes pointed by the comment
> inside the patch. Maybe we can check with limit_needed() and manually
> add the limit node only if there is no LIMIT clause in the origin query?

I wasn't hugely concerned about this. I think we're a little limited
to what we can actually do about it too.

It seems easy enough to skip adding the LimitPath in
create_final_distinct_paths() if the existing query already has
exactly LIMIT 1. However, if they've written LIMIT 10 or LIMIT
random()*1234 then we must add the LimitPath to ensure we only get 1
row rather than 10 or some random number.

As for getting rid of the LIMIT 10 / LIMIT random()*1234, we store the
LIMIT clause information in the parse and currently that's what the
planner uses when creating the LimitPath for the LIMIT clause.  I'd
quite like to avoid making any adjustments to the parse fields here.
(There's a general project desire to move away from the planner
modifying the parse. If we didn't do that we could do things like
re-plan queries with stronger optimization levels when they come out
too costly.)

We could do something like set some bool flag in PlannerInfo to tell
the planner not to bother adding the final LimitPath as we've already
added another which does the job, but is it really worth adding that
complexity for this patch? You already mentioned that "this patch is
very straightforward". I don't think it would be if we added code to
avoid the LimitPath duplication.

David



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: problems with making relfilenodes 56-bits
Next
From: David Rowley
Date:
Subject: Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant