Thread: FETCH FIRST clause WITH TIES option

FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:

hello ,

The WITH TIES keyword is sql standard that specifies any peers of retained rows

to retained in the result set too .which means according to ordering key the result set can includes additional rows which have ties on the last position, if there are any and It work with ORDER BY query

The attach patch is a finished implementation of it except it did not have a regression test because am not sure of changing the test data set for it and I can’t find fetch first clause regression test too.

Regards

Surafel

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
Hello Surafel,

On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
> hello ,
> 
> The WITH TIES keyword is sql standard that specifies any peers of 
> retained rows to retained in the result set too .which means 
> according to ordering key the result set can includes additional rows
> which have ties on the last position, if there are any and It work
> with ORDER BY query.
>

Thanks for the patch. I've looked at it today, and it seems mostly OK,
with a couple of minor issues. Most of it is code formatting and comment
wording issues, so I'm not going to go through them here - see the
attached 0002 patch (0001 is your patch, rebased to current master).

There's one place that I think is incorrect, and that's this bit from
ExecLimit:

    }else
    /*
     * Get next tuple from subplan, if any.
     */
    slot = ExecProcNode(outerPlan);
    if (TupIsNull(slot))
    {
        node->lstate = LIMIT_SUBPLANEOF;
        return NULL;
    }
    if (node->limitOption == WITH_TIES)
    {
        ExecCopySlot(node->last_slot, slot);
    }
    node->subSlot = slot;
    node->position++;

Note that the "else" guards only the very first line, not the whole
block. This seems wrong, i.e. there should be {} around the whole block.

I'm also suggesting to slightly change the create_limit_plan(), to keep
a single make_limit call. IMHO that makes it easier to understand,
although I admit it's fairly subjective.

One question is whether to make this work for LIMIT too, not just for
FETCH FIRST. I'm not sure how difficult would that be (it should be a
grammar-only tweak I guess), or if it's a good idea in general. But I
find it quite confusing that various comments say things like this:

  LimitOption  limitOption; /* LIMIT with ties or  exact number */

while in fact it does not work with LIMIT.

> 
> The attach patch is a finished implementation of it except it did not
> have a regression test because am not sure of changing the test data set
> for it and I can’t find fetch first clause regression test too.
> 

Well, that's not a very good reason not to have tests for this new
improvement. FWIW there are a couple of places in regression tests where
FETCH FIRST is used, see this:

  $ grep -i 'fetch first' src/test/regress/sql/*
  src/test/regress/sql/hs_standby_allowed.sql:fetch first from hsc;
  src/test/regress/sql/tablesample.sql:FETCH FIRST FROM tablesample_cur;
  src/test/regress/sql/tablesample.sql:FETCH FIRST FROM tablesample_cur;
  src/test/regress/sql/tidscan.sql:FETCH FIRST FROM c;

But those places don't seem like very good match for testing the new
stuff, because those are primarily testing something else. I suggest we
add the new tests into limit.sql.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Andrew Gierth
Date:
>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

 > On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
 >> hello ,
 >> 
 >> The WITH TIES keyword is sql standard that specifies any peers of 
 >> retained rows to retained in the result set too .which means 
 >> according to ordering key the result set can includes additional rows
 >> which have ties on the last position, if there are any and It work
 >> with ORDER BY query.

 Tomas> Thanks for the patch. I've looked at it today, and it seems
 Tomas> mostly OK, with a couple of minor issues. Most of it is code
 Tomas> formatting and comment wording issues, so I'm not going to go
 Tomas> through them here - see the attached 0002 patch (0001 is your
 Tomas> patch, rebased to current master).

I still think that this is the wrong approach. Implementing WITH TIES
and PERCENT together using an implicit window function call kills two
birds with one very small stone (the only executor change needed would
be teaching LIMIT to be able to stop on a boolean condition), with
maximum reuse of existing facilities.

-- 
Andrew (irc:RhodiumToad)


Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On 10/29/2018 04:17 PM, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> 
>   > On 10/26/2018 12:28 PM, Surafel Temesgen wrote:
>   >> hello ,
>   >>
>   >> The WITH TIES keyword is sql standard that specifies any peers of
>   >> retained rows to retained in the result set too .which means
>   >> according to ordering key the result set can includes additional rows
>   >> which have ties on the last position, if there are any and It work
>   >> with ORDER BY query.
> 
>   Tomas> Thanks for the patch. I've looked at it today, and it seems
>   Tomas> mostly OK, with a couple of minor issues. Most of it is code
>   Tomas> formatting and comment wording issues, so I'm not going to go
>   Tomas> through them here - see the attached 0002 patch (0001 is your
>   Tomas> patch, rebased to current master).
> 
> I still think that this is the wrong approach. Implementing WITH TIES
> and PERCENT together using an implicit window function call kills two
> birds with one very small stone (the only executor change needed would
> be teaching LIMIT to be able to stop on a boolean condition), with
> maximum reuse of existing facilities.
> 

Hmmm, maybe. How would that work, exactly? Wouldn't that mean extra 
overhead (the window functions are hardly free) and limitations? Perhaps 
that was discussed in some other thread in the past?

FWIW, I doubt the patch can be much smaller/simpler - a significant part 
of the new stuff is in gram.y and node read/out infrastructure, the 
changes to LIMIT node are fairly minimal.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: FETCH FIRST clause WITH TIES option

From
Andrew Gierth
Date:
>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

 >> I still think that this is the wrong approach. Implementing WITH
 >> TIES and PERCENT together using an implicit window function call
 >> kills two birds with one very small stone (the only executor change
 >> needed would be teaching LIMIT to be able to stop on a boolean
 >> condition), with maximum reuse of existing facilities.

 Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean
 Tomas> extra overhead (the window functions are hardly free) and
 Tomas> limitations?

They're not free, but then this case probably shouldn't be treated as a
particularly hot code path.

The basic idea is this: extend the Limit node to be able to stop on a
boolean expression, such that the first false value ends the result.

Then FETCH FIRST N WITH TIES becomes "stop when the expression
  rank() over (order by ...) <= N
is no longer true" (where the ... is the existing top level order by)

and FETCH FIRST N PERCENT is the same but with percent_rank (or
cume_dist, I'd have to check the exact semantics)

rank() doesn't need to read ahead in the input. percent_rank of course
does, but it's clearly impossible to implement PERCENT without doing
that.

Also, one could consider extending LIMIT to support arbitrary
expressions, with some syntax like LIMIT WHEN (condition) which would be
the general form.

 Tomas> FWIW, I doubt the patch can be much smaller/simpler - a
 Tomas> significant part of the new stuff is in gram.y and node read/out
 Tomas> infrastructure, the changes to LIMIT node are fairly minimal.

It's not a matter of patch size as such, but reuse of mechanisms rather
than adding new ones. Also doing WITH TIES as a special case in the
Limit node doesn't make adding PERCENT any easier at all, whereas the
above gets it basically for free.

-- 
Andrew (irc:RhodiumToad)


Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On 10/29/2018 05:47 PM, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> 
>   >> I still think that this is the wrong approach. Implementing WITH
>   >> TIES and PERCENT together using an implicit window function call
>   >> kills two birds with one very small stone (the only executor change
>   >> needed would be teaching LIMIT to be able to stop on a boolean
>   >> condition), with maximum reuse of existing facilities.
> 
>   Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean
>   Tomas> extra overhead (the window functions are hardly free) and
>   Tomas> limitations?
> 
> They're not free, but then this case probably shouldn't be treated as a
> particularly hot code path.
> 
> The basic idea is this: extend the Limit node to be able to stop on a
> boolean expression, such that the first false value ends the result.
> 
> Then FETCH FIRST N WITH TIES becomes "stop when the expression
>    rank() over (order by ...) <= N
> is no longer true" (where the ... is the existing top level order by)
> 
> and FETCH FIRST N PERCENT is the same but with percent_rank (or
> cume_dist, I'd have to check the exact semantics)
> 
> rank() doesn't need to read ahead in the input. percent_rank of course
> does, but it's clearly impossible to implement PERCENT without doing
> that.
> 

OK. I was under the impression the window function would need to see all 
the input rows, but that seems not to be the case in general.

> Also, one could consider extending LIMIT to support arbitrary
> expressions, with some syntax like LIMIT WHEN (condition) which would be
> the general form.
> 

Hmmm. I can't really recall needing such thing, but of course - that 
does not prove it'd not be a good thing.

>   Tomas> FWIW, I doubt the patch can be much smaller/simpler - a
>   Tomas> significant part of the new stuff is in gram.y and node read/out
>   Tomas> infrastructure, the changes to LIMIT node are fairly minimal.
> 
> It's not a matter of patch size as such, but reuse of mechanisms rather
> than adding new ones. Also doing WITH TIES as a special case in the
> Limit node doesn't make adding PERCENT any easier at all, whereas the
> above gets it basically for free.
> 

Makes sense, I guess. At first I was thinking that this certainly does 
not add more new mechanisms than allowing LIMIT to terminate on boolean 
expression. But you're right about the WITH PERCENT part - using window 
functions would make adding this much simpler.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: FETCH FIRST clause WITH TIES option

From
Robert Haas
Date:
On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> Then FETCH FIRST N WITH TIES becomes "stop when the expression
>   rank() over (order by ...) <= N
> is no longer true" (where the ... is the existing top level order by)

Wouldn't that be wicked expensive compared to something hard-coded
that does the same thing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On 10/31/2018 06:17 PM, Robert Haas wrote:
> On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
> <andrew@tao11.riddles.org.uk> wrote:
>> Then FETCH FIRST N WITH TIES becomes "stop when the expression
>>    rank() over (order by ...) <= N
>> is no longer true" (where the ... is the existing top level order by)
> 
> Wouldn't that be wicked expensive compared to something hard-coded
> that does the same thing?
> 

Not sure, but that was one of my concerns too, particularly for the 
simple FETCH FIRST N WITH TIES case. But I think Andrew has a point it 
would make it much easier to implement the PERCENT case.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:
hi,

The attached patch include all the comment given by Tomas and i check sql standard about LIMIT and this feature

it did not say anything about it but I think its good idea to include it to LIMIT too and I will add it if we have consensus on it.

regards

surafel

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:
Attach is rebased patch against the current master
regards
Surafel

On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
hi,

The attached patch include all the comment given by Tomas and i check sql standard about LIMIT and this feature

it did not say anything about it but I think its good idea to include it to LIMIT too and I will add it if we have consensus on it.

regards

surafel

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
Hi,

On 11/24/18 10:28 AM, Surafel Temesgen wrote:
> Attach is rebased patch against the current master
> regards
> Surafel
> 
> On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen <surafel3000@gmail.com
> <mailto:surafel3000@gmail.com>> wrote:
> 
>     hi,
> 
>     The attached patch include all the comment given by Tomas and i
>     check sql standard about LIMIT and this feature
> 

Unfortunately, it seems the "limit" regression tests fail for some
reason - the output mismatches the expected results for some reason. It
seems as if the WITH TIES code affects ordering of the results within
the group. See the attached file.

>     it did not say anything about it but I think its good idea to
>     include it to LIMIT too and I will add it if we have consensus on it.
> 

Hmm, I'm not sure that's needed. I don't see an urgent need to do that
in v1 of the patch.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>     The attached patch include all the comment given by Tomas and i
>     check sql standard about LIMIT and this feature
>

Unfortunately, it seems the "limit" regression tests fail for some
reason - the output mismatches the expected results for some reason. It
seems as if the WITH TIES code affects ordering of the results within
the group. See the attached file.


Yes the reason is the order of returned row is not always the same. I remove other columns from the result set to get constant result

Regards

Surafel

 
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:

On 1/2/19 11:51 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     >
>     >     The attached patch include all the comment given by Tomas and i
>     >     check sql standard about LIMIT and this feature
>     >
> 
>     Unfortunately, it seems the "limit" regression tests fail for some
>     reason - the output mismatches the expected results for some reason. It
>     seems as if the WITH TIES code affects ordering of the results within
>     the group. See the attached file.
> 
> 
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
> 

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

  limitCount + (avgGroupSize/2);

or perhaps

  Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause).

 

can we use ORDER BY column raw statistic in limit node reliably? because it seems to me it can be affected by other operation in a subplan like filter condition

regards

Surafel

Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:

On 1/15/19 11:07 AM, Surafel Temesgen wrote:
> 
> 
> On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
>     patch should tweak estimates in some way. Currently, the cardinality
>     estimate is the same as for plain LIMIT, using the requested number of
>     rows. But let's say there are very few large groups - that will
>     naturally increase the number of rows produced.
> 
>     As an example, let's say the subplan produces 1M rows, and there are
>     1000 groups (when split according to the ORDER BY clause).
> 
> 
>  
> 
> can we use ORDER BY column raw statistic in limit node reliably?
> because it seems to me it can be affected by other operation in a
> subplan like filter condition
> 

What do you mean by "raw statistic"? Using stats from the underlying
table is not quite possible, because you might be operating on top of
join or something like that.

IMHO the best thing you can do is call estimate_num_groups() and combine
that with the number of input rows. That shall benefit from ndistinct
coefficients when available, etc. I've been thinking that considering
the unreliability of grouping estimates we should use a multiple of the
average size (because there may be much larger groups), but I think
that's quite unprecipled and I'd much rather try without it first.

But maybe we can do better when there really is a single table to
consider, in which case we might look at MCV lists and estimate the
largest group. That would give us a much better idea of the worst case.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Tue, Jan 15, 2019 at 2:51 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
 
What do you mean by "raw statistic"?

I mean without further calculation to consider other operation 


IMHO the best thing you can do is call estimate_num_groups() and combine
that with the number of input rows. That shall benefit from ndistinct
coefficients when available, etc. I've been thinking that considering
the unreliability of grouping estimates we should use a multiple of the
average size (because there may be much larger groups), but I think
that's quite unprecipled and I'd much rather try without it first.


thank you for clarifying.

The attache patch use your suggestion uptread

regards

Surafel

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Michael Paquier
Date:
On Wed, Jan 16, 2019 at 11:45:46AM +0300, Surafel Temesgen wrote:
> The attached patch use your suggestion uptread

This patch needs a rebase because of conflicts done recently for
pluggable storage, so moved to next CF, waiting on author.
--
Michael

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Mon, Feb 4, 2019 at 8:29 AM Michael Paquier <michael@paquier.xyz> wrote:

This patch needs a rebase because of conflicts done recently for
pluggable storage, so moved to next CF, waiting on author.
--

Thank you . rebased against current master

regards
Surafel
Attachment

Re: Re: FETCH FIRST clause WITH TIES option

From
David Steele
Date:
On 2/4/19 2:37 PM, Surafel Temesgen wrote:
> 
> 
> On Mon, Feb 4, 2019 at 8:29 AM Michael Paquier <michael@paquier.xyz 
> <mailto:michael@paquier.xyz>> wrote:
> 
> 
>     This patch needs a rebase because of conflicts done recently for
>     pluggable storage, so moved to next CF, waiting on author.
>     --
> 
> 
> Thank you . rebased against current master

This patch no longer passes testing so marked Waiting on Author.

Regards,
-- 
-David
david@pgmasters.net


Re: Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote:
This patch no longer passes testing so marked Waiting on Author.


Thank  you for informing. Fixed
Attachment

Re: Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:
>On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote:
>
>> This patch no longer passes testing so marked Waiting on Author.
>>
>>
>Thank  you for informing. Fixed

Thanks for the updated patch.  I do have this on my list of patches that
I'd like to commit in this CF - likely tomorrow after one more round of
review, or so.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On Fri, Mar 29, 2019 at 01:56:48AM +0100, Tomas Vondra wrote:
>On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:
>>On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote:
>>
>>>This patch no longer passes testing so marked Waiting on Author.
>>>
>>>
>>Thank  you for informing. Fixed
>
>Thanks for the updated patch.  I do have this on my list of patches that
>I'd like to commit in this CF - likely tomorrow after one more round of
>review, or so.
>

Hi,

I got to look at the patch today, with the intent to commit, but sadly I
ran into a couple of minor issues that I don't feel comfortable fixing
on my own. Attached is a patch highlighling some of the places (0001 is
your v7 patch, to keep the cfbot happy).


1) the docs documented this as

   ... [ ONLY | WITH TIES ]

but that's wrong, because it implies those options are optional (i.e.
the user may not specify anything). That's not the case, exactly one
of those options needs to be specified, so it should have been

   ... { ONLY | WITH TIES }


2) The comment in ExecLimit() needs to be updated to explain that WITH
TIES changes the behavior.


3) Minor code style issues (no space before * on comment lines, {}
around single-line if statements, ...).


4) The ExecLimit() does this

    if (node->limitOption == WITH_TIES)
        ExecCopySlot(node->last_slot, slot);

but I think we only really need to do that for the last tuple in the
window, no? Would it be a useful optimization?


5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
code actually prints uniqOperators. Secondly, why does the code use
these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
arrays? I haven't tested this, but looking at the READ_ counterparts, I
don't see why that would be the case.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On Sun, Mar 31, 2019 at 01:14:46AM +0100, Tomas Vondra wrote:
>On Fri, Mar 29, 2019 at 01:56:48AM +0100, Tomas Vondra wrote:
>>On Tue, Mar 26, 2019 at 10:46:00AM +0300, Surafel Temesgen wrote:
>>>On Mon, Mar 25, 2019 at 11:56 AM David Steele <david@pgmasters.net> wrote:
>>>
>>>>This patch no longer passes testing so marked Waiting on Author.
>>>>
>>>>
>>>Thank  you for informing. Fixed
>>
>>Thanks for the updated patch.  I do have this on my list of patches that
>>I'd like to commit in this CF - likely tomorrow after one more round of
>>review, or so.
>>
>
>Hi,
>
>I got to look at the patch today, with the intent to commit, but sadly I
>ran into a couple of minor issues that I don't feel comfortable fixing
>on my own. Attached is a patch highlighling some of the places (0001 is
>your v7 patch, to keep the cfbot happy).
>
>
>1) the docs documented this as
>
>  ... [ ONLY | WITH TIES ]
>
>but that's wrong, because it implies those options are optional (i.e.
>the user may not specify anything). That's not the case, exactly one
>of those options needs to be specified, so it should have been
>
>  ... { ONLY | WITH TIES }
>
>
>2) The comment in ExecLimit() needs to be updated to explain that WITH
>TIES changes the behavior.
>
>
>3) Minor code style issues (no space before * on comment lines, {}
>around single-line if statements, ...).
>
>
>4) The ExecLimit() does this
>
>   if (node->limitOption == WITH_TIES)
>       ExecCopySlot(node->last_slot, slot);
>
>but I think we only really need to do that for the last tuple in the
>window, no? Would it be a useful optimization?
>
>
>5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
>code actually prints uniqOperators. Secondly, why does the code use
>these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
>WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
>arrays? I haven't tested this, but looking at the READ_ counterparts, I
>don't see why that would be the case.
>

Actually, two more minor comments:

6) There's some confusing naming - in plannodes.h the fields added to
the Limit node are called uniqSomething, but in other places the patch
uses sortSomething, ordSomething. I suggest more consistent naming.

7) The LimitOption enum has two items - WITH_ONLY and WITH_TIES. That's
a bit strange, because there's nothing like "WITH ONLY".


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Sun, Mar 31, 2019 at 3:14 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


Hi,

I got to look at the patch today, with the intent to commit, but sadly I
ran into a couple of minor issues that I don't feel comfortable fixing
on my own. Attached is a patch highlighling some of the places (0001 is
your v7 patch, to keep the cfbot happy).


Thank you 

1) the docs documented this as

   ... [ ONLY | WITH TIES ]

but that's wrong, because it implies those options are optional (i.e.
the user may not specify anything). That's not the case, exactly one
of those options needs to be specified, so it should have been

   ... { ONLY | WITH TIES }


2) The comment in ExecLimit() needs to be updated to explain that WITH
TIES changes the behavior.


3) Minor code style issues (no space before * on comment lines, {}
around single-line if statements, ...).


4) The ExecLimit() does this

    if (node->limitOption == WITH_TIES)
        ExecCopySlot(node->last_slot, slot);

but I think we only really need to do that for the last tuple in the
window, no? Would it be a useful optimization?



I think it is good optimization .Fixed

5) Two issues in _outLimit(). Firstly, when printing uniqCollations the
code actually prints uniqOperators. Secondly, why does the code use
these loops at all, instead of using WRITE_ATTRNUMBER_ARRAY and
WRITE_OID_ARRAY, like other places? Perhaps there's an issue with empty
arrays? I haven't tested this, but looking at the READ_ counterparts, I
don't see why that would be the case.



Fixed

regards
Surafel
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
Hi,

Unfortunately this got broken again, this time by aef65db676 :-(

I've tried to fix the merge conflict (essentially by moving some of the
code to adjust_limit_rows_costs(), but I'm wondering if the code added to
create_limit_path is actually correct

    if (count_est != 0)
    {
        double        count_rows;

        if (count_est > 0)
            count_rows = (double) count_est;
        else
            count_rows = clamp_row_est(subpath->rows * 0.10);

        if (limitOption == WITH_TIES)
        {
            ...
            count_rows = Max(avgGroupSize, count_est + (...));
        }
        ...
    }

Firstly, this seriously needs some comment explaining why we do this. But
more importantly - shouldn't it really be

    count_rows = Max(avgGroupSize, count_rows + (...));

instead of using count_est again (which might easily be -1 anyway)?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I've tried to fix the merge conflict (essentially by moving some of the
> code to adjust_limit_rows_costs(), but I'm wondering if the code added to
> create_limit_path is actually correct
> ...
> Firstly, this seriously needs some comment explaining why we do this.

I've not looked at this patch, but TBH I wonder why it is touching
planner rowcount estimation at all.  I find it doubtful either that
a correction for WITH TIES would be significant in most use-cases,
or that we could estimate it accurately if it was significant.
It certainly doesn't seem like something that needs to be messed
with in v1 of the feature.

            regards, tom lane



Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On Wed, Apr 03, 2019 at 03:08:05PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I've tried to fix the merge conflict (essentially by moving some of the
>> code to adjust_limit_rows_costs(), but I'm wondering if the code added to
>> create_limit_path is actually correct
>> ...
>> Firstly, this seriously needs some comment explaining why we do this.
>
>I've not looked at this patch, but TBH I wonder why it is touching
>planner rowcount estimation at all.  I find it doubtful either that
>a correction for WITH TIES would be significant in most use-cases,
>or that we could estimate it accurately if it was significant.
>It certainly doesn't seem like something that needs to be messed
>with in v1 of the feature.
>

FWIW it was me who suggested to tweak the cardinality estimation this way,
but if we want to leave it out from v1, I'm OK with that.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
Hi Surafel,

I've been looking at the patch with the intent to commit it, but once
again I ran into stuff that seems suspicious to me but am not familiar
with enough to say if it's OK. I'm sorry about that :-(

First, a couple of tweaks in the attached v9 of the patch:


1) I've removed the costing changes. As Tom mentions elsewhere in this
thread, that's probably not needed for v1 and it's true those estimates
are probably somewhat sketchy anyway.


2) v8 did this (per my comment):

-       ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+       if(node->limitOption == EXACT_NUMBER)
+               ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));

but I noticed the comment immediately above that says

  * Notify child node about limit.  Note: think not to "optimize" by
  * skipping ExecSetTupleBound if compute_tuples_needed returns < 0.  We
  * must update the child node anyway, in case this is a rescan and the
  * previous time we got a different result.

which applies to WITH_TIES too, no? So I've modified compute_tuples_needed
to return -1, and reverted this change. Not sure, though.


3) I'm a bit confused by the initialization added to ExecInitLimit. It
first gets the tuple descriptor from the limitstate (it should not do so
directly but use ExecGetResultType). But when it creates the extra slot,
it uses ops extracted from the outer plan. That's strange, I guess ...

And then it extracts the descriptor from the outer plan and uses it when
calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
the last_slot, which is using a descriptor from the limitstate.

IMHO all of this should use descriptor/ops from the outer plan, no? It
probably does not change anything because limit does not project, but it
seems confusing.


cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Sun, Apr 7, 2019 at 1:39 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

1) I've removed the costing changes. As Tom mentions elsewhere in this
thread, that's probably not needed for v1 and it's true those estimates
are probably somewhat sketchy anyway.


2) v8 did this (per my comment):

-       ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+       if(node->limitOption == EXACT_NUMBER)
+               ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));

but I noticed the comment immediately above that says

  * Notify child node about limit.  Note: think not to "optimize" by
  * skipping ExecSetTupleBound if compute_tuples_needed returns < 0.  We
  * must update the child node anyway, in case this is a rescan and the
  * previous time we got a different result.

which applies to WITH_TIES too, no? So I've modified compute_tuples_needed
to return -1, and reverted this change. Not sure, though.



I agree that passing -1 to  ExecSetTupleBound is more appropriate implementation


3) I'm a bit confused by the initialization added to ExecInitLimit. It
first gets the tuple descriptor from the limitstate (it should not do so
directly but use ExecGetResultType). But when it creates the extra slot,
it uses ops extracted from the outer plan. That's strange, I guess ...

And then it extracts the descriptor from the outer plan and uses it when
calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
the last_slot, which is using a descriptor from the limitstate.

IMHO all of this should use descriptor/ops from the outer plan, no? It
probably does not change anything because limit does not project, but it
seems confusing.
  

agree

regards
Surafel

Re: FETCH FIRST clause WITH TIES option

From
Thomas Munro
Date:
On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
> agree

Hi Surafel,

A new Commitfest is starting.  Can you please post a fresh rebase of this patch?

Thanks,

-- 
Thomas Munro
https://enterprisedb.com



Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:

Hi Thomas
On Mon, Jul 1, 2019 at 1:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
> agree

Hi Surafel,

A new Commitfest is starting.  Can you please post a fresh rebase of this patch?


Thank you for informing. attach is a rebased patch against current master

regards
Surafel
 
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Erik Rijkers
Date:
On 2019-07-01 19:38, Surafel Temesgen wrote:
> Thank you for informing. attach is a rebased patch against current 
> master
> [...]
> [fetch_first_with_ties_v10.patch]

Hi Surafel,

The patch applies OK, make check is OK, compiles OK.

But I get:

TRAP: FailedAssertion("!(!(((slot)->tts_flags & (1 << 1)) != 0))", File: 
"execTuples.c", Line: 491)

when running a variant ('fetch 1' instead of 'fetch 2') of the test SQL 
against src/test/regress/data/onek.data:

(in the script below the location of the file 'onek.data' will have to 
be changed)

--------------------- 8< ---------------------
#!/bin/bash

echo "
drop   table if exists onek ;
create table onek (
unique1      int4,
unique2      int4,
two          int4,
four         int4,
ten          int4,
twenty       int4,
hundred      int4,
thousand     int4,
twothousand  int4,
fivethous    int4,
tenthous     int4,
odd          int4,
even         int4,
stringu1     name,
stringu2     name,
string4      name
);

copy onek from 
'/home/aardvark/pg_stuff/pg_sandbox/pgsql.fetch_first_with_ties/src/test/regress/data/onek.data';

create index               onek_unique1 on onek using btree(unique1 
int4_ops);
create index onek_unique2  on onek using btree(unique2 int4_ops);
create index onek_hundred  on onek using btree(hundred int4_ops);
create index onek_stringu1 on onek using btree(stringu1 name_ops);

-- OK:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows only
;

-- crashes:
select  * from onek
where thousand < 5 order by thousand
fetch first 1 rows with ties
;

" | psql -qXa
--------------------- 8< ---------------------

Can you have a look?


thanks,

Erik Rijkers





Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:

Hi Erik,
Thank you for looking at it.

Can you have a look?
 

It is because of the patch didn't handle that edge case.
attache patch contain a fix for it

regards
Surafel      
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera from 2ndQuadrant
Date:
As Tom just said in the thread for PERCENT, the gram.y changes need a
better representation.  Also, rename EXACT_NUMBER, per that thread.

As far as I can tell, this concerns feature F867.  I think we should
mark that as supported after this patch -- please edit
src/backend/catalog/sql_features.txt.

Earlier in the thread, Tomas Vondra said:

> 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so
                                                                     
 
> directly but use ExecGetResultType). But when it creates the extra slot,
                                                                     
 
> it uses ops extracted from the outer plan. That's strange, I guess ...
                                                                     
 
>
                                                                     
 
> And then it extracts the descriptor from the outer plan and uses it when
                                                                     
 
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to
                                                                     
 
> the last_slot, which is using a descriptor from the limitstate.
                                                                     
 
>
                                                                     
 
> IMHO all of this should use descriptor/ops from the outer plan, no? It
                                                                     
 
> probably does not change anything because limit does not project, but it
                                                                     
 
> seems confusing.
                                                                     
 

and you replied:

> agree

... yet this doesn't appear to have resulted in any change in the code,
or I just missed it.  Are you going to update the patch per that?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:

Hi Alvaro,
On Fri, Sep 6, 2019 at 1:52 AM Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> wrote:
As Tom just said in the thread for PERCENT, the gram.y changes need a
better representation.  Also, rename EXACT_NUMBER, per that thread.

As far as I can tell, this concerns feature F867.  I think we should
mark that as supported after this patch -- please edit
src/backend/catalog/sql_features.txt.
 
ok
 

Earlier in the thread, Tomas Vondra said:

> 3) I'm a bit confused by the initialization added to ExecInitLimit. It
> first gets the tuple descriptor from the limitstate (it should not do so                                                                                                                   
> directly but use ExecGetResultType). But when it creates the extra slot,                                                                                                                   
> it uses ops extracted from the outer plan. That's strange, I guess ...                                                                                                                     
>                                                                                                                                                                                             
> And then it extracts the descriptor from the outer plan and uses it when                                                                                                                   
> calling execTuplesMatchPrepare. But AFAIK it's going to be compared to                                                                                                                     
> the last_slot, which is using a descriptor from the limitstate.                                                                                                                             
>                                                                                                                                                                                             
> IMHO all of this should use descriptor/ops from the outer plan, no? It                                                                                                                     
> probably does not change anything because limit does not project, but it                                                                                                                   
> seems confusing.                                                                                                                                                                           

and you replied:

> agree

... yet this doesn't appear to have resulted in any change in the code,
or I just missed it.  Are you going to update the patch per that?


Its already done in v9 of the patch attached by Tomas

regards
Surafel

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera from 2ndQuadrant
Date:
On 2019-Sep-06, Surafel Temesgen wrote:

> > ... yet this doesn't appear to have resulted in any change in the code,
> > or I just missed it.  Are you going to update the patch per that?
>
> Its already done in v9 of the patch attached by Tomas

Oh, I missed that.  Great, so we're expecting some small updates from
you then and I think you can mark ready for committer when you post it.
Thanks.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Fri, Sep 6, 2019 at 5:07 PM Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org> wrote:
On 2019-Sep-06, Surafel Temesgen wrote:

> > ... yet this doesn't appear to have resulted in any change in the code,
> > or I just missed it.  Are you going to update the patch per that?
>
> Its already done in v9 of the patch attached by Tomas

Oh, I missed that.  Great, so we're expecting some small updates from
you then and I think you can mark ready for committer when you post it.
Thanks.


Done
regards
Surafel
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
Hmm,

create table w (a point);

# select * from w order by a fetch first 2 rows with ties;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a fetch first 2 rows with ties;
                                  ^
HINT:  Use an explicit ordering operator or modify the query.

I'm not sure that the HINT is useful here.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> create table w (a point);

> # select * from w order by a fetch first 2 rows with ties;
> ERROR:  could not identify an ordering operator for type point
> LINE 1: select * from w order by a fetch first 2 rows with ties;
>                                   ^
> HINT:  Use an explicit ordering operator or modify the query.

> I'm not sure that the HINT is useful here.

That's not new to this patch, HEAD does the same:

regression=# create table w (a point);
CREATE TABLE
regression=# select * from w order by a ;
ERROR:  could not identify an ordering operator for type point
LINE 1: select * from w order by a ;
                                 ^
HINT:  Use an explicit ordering operator or modify the query.

It is a meaningful hint IMO, since (in theory) you could add
something like "USING <<" to the ORDER BY to specify a
particular ordering operator.  The fact that no suitable
operator is actually available in core doesn't seem like
a reason not to give the hint.

            regards, tom lane



Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
Looking at this patch again.  (Attached v13 fixes a typo in ruleutils,
fixes a bogus edit in the SGML docs plus minor rewording, and is rebased
to current master).

First, I noticed that there's a significant unanswered issue from Andrew
Gierth about this using a completely different mechanism, namely an
implicit window function.  I see that Robert was concerned about the
performance of Andrew's proposed approach, but I don't see any reply
from Surafel on the whole issue.
Andrew: Would you rather not have this feature in this form at all?

Tomas, are you still intending to be committer for this?

There's also this point that was not discussed very much:

On 2018-Oct-28, Tomas Vondra wrote:

> One question is whether to make this work for LIMIT too, not just for
> FETCH FIRST. I'm not sure how difficult would that be (it should be a
> grammar-only tweak I guess), or if it's a good idea in general.

I was also thinking that it would be good to support the new clause
under the LIMIT spelling, but it turns out that there are shift/reduce
conflicts if done the naive way(*); fixing those looks to be rather major
surgery to the way we represent the LIMIT/OFFSET clauses.

I'm not sure the proposed changes to gram.y are all that great, though.
Right now we have separate productions offset_clause and limit_clause,
and a combination mechanism called select_limit; after the patch the
whole thing is a little bit too messy.  Would it be sensible to unify/
rationalize these?


(*) this:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 72ad633102..01d337c30e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11672,6 +11672,13 @@ limit_clause:
                     n->limitOption = LIMIT_OPTION_COUNT;
                     $$ = n;
                 }
+            | LIMIT select_limit_value WITH TIES
+                {
+                    LimitClause *n = (LimitClause *) palloc(sizeof(LimitClause));
+                    n->limitCount = $2;
+                    n->limitOption = LIMIT_OPTION_WITH_TIES;
+                    $$ = n;
+                }
             | LIMIT select_limit_value ',' select_offset_value
                 {
                     /* Disabled because it was too confusing, bjm 2002-02-18 */

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
(BTW another small issue in the patch is the comments in gram.y that
talk about ONLY being a reserved word about the SQL:2008 syntax; ISTM
that that comment should now mention both ONLY as well as WITH being
fully reserved words.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Mon, Nov 11, 2019 at 5:56 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
First, I noticed that there's a significant unanswered issue from Andrew
Gierth about this using a completely different mechanism, namely an
implicit window function.  I see that Robert was concerned about the
performance of Andrew's proposed approach, but I don't see any reply
from Surafel on the whole issue.


i don't know much about window function implementation but am sure teaching window
function about limit and implementing limit variant on top of it will not be much simpler
and performant than the current implementation. i also think more appropriate place to
include limit variant is limitNode not other place which can case redundancy  and
maintainability issue

regards
Surafel

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
On 2019-Nov-11, Alvaro Herrera wrote:

> I'm not sure the proposed changes to gram.y are all that great, though.

Here's a proposed simplification of the gram.y changes.  There are two
things here:

1. cosmetic: we don't need the LimitClause struct; we can use just
   SelectLimit, and return that from limit_clause; that can be
   complemented using the offset_clause if there's any at select_limit
   level.

2. there's a gratuituous palloc() in opt_select_limit when there's no
   clause, that seems to be there just so that NULLs can be returned.
   That's one extra palloc for SELECTs parsed using one the affected
   productions ... it's not every single select, but it seems bad enough
   it's worth fixing.

I fixed #2 by just checking whether the return from opt_select_limit is
NULL.  ISTM it'd be better to pass the SelectLimit pointer to
insertSelectOptions instead (which is a static function in gram.y anyway
so there's no difficulty there).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
(Prior to posting this delta patch, the CF bot appeared happy with this
patch.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
I rebased this patch, and my proposed changes are in 0002.

Looking at the changes in ExecLimit, I'm wondering if it would be better
to add a new state to the state machine there -- instead of doing all
the work in duplicative code in the LIMIT_INWINDOW case, have that one
only save the current end-of-window tuple and jump to
LIMIT_WINDOWEND_WITH_TIE (or something) which then returns all tuples that
match the current one.  I think that would end up being cleaner.  Please
research.

Marking Waiting on Author.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Tue, Nov 26, 2019 at 6:09 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I rebased this patch, and my proposed changes are in 0002.

Thank you 

Looking at the changes in ExecLimit, I'm wondering if it would be better
to add a new state to the state machine there -- instead of doing all
the work in duplicative code in the LIMIT_INWINDOW case, have that one
only save the current end-of-window tuple and jump to
LIMIT_WINDOWEND_WITH_TIE (or something) which then returns all tuples that
match the current one.  I think that would end up being cleaner.  Please
research.
 
Done
regards
Surafel
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
Thanks.

(I would suggest renaming the new state LIMIT_WINDOWEND_TIES, because it
seems to convey what it does a little better.)

I think you should add a /* fall-though */ comment after changing state.
Like this (this flow seems clearer; also DRY):

                if (!node->noCount &&
                    node->position - node->offset >= node->count)
                {
                    if (node->limitOption == LIMIT_OPTION_COUNT)
                    {
                        node->lstate = LIMIT_WINDOWEND;
                        return NULL;
                    }
                    else
                    {
                        node->lstate = LIMIT_WINDOWEND_TIES;
                        /* fall-through */
                    }
                }
                else
                    ...

I've been playing with this a little more, and I think you've overlooked
a few places in ExecLimit that need to be changed.  In the regression
database, I tried this:

55432 13devel 17282=# begin;
BEGIN
55432 13devel 17282=# declare c1 cursor for select * from int8_tbl order by q1 fetch first 2 rows with ties;
DECLARE CURSOR
55432 13devel 17282=# fetch all in c1;
 q1  │        q2        
─────┼──────────────────
 123 │              456
 123 │ 4567890123456789
(2 filas)

55432 13devel 17282=# fetch all in c1;
 q1 │ q2 
────┼────
(0 filas)

55432 13devel 17282=# fetch forward all in c1;
 q1 │ q2 
────┼────
(0 filas)

So far so good .. things look normal.  But here's where things get
crazy:

55432 13devel 17282=# fetch backward all in c1;
        q1        │        q2        
──────────────────┼──────────────────
 4567890123456789 │              123
              123 │ 4567890123456789
(2 filas)

(huh???)

55432 13devel 17282=# fetch backward all in c1;
 q1 │ q2 
────┼────
(0 filas)

Okay -- ignoring the fact that we got a row that should not be in the
result, we've exhausted the cursor going back.  Let's scan it again from
the start:

55432 13devel 17282=# fetch forward all in c1;
        q1        │        q2         
──────────────────┼───────────────────
              123 │  4567890123456789
 4567890123456789 │               123
 4567890123456789 │  4567890123456789
 4567890123456789 │ -4567890123456789
(4 filas)

This is just crazy.

I think you need to stare a that thing a little harder.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Thanks.

(I would suggest renaming the new state LIMIT_WINDOWEND_TIES, because it
seems to convey what it does a little better.)

I think you should add a /* fall-though */ comment after changing state.
Like this (this flow seems clearer; also DRY):

                if (!node->noCount &&
                    node->position - node->offset >= node->count)
                {
                    if (node->limitOption == LIMIT_OPTION_COUNT)
                    {
                        node->lstate = LIMIT_WINDOWEND;
                        return NULL;
                    }
                    else
                    {
                        node->lstate = LIMIT_WINDOWEND_TIES;
                        /* fall-through */
                    }
                }
                else
                    ...

 
changed
 
I've been playing with this a little more, and I think you've overlooked
a few places in ExecLimit that need to be changed.  In the regression
database, I tried this:

55432 13devel 17282=# begin;
BEGIN
55432 13devel 17282=# declare c1 cursor for select * from int8_tbl order by q1 fetch first 2 rows with ties;
DECLARE CURSOR
55432 13devel 17282=# fetch all in c1;
 q1  │        q2       
─────┼──────────────────
 123 │              456
 123 │ 4567890123456789
(2 filas)

55432 13devel 17282=# fetch all in c1;
 q1 │ q2
────┼────
(0 filas)

55432 13devel 17282=# fetch forward all in c1;
 q1 │ q2
────┼────
(0 filas)

So far so good .. things look normal.  But here's where things get
crazy:

55432 13devel 17282=# fetch backward all in c1;
        q1        │        q2       
──────────────────┼──────────────────
 4567890123456789 │              123
              123 │ 4567890123456789
(2 filas)

(huh???)

55432 13devel 17282=# fetch backward all in c1;
 q1 │ q2
────┼────
(0 filas)

Okay -- ignoring the fact that we got a row that should not be in the
result, we've exhausted the cursor going back.  Let's scan it again from
the start:

55432 13devel 17282=# fetch forward all in c1;
        q1        │        q2         
──────────────────┼───────────────────
              123 │  4567890123456789
 4567890123456789 │               123
 4567890123456789 │  4567890123456789
 4567890123456789 │ -4567890123456789
(4 filas)

This is just crazy.

I think you need to stare a that thing a little harder.


This is because the new state didn't know about backward scan
and there was off by one error it scan one position past to detect
ties. The attached patch fix both
regards
Surafel  
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
On 2019-Nov-28, Surafel Temesgen wrote:

> On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> 
> > I think you should add a /* fall-though */ comment after changing state.
> > Like this (this flow seems clearer; also DRY):
> >
> >                 if (!node->noCount &&
> >                     node->position - node->offset >= node->count)
> >                 {
> >                     if (node->limitOption == LIMIT_OPTION_COUNT)
> >                     {
> >                         node->lstate = LIMIT_WINDOWEND;
> >                         return NULL;
> >                     }
> >                     else
> >                     {
> >                         node->lstate = LIMIT_WINDOWEND_TIES;
> >                         /* fall-through */
> >                     }
> >                 }
> >                 else
> >                     ...
>
> changed

But you did not read my code snippet, did you ...?

> > I think you need to stare a that thing a little harder.
>
> This is because the new state didn't know about backward scan
> and there was off by one error it scan one position past to detect
> ties. The attached patch fix both

Oh, thanks, glad it was that easy.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Andrew Gierth
Date:
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:

 Alvaro> First, I noticed that there's a significant unanswered issue
 Alvaro> from Andrew Gierth about this using a completely different
 Alvaro> mechanism, namely an implicit window function. Robert was
 Alvaro> concerned about the performance of Andrew's proposed approach,
 Alvaro> but I don't see any reply from Surafel on the whole issue.
 
 Alvaro> Andrew: Would you rather not have this feature in this form at
 Alvaro> all?

I have done a proof-of-concept hack for my alternative approach, which I
will post in another thread so as not to confuse the cfbot.

The main advantage, as I see it, of a window function approach is that
it can also work for PERCENT with only a change to the generated
expression; the executor part of the code can handle any stopping
condition. It can also be generalized (though the POC does not do this)
to allow an SQL-level extension, something like "LIMIT WHEN condition"
to indicate that it should stop just before the first row for which the
condition is true. Whether this is a desirable feature or not is another
question.

-- 
Andrew.



Re: FETCH FIRST clause WITH TIES option

From
Tomas Vondra
Date:
On Fri, Nov 29, 2019 at 05:19:00AM +0000, Andrew Gierth wrote:
>>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>
> Alvaro> First, I noticed that there's a significant unanswered issue
> Alvaro> from Andrew Gierth about this using a completely different
> Alvaro> mechanism, namely an implicit window function. Robert was
> Alvaro> concerned about the performance of Andrew's proposed approach,
> Alvaro> but I don't see any reply from Surafel on the whole issue.
>
> Alvaro> Andrew: Would you rather not have this feature in this form at
> Alvaro> all?
>
>I have done a proof-of-concept hack for my alternative approach, which I
>will post in another thread so as not to confuse the cfbot.
>
>The main advantage, as I see it, of a window function approach is that
>it can also work for PERCENT with only a change to the generated
>expression; the executor part of the code can handle any stopping
>condition. It can also be generalized (though the POC does not do this)
>to allow an SQL-level extension, something like "LIMIT WHEN condition"
>to indicate that it should stop just before the first row for which the
>condition is true. Whether this is a desirable feature or not is another
>question.
>

For the record, the alternative approach was posted here:

https://www.postgresql.org/message-id/flat/87o8wvz253.fsf@news-spur.riddles.org.uk

I think we need to make a decision about which road to take - shall we
continue with what Surafel originally proposed, or should we instead
adopt Andrew's patch?

Andrew's patch looks significantly simpler, although it probably will
need more work to get it committable. My main concern about using window
functions was performance, so I think we should benchmark. For example,
considering window functions are not parallel safe, would this have
negative impact on parallel queries?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Surafel Temesgen
Date:


On Thu, Nov 28, 2019 at 5:49 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Nov-28, Surafel Temesgen wrote:

> On Thu, Nov 28, 2019 at 12:36 AM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>
> > I think you should add a /* fall-though */ comment after changing state.
> > Like this (this flow seems clearer; also DRY):
> >
> >                 if (!node->noCount &&
> >                     node->position - node->offset >= node->count)
> >                 {
> >                     if (node->limitOption == LIMIT_OPTION_COUNT)
> >                     {
> >                         node->lstate = LIMIT_WINDOWEND;
> >                         return NULL;
> >                     }
> >                     else
> >                     {
> >                         node->lstate = LIMIT_WINDOWEND_TIES;
> >                         /* fall-through */
> >                     }
> >                 }
> >                 else
> >                     ...
>
> changed

But you did not read my code snippet, did you ...?

I don't see it. changed to it and rebased to current master

regards
Surafel
Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
There was a minor conflict in planmain.h.  Here's a refreshed version.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
Some ruleutils.c code added by this patch is not covered by tests:

    5246             :     /* Add the LIMIT clause if given */
    5247        1115 :     if (query->limitOffset != NULL)
    5248             :     {
    5249           0 :         appendContextKeyword(context, " OFFSET ",
    5250             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5251           0 :         get_rule_expr(query->limitOffset, context, false);
    5252             :     }
    5253        1115 :     if (query->limitOption == LIMIT_OPTION_WITH_TIES)
    5254             :     {
    5255           0 :         appendContextKeyword(context, " FETCH FIRST ",
    5256             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5257           0 :         get_rule_expr(query->limitCount, context, false);
    5258           0 :         appendContextKeyword(context, " ROWS WITH TIES ",
    5259             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5260             :     }
    5261        1115 :     if (query->limitCount != NULL && query->limitOption != LIMIT_OPTION_WITH_TIES)
    5262             :     {
    5263           2 :         appendContextKeyword(context, " LIMIT ",
    5264             :                              -PRETTYINDENT_STD, PRETTYINDENT_STD, 0);
    5265           2 :         if (IsA(query->limitCount, Const) &&
    5266           0 :             ((Const *) query->limitCount)->constisnull)
    5267           0 :             appendStringInfoString(buf, "ALL");
    5268             :         else
    5269           2 :             get_rule_expr(query->limitCount, context, false);
    5270             :     }

Other than that, the patch seems good to go to me, so unless there are
objections, I intend to get this pushed tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
Pushed, with some additional changes.

So *of course* when I add tests to verify that ruleutils, I find a case
that does not work properly -- meaning, you get a view that pg_dump
emits in a way that won't be accepted:

CREATE VIEW limit_thousand_v_3 AS SELECT thousand FROM onek WHERE thousand < 995
        ORDER BY thousand FETCH FIRST NULL ROWS WITH TIES;

note the "NULL" there.  ruleutils would gladly print this out as:

View definition:
 SELECT onek.thousand
   FROM onek
  WHERE onek.thousand < 995
  ORDER BY onek.thousand
 FETCH FIRST NULL::integer ROWS WITH TIES;

which is then not accepted.

The best fix I could come up for this was to reject a bare NULL in the
limit clause.  It's a very stupid fix, because you can still give it a
NULL, using
CREATE VIEW limit_thousand_v_3 AS SELECT thousand FROM onek WHERE thousand < 995
        ORDER BY thousand FETCH FIRST (NULL+1) ROWS WITH TIES;
and the like.  But when ruleutils get this, it will add the parens,
which will magically make it work.

It turns out that the SQL standard is much more limited in what it will
accept there.  But our grammar (what we'll accept for the ancient LIMIT
clause) is very lenient -- it'll take just any expression.  I thought
about reducing that to NumericOnly for FETCH FIRST .. WITH TIES, but
then I have to pick: 1) gram.y fails to compile because of a
reduce/reduce conflict, or 2) also restricting FETCH FIRST .. ONLY to
NumericOnly.  Neither of those seemed very palatable.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Andres Freund
Date:
Hi,

On 2020-04-07 16:36:54 -0400, Alvaro Herrera wrote:
> Pushed, with some additional changes.

This triggers a new warning for me (gcc-10):
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c: In function ‘ExecLimit’:
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c:136:7: warning: this statement may fall through
[-Wimplicit-fallthrough=]
  136 |    if (ScanDirectionIsForward(direction))
      |       ^
/home/andres/src/postgresql/src/backend/executor/nodeLimit.c:216:3: note: here
  216 |   case LIMIT_WINDOWEND_TIES:
      |   ^~~~

I've not looked at it in any sort of detail, but it looks like it might
be a false positive, with the "fall-through" comment not being
sufficient to quiesce the compiler?

Cosmetically I would agree that falling through to the next case" a few
blocks deep inside a case: isn't the prettiest...

- Andres



Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
Hello

On 2020-Apr-07, Andres Freund wrote:

> On 2020-04-07 16:36:54 -0400, Alvaro Herrera wrote:
> > Pushed, with some additional changes.
> 
> This triggers a new warning for me (gcc-10):
> /home/andres/src/postgresql/src/backend/executor/nodeLimit.c: In function ‘ExecLimit’:
> /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:136:7: warning: this statement may fall through
[-Wimplicit-fallthrough=]
>   136 |    if (ScanDirectionIsForward(direction))
>       |       ^
> /home/andres/src/postgresql/src/backend/executor/nodeLimit.c:216:3: note: here
>   216 |   case LIMIT_WINDOWEND_TIES:
>       |   ^~~~
> 
> I've not looked at it in any sort of detail, but it looks like it might
> be a false positive, with the "fall-through" comment not being
> sufficient to quiesce the compiler?

It's on purpose, yeah, but I can understand the compiler not getting it.

> Cosmetically I would agree that falling through to the next case" a few
> blocks deep inside a case: isn't the prettiest...

That's true ... maybe a fix would be to split that stuff to a
subroutine?

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Andrew Gierth
Date:
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:

 Alvaro> It turns out that the SQL standard is much more limited in what
 Alvaro> it will accept there. But our grammar (what we'll accept for
 Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just
 Alvaro> any expression. I thought about reducing that to NumericOnly
 Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1)
 Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or
 Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither
 Alvaro> of those seemed very palatable.

FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it
didn't allow parameters (which are allowed by the spec); see 1da162e1f.

(That change didn't present a problem for ruleutils, because FETCH FIRST
... ONLY is output as a LIMIT clause instead.)

This needs to be fixed in ruleutils, IMO, not by changing what the
grammar accepts.

-- 
Andrew.



Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
On 2020-Apr-08, Andrew Gierth wrote:

> >>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> 
>  Alvaro> It turns out that the SQL standard is much more limited in what
>  Alvaro> it will accept there. But our grammar (what we'll accept for
>  Alvaro> the ancient LIMIT clause) is very lenient -- it'll take just
>  Alvaro> any expression. I thought about reducing that to NumericOnly
>  Alvaro> for FETCH FIRST .. WITH TIES, but then I have to pick: 1)
>  Alvaro> gram.y fails to compile because of a reduce/reduce conflict, or
>  Alvaro> 2) also restricting FETCH FIRST .. ONLY to NumericOnly. Neither
>  Alvaro> of those seemed very palatable.
> 
> FETCH FIRST ... ONLY was made _too_ restrictive initially, such that it
> didn't allow parameters (which are allowed by the spec); see 1da162e1f.

Hmm, oh, I see.

> (That change didn't present a problem for ruleutils, because FETCH FIRST
> ... ONLY is output as a LIMIT clause instead.)

Right, I noticed that and kept it unchanged.

> This needs to be fixed in ruleutils, IMO, not by changing what the
> grammar accepts.

Fair.  I didn't change what the grammar accepts.  I ended up only
throwing an error in parse analysis when a bare NULL const is seen.
I guess we could fix ruleutils to add parens when NULL is seen, but
I'm not sure it's necessary or useful.  (LIMIT uses a null to represent
the LIMIT ALL case.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FETCH FIRST clause WITH TIES option

From
Andrew Gierth
Date:
>>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:

 >> This needs to be fixed in ruleutils, IMO, not by changing what the
 >> grammar accepts.

 Alvaro> Fair. I didn't change what the grammar accepts. I ended up only
 Alvaro> throwing an error in parse analysis when a bare NULL const is
 Alvaro> seen.

This seems far too arbitrary to me.

-- 
Andrew (irc:RhodiumToad)



Re: FETCH FIRST clause WITH TIES option

From
Alvaro Herrera
Date:
On 2020-Apr-08, Andrew Gierth wrote:

> >>>>> "Alvaro" == Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> 
>  >> This needs to be fixed in ruleutils, IMO, not by changing what the
>  >> grammar accepts.
> 
>  Alvaro> Fair. I didn't change what the grammar accepts. I ended up only
>  Alvaro> throwing an error in parse analysis when a bare NULL const is
>  Alvaro> seen.
> 
> This seems far too arbitrary to me.

Oops, I saw this comment only now.

We're still on time to fix this, if there's something that needs fixing.
Let's discuss what changes are needed.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services