Thread: BUG #18467: postgres_fdw (deparser) ignores LimitOption

BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18467
Logged by:          Onder Kalacı
Email address:      onderkalaci@gmail.com
PostgreSQL version: 16.2
Operating system:   MacOs
Description:

Hi, it seems the same query with `LimitOption`  on a heap table and on an
postgres_fdw table pointing to the same heap table is returning different
results.

Steps to reproduce:

-- create heap table, and insert 2 rows
CREATE TABLE heap_table (a int);
INSERT INTO heap_table VALUES (1), (1), (1);

-- create a foreign table, pointing to the same heap_table
CREATE FOREIGN TABLE ft1 (
  a int
) SERVER loopback OPTIONS (table_name 'heap_table');

-- same query returning different results
 SELECT * FROM heap_table ORDER BY 1 FETCH FIRST 2 ROWS WITH TIES ;
 a 
---
 1
 1
 1
(3 rows)

 SELECT * FROM ft1 ORDER BY 1 FETCH FIRST 2 ROWS WITH TIES ;
 a 
---
 1
 1
(2 rows)

-- seems like the deparser doesn't properly handle LimitOption
explain (verbose) SELECT * FROM ft1 ORDER BY 1 FETCH FIRST 2 ROWS WITH TIES
;
                                       QUERY PLAN
            
-----------------------------------------------------------------------------------------
 Foreign Scan on public.ft1  (cost=100.00..100.07 rows=2 width=4)
   Output: a
   Remote SQL: SELECT a FROM public.heap_table ORDER BY a ASC NULLS LAST
LIMIT 2::bigint
(3 rows)


fdw-setup steps I used:

-- setup
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
DO $d$
    BEGIN
        EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
            OPTIONS (dbname '$$||current_database()||$$',
                     port '$$||current_setting('port')||$$'
            )$$;
    END;
$d$;
CREATE USER MAPPING FOR public SERVER testserver1 OPTIONS (user 'value',
password 'value');
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;


Thanks,
Onder


Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
Hi Onder,

On Wed, May 15, 2024 at 6:07 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
> Hi, it seems the same query with `LimitOption`  on a heap table and on an
> postgres_fdw table pointing to the same heap table is returning different
> results.

Thanks for the report!  Will look into this next week, because I am
busy with something else this week.

Best regards,
Etsuro Fujita



On Wed, 15 May 2024 at 17:41, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Hi Onder,
>
> On Wed, May 15, 2024 at 6:07 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> Hi, it seems the same query with `LimitOption`  on a heap table and on an
>> postgres_fdw table pointing to the same heap table is returning different
>> results.
>
> Thanks for the report!  Will look into this next week, because I am
> busy with something else this week.
>

AppendLimitClause() does not check the limitOption, which may be
LIMIT_OPTION_WITH_TIES. It simply uses the LIMIT clause.

Here is a poc patch to verify the above.

--
Regards,
Japin Li


Attachment

Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Önder Kalacı
Date:
Hi Japin,


Here is a poc patch to verify the above.


The patch looks reasonable to me, it is in line with `get_select_query_def()` .

Though, probably requires a regression test.

Thanks,
Onder
 

Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
Hi Japin,

On Thu, May 16, 2024 at 1:02 PM Japin Li <japinli@hotmail.com> wrote:
> AppendLimitClause() does not check the limitOption, which may be
> LIMIT_OPTION_WITH_TIES. It simply uses the LIMIT clause.
>
> Here is a poc patch to verify the above.

Thanks for the patch!  Will review.

Best regards,
Etsuro Fujita



On Thu, 16 May 2024 at 17:11, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Hi Japin,
>
> On Thu, May 16, 2024 at 1:02 PM Japin Li <japinli@hotmail.com> wrote:
>> AppendLimitClause() does not check the limitOption, which may be
>> LIMIT_OPTION_WITH_TIES. It simply uses the LIMIT clause.
>>
>> Here is a poc patch to verify the above.
>
> Thanks for the patch!  Will review.
>

I find that the FETCH FIRST ... ROWS WITH TIES does not support type casting.
For example:

postgres=# SELECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint ROWS WITH TIES;
ERROR:  syntax error at or near "::"
LINE 1: ...ECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint R...
                                                             ^

I've made a change to support type casting in the select_fetch_first_value.

Please consider reviewing the v2 patch.

--
Regards,
Japin Li


Attachment
On Thu, 16 May 2024 at 17:02, Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hi Japin,
>
>>
>>
>> Here is a poc patch to verify the above.
>>
>>
> The patch looks reasonable to me, it is in line with `get_select_query_def()
>
<https://github.com/postgres/postgres/blob/8aee330af55d8a759b2b73f5a771d9d34a7b887f/src/backend/utils/adt/ruleutils.c#L5745-L5752>`
> .
>
> Though, probably requires a regression test.
>

Yeah, add the regression test in v2 patch [1].

[1]
https://www.postgresql.org/message-id/ME3P282MB31666446BF8A0C0F7A02D24CB6ED2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM

--
Regards,
Japin Li



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Alvaro Herrera
Date:
On 2024-May-16, Japin Li wrote:

> I find that the FETCH FIRST ... ROWS WITH TIES does not support type casting.
> For example:
> 
> postgres=# SELECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint ROWS WITH TIES;
> ERROR:  syntax error at or near "::"
> LINE 1: ...ECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint R...
>                                                              ^

Why do you need this?  The standard says

<fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES }
<fetch first quantity> ::= <fetch first row count> | <fetch first percentage>
<offset row count> ::= <simple value specification>
<fetch first row count> ::= <simple value specification>

which doesn't seem to leave room for a cast.

I didn't try super extensively, but this works:
  select 1 from pg_class fetch first 281474976710656 rows only;
so the count is already not restricted to be an int32 value.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



On Thu, 16 May 2024 at 23:27, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2024-May-16, Japin Li wrote:
>
>> I find that the FETCH FIRST ... ROWS WITH TIES does not support type casting.
>> For example:
>>
>> postgres=# SELECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint ROWS WITH TIES;
>> ERROR:  syntax error at or near "::"
>> LINE 1: ...ECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint R...
>>                                                              ^
>
> Why do you need this?  The standard says
>

The limitCount is stored as bigint and deparseConst() will automatically append
'::bigint'.  So I do the typecast.

> <fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES }
> <fetch first quantity> ::= <fetch first row count> | <fetch first percentage>
> <offset row count> ::= <simple value specification>
> <fetch first row count> ::= <simple value specification>
>
> which doesn't seem to leave room for a cast.
>
> I didn't try super extensively, but this works:
>   select 1 from pg_class fetch first 281474976710656 rows only;
> so the count is already not restricted to be an int32 value.

Since it is of type int, why should it be stored as bigint?

--
Regards,
Japin Li



On Thu, 16 May 2024 at 23:27, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2024-May-16, Japin Li wrote:
>
>> I find that the FETCH FIRST ... ROWS WITH TIES does not support type casting.
>> For example:
>>
>> postgres=# SELECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint ROWS WITH TIES;
>> ERROR:  syntax error at or near "::"
>> LINE 1: ...ECT * FROM pg_class ORDER BY relname FETCH FIRST 2::bigint R...
>>                                                              ^
>
> Why do you need this?  The standard says
>
> <fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES }
> <fetch first quantity> ::= <fetch first row count> | <fetch first percentage>
> <offset row count> ::= <simple value specification>
> <fetch first row count> ::= <simple value specification>
>
> which doesn't seem to leave room for a cast.
>
> I didn't try super extensively, but this works:
>   select 1 from pg_class fetch first 281474976710656 rows only;
> so the count is already not restricted to be an int32 value.

After some dig in, I find transformLimitClause() has the following comments:

 * Note: as of Postgres 8.2, LIMIT expressions are expected to yield int8,
 * rather than int4 as before.

So the deparseConst() append '::bigint' typecast, same as get_rule_expr().

--
Regards,
Japin Li



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Alvaro Herrera
Date:
On 2024-May-17, Japin Li wrote:

> After some dig in, I find transformLimitClause() has the following comments:
> 
>  * Note: as of Postgres 8.2, LIMIT expressions are expected to yield int8,
>  * rather than int4 as before.
> 
> So the deparseConst() append '::bigint' typecast, same as get_rule_expr().

Hmm, probably the answer is not to use deparseExpr in
appendLimitClause() then.  I mean, AFAICS this is a postgres_fdw
problem, not a core parser problem.

Not sure if this is workable, but maybe test if the value is a constant,
and use FETCH FIRST if so, otherwise fall back to using LIMIT.  (It's
not clear if this would handle FETCH FIRST +123123123123 ROWS WITH TIES
correctly though -- worth checking).


Oh, I see we never got back to adding FETCH FIRST / PERCENT.  We had a
patch seemingly almost ready for that ...

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2024-May-16, Japin Li wrote:
>> I find that the FETCH FIRST ... ROWS WITH TIES does not support type casting.

> Why do you need this?  The standard says

> <fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES }
> <fetch first quantity> ::= <fetch first row count> | <fetch first percentage>
> <offset row count> ::= <simple value specification>
> <fetch first row count> ::= <simple value specification>
> which doesn't seem to leave room for a cast.

Exactly.  "LIMIT 42::int8" is accepted, but the equivalent
with FETCH is not, so that there's a hazard of the remote
rejecting the query depending on how deparseExpr chooses
to print the limit expression.  Ordinarily it hasn't the
slightest hesitation about affixing casts to constants,
so I suspect there's a live problem there.


I think that this approach to a fix might be the wrong thing
altogether, and that it might be better to refuse to send WITH_TIES
clauses to the remote at all.  There are two things that are scaring
me about that:

1. The remote might have a different idea of equality than we do,
leading to different results.  Admittedly that's a little bit
far-fetched, but with things like nondeterministic collations floating
around, it's by no means impossible.

2. If the remote is older than v13 the query will fail entirely
for lack of support for WITH TIES.

At the very least we need a remote-version check before deciding
that it'll be OK to ship such a clause.  But if there are other
gotchas, as it seems there are, let's just cut our losses and
not do it.

            regards, tom lane



On Fri, 17 May 2024 at 00:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> On 2024-May-16, Japin Li wrote:
>>> I find that the FETCH FIRST ... ROWS WITH TIES does not support type casting.
>
>> Why do you need this?  The standard says
>
>> <fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first quantity> ] { ROW | ROWS } { ONLY | WITH TIES }
>> <fetch first quantity> ::= <fetch first row count> | <fetch first percentage>
>> <offset row count> ::= <simple value specification>
>> <fetch first row count> ::= <simple value specification>
>> which doesn't seem to leave room for a cast.
>
> Exactly.  "LIMIT 42::int8" is accepted, but the equivalent
> with FETCH is not, so that there's a hazard of the remote
> rejecting the query depending on how deparseExpr chooses
> to print the limit expression.  Ordinarily it hasn't the
> slightest hesitation about affixing casts to constants,
> so I suspect there's a live problem there.
>
> I think that this approach to a fix might be the wrong thing
> altogether, and that it might be better to refuse to send WITH_TIES
> clauses to the remote at all.  There are two things that are scaring
> me about that:
>
> 1. The remote might have a different idea of equality than we do,
> leading to different results.  Admittedly that's a little bit
> far-fetched, but with things like nondeterministic collations floating
> around, it's by no means impossible.
>
> 2. If the remote is older than v13 the query will fail entirely
> for lack of support for WITH TIES.
>

I didn't take this into consideration previously.

> At the very least we need a remote-version check before deciding
> that it'll be OK to ship such a clause.  But if there are other
> gotchas, as it seems there are, let's just cut our losses and
> not do it.
>

Try to disable the push-down FETCH FIRST clause as you suggested.


--
Regards,
Japin Li


Attachment

Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Önder Kalacı
Date:
Hi,

altogether, and that it might be better to refuse to send WITH_TIES
clauses to the remote at all.  I think that this approach to a fix might be the wrong thing
 
I didn't take this into consideration previously.
 
I think I confused many people with my first bug-report, where I mentioned
about deparser. Sorry about that.

Reading Tom's response, I think he is right, it sounds simpler & better to
refuse to send WITH_TIES at all. Especially if we consider this as a candidate
for backport to earlier versions. I think supporting pushdown of WITH_TIES
can probably be considered as a new feature, and deserves its own patch & 
discussion.

+SELECT * FROM ft1 t1 ORDER BY t1.c2 FETCH FIRST 2 ROWS WITH TIES;

I think it is excessive that the query returns 100 rows. Can we add few filters, like
a filter (c5 = `Thu Jan 01 00:00:00 1970` AND c3 > '00500') or such that we only have
few rows to show in the output?

+ /*
+ * Also, the FETCH FIRST/NEXT ... ROW/ROWS WITH TIES cannot be pushed down
+ * due to potential lack of support for this clause on the remote.
+ */
 
Would be nice to mention potential risks due to collation-incompatibilities in this comment.
I feel like that is probably more important than the lack of WITH TIES support. For example, 
in a few years, someone might decide to remove this check by assuming that it is only 
added as PG 13 would not be officially supported by the community (e.g., end of life).

+ if (parse->limitOption == LIMIT_OPTION_WITH_TIES)
+ return;
+

Other than that, the check you added looks like a reasonable place to add.

Thanks,
Onder
On Mon, 20 May 2024 at 17:08, Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hi,
>
>>
>> altogether, and that it might be better to refuse to send WITH_TIES
>>> clauses to the remote at all.  I think that this approach to a fix might
>>> be the wrong thing
>>
>>
>>
> I didn't take this into consideration previously.
>
>
> I think I confused many people with my first bug-report, where I mentioned
> about deparser. Sorry about that.
>
> Reading Tom's response, I think he is right, it sounds simpler & better to
> refuse to send WITH_TIES at all. Especially if we consider this as a
> candidate
> for backport to earlier versions. I think supporting pushdown of WITH_TIES
> can probably be considered as a new feature, and deserves its own patch &
> discussion.
>
>> +SELECT * FROM ft1 t1 ORDER BY t1.c2 FETCH FIRST 2 ROWS WITH TIES;
>
>
> I think it is excessive that the query returns 100 rows. Can we add few
> filters, like
> a filter (c5 = `Thu Jan 01 00:00:00 1970` AND c3 > '00500') or such that we
> only have
> few rows to show in the output?

Fixed.

>
> + /*
>> + * Also, the FETCH FIRST/NEXT ... ROW/ROWS WITH TIES cannot be pushed down
>> + * due to potential lack of support for this clause on the remote.
>> + */
>
>
> Would be nice to mention potential risks due to collation-incompatibilities
> in this comment.
> I feel like that is probably more important than the lack of WITH TIES
> support. For example,
> in a few years, someone might decide to remove this check by assuming that
> it is only
> added as PG 13 would not be officially supported by the community (e.g.,
> end of life).
>
> + if (parse->limitOption == LIMIT_OPTION_WITH_TIES)
>> + return;
>> +
>

Sorry, I forgot the first reason that Tom mentioned. Fixed.
However, as a non-native English speaker, it might not be accurate.
Feel free to modify it.

>
> Other than that, the check you added looks like a reasonable place to add.

--
Regards,
Japin Li


Attachment

Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Önder Kalacı
Date:
Hi Japin,

Thanks, the patch looks good to me. 

However, as a non-native English speaker, it might not be accurate.
Feel free to modify it.


I'm also not good at wording, but I have a minor suggestions like the following :

/*
 * Also, the FETCH FIRST/NEXT ... ROW/ROWS WITH TIES clause cannot be pushed down
 * because:
 * a) The remote system may have a different understanding of equality, which can
 *    result in varying results, such as non-deterministic collations.
 * b) We do not have knowledge of the remote server's version
 *    as this clause is only supported for PG13 and above.
 */
 

Thanks,
Onder
On Thu, 23 May 2024 at 14:54, Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hi Japin,
>
> Thanks, the patch looks good to me.
>
> However, as a non-native English speaker, it might not be accurate.
>> Feel free to modify it.
>>
>>
> I'm also not good at wording, but I have a minor suggestions like the
> following :
>
> /*
>  * Also, the FETCH FIRST/NEXT ... ROW/ROWS WITH TIES clause cannot be
> pushed down
>  * because:
>  * a) The remote system may have a different understanding of equality,
> which can
>  *    result in varying results, such as non-deterministic collations.
>  * b) We do not have knowledge of the remote server's version
>  *    as this clause is only supported for PG13 and above.
>  */
>

Thanks for your review!  Fixed in v5 patch.

--
Regards,
Japin Li


Attachment

Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
Hi,

On Thu, May 23, 2024 at 11:30 PM Japin Li <japinli@hotmail.com> wrote:
> On Thu, 23 May 2024 at 14:54, Önder Kalacı <onderkalaci@gmail.com> wrote:
> > I'm also not good at wording, but I have a minor suggestions like the
> > following :
> >
> > /*
> >  * Also, the FETCH FIRST/NEXT ... ROW/ROWS WITH TIES clause cannot be
> > pushed down
> >  * because:
> >  * a) The remote system may have a different understanding of equality,
> > which can
> >  *    result in varying results, such as non-deterministic collations.
> >  * b) We do not have knowledge of the remote server's version
> >  *    as this clause is only supported for PG13 and above.
> >  */

> Thanks for your review!  Fixed in v5 patch.

I think it is reasonable to refuse to send WITH TIES, but I am
confused about the comments above.  Do we really need to care about a)
here in add_foreign_final_paths()?  If the query has WITH TIES, 1) it
must have ORDER BY as well, which determines what additional rows tie
for the last place in the result set, and 2) ORDER BY must already
have been determined to be safe to push down before we get here.  So
in that case, if getting here, we can consider that WITH TIES is also
safe to push down (if the remote is v13 or later).  No?

Anyway, thank you for working on this issue!

Best regards,
Etsuro Fujita



On Fri, 24 May 2024 at 16:32, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Hi,
>
> On Thu, May 23, 2024 at 11:30 PM Japin Li <japinli@hotmail.com> wrote:
>> On Thu, 23 May 2024 at 14:54, Önder Kalacı <onderkalaci@gmail.com> wrote:
>> > I'm also not good at wording, but I have a minor suggestions like the
>> > following :
>> >
>> > /*
>> >  * Also, the FETCH FIRST/NEXT ... ROW/ROWS WITH TIES clause cannot be
>> > pushed down
>> >  * because:
>> >  * a) The remote system may have a different understanding of equality,
>> > which can
>> >  *    result in varying results, such as non-deterministic collations.
>> >  * b) We do not have knowledge of the remote server's version
>> >  *    as this clause is only supported for PG13 and above.
>> >  */
>
>> Thanks for your review!  Fixed in v5 patch.
>
> I think it is reasonable to refuse to send WITH TIES, but I am
> confused about the comments above.  Do we really need to care about a)
> here in add_foreign_final_paths()?  If the query has WITH TIES, 1) it
> must have ORDER BY as well, which determines what additional rows tie
> for the last place in the result set, and 2) ORDER BY must already
> have been determined to be safe to push down before we get here.  So
> in that case, if getting here, we can consider that WITH TIES is also
> safe to push down (if the remote is v13 or later).  No?
>
> Anyway, thank you for working on this issue!
>

Sorry for the late reply. I'm not familiar with this. However, after some
tests, the COLLATE may influence the result; see the example below.

[local]:535513 postgres=# CREATE TABLE t01 (a text);
CREATE TABLE
[local]:535513 postgres=# CREATE FOREIGN TABLE ft01 (a text) SERVER loopback OPTIONS (table_name 't01');
CREATE FOREIGN TABLE
[local]:535513 postgres=# SELECT * FROM ft01 ORDER BY a COLLATE "en_US" FETCH FIRST 2 ROWS WITH TIES;
   a
-------
 hello
 hello
 hello
(3 rows)

[local]:535513 postgres=# EXPLAIN (verbose) SELECT * FROM ft01 ORDER BY a COLLATE "en_US" FETCH FIRST 2 ROWS WITH TIES;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit  (cost=446.26..446.27 rows=2 width=64)
   Output: a, ((a)::text)
   ->  Sort  (cost=446.26..449.92 rows=1462 width=64)
         Output: a, ((a)::text)
         Sort Key: ft01.a COLLATE "en_US"
         ->  Foreign Scan on public.ft01  (cost=100.00..431.64 rows=1462 width=64)
               Output: a, a
               Remote SQL: SELECT a FROM public.t01
(8 rows)

[local]:535513 postgres=# SELECT * FROM ft01 ORDER BY a FETCH FIRST 2 ROWS WITH TIES;
   a
-------
 hello
 hello
(2 rows)

[local]:535513 postgres=# EXPLAIN (verbose) SELECT * FROM ft01 ORDER BY a FETCH FIRST 2 ROWS WITH TIES;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Foreign Scan on public.ft01  (cost=100.00..100.44 rows=2 width=32)
   Output: a
   Remote SQL: SELECT a FROM public.t01 ORDER BY a ASC NULLS LAST LIMIT 2::bigint
(3 rows)

--
Regards,
Japin Li



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Önder Kalacı
Date:
Hi,

and 2) ORDER BY must already
> have been determined to be safe to push down before we get here.  

When I read the code, the decision for that seems to happen in the next line where this patch proposes to modify:

 /*
* Also, the LIMIT/OFFSET cannot be pushed down, if their expressions are
* not safe to remote.
*/
if (!is_foreign_expr(root, input_rel, (Expr *) parse->limitOffset) ||
!is_foreign_expr(root, input_rel, (Expr *) parse->limitCount))
return;
 
Hence, I thought it is OK to add this new check to this place.


 So
> in that case, if getting here, we can consider that WITH TIES is also
> safe to push down (if the remote is v13 or later). 

 See Tom's response here on why it might not be a good idea to pushdown

I think, at least not with this patch, this patch is like a bug-fix. If intended, it should be possible to
pushdown WITH TIES with a follow-up patch?

Thanks,
Onder


Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
Hi Japin,

On Tue, May 28, 2024 at 11:20 PM Japin Li <japinli@hotmail.com> wrote:
> Sorry for the late reply. I'm not familiar with this. However, after some
> tests, the COLLATE may influence the result; see the example below.

That is true, but my point is that we do not need to worry about
things like that, in *add_foreign_final_paths()*.  I will explain the
reason why below.

> [local]:535513 postgres=# EXPLAIN (verbose) SELECT * FROM ft01 ORDER BY a COLLATE "en_US" FETCH FIRST 2 ROWS WITH
TIES;
>                                     QUERY PLAN
> -----------------------------------------------------------------------------------
>  Limit  (cost=446.26..446.27 rows=2 width=64)
>    Output: a, ((a)::text)
>    ->  Sort  (cost=446.26..449.92 rows=1462 width=64)
>          Output: a, ((a)::text)
>          Sort Key: ft01.a COLLATE "en_US"
>          ->  Foreign Scan on public.ft01  (cost=100.00..431.64 rows=1462 width=64)
>                Output: a, a
>                Remote SQL: SELECT a FROM public.t01
> (8 rows)

> [local]:535513 postgres=# EXPLAIN (verbose) SELECT * FROM ft01 ORDER BY a FETCH FIRST 2 ROWS WITH TIES;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Foreign Scan on public.ft01  (cost=100.00..100.44 rows=2 width=32)
>    Output: a
>    Remote SQL: SELECT a FROM public.t01 ORDER BY a ASC NULLS LAST LIMIT 2::bigint
> (3 rows)

First of all let me briefly explain about how postgres_fdw considers
pushing down the operations.  The core allows it to do so
step-by-step: first ORDER BY and then LIMIT (FETCH in this case).
First, when called for ORDER BY, it executes
add_foreign_ordered_paths() to consider the pushability of ORDER BY.
Then, when called for FETCH, 1) if ORDER BY had been determined to be
safe to push down in the first step, it executes
add_foreign_final_paths() to consider the pushability of LIMIT; 2) if
not, it just gives up on pushing down LIMIT (without executing that
function), because if we can't push ORDER BY, we can't LIMIT either!
I think while the former example would correspond to #2, the latter
example would correspond to #1.

The reason is: if getting to add_foreign_final_paths(), it means that
postgres_fdw determined in the first step that ORDER BY is safe to
push down, so we no longer need to worry that the clause might produce
a different sort order and/or a different set of ties in the remote
side.

Thanks!

Best regards,
Etsuro Fujita



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
Hi Onder,

On Thu, May 30, 2024 at 4:15 PM Önder Kalacı <onderkalaci@gmail.com> wrote:

>> and 2) ORDER BY must already
>> > have been determined to be safe to push down before we get here.

> When I read the code, the decision for that seems to happen in the next line where this patch proposes to modify:

That is not correct; see my previous email.

>>  So
>> > in that case, if getting here, we can consider that WITH TIES is also
>> > safe to push down (if the remote is v13 or later).

>  See Tom's response here on why it might not be a good idea to pushdown
> WITH TIES: https://www.postgresql.org/message-id/2114796.1715878709%40sss.pgh.pa.us
>
> I think, at least not with this patch, this patch is like a bug-fix. If intended, it should be possible to
> pushdown WITH TIES with a follow-up patch?

With all due respect to him, I think he is missing that the set of
ties is determined according to ORDER BY; as explained in that email,
in that case it is guaranteed that ORDER BY is pushable, so WITH TIES
is also pushable with ORDER BY, I think.  We do not currently have a
way to do a remote-version check (without accessing the remote
server), so I agree with you that we should just diable pushing WITH
TIES fow now, though.

Thanks!

Best regards,
Etsuro Fujita



On Thu, 30 May 2024 at 20:56, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Hi Japin,
>
> On Tue, May 28, 2024 at 11:20 PM Japin Li <japinli@hotmail.com> wrote:
>> Sorry for the late reply. I'm not familiar with this. However, after some
>> tests, the COLLATE may influence the result; see the example below.
>
> That is true, but my point is that we do not need to worry about
> things like that, in *add_foreign_final_paths()*.  I will explain the
> reason why below.
>
>> [local]:535513 postgres=# EXPLAIN (verbose) SELECT * FROM ft01 ORDER BY a COLLATE "en_US" FETCH FIRST 2 ROWS WITH
TIES;
>>                                     QUERY PLAN
>> -----------------------------------------------------------------------------------
>>  Limit  (cost=446.26..446.27 rows=2 width=64)
>>    Output: a, ((a)::text)
>>    ->  Sort  (cost=446.26..449.92 rows=1462 width=64)
>>          Output: a, ((a)::text)
>>          Sort Key: ft01.a COLLATE "en_US"
>>          ->  Foreign Scan on public.ft01  (cost=100.00..431.64 rows=1462 width=64)
>>                Output: a, a
>>                Remote SQL: SELECT a FROM public.t01
>> (8 rows)
>
>> [local]:535513 postgres=# EXPLAIN (verbose) SELECT * FROM ft01 ORDER BY a FETCH FIRST 2 ROWS WITH TIES;
>>                                     QUERY PLAN
>> ----------------------------------------------------------------------------------
>>  Foreign Scan on public.ft01  (cost=100.00..100.44 rows=2 width=32)
>>    Output: a
>>    Remote SQL: SELECT a FROM public.t01 ORDER BY a ASC NULLS LAST LIMIT 2::bigint
>> (3 rows)
>
> First of all let me briefly explain about how postgres_fdw considers
> pushing down the operations.  The core allows it to do so
> step-by-step: first ORDER BY and then LIMIT (FETCH in this case).
> First, when called for ORDER BY, it executes
> add_foreign_ordered_paths() to consider the pushability of ORDER BY.
> Then, when called for FETCH, 1) if ORDER BY had been determined to be
> safe to push down in the first step, it executes
> add_foreign_final_paths() to consider the pushability of LIMIT; 2) if
> not, it just gives up on pushing down LIMIT (without executing that
> function), because if we can't push ORDER BY, we can't LIMIT either!
> I think while the former example would correspond to #2, the latter
> example would correspond to #1.
>
> The reason is: if getting to add_foreign_final_paths(), it means that
> postgres_fdw determined in the first step that ORDER BY is safe to
> push down, so we no longer need to worry that the clause might produce
> a different sort order and/or a different set of ties in the remote
> side.
>

Thanks for the explanation!

I think I understand what you mean. We can ensure that the ORDER BY can be
safely pushed down if we are in add_foreign_final_paths().  The reason the
FETCH clause cannot be pushed down is only because the remote may not
support it, right?

--
Regards,
Japin Li



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
On Fri, May 31, 2024 at 10:12 AM Japin Li <japinli@hotmail.com> wrote:
> I think I understand what you mean. We can ensure that the ORDER BY can be
> safely pushed down if we are in add_foreign_final_paths().  The reason the
> FETCH clause cannot be pushed down is only because the remote may not
> support it, right?

Yeah, I think so; for the next person, I would like to propose to
update the comments proposed upthread to something like this:

    /*
     * If the query uses FETCH FIRST .. WITH TIES, 1) it must have ORDER BY as
     * well, which is used to determine which additional rows tie for the last
     * place in the result set, and 2) ORDER BY must already have been
     * determined to be safe to push down before we get here.  So in that case
     * the FETCH clause is safe to push down with ORDER BY if the remote
     * server is v13 or later; but if not, the remote query will fail entirely
     * for lack of support for it.  Since we do not currently have a way to do
     * a remote-version check (without accessing the remote server), disable
     * pushing it for now.
     */

Comments are welcome!

Best regards,
Etsuro Fujita



On Fri, 31 May 2024 at 16:22, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Fri, May 31, 2024 at 10:12 AM Japin Li <japinli@hotmail.com> wrote:
>> I think I understand what you mean. We can ensure that the ORDER BY can be
>> safely pushed down if we are in add_foreign_final_paths().  The reason the
>> FETCH clause cannot be pushed down is only because the remote may not
>> support it, right?
>
> Yeah, I think so; for the next person, I would like to propose to
> update the comments proposed upthread to something like this:
>
>     /*
>      * If the query uses FETCH FIRST .. WITH TIES, 1) it must have ORDER BY as
>      * well, which is used to determine which additional rows tie for the last
>      * place in the result set, and 2) ORDER BY must already have been
>      * determined to be safe to push down before we get here.  So in that case
>      * the FETCH clause is safe to push down with ORDER BY if the remote
>      * server is v13 or later; but if not, the remote query will fail entirely
>      * for lack of support for it.  Since we do not currently have a way to do
>      * a remote-version check (without accessing the remote server), disable
>      * pushing it for now.
>      */
>
> Comments are welcome!
>

Thanks for the rewording!  LGTM.

--
Regrads,
Japin Li



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
On Fri, May 31, 2024 at 6:51 PM Japin Li <japinli@hotmail.com> wrote:
> On Fri, 31 May 2024 at 16:22, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > On Fri, May 31, 2024 at 10:12 AM Japin Li <japinli@hotmail.com> wrote:
> >> I think I understand what you mean. We can ensure that the ORDER BY can be
> >> safely pushed down if we are in add_foreign_final_paths().  The reason the
> >> FETCH clause cannot be pushed down is only because the remote may not
> >> support it, right?
> >
> > Yeah, I think so;

One thing I forgot to mention is the parsing/deparsing issue in
core/postgres_fdw discussed upthread.  From the postgres_fdw side, I
think a simple workaround for that is to enclose
constants-with-the-cast in parentheses.

> > for the next person, I would like to propose to
> > update the comments proposed upthread to something like this:
> >
> >     /*
> >      * If the query uses FETCH FIRST .. WITH TIES, 1) it must have ORDER BY as
> >      * well, which is used to determine which additional rows tie for the last
> >      * place in the result set, and 2) ORDER BY must already have been
> >      * determined to be safe to push down before we get here.  So in that case
> >      * the FETCH clause is safe to push down with ORDER BY if the remote
> >      * server is v13 or later; but if not, the remote query will fail entirely
> >      * for lack of support for it.  Since we do not currently have a way to do
> >      * a remote-version check (without accessing the remote server), disable
> >      * pushing it for now.
> >      */

> Thanks for the rewording!  LGTM.

Done in the attached.

Another thing I noticed is this:

+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c3 >
'00960' ORDER BY t1.c2 FETCH FIRST 2 ROWS WITH TIES;
+                                                           QUERY PLAN

+---------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: c1, c2, c3, c4, c5, c6, c7, c8
+   ->  Foreign Scan on public.ft1 t1
+         Output: c1, c2, c3, c4, c5, c6, c7, c8
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S
1"."T 1" WHERE ((c3 > '00960')) ORDER BY c2 ASC NULLS LAST
+(5 rows)
+
+SELECT * FROM ft1 t1 WHERE t1.c3 > '00960' ORDER BY t1.c2 FETCH FIRST
2 ROWS WITH TIES;
+  c1  | c2 |  c3   |              c4              |            c5
       | c6 |     c7     | c8
+------+----+-------+------------------------------+--------------------------+----+------------+-----
+  970 |  0 | 00970 | Thu Mar 12 00:00:00 1970 PST | Thu Mar 12
00:00:00 1970 | 0  | 0          | foo
+ 1000 |  0 | 01000 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01
00:00:00 1970 | 0  | 0          | foo
+  990 |  0 | 00990 | Wed Apr 01 00:00:00 1970 PST | Wed Apr 01
00:00:00 1970 | 0  | 0          | foo
+  980 |  0 | 00980 | Sun Mar 22 00:00:00 1970 PST | Sun Mar 22
00:00:00 1970 | 0  | 0          | foo
+(4 rows)

* The filter uses column c3, which is of type text, so I think the
test can produce different results when running it against an
already-installed server with non-C locale settings such as the
--icu-locale and --icu-rules options in initdb [1].
* IIUC, the result ordering depends on the implementation of tuple
sorting; if we made some changes to the implementation, the result
ordering might change as well.

Admittedly, these are far-fetched, but better safe than sorry; I
slightly modified the test for better stability.

If there are no objections, I will push the patch.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/docs/current/regress-evaluation.html#REGRESS-EVALUATION-ORDERING-DIFFERENCES



Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
On Wed, Jun 5, 2024 at 10:08 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> If there are no objections, I will push the patch.

Sorry, I forgot to attach the patch.

Best regards,
Etsuro Fujita

Attachment

Re: BUG #18467: postgres_fdw (deparser) ignores LimitOption

From
Etsuro Fujita
Date:
On Wed, Jun 5, 2024 at 10:21 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Wed, Jun 5, 2024 at 10:08 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > If there are no objections, I will push the patch.
>
> Sorry, I forgot to attach the patch.

Pushed and back-patched.

Thanks again Onder for the report and Japin for the patch!

Best regards,
Etsuro Fujita