Thread: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
Gert van Dijk
Date:
Hi,

First of all I want to thank Etsuro Fujita for implementing the exact
feature I was missing in
FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to
foreign tables
(commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the
official Docker
image I noticed an omission that I wanted to report here which may be
relevant for those
like me, using FDW in a typical sharding setup.

By querying purely foreign tables, I can confirm pushing down LIMIT &
ORDER BY is
working as expected on my installation.
However, when I use a typical sharding setup where the main table is
located on the FDW
node, with partitions of foreign tables, this seems not to activate
the new code path. I can
understand that pushing this down is not possible in cases where
*multiple* foreign tables
are to be scanned. However, it also does not work in the case where my
WHERE clause
condition causes to only connect to a *single* foreign table.

Short version of my situation below.

Table definition, typical 'shard by user':
CREATE TABLE my_big_table (
    user_id bigint NOT NULL,
    [ omitted other columns for brevity ]
) PARTITION BY HASH (user_id) ;

create foreign table my_big_table_mod4_s0 partition of my_big_table
    FOR VALUES WITH (MODULUS 4, REMAINDER 0) server shardA
    OPTIONS (table_name 'my_big_table_mod4_s0');

Running

EXPLAIN VERBOSE
SELECT * from my_big_table
WHERE
  user_id = 12345     -- only 1 user --> single foreign table.
ORDER BY serial DESC
LIMIT 10;

yields

 Limit  (cost=927393.08..927395.58 rows=1000 width=32)
   Output: [...]
   ->  Sort  (cost=927393.08..931177.06 rows=1513592 width=32)
         Output: [...]
         Sort Key: my_big_table_mod4_s0.serial DESC
         ->  Foreign Scan on public.my_big_table_mod4_s0
(cost=5318.35..844404.46 rows=1513592 width=32)
               Output: [...]
               Remote SQL: SELECT [...] FROM
public.my_big_table_mod4_s0 WHERE ((user_id = 4560084))

As you can see this is sub-optimal compared to the case where I
directly query the foreign
table.

This started as a Question on DBA.SE, some more information included there:
https://dba.stackexchange.com/q/242358/13155

Full version string used:
PostgreSQL 12beta2 (Debian 12~beta2-1.pgdg100+1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Please let me know if I should provide more information or in what
other way I could
contribute. I'm very much willing to test patches.

(If this is already being worked on or discussed elsewhere on this or
another list, please
excuse me, it seems a bit hard to find relevant results searching the
mailing list archives,
and I'm fairly new to PostgreSQL in general too.)

Thanks,

Gert van Dijk



Re: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
Amit Langote
Date:
(I've added Fujita-san and David Rowley to this discussion as they
will have a better clue about some things I write below.)

On Tue, Jul 9, 2019 at 9:12 AM Gert van Dijk <gertvdijk@gmail.com> wrote:
> First of all I want to thank Etsuro Fujita for implementing the exact
> feature I was missing in
> FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to
> foreign tables
> (commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the
> official Docker
> image I noticed an omission that I wanted to report here which may be
> relevant for those
> like me, using FDW in a typical sharding setup.
>
> By querying purely foreign tables, I can confirm pushing down LIMIT &
> ORDER BY is
> working as expected on my installation.
> However, when I use a typical sharding setup where the main table is
> located on the FDW
> node, with partitions of foreign tables, this seems not to activate
> the new code path. I can
> understand that pushing this down is not possible in cases where
> *multiple* foreign tables
> are to be scanned. However, it also does not work in the case where my
> WHERE clause
> condition causes to only connect to a *single* foreign table.

As far as I can tell, LIMIT cannot be pushed below an Append or
MergeAppend that's used to combine the outputs of individual
partitions, which if I read correctly, you already know.  It's true
that there's no Append/MergeAppend node in the *final* plan of your
example query, because there's only partition to be scanned after
pruning, but the Append/MergeAppend node remains in the plan through
the planning stage where LIMIT is added to the plan and only removed
in the final stage of planning.  The final stage that removes the
Append/MergeAppend doesn't reassess whether the LIMIT on top (if any)
should be applied to the partition directly, which means the
partition's FDW never gets to see the LIMIT.

> (If this is already being worked on or discussed elsewhere on this or
> another list, please
> excuse me, it seems a bit hard to find relevant results searching the
> mailing list archives,
> and I'm fairly new to PostgreSQL in general too.)

I don't know of any ongoing work to address this either.  The ability
to remove unnecessary Append/MergeAppend was also added in PG 12
(thanks to David and Tom Lane), along with LIMIT push down to foreign
server (thanks to Fujita-san), but the way former is implemented
prevents the two from working together.  Maybe, there's a plan to fix
that in the future.

Thanks,
Amit



Re: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
David Rowley
Date:
On Tue, 9 Jul 2019 at 16:56, Amit Langote <amitlangote09@gmail.com> wrote:
>
> (I've added Fujita-san and David Rowley to this discussion as they
> will have a better clue about some things I write below.)
>
> On Tue, Jul 9, 2019 at 9:12 AM Gert van Dijk <gertvdijk@gmail.com> wrote:
> > First of all I want to thank Etsuro Fujita for implementing the exact
> > feature I was missing in
> > FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to
> > foreign tables
> > (commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the
> > official Docker
> > image I noticed an omission that I wanted to report here which may be
> > relevant for those
> > like me, using FDW in a typical sharding setup.
> >
> > By querying purely foreign tables, I can confirm pushing down LIMIT &
> > ORDER BY is
> > working as expected on my installation.
> > However, when I use a typical sharding setup where the main table is
> > located on the FDW
> > node, with partitions of foreign tables, this seems not to activate
> > the new code path. I can
> > understand that pushing this down is not possible in cases where
> > *multiple* foreign tables
> > are to be scanned. However, it also does not work in the case where my
> > WHERE clause
> > condition causes to only connect to a *single* foreign table.
>
> As far as I can tell, LIMIT cannot be pushed below an Append or
> MergeAppend that's used to combine the outputs of individual
> partitions, which if I read correctly, you already know.  It's true
> that there's no Append/MergeAppend node in the *final* plan of your
> example query, because there's only partition to be scanned after
> pruning, but the Append/MergeAppend node remains in the plan through
> the planning stage where LIMIT is added to the plan and only removed
> in the final stage of planning.  The final stage that removes the
> Append/MergeAppend doesn't reassess whether the LIMIT on top (if any)
> should be applied to the partition directly, which means the
> partition's FDW never gets to see the LIMIT.

I'm not so sure it's true that it's not possible to push the LIMIT
below an Append/MergeAppend node. It seems perfectly fine to me,
However, if there is more than 1 subnode to the Append/MergeAppend,
then we'd need to keep the top-level LIMIT in place to ensure we don't
output too many rows.

In any case, this is not a bug, so we really shouldn't discuss on
-bugs. It just seems like a limitation of d50d172e51 to me. The
setrefs.c code added in 8edd0e79 always gets rid of the
Append/MergeAppend when there's just 1 subnode, so it does not seem
that unreasonable that planner code that's called before that could
assume that such an Append/MergeAppend path would not make it into the
final plan. It could do whatever work that it needs to on the single
subpath instead.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
Amit Langote
Date:
On Wed, Jul 10, 2019 at 9:38 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On Tue, 9 Jul 2019 at 16:56, Amit Langote <amitlangote09@gmail.com> wrote:
> > As far as I can tell, LIMIT cannot be pushed below an Append or
> > MergeAppend that's used to combine the outputs of individual
> > partitions, which if I read correctly, you already know.  It's true
> > that there's no Append/MergeAppend node in the *final* plan of your
> > example query, because there's only partition to be scanned after
> > pruning, but the Append/MergeAppend node remains in the plan through
> > the planning stage where LIMIT is added to the plan and only removed
> > in the final stage of planning.  The final stage that removes the
> > Append/MergeAppend doesn't reassess whether the LIMIT on top (if any)
> > should be applied to the partition directly, which means the
> > partition's FDW never gets to see the LIMIT.
>
> I'm not so sure it's true that it's not possible to push the LIMIT
> below an Append/MergeAppend node.  It seems perfectly fine to me,
> However, if there is more than 1 subnode to the Append/MergeAppend,
> then we'd need to keep the top-level LIMIT in place to ensure we don't
> output too many rows.

Yeah, I really meant to say that Postgres *currently doesn't* push
LIMIT under Append/MergeAppend.

> In any case, this is not a bug, so we really shouldn't discuss on
> -bugs. It just seems like a limitation of d50d172e51 to me. The
> setrefs.c code added in 8edd0e79 always gets rid of the
> Append/MergeAppend when there's just 1 subnode, so it does not seem
> that unreasonable that planner code that's called before that could
> assume that such an Append/MergeAppend path would not make it into the
> final plan. It could do whatever work that it needs to on the single
> subpath instead.

I see.  Agree that this is not a bug of either of the commits I mentioned.

However, rather than calling this a limitation of d50d172e51, which
IIUC was an FDW-specific effort, I'd say that we lack the feature to
push LIMIT under Append/MergeAppend.  If we had that feature, then
much like in the case of grouping that can be pushed under
Append/MergeAppend (at least for partitioned tables), we wouldn't need
do anything special for the single-child cases.

Thanks,
Amit



Re: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
Etsuro Fujita
Date:
On Wed, Jul 10, 2019 at 5:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Jul 10, 2019 at 9:38 AM David Rowley
> <david.rowley@2ndquadrant.com> wrote:
> > In any case, this is not a bug, so we really shouldn't discuss on
> > -bugs. It just seems like a limitation of d50d172e51 to me. The
> > setrefs.c code added in 8edd0e79 always gets rid of the
> > Append/MergeAppend when there's just 1 subnode, so it does not seem
> > that unreasonable that planner code that's called before that could
> > assume that such an Append/MergeAppend path would not make it into the
> > final plan. It could do whatever work that it needs to on the single
> > subpath instead.
>
> I see.  Agree that this is not a bug of either of the commits I mentioned.

I think so too.

> However, rather than calling this a limitation of d50d172e51, which
> IIUC was an FDW-specific effort, I'd say that we lack the feature to
> push LIMIT under Append/MergeAppend.  If we had that feature, then
> much like in the case of grouping that can be pushed under
> Append/MergeAppend (at least for partitioned tables), we wouldn't need
> do anything special for the single-child cases.

Yeah, that's really what I'm thinking: in other words, partitionwise
limit restriction.  I'd like to work on it for PG13 if I have time.

Best regards,
Etsuro Fujita

PS: in the StackExchange site, Gert van Dijk stated this:

Running ANALYZE against the foreign table (on FDW instance). Takes a
huge amount of time; looks like it's full tablescanning the remote
table?

Yeah, that's right; when analyzing a remote table, postgres_fdw
retrieves all rows form the remote table, which would take a long time
if the remote table is large.  So I'm planning to work on this issue
for PG13.

[1]
https://dba.stackexchange.com/questions/242358/how-do-i-get-postgresql-fdw-to-push-down-the-limit-to-the-single-backend-serve



Re: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
Gert van Dijk
Date:
On Wed, Jul 10, 2019 at 11:35 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
>
> On Wed, Jul 10, 2019 at 5:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Wed, Jul 10, 2019 at 9:38 AM David Rowley
> > <david.rowley@2ndquadrant.com> wrote:
> > > In any case, this is not a bug, so we really shouldn't discuss on
> > > -bugs. It just seems like a limitation of d50d172e51 to me. The
> > > setrefs.c code added in 8edd0e79 always gets rid of the
> > > Append/MergeAppend when there's just 1 subnode, so it does not seem
> > > that unreasonable that planner code that's called before that could
> > > assume that such an Append/MergeAppend path would not make it into the
> > > final plan. It could do whatever work that it needs to on the single
> > > subpath instead.
> >
> > I see.  Agree that this is not a bug of either of the commits I mentioned.
>
> I think so too.
>
> > However, rather than calling this a limitation of d50d172e51, which
> > IIUC was an FDW-specific effort, I'd say that we lack the feature to
> > push LIMIT under Append/MergeAppend.  If we had that feature, then
> > much like in the case of grouping that can be pushed under
> > Append/MergeAppend (at least for partitioned tables), we wouldn't need
> > do anything special for the single-child cases.

Thanks for all of your explanatory words on this. I also now see how
this is more a feature request, sorry about that. As I mentioned, I am
still quite new to PostgreSQL, and I thought I had identified a
'simple' omission in a recently merged feature for which I expected
too much. I now learned things are a bit more complex in that regard;
I wrongly assumed that FDW can be a query router for my case at this
point in time at least.

Would you like me to report it elsewhere now? And what would have been
the right place? -general, -performance, -hackers, ...?

> Yeah, that's really what I'm thinking: in other words, partitionwise
> limit restriction.  I'd like to work on it for PG13 if I have time.
>
> Best regards,
> Etsuro Fujita
>
> PS: in the StackExchange site, Gert van Dijk stated this:
>
> Running ANALYZE against the foreign table (on FDW instance). Takes a
> huge amount of time; looks like it's full tablescanning the remote
> table?
>
> Yeah, that's right; when analyzing a remote table, postgres_fdw
> retrieves all rows form the remote table, which would take a long time
> if the remote table is large.  So I'm planning to work on this issue
> for PG13.
>
> [1]
https://dba.stackexchange.com/questions/242358/how-do-i-get-postgresql-fdw-to-push-down-the-limit-to-the-single-backend-serve

Cool, thanks again!

Regards,

Gert



Re: FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From
David Rowley
Date:
On Wed, 10 Jul 2019 at 21:39, Gert van Dijk <gertvdijk@gmail.com> wrote:
> Would you like me to report it elsewhere now? And what would have been
> the right place? -general, -performance, -hackers, ...?

I think you've done the right thing so far. If you thought it was a
bug then this is the right place to report that.

-hackers would be the right place for you to start a thread if you
were planning on working on the feature yourself. It's not really a
place to go to make feature requests. In any case, Fujita-san already
mentioned that he'd like to work on it for v13, so another thread
anywhere is unlikely to be useful until there's something further to
discuss.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services