Thread: FDW does not push down LIMIT & ORDER BY with sharding (partitions)
Hi, First of all I want to thank Etsuro Fujita for implementing the exact feature I was missing in FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to foreign tables (commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the official Docker image I noticed an omission that I wanted to report here which may be relevant for those like me, using FDW in a typical sharding setup. By querying purely foreign tables, I can confirm pushing down LIMIT & ORDER BY is working as expected on my installation. However, when I use a typical sharding setup where the main table is located on the FDW node, with partitions of foreign tables, this seems not to activate the new code path. I can understand that pushing this down is not possible in cases where *multiple* foreign tables are to be scanned. However, it also does not work in the case where my WHERE clause condition causes to only connect to a *single* foreign table. Short version of my situation below. Table definition, typical 'shard by user': CREATE TABLE my_big_table ( user_id bigint NOT NULL, [ omitted other columns for brevity ] ) PARTITION BY HASH (user_id) ; create foreign table my_big_table_mod4_s0 partition of my_big_table FOR VALUES WITH (MODULUS 4, REMAINDER 0) server shardA OPTIONS (table_name 'my_big_table_mod4_s0'); Running EXPLAIN VERBOSE SELECT * from my_big_table WHERE user_id = 12345 -- only 1 user --> single foreign table. ORDER BY serial DESC LIMIT 10; yields Limit (cost=927393.08..927395.58 rows=1000 width=32) Output: [...] -> Sort (cost=927393.08..931177.06 rows=1513592 width=32) Output: [...] Sort Key: my_big_table_mod4_s0.serial DESC -> Foreign Scan on public.my_big_table_mod4_s0 (cost=5318.35..844404.46 rows=1513592 width=32) Output: [...] Remote SQL: SELECT [...] FROM public.my_big_table_mod4_s0 WHERE ((user_id = 4560084)) As you can see this is sub-optimal compared to the case where I directly query the foreign table. This started as a Question on DBA.SE, some more information included there: https://dba.stackexchange.com/q/242358/13155 Full version string used: PostgreSQL 12beta2 (Debian 12~beta2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit Please let me know if I should provide more information or in what other way I could contribute. I'm very much willing to test patches. (If this is already being worked on or discussed elsewhere on this or another list, please excuse me, it seems a bit hard to find relevant results searching the mailing list archives, and I'm fairly new to PostgreSQL in general too.) Thanks, Gert van Dijk
(I've added Fujita-san and David Rowley to this discussion as they will have a better clue about some things I write below.) On Tue, Jul 9, 2019 at 9:12 AM Gert van Dijk <gertvdijk@gmail.com> wrote: > First of all I want to thank Etsuro Fujita for implementing the exact > feature I was missing in > FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to > foreign tables > (commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the > official Docker > image I noticed an omission that I wanted to report here which may be > relevant for those > like me, using FDW in a typical sharding setup. > > By querying purely foreign tables, I can confirm pushing down LIMIT & > ORDER BY is > working as expected on my installation. > However, when I use a typical sharding setup where the main table is > located on the FDW > node, with partitions of foreign tables, this seems not to activate > the new code path. I can > understand that pushing this down is not possible in cases where > *multiple* foreign tables > are to be scanned. However, it also does not work in the case where my > WHERE clause > condition causes to only connect to a *single* foreign table. As far as I can tell, LIMIT cannot be pushed below an Append or MergeAppend that's used to combine the outputs of individual partitions, which if I read correctly, you already know. It's true that there's no Append/MergeAppend node in the *final* plan of your example query, because there's only partition to be scanned after pruning, but the Append/MergeAppend node remains in the plan through the planning stage where LIMIT is added to the plan and only removed in the final stage of planning. The final stage that removes the Append/MergeAppend doesn't reassess whether the LIMIT on top (if any) should be applied to the partition directly, which means the partition's FDW never gets to see the LIMIT. > (If this is already being worked on or discussed elsewhere on this or > another list, please > excuse me, it seems a bit hard to find relevant results searching the > mailing list archives, > and I'm fairly new to PostgreSQL in general too.) I don't know of any ongoing work to address this either. The ability to remove unnecessary Append/MergeAppend was also added in PG 12 (thanks to David and Tom Lane), along with LIMIT push down to foreign server (thanks to Fujita-san), but the way former is implemented prevents the two from working together. Maybe, there's a plan to fix that in the future. Thanks, Amit
On Tue, 9 Jul 2019 at 16:56, Amit Langote <amitlangote09@gmail.com> wrote: > > (I've added Fujita-san and David Rowley to this discussion as they > will have a better clue about some things I write below.) > > On Tue, Jul 9, 2019 at 9:12 AM Gert van Dijk <gertvdijk@gmail.com> wrote: > > First of all I want to thank Etsuro Fujita for implementing the exact > > feature I was missing in > > FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to > > foreign tables > > (commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the > > official Docker > > image I noticed an omission that I wanted to report here which may be > > relevant for those > > like me, using FDW in a typical sharding setup. > > > > By querying purely foreign tables, I can confirm pushing down LIMIT & > > ORDER BY is > > working as expected on my installation. > > However, when I use a typical sharding setup where the main table is > > located on the FDW > > node, with partitions of foreign tables, this seems not to activate > > the new code path. I can > > understand that pushing this down is not possible in cases where > > *multiple* foreign tables > > are to be scanned. However, it also does not work in the case where my > > WHERE clause > > condition causes to only connect to a *single* foreign table. > > As far as I can tell, LIMIT cannot be pushed below an Append or > MergeAppend that's used to combine the outputs of individual > partitions, which if I read correctly, you already know. It's true > that there's no Append/MergeAppend node in the *final* plan of your > example query, because there's only partition to be scanned after > pruning, but the Append/MergeAppend node remains in the plan through > the planning stage where LIMIT is added to the plan and only removed > in the final stage of planning. The final stage that removes the > Append/MergeAppend doesn't reassess whether the LIMIT on top (if any) > should be applied to the partition directly, which means the > partition's FDW never gets to see the LIMIT. I'm not so sure it's true that it's not possible to push the LIMIT below an Append/MergeAppend node. It seems perfectly fine to me, However, if there is more than 1 subnode to the Append/MergeAppend, then we'd need to keep the top-level LIMIT in place to ensure we don't output too many rows. In any case, this is not a bug, so we really shouldn't discuss on -bugs. It just seems like a limitation of d50d172e51 to me. The setrefs.c code added in 8edd0e79 always gets rid of the Append/MergeAppend when there's just 1 subnode, so it does not seem that unreasonable that planner code that's called before that could assume that such an Append/MergeAppend path would not make it into the final plan. It could do whatever work that it needs to on the single subpath instead. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jul 10, 2019 at 9:38 AM David Rowley <david.rowley@2ndquadrant.com> wrote: > On Tue, 9 Jul 2019 at 16:56, Amit Langote <amitlangote09@gmail.com> wrote: > > As far as I can tell, LIMIT cannot be pushed below an Append or > > MergeAppend that's used to combine the outputs of individual > > partitions, which if I read correctly, you already know. It's true > > that there's no Append/MergeAppend node in the *final* plan of your > > example query, because there's only partition to be scanned after > > pruning, but the Append/MergeAppend node remains in the plan through > > the planning stage where LIMIT is added to the plan and only removed > > in the final stage of planning. The final stage that removes the > > Append/MergeAppend doesn't reassess whether the LIMIT on top (if any) > > should be applied to the partition directly, which means the > > partition's FDW never gets to see the LIMIT. > > I'm not so sure it's true that it's not possible to push the LIMIT > below an Append/MergeAppend node. It seems perfectly fine to me, > However, if there is more than 1 subnode to the Append/MergeAppend, > then we'd need to keep the top-level LIMIT in place to ensure we don't > output too many rows. Yeah, I really meant to say that Postgres *currently doesn't* push LIMIT under Append/MergeAppend. > In any case, this is not a bug, so we really shouldn't discuss on > -bugs. It just seems like a limitation of d50d172e51 to me. The > setrefs.c code added in 8edd0e79 always gets rid of the > Append/MergeAppend when there's just 1 subnode, so it does not seem > that unreasonable that planner code that's called before that could > assume that such an Append/MergeAppend path would not make it into the > final plan. It could do whatever work that it needs to on the single > subpath instead. I see. Agree that this is not a bug of either of the commits I mentioned. However, rather than calling this a limitation of d50d172e51, which IIUC was an FDW-specific effort, I'd say that we lack the feature to push LIMIT under Append/MergeAppend. If we had that feature, then much like in the case of grouping that can be pushed under Append/MergeAppend (at least for partitioned tables), we wouldn't need do anything special for the single-child cases. Thanks, Amit
On Wed, Jul 10, 2019 at 5:48 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Jul 10, 2019 at 9:38 AM David Rowley > <david.rowley@2ndquadrant.com> wrote: > > In any case, this is not a bug, so we really shouldn't discuss on > > -bugs. It just seems like a limitation of d50d172e51 to me. The > > setrefs.c code added in 8edd0e79 always gets rid of the > > Append/MergeAppend when there's just 1 subnode, so it does not seem > > that unreasonable that planner code that's called before that could > > assume that such an Append/MergeAppend path would not make it into the > > final plan. It could do whatever work that it needs to on the single > > subpath instead. > > I see. Agree that this is not a bug of either of the commits I mentioned. I think so too. > However, rather than calling this a limitation of d50d172e51, which > IIUC was an FDW-specific effort, I'd say that we lack the feature to > push LIMIT under Append/MergeAppend. If we had that feature, then > much like in the case of grouping that can be pushed under > Append/MergeAppend (at least for partitioned tables), we wouldn't need > do anything special for the single-child cases. Yeah, that's really what I'm thinking: in other words, partitionwise limit restriction. I'd like to work on it for PG13 if I have time. Best regards, Etsuro Fujita PS: in the StackExchange site, Gert van Dijk stated this: Running ANALYZE against the foreign table (on FDW instance). Takes a huge amount of time; looks like it's full tablescanning the remote table? Yeah, that's right; when analyzing a remote table, postgres_fdw retrieves all rows form the remote table, which would take a long time if the remote table is large. So I'm planning to work on this issue for PG13. [1] https://dba.stackexchange.com/questions/242358/how-do-i-get-postgresql-fdw-to-push-down-the-limit-to-the-single-backend-serve
On Wed, Jul 10, 2019 at 11:35 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Jul 10, 2019 at 5:48 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Jul 10, 2019 at 9:38 AM David Rowley > > <david.rowley@2ndquadrant.com> wrote: > > > In any case, this is not a bug, so we really shouldn't discuss on > > > -bugs. It just seems like a limitation of d50d172e51 to me. The > > > setrefs.c code added in 8edd0e79 always gets rid of the > > > Append/MergeAppend when there's just 1 subnode, so it does not seem > > > that unreasonable that planner code that's called before that could > > > assume that such an Append/MergeAppend path would not make it into the > > > final plan. It could do whatever work that it needs to on the single > > > subpath instead. > > > > I see. Agree that this is not a bug of either of the commits I mentioned. > > I think so too. > > > However, rather than calling this a limitation of d50d172e51, which > > IIUC was an FDW-specific effort, I'd say that we lack the feature to > > push LIMIT under Append/MergeAppend. If we had that feature, then > > much like in the case of grouping that can be pushed under > > Append/MergeAppend (at least for partitioned tables), we wouldn't need > > do anything special for the single-child cases. Thanks for all of your explanatory words on this. I also now see how this is more a feature request, sorry about that. As I mentioned, I am still quite new to PostgreSQL, and I thought I had identified a 'simple' omission in a recently merged feature for which I expected too much. I now learned things are a bit more complex in that regard; I wrongly assumed that FDW can be a query router for my case at this point in time at least. Would you like me to report it elsewhere now? And what would have been the right place? -general, -performance, -hackers, ...? > Yeah, that's really what I'm thinking: in other words, partitionwise > limit restriction. I'd like to work on it for PG13 if I have time. > > Best regards, > Etsuro Fujita > > PS: in the StackExchange site, Gert van Dijk stated this: > > Running ANALYZE against the foreign table (on FDW instance). Takes a > huge amount of time; looks like it's full tablescanning the remote > table? > > Yeah, that's right; when analyzing a remote table, postgres_fdw > retrieves all rows form the remote table, which would take a long time > if the remote table is large. So I'm planning to work on this issue > for PG13. > > [1] https://dba.stackexchange.com/questions/242358/how-do-i-get-postgresql-fdw-to-push-down-the-limit-to-the-single-backend-serve Cool, thanks again! Regards, Gert
On Wed, 10 Jul 2019 at 21:39, Gert van Dijk <gertvdijk@gmail.com> wrote: > Would you like me to report it elsewhere now? And what would have been > the right place? -general, -performance, -hackers, ...? I think you've done the right thing so far. If you thought it was a bug then this is the right place to report that. -hackers would be the right place for you to start a thread if you were planning on working on the feature yourself. It's not really a place to go to make feature requests. In any case, Fujita-san already mentioned that he'd like to work on it for v13, so another thread anywhere is unlikely to be useful until there's something further to discuss. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services