Thread: Same query 10000x More Time
Hi
I have postgres_fdw table called tbl_link. The source table is 2.5 GB in size with 122 lines (some lines has 70MB bytea column, but not the ones I select in the example)
I noticed that when I put the specific ids in the list "where id in (140,144,148)" it works fast (few ms), but when I put the same list as select "where id in (select 140 as id union select 144 union select 148)" it takes 50 seconds. This select union is just for the example, I obviously have a different select (which by itself takes few ms but cause the whole insert query to take 10000x more time)
Why is that? How can I still use regular select and still get reasonable response time?
Thanks
FAST:
select lnk.*
into local_1
from tbl_link lnk
where id in (140,144,148)
"Foreign Scan on tbl_link lnk (cost=100.00..111.61 rows=3 width=700) (actual time=4.161..4.167 rows=3 loops=1)"
"Planning Time: 0.213 ms"
"Execution Time: 16.251 ms"
SLOW:
select lnk.*
into local_1
from tbl_link lnk
where id in (select 140 as id union select 144 union select 148)
"Hash Join (cost=100.18..113.88 rows=3 width=700) (actual time=45398.721..46812.100 rows=3 loops=1)"
" Hash Cond: (lnk.id = (140))"
" -> Foreign Scan on tbl_link lnk (cost=100.00..113.39 rows=113 width=700) (actual time=45398.680..46812.026 rows=112 loops=1)"
" -> Hash (cost=0.14..0.14 rows=3 width=4) (actual time=0.023..0.026 rows=3 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> HashAggregate (cost=0.08..0.11 rows=3 width=4) (actual time=0.017..0.021 rows=3 loops=1)"
" Group Key: (140)"
" Batches: 1 Memory Usage: 24kB"
" -> Append (cost=0.00..0.07 rows=3 width=4) (actual time=0.005..0.009 rows=3 loops=1)"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1)"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.001 rows=1 loops=1)"
"Planning Time: 0.541 ms"
"Execution Time: 46827.945 ms"
Hi
I have postgres_fdw table called tbl_link. The source table is 2.5 GB in size with 122 lines (some lines has 70MB bytea column, but not the ones I select in the example)
I noticed that when I put the specific ids in the list "where id in (140,144,148)" it works fast (few ms), but when I put the same list as select "where id in (select 140 as id union select 144 union select 148)" it takes 50 seconds. This select union is just for the example, I obviously have a different select (which by itself takes few ms but cause the whole insert query to take 10000x more time)
Why is that? How can I still use regular select and still get reasonable response time?
Thanks
fetch_size
At Thu, 6 Jan 2022 13:50:55 +0530, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote in > On Thu, 6 Jan 2022 at 13:13, Avi Weinberg <AviW@gilat.com> wrote: > > > Hi > > > > > > > > I have postgres_fdw table called tbl_link. The source table is 2.5 GB in > > size with 122 lines (some lines has 70MB bytea column, but not the ones I > > select in the example) > > > > I noticed that when I put the specific ids in the list "where id in > > (140,144,148)" it works fast (few ms), but when I put the same list as > > select "where id in (select 140 as id union select 144 union select 148)" > > it takes 50 seconds. This select union is just for the example, I > > obviously have a different select (which by itself takes few ms but cause > > the whole insert query to take 10000x more time) > > > > > > > > Why is that? How can I still use regular select and still get reasonable > > response time? > > > > > > > > Thanks > > > > > > > > couple of things: > PostgreSQL: Documentation: 14: F.35. postgres_fdw > <https://www.postgresql.org/docs/current/postgres-fdw.html> > <https://www.postgresql.org/docs/current/postgres-fdw.html>when you set > your foreign server what are your > use_remote_estimate > fetch_size > params for the foreign server. > > you need to know there are certain restrictions on what gets pushed down to > the remote server > i generally use postgres/postgres_fdw.sql at master · postgres/postgres > (github.com) > <https://github.com/postgres/postgres/blob/master/contrib/postgres_fdw/sql/postgres_fdw.sql> > as > a reference > if you predicates are not pushed down, it will bring all the rows from the > foreign server to your local server (and fetch_size value and network io > will add to delay) > and given you used select * , it will be a lot of io, so maybe restrict > only to columns needed after being filtered would help. > > > you can try by running > explain (verbose,analyze) query and then also enabling log_statement = > 'all' / log_min_duration_statement = 0 > on the foreign server to see the actual plan for the foreign scan. > > That might help in trouble shooting. > > > as always, i have little production exposure. If i am wrong, i can be > corrected. In this specific case, the FAST query doesn't contain a join and its predicate can be pushed down to remote. On the other hand the SLOW one contains a join. The planner considers remote join only when the both hands of a join are on the same foreign server. Tthis is not the case since the inner subquery is not even a foreign scan. The planner doesn't consider the possibility that a subquery is executable anywhere. As the result, the local inevitably draw all rows from remote table to join with the result of the subquery on-local, which should be quite slow. It could be improved, but I don't think we are going to consider that case because the SLOW query seems like a kind of bad query, which can be improved by rewriting to the FAST one. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi Kyotaro Horiguchi and Vijaykumar Jain, Thanks for your quick reply! I understand that the fact the slow query has a join caused this problem. However, why can't Postgres evaluate the tableof the "IN" clause (select 140 as id union select 144 union select 148) and based on its size decide what is more optimal. Push the local table to the linked server to perform the join on the linked server Pull the linked server table to local to perform the join on the local. In my case the table size of the local is million times smaller than the table size of the remote. select lnk.* into local_1 from tbl_link lnk where id in (select 140 as id union select 144 union select 148) -----Original Message----- From: Kyotaro Horiguchi [mailto:horikyota.ntt@gmail.com] Sent: Thursday, January 6, 2022 11:39 AM To: vijaykumarjain.github@gmail.com Cc: Avi Weinberg <AviW@gilat.com>; pgsql-performance@postgresql.org Subject: Re: Same query 10000x More Time At Thu, 6 Jan 2022 13:50:55 +0530, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote in > On Thu, 6 Jan 2022 at 13:13, Avi Weinberg <AviW@gilat.com> wrote: > > > Hi > > > > > > > > I have postgres_fdw table called tbl_link. The source table is 2.5 > > GB in size with 122 lines (some lines has 70MB bytea column, but not > > the ones I select in the example) > > > > I noticed that when I put the specific ids in the list "where id in > > (140,144,148)" it works fast (few ms), but when I put the same list > > as select "where id in (select 140 as id union select 144 union select 148)" > > it takes 50 seconds. This select union is just for the example, I > > obviously have a different select (which by itself takes few ms but > > cause the whole insert query to take 10000x more time) > > > > > > > > Why is that? How can I still use regular select and still get > > reasonable response time? > > > > > > > > Thanks > > > > > > > > couple of things: > PostgreSQL: Documentation: 14: F.35. postgres_fdw > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww > .postgresql.org%2Fdocs%2Fcurrent%2Fpostgres-fdw.html&data=04%7C01% > 7Caviw%40gilat.com%7Cc8585d2ddbeb4a09e3e208d9d0f8684c%7C7300b1a3573a40 > 1092a61c65cd85e927%7C0%7C0%7C637770587595033327%7CUnknown%7CTWFpbGZsb3 > d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7 > C3000&sdata=bVBCIOkXrVkkI%2BDH44QmAZmm%2FJLz%2FWYp5Wda%2FrJRfDA%3D > &reserved=0> > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww > .postgresql.org%2Fdocs%2Fcurrent%2Fpostgres-fdw.html&data=04%7C01% > 7Caviw%40gilat.com%7Cc8585d2ddbeb4a09e3e208d9d0f8684c%7C7300b1a3573a40 > 1092a61c65cd85e927%7C0%7C0%7C637770587595033327%7CUnknown%7CTWFpbGZsb3 > d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bVBCIOkXrVkkI%2BDH44QmAZmm%2FJLz%2FWYp5Wda%2FrJRfDA%3D&reserved=0>when youset your foreign server what are your use_remote_estimate fetch_size params for the foreign server. > > you need to know there are certain restrictions on what gets pushed > down to the remote server i generally use postgres/postgres_fdw.sql at > master * postgres/postgres > (github.com) > <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit > hub.com%2Fpostgres%2Fpostgres%2Fblob%2Fmaster%2Fcontrib%2Fpostgres_fdw > %2Fsql%2Fpostgres_fdw.sql&data=04%7C01%7Caviw%40gilat.com%7Cc8585d > 2ddbeb4a09e3e208d9d0f8684c%7C7300b1a3573a401092a61c65cd85e927%7C0%7C0% > 7C637770587595033327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=TzqeuCMrThZ > RUkq9m%2F97N8bRgm9wu3VFjTnoZpt%2BA7w%3D&reserved=0> > as > a reference > if you predicates are not pushed down, it will bring all the rows from > the foreign server to your local server (and fetch_size value and > network io will add to delay) and given you used select * , it will be > a lot of io, so maybe restrict only to columns needed after being > filtered would help. > > > you can try by running > explain (verbose,analyze) query and then also enabling log_statement > = 'all' / log_min_duration_statement = 0 on the foreign server to see > the actual plan for the foreign scan. > > That might help in trouble shooting. > > > as always, i have little production exposure. If i am wrong, i can be > corrected. In this specific case, the FAST query doesn't contain a join and its predicate can be pushed down to remote. On the otherhand the SLOW one contains a join. The planner considers remote join only when the both hands of a join are on thesame foreign server. Tthis is not the case since the inner subquery is not even a foreign scan. The planner doesn'tconsider the possibility that a subquery is executable anywhere. As the result, the local inevitably draw all rows from remote table to join with the result of the subquery on-local, whichshould be quite slow. It could be improved, but I don't think we are going to consider that case because the SLOW query seems like a kind of badquery, which can be improved by rewriting to the FAST one. regards. -- Kyotaro Horiguchi NTT Open Source Software Center IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information whichis confidential or privileged. If you are not the intended recipient, please inform the sender immediately and deletethis email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
Hi Kyotaro Horiguchi and Vijaykumar Jain,
Thanks for your quick reply!
I understand that the fact the slow query has a join caused this problem. However, why can't Postgres evaluate the table of the "IN" clause (select 140 as id union select 144 union select 148) and based on its size decide what is more optimal.
Push the local table to the linked server to perform the join on the linked server
Pull the linked server table to local to perform the join on the local.
In my case the table size of the local is million times smaller than the table size of the remote.
Thanks for the input
postgres_fdw seems to bring the entire table even if all I use in the join is just the id from the remote table. I know it is possible to query for the missing ids and then perform the delete, but I wonder why all types of joins are so inefficient.
DELETE FROM tbl_local lcl
WHERE NOT EXISTS (
SELECT id FROM tbl_link lnk
WHERE lnk.id = lcl.id );
"Delete on tbl_local lcl (cost=114.59..122.14 rows=3 width=730) (actual time=62153.636..62153.639 rows=0 loops=1)"
" -> Hash Anti Join (cost=114.59..122.14 rows=3 width=730) (actual time=62153.633..62153.636 rows=0 loops=1)"
" Hash Cond: (lcl.id = lnk.id)"
" -> Seq Scan on tbl_local lcl (cost=0.00..7.11 rows=111 width=14) (actual time=0.022..0.062 rows=111 loops=1)"
" -> Hash (cost=113.24..113.24 rows=108 width=732) (actual time=55984.489..55984.490 rows=112 loops=1)"
" Buckets: 1024 (originally 1024) Batches: 32 (originally 1) Memory Usage: 240024kB"
" -> Foreign Scan on tbl_link lnk (cost=100.00..113.24 rows=108 width=732) (actual time=48505.926..51893.668 rows=112 loops=1)"
"Planning Time: 0.237 ms"
"Execution Time: 62184.253 ms"
From: Vijaykumar Jain [mailto:vijaykumarjain.github@gmail.com]
Sent: Thursday, January 6, 2022 2:53 PM
To: Avi Weinberg <AviW@gilat.com>
Cc: Kyotaro Horiguchi <horikyota.ntt@gmail.com>; pgsql-performa. <pgsql-performance@postgresql.org>
Subject: Re: Same query 10000x More Time
On Thu, Jan 6, 2022, 3:50 PM Avi Weinberg <AviW@gilat.com> wrote:
Hi Kyotaro Horiguchi and Vijaykumar Jain,
Thanks for your quick reply!
I understand that the fact the slow query has a join caused this problem. However, why can't Postgres evaluate the table of the "IN" clause (select 140 as id union select 144 union select 148) and based on its size decide what is more optimal.
Push the local table to the linked server to perform the join on the linked server
Pull the linked server table to local to perform the join on the local.
In my case the table size of the local is million times smaller than the table size of the remote.
I understand when the optimizer makes a decision it uses stats to use the least expensive plan to get the result.
I can reply but I am pretty sure making an analogy to a local setup of big and small table is not the same as small local table and a big remote table.
I would leave it to the experts here unless you are open to read the src for postgres_fdw extension.
There must be a reason if that is beyond cost calculation as to why this happens.
Else if this is all just cost based, you can try tweaking the cost params and see if you can get a better plan.
For exp, if you force parallel cost to 0 on the foreign server, it may use parallel workers and do some speed up, but given my exp, fighting optimizer is mostly asking for trouble :)
Thanks for the input
postgres_fdw seems to bring the entire table even if all I use in the join is just the id from the remote table. I know it is possible to query for the missing ids and then perform the delete, but I wonder why all types of joins are so inefficient.