Thread: Issue executing query from container
Hello! I'm running a docker container that executes a php script running a sequence of queries. One of the queries gets stuck (takes more than 2 hours in execution, active in pg_stat_activity). The query is executed with a JOIN between a FOREIGN TABLE and a local table. Executing this query from the psql console takes less than 5 seconds to resolve. If I execute the query from the container created, manually, it resolves as well within 5 seconds. Can anyone guide me in a way I can troubleshoot what is causing the query to get stuck? Thanks! Eudald
=?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes: > I'm running a docker container that executes a php script running a > sequence of queries. > One of the queries gets stuck (takes more than 2 hours in execution, > active in pg_stat_activity). > The query is executed with a JOIN between a FOREIGN TABLE and a local table. > Executing this query from the psql console takes less than 5 seconds to resolve. > If I execute the query from the container created, manually, it > resolves as well within 5 seconds. > Can anyone guide me in a way I can troubleshoot what is causing the > query to get stuck? It's a very good bet that it's something about the query being parameterized or not. You could try installing auto_explain to compare the plans that are generated. regards, tom lane
Hello Tom, Thanks for your answer! I didn't know about this plugin and configured postgresql with it. After running the query both manually and with the script, I've the following logs: MANUALLY: 2020-07-15 00:56:08.735 CEST [20457] cefron@kontriki LOG: statement: UPDATE import_temp_2 AS tmp SET status = 3 FROM blacklist_central bl WHERE tmp.status = 1 AND lower(tmp.email) = lower(bl.value) AND bl.type = 1 2020-07-15 00:56:09.495 CEST [20457] cefron@kontriki LOG: duration: 759.102 ms plan: Query Text: UPDATE import_temp_2 AS tmp SET status = 3 FROM blacklist_central bl WHERE tmp.status = 1 AND lower(tmp.email) = lower(bl.value) AND bl.type = 1 Update on import_temp_2 tmp (cost=116.73..17352.10 rows=5557 width=293) -> Hash Join (cost=116.73..17352.10 rows=5557 width=293) Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text)) -> Seq Scan on import_temp_2 tmp (cost=0.00..14864.20 rows=370496 width=193) Filter: (status = 1) -> Hash (cost=116.70..116.70 rows=3 width=130) Buckets: 32768 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3841kB -> Foreign Scan on blacklist_central bl (cost=100.00..116.70 rows=3 width=130) AUTOMATED: 2020-07-15 01:01:27.336 CEST [22783] cefron@kontriki LOG: duration: 0.049 ms 2020-07-15 01:01:27.337 CEST [22783] cefron@kontriki LOG: statement: UPDATE import_temp_2 AS tmp SET status = 3 FROM blacklist_central bl WHERE tmp.status = 1 AND lower(tmp.email) = lower(bl.value) AND bl.type = 1 2020-07-15 03:22:01.398 CEST [22783] cefron@kontriki LOG: duration: 8434060.530 ms plan: Query Text: UPDATE import_temp_2 AS tmp SET status = 3 FROM blacklist_central bl WHERE tmp.status = 1 AND lower(tmp.email) = lower(bl.value) AND bl.type = 1 Update on import_temp_2 tmp (cost=100.00..13295.86 rows=15 width=500) -> Nested Loop (cost=100.00..13295.86 rows=15 width=500) Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text)) -> Seq Scan on import_temp_2 tmp (cost=0.00..13118.74 rows=1007 width=400) Filter: (status = 1) -> Materialize (cost=100.00..116.71 rows=3 width=130) -> Foreign Scan on blacklist_central bl (cost=100.00..116.70 rows=3 width=130) Honestly, I see some differences, but I don't see a real cause that could make the execution take 2 hours instead of few seconds. Maybe with these results you or someone in the list is able to find something. Appreciated for your help, Eudald El mar., 14 jul. 2020 a las 23:51, Tom Lane (<tgl@sss.pgh.pa.us>) escribió: > > =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes: > > I'm running a docker container that executes a php script running a > > sequence of queries. > > One of the queries gets stuck (takes more than 2 hours in execution, > > active in pg_stat_activity). > > The query is executed with a JOIN between a FOREIGN TABLE and a local table. > > > Executing this query from the psql console takes less than 5 seconds to resolve. > > If I execute the query from the container created, manually, it > > resolves as well within 5 seconds. > > > Can anyone guide me in a way I can troubleshoot what is causing the > > query to get stuck? > > It's a very good bet that it's something about the query being > parameterized or not. You could try installing auto_explain to > compare the plans that are generated. > > regards, tom lane
=?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes: > After running the query both manually and with the script, I've the > following logs: > MANUALLY: > Update on import_temp_2 tmp (cost=116.73..17352.10 rows=5557 width=293) > -> Hash Join (cost=116.73..17352.10 rows=5557 width=293) > Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text)) > -> Seq Scan on import_temp_2 tmp (cost=0.00..14864.20 > rows=370496 width=193) > Filter: (status = 1) > -> Hash (cost=116.70..116.70 rows=3 width=130) > Buckets: 32768 (originally 1024) Batches: 2 > (originally 1) Memory Usage: 3841kB > -> Foreign Scan on blacklist_central bl > (cost=100.00..116.70 rows=3 width=130) > AUTOMATED: > Update on import_temp_2 tmp (cost=100.00..13295.86 rows=15 width=500) > -> Nested Loop (cost=100.00..13295.86 rows=15 width=500) > Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text)) > -> Seq Scan on import_temp_2 tmp (cost=0.00..13118.74 > rows=1007 width=400) > Filter: (status = 1) > -> Materialize (cost=100.00..116.71 rows=3 width=130) > -> Foreign Scan on blacklist_central bl > (cost=100.00..116.70 rows=3 width=130) So the question is why you are getting an estimate of 370496 import_temp_2 rows with status = 1 in the first case, and only 1007 rows in the second. I suspect that the true number of rows is quite large, causing the nested-loop plan to run slowly. (Is the row estimate of 3 for the foreign scan anywhere near reality, either?) You may need to insert a manual ANALYZE in your automated process to ensure that import_temp_2 has up-to-date stats before you try to do this step. It seems somewhat likely that autovacuum takes care of that for you in the "manual" case, but its reaction time is too slow to fill the gap for the automated process. regards, tom lane
Hello Tom, I've noticed something that may be a factor on this problem. In the automated script, 5 or 6 seconds before this query happens, the foreign table is created. If I create the foreign table manually and skip its creation in the automated script, and then I run the automated script, the query takes 5-6 seconds to be executed instead of 2 hours. Hope this info is helpful to dig further. Thanks, Eudald El mié., 15 jul. 2020 a las 16:42, Tom Lane (<tgl@sss.pgh.pa.us>) escribió: > > =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes: > > After running the query both manually and with the script, I've the > > following logs: > > > MANUALLY: > > Update on import_temp_2 tmp (cost=116.73..17352.10 rows=5557 width=293) > > -> Hash Join (cost=116.73..17352.10 rows=5557 width=293) > > Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text)) > > -> Seq Scan on import_temp_2 tmp (cost=0.00..14864.20 > > rows=370496 width=193) > > Filter: (status = 1) > > -> Hash (cost=116.70..116.70 rows=3 width=130) > > Buckets: 32768 (originally 1024) Batches: 2 > > (originally 1) Memory Usage: 3841kB > > -> Foreign Scan on blacklist_central bl > > (cost=100.00..116.70 rows=3 width=130) > > > AUTOMATED: > > Update on import_temp_2 tmp (cost=100.00..13295.86 rows=15 width=500) > > -> Nested Loop (cost=100.00..13295.86 rows=15 width=500) > > Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text)) > > -> Seq Scan on import_temp_2 tmp (cost=0.00..13118.74 > > rows=1007 width=400) > > Filter: (status = 1) > > -> Materialize (cost=100.00..116.71 rows=3 width=130) > > -> Foreign Scan on blacklist_central bl > > (cost=100.00..116.70 rows=3 width=130) > > So the question is why you are getting an estimate of 370496 import_temp_2 > rows with status = 1 in the first case, and only 1007 rows in the second. > > I suspect that the true number of rows is quite large, causing the > nested-loop plan to run slowly. (Is the row estimate of 3 for the > foreign scan anywhere near reality, either?) > > You may need to insert a manual ANALYZE in your automated process to > ensure that import_temp_2 has up-to-date stats before you try to do > this step. It seems somewhat likely that autovacuum takes care of > that for you in the "manual" case, but its reaction time is too slow > to fill the gap for the automated process. > > regards, tom lane
Hello Tom, Sorry to bump this conversation, but I'm still dealing with this issue. I come back because I've found something different that throws away all the hypotheses (or at least it's what I think). I've run the automated script several times, and the duration of the query is widely variable. Using the same query plan, I've: 2020-10-27 11:08:39.027 CET [18671] cefron@kontriki LOG: duration: 5.735 ms plan: Query Text: UPDATE import_temp_256 AS tmp SET status = 11 FROM blacklist_central blc WHERE lower(tmp.email) = blc.value AND tmp.status = 1 AND blc.reason = 'hardbounce' Update on import_temp_256 tmp (cost=100.00..2878.74 rows=3 width=501) -> Nested Loop (cost=100.00..2878.74 rows=3 width=501) Join Filter: (lower((tmp.email)::text) = (blc.value)::text) -> Seq Scan on import_temp_256 tmp (cost=0.00..2750.96 rows=211 width=401) Filter: (status = 1) -> Materialize (cost=100.00..116.71 rows=3 width=130) -> Foreign Scan on blacklist_central blc (cost=100.00..116.70 rows=3 width=130) 2020-10-27 11:08:39.027 CET [18671] cefron@kontriki LOG: duration: 6.122 ms And being executed at 11:20:54 (but completed at 11:46:32), the following: 2020-10-27 11:46:32.714 CET [30340] cefron@kontriki LOG: duration: 1538286.304 ms plan: Query Text: UPDATE import_temp_257 AS tmp SET status = 11 FROM blacklist_central blc WHERE lower(tmp.email) = blc.value AND tmp.status = 1 AND blc.reason = 'hardbounce' Update on import_temp_257 tmp (cost=100.00..2878.74 rows=3 width=501) -> Nested Loop (cost=100.00..2878.74 rows=3 width=501) Join Filter: (lower((tmp.email)::text) = (blc.value)::text) -> Seq Scan on import_temp_257 tmp (cost=0.00..2750.96 rows=211 width=401) Filter: (status = 1) -> Materialize (cost=100.00..116.71 rows=3 width=130) -> Foreign Scan on blacklist_central blc (cost=100.00..116.70 rows=3 width=130) 2020-10-27 11:46:32.715 CET [30340] cefron@kontriki LOG: duration: 1538287.737 ms As you can see, they don't seem to differ between each other, but the duration is abismaly different. I have the database logging everything, what should I look for so that it can help me understand this strange behavior? Thank you, Eudald El mié., 15 jul. 2020 a las 18:37, Eudald Valcàrcel Lacasa (<eudald.valcarcel@gmail.com>) escribió: > > Hello Tom, > I've noticed something that may be a factor on this problem. > > In the automated script, 5 or 6 seconds before this query happens, the > foreign table is created. > If I create the foreign table manually and skip its creation in the > automated script, and then I run the automated script, > the query takes 5-6 seconds to be executed instead of 2 hours. > > Hope this info is helpful to dig further. > Thanks, > Eudald > > El mié., 15 jul. 2020 a las 16:42, Tom Lane (<tgl@sss.pgh.pa.us>) escribió: > > > > =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes: > > > After running the query both manually and with the script, I've the > > > following logs: > > > > > MANUALLY: > > > Update on import_temp_2 tmp (cost=116.73..17352.10 rows=5557 width=293) > > > -> Hash Join (cost=116.73..17352.10 rows=5557 width=293) > > > Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text)) > > > -> Seq Scan on import_temp_2 tmp (cost=0.00..14864.20 > > > rows=370496 width=193) > > > Filter: (status = 1) > > > -> Hash (cost=116.70..116.70 rows=3 width=130) > > > Buckets: 32768 (originally 1024) Batches: 2 > > > (originally 1) Memory Usage: 3841kB > > > -> Foreign Scan on blacklist_central bl > > > (cost=100.00..116.70 rows=3 width=130) > > > > > AUTOMATED: > > > Update on import_temp_2 tmp (cost=100.00..13295.86 rows=15 width=500) > > > -> Nested Loop (cost=100.00..13295.86 rows=15 width=500) > > > Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text)) > > > -> Seq Scan on import_temp_2 tmp (cost=0.00..13118.74 > > > rows=1007 width=400) > > > Filter: (status = 1) > > > -> Materialize (cost=100.00..116.71 rows=3 width=130) > > > -> Foreign Scan on blacklist_central bl > > > (cost=100.00..116.70 rows=3 width=130) > > > > So the question is why you are getting an estimate of 370496 import_temp_2 > > rows with status = 1 in the first case, and only 1007 rows in the second. > > > > I suspect that the true number of rows is quite large, causing the > > nested-loop plan to run slowly. (Is the row estimate of 3 for the > > foreign scan anywhere near reality, either?) > > > > You may need to insert a manual ANALYZE in your automated process to > > ensure that import_temp_2 has up-to-date stats before you try to do > > this step. It seems somewhat likely that autovacuum takes care of > > that for you in the "manual" case, but its reaction time is too slow > > to fill the gap for the automated process. > > > > regards, tom lane
=?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes: > I've run the automated script several times, and the duration of the > query is widely variable. Interesting. You might try using auto_explain to log the query plans and see if the plan is changing. regards, tom lane
On 2020-10-27 17:46:59 +0100, Eudald Valcàrcel Lacasa wrote: > I've run the automated script several times, and the duration of the > query is widely variable. > [0.005 s vs. 1500 s for the same plan] > As you can see, they don't seem to differ between each other, but the > duration is abismaly different. > I have the database logging everything, what should I look for so that > it can help me understand this strange behavior? You might want to turn auto_explain.log_analyze and auto_explain.log_buffers on. You probably should do this only on a test system, not the production system as the manual states that auto_explain.log_analyze "can have an extremely negative impact on performance". hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"