Thread: Issue executing query from container

Issue executing query from container

From

Eudald Valcàrcel Lacasa

Date:

14 July 2020, 23:25:07

Hello!
I'm running a docker container that executes a php script running a
sequence of queries.
One of the queries gets stuck (takes more than 2 hours in execution,
active in pg_stat_activity).
The query is executed with a JOIN between a FOREIGN TABLE and a local table.

Executing this query from the psql console takes less than 5 seconds to resolve.
If I execute the query from the container created, manually, it
resolves as well within 5 seconds.

Can anyone guide me in a way I can troubleshoot what is causing the
query to get stuck?

Thanks!
Eudald

Re: Issue executing query from container

From

Tom Lane

Date:

15 July 2020, 00:51:38

=?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> I'm running a docker container that executes a php script running a
> sequence of queries.
> One of the queries gets stuck (takes more than 2 hours in execution,
> active in pg_stat_activity).
> The query is executed with a JOIN between a FOREIGN TABLE and a local table.

> Executing this query from the psql console takes less than 5 seconds to resolve.
> If I execute the query from the container created, manually, it
> resolves as well within 5 seconds.

> Can anyone guide me in a way I can troubleshoot what is causing the
> query to get stuck?

It's a very good bet that it's something about the query being
parameterized or not.  You could try installing auto_explain to
compare the plans that are generated.

            regards, tom lane

Re: Issue executing query from container

From

Eudald Valcàrcel Lacasa

Date:

15 July 2020, 11:27:52

Hello Tom,
Thanks for your answer! I didn't know about this plugin and configured
postgresql with it.

After running the query both manually and with the script, I've the
following logs:

MANUALLY:
2020-07-15 00:56:08.735 CEST [20457] cefron@kontriki LOG:  statement:
UPDATE import_temp_2 AS tmp SET status = 3 FROM blacklist_central bl
                 WHERE tmp.status = 1 AND lower(tmp.email) =
lower(bl.value) AND bl.type = 1
2020-07-15 00:56:09.495 CEST [20457] cefron@kontriki LOG:  duration:
759.102 ms  plan:
    Query Text: UPDATE import_temp_2 AS tmp SET status = 3 FROM
blacklist_central bl
                 WHERE tmp.status = 1 AND lower(tmp.email) =
lower(bl.value) AND bl.type = 1
    Update on import_temp_2 tmp  (cost=116.73..17352.10 rows=5557 width=293)
      ->  Hash Join  (cost=116.73..17352.10 rows=5557 width=293)
            Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text))
            ->  Seq Scan on import_temp_2 tmp  (cost=0.00..14864.20
rows=370496 width=193)
                  Filter: (status = 1)
            ->  Hash  (cost=116.70..116.70 rows=3 width=130)
                  Buckets: 32768 (originally 1024)  Batches: 2
(originally 1)  Memory Usage: 3841kB
                  ->  Foreign Scan on blacklist_central bl
(cost=100.00..116.70 rows=3 width=130)

AUTOMATED:
2020-07-15 01:01:27.336 CEST [22783] cefron@kontriki LOG:  duration: 0.049 ms
2020-07-15 01:01:27.337 CEST [22783] cefron@kontriki LOG:  statement:
UPDATE import_temp_2 AS tmp SET status = 3 FROM blacklist_central bl
                 WHERE tmp.status = 1 AND lower(tmp.email) =
lower(bl.value) AND bl.type = 1
2020-07-15 03:22:01.398 CEST [22783] cefron@kontriki LOG:  duration:
8434060.530 ms  plan:
    Query Text: UPDATE import_temp_2 AS tmp SET status = 3 FROM
blacklist_central bl
                 WHERE tmp.status = 1 AND lower(tmp.email) =
lower(bl.value) AND bl.type = 1
    Update on import_temp_2 tmp  (cost=100.00..13295.86 rows=15 width=500)
      ->  Nested Loop  (cost=100.00..13295.86 rows=15 width=500)
            Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text))
            ->  Seq Scan on import_temp_2 tmp  (cost=0.00..13118.74
rows=1007 width=400)
                  Filter: (status = 1)
            ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
                  ->  Foreign Scan on blacklist_central bl
(cost=100.00..116.70 rows=3 width=130)

Honestly, I see some differences, but I don't see a real cause that
could make the execution take 2 hours instead of few seconds.
Maybe with these results you or someone in the list is able to find something.

Appreciated for your help,
Eudald

El mar., 14 jul. 2020 a las 23:51, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:
>
> =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> > I'm running a docker container that executes a php script running a
> > sequence of queries.
> > One of the queries gets stuck (takes more than 2 hours in execution,
> > active in pg_stat_activity).
> > The query is executed with a JOIN between a FOREIGN TABLE and a local table.
>
> > Executing this query from the psql console takes less than 5 seconds to resolve.
> > If I execute the query from the container created, manually, it
> > resolves as well within 5 seconds.
>
> > Can anyone guide me in a way I can troubleshoot what is causing the
> > query to get stuck?
>
> It's a very good bet that it's something about the query being
> parameterized or not.  You could try installing auto_explain to
> compare the plans that are generated.
>
>                         regards, tom lane

Re: Issue executing query from container

From

Tom Lane

Date:

15 July 2020, 17:42:42

=?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> After running the query both manually and with the script, I've the
> following logs:

> MANUALLY:
>     Update on import_temp_2 tmp  (cost=116.73..17352.10 rows=5557 width=293)
>       ->  Hash Join  (cost=116.73..17352.10 rows=5557 width=293)
>             Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text))
>             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..14864.20
> rows=370496 width=193)
>                   Filter: (status = 1)
>             ->  Hash  (cost=116.70..116.70 rows=3 width=130)
>                   Buckets: 32768 (originally 1024)  Batches: 2
> (originally 1)  Memory Usage: 3841kB
>                   ->  Foreign Scan on blacklist_central bl
> (cost=100.00..116.70 rows=3 width=130)

> AUTOMATED:
>     Update on import_temp_2 tmp  (cost=100.00..13295.86 rows=15 width=500)
>       ->  Nested Loop  (cost=100.00..13295.86 rows=15 width=500)
>             Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text))
>             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..13118.74
> rows=1007 width=400)
>                   Filter: (status = 1)
>             ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
>                   ->  Foreign Scan on blacklist_central bl
> (cost=100.00..116.70 rows=3 width=130)

So the question is why you are getting an estimate of 370496 import_temp_2
rows with status = 1 in the first case, and only 1007 rows in the second.

I suspect that the true number of rows is quite large, causing the
nested-loop plan to run slowly.  (Is the row estimate of 3 for the
foreign scan anywhere near reality, either?)

You may need to insert a manual ANALYZE in your automated process to
ensure that import_temp_2 has up-to-date stats before you try to do
this step.  It seems somewhat likely that autovacuum takes care of
that for you in the "manual" case, but its reaction time is too slow
to fill the gap for the automated process.

            regards, tom lane

Re: Issue executing query from container

From

Eudald Valcàrcel Lacasa

Date:

15 July 2020, 19:37:28

Hello Tom,
I've noticed something that may be a factor on this problem.

In the automated script, 5 or 6 seconds before this query happens, the
foreign table is created.
If I create the foreign table manually and skip its creation in the
automated script, and then I run the automated script,
the query takes 5-6 seconds to be executed instead of 2 hours.

Hope this info is helpful to dig further.
Thanks,
Eudald

El mié., 15 jul. 2020 a las 16:42, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:
>
> =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> > After running the query both manually and with the script, I've the
> > following logs:
>
> > MANUALLY:
> >     Update on import_temp_2 tmp  (cost=116.73..17352.10 rows=5557 width=293)
> >       ->  Hash Join  (cost=116.73..17352.10 rows=5557 width=293)
> >             Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text))
> >             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..14864.20
> > rows=370496 width=193)
> >                   Filter: (status = 1)
> >             ->  Hash  (cost=116.70..116.70 rows=3 width=130)
> >                   Buckets: 32768 (originally 1024)  Batches: 2
> > (originally 1)  Memory Usage: 3841kB
> >                   ->  Foreign Scan on blacklist_central bl
> > (cost=100.00..116.70 rows=3 width=130)
>
> > AUTOMATED:
> >     Update on import_temp_2 tmp  (cost=100.00..13295.86 rows=15 width=500)
> >       ->  Nested Loop  (cost=100.00..13295.86 rows=15 width=500)
> >             Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text))
> >             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..13118.74
> > rows=1007 width=400)
> >                   Filter: (status = 1)
> >             ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
> >                   ->  Foreign Scan on blacklist_central bl
> > (cost=100.00..116.70 rows=3 width=130)
>
> So the question is why you are getting an estimate of 370496 import_temp_2
> rows with status = 1 in the first case, and only 1007 rows in the second.
>
> I suspect that the true number of rows is quite large, causing the
> nested-loop plan to run slowly.  (Is the row estimate of 3 for the
> foreign scan anywhere near reality, either?)
>
> You may need to insert a manual ANALYZE in your automated process to
> ensure that import_temp_2 has up-to-date stats before you try to do
> this step.  It seems somewhat likely that autovacuum takes care of
> that for you in the "manual" case, but its reaction time is too slow
> to fill the gap for the automated process.
>
>                         regards, tom lane

Re: Issue executing query from container

From

Eudald Valcàrcel Lacasa

Date:

27 October 2020, 19:46:59

Hello Tom,
Sorry to bump this conversation, but I'm still dealing with this issue.
I come back because I've found something different that throws away
all the hypotheses (or at least it's what I think).

I've run the automated script several times, and the duration of the
query is widely variable.

Using the same query plan, I've:
2020-10-27 11:08:39.027 CET [18671] cefron@kontriki LOG:  duration:
5.735 ms  plan:
    Query Text: UPDATE import_temp_256 AS tmp SET status = 11 FROM
blacklist_central blc
                   WHERE lower(tmp.email) = blc.value AND tmp.status =
1 AND blc.reason = 'hardbounce'
    Update on import_temp_256 tmp  (cost=100.00..2878.74 rows=3 width=501)
      ->  Nested Loop  (cost=100.00..2878.74 rows=3 width=501)
            Join Filter: (lower((tmp.email)::text) = (blc.value)::text)
            ->  Seq Scan on import_temp_256 tmp  (cost=0.00..2750.96
rows=211 width=401)
                  Filter: (status = 1)
            ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
                  ->  Foreign Scan on blacklist_central blc
(cost=100.00..116.70 rows=3 width=130)

2020-10-27 11:08:39.027 CET [18671] cefron@kontriki LOG:  duration: 6.122 ms

And being executed at 11:20:54 (but completed at 11:46:32), the following:

2020-10-27 11:46:32.714 CET [30340] cefron@kontriki LOG:  duration:
1538286.304 ms  plan:
    Query Text: UPDATE import_temp_257 AS tmp SET status = 11 FROM
blacklist_central blc
                   WHERE lower(tmp.email) = blc.value AND tmp.status =
1 AND blc.reason = 'hardbounce'
    Update on import_temp_257 tmp  (cost=100.00..2878.74 rows=3 width=501)
      ->  Nested Loop  (cost=100.00..2878.74 rows=3 width=501)
            Join Filter: (lower((tmp.email)::text) = (blc.value)::text)
            ->  Seq Scan on import_temp_257 tmp  (cost=0.00..2750.96
rows=211 width=401)
                  Filter: (status = 1)
            ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
                  ->  Foreign Scan on blacklist_central blc
(cost=100.00..116.70 rows=3 width=130)

2020-10-27 11:46:32.715 CET [30340] cefron@kontriki LOG:  duration:
1538287.737 ms

As you can see, they don't seem to differ between each other, but the
duration is abismaly different.
I have the database logging everything, what should I look for so that
it can help me understand this strange behavior?

Thank you,
Eudald

El mié., 15 jul. 2020 a las 18:37, Eudald Valcàrcel Lacasa
(<eudald.valcarcel@gmail.com>) escribió:
>
> Hello Tom,
> I've noticed something that may be a factor on this problem.
>
> In the automated script, 5 or 6 seconds before this query happens, the
> foreign table is created.
> If I create the foreign table manually and skip its creation in the
> automated script, and then I run the automated script,
> the query takes 5-6 seconds to be executed instead of 2 hours.
>
> Hope this info is helpful to dig further.
> Thanks,
> Eudald
>
> El mié., 15 jul. 2020 a las 16:42, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:
> >
> > =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> > > After running the query both manually and with the script, I've the
> > > following logs:
> >
> > > MANUALLY:
> > >     Update on import_temp_2 tmp  (cost=116.73..17352.10 rows=5557 width=293)
> > >       ->  Hash Join  (cost=116.73..17352.10 rows=5557 width=293)
> > >             Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text))
> > >             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..14864.20
> > > rows=370496 width=193)
> > >                   Filter: (status = 1)
> > >             ->  Hash  (cost=116.70..116.70 rows=3 width=130)
> > >                   Buckets: 32768 (originally 1024)  Batches: 2
> > > (originally 1)  Memory Usage: 3841kB
> > >                   ->  Foreign Scan on blacklist_central bl
> > > (cost=100.00..116.70 rows=3 width=130)
> >
> > > AUTOMATED:
> > >     Update on import_temp_2 tmp  (cost=100.00..13295.86 rows=15 width=500)
> > >       ->  Nested Loop  (cost=100.00..13295.86 rows=15 width=500)
> > >             Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text))
> > >             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..13118.74
> > > rows=1007 width=400)
> > >                   Filter: (status = 1)
> > >             ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
> > >                   ->  Foreign Scan on blacklist_central bl
> > > (cost=100.00..116.70 rows=3 width=130)
> >
> > So the question is why you are getting an estimate of 370496 import_temp_2
> > rows with status = 1 in the first case, and only 1007 rows in the second.
> >
> > I suspect that the true number of rows is quite large, causing the
> > nested-loop plan to run slowly.  (Is the row estimate of 3 for the
> > foreign scan anywhere near reality, either?)
> >
> > You may need to insert a manual ANALYZE in your automated process to
> > ensure that import_temp_2 has up-to-date stats before you try to do
> > this step.  It seems somewhat likely that autovacuum takes care of
> > that for you in the "manual" case, but its reaction time is too slow
> > to fill the gap for the automated process.
> >
> >                         regards, tom lane

Re: Issue executing query from container

From

Tom Lane

Date:

27 October 2020, 20:41:29

=?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> I've run the automated script several times, and the duration of the
> query is widely variable.

Interesting.  You might try using auto_explain to log the query plans
and see if the plan is changing.

            regards, tom lane

Re: Issue executing query from container

From

"Peter J. Holzer"

Date:

30 October 2020, 13:45:18

On 2020-10-27 17:46:59 +0100, Eudald Valcàrcel Lacasa wrote:
> I've run the automated script several times, and the duration of the
> query is widely variable.
>

[0.005 s vs. 1500 s for the same plan]

> As you can see, they don't seem to differ between each other, but the
> duration is abismaly different.
> I have the database logging everything, what should I look for so that
> it can help me understand this strange behavior?

You might want to turn auto_explain.log_analyze and
auto_explain.log_buffers on. You probably should do this only on a test
system, not the production system as the manual states that
auto_explain.log_analyze "can have an extremely negative impact on
performance".

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc