Re: Issue executing query from container - Mailing list pgsql-general

From Eudald Valcàrcel Lacasa
Subject Re: Issue executing query from container
Date
Msg-id CANEx+AXSzuf4wPeNHTcVg29EFhHarwMEnMgYHbrbJgdmHTENoA@mail.gmail.com
Whole thread Raw
In response to Re: Issue executing query from container  (Eudald Valcàrcel Lacasa <eudald.valcarcel@gmail.com>)
Responses Re: Issue executing query from container
Re: Issue executing query from container
List pgsql-general
Hello Tom,
Sorry to bump this conversation, but I'm still dealing with this issue.
I come back because I've found something different that throws away
all the hypotheses (or at least it's what I think).

I've run the automated script several times, and the duration of the
query is widely variable.

Using the same query plan, I've:
2020-10-27 11:08:39.027 CET [18671] cefron@kontriki LOG:  duration:
5.735 ms  plan:
    Query Text: UPDATE import_temp_256 AS tmp SET status = 11 FROM
blacklist_central blc
                   WHERE lower(tmp.email) = blc.value AND tmp.status =
1 AND blc.reason = 'hardbounce'
    Update on import_temp_256 tmp  (cost=100.00..2878.74 rows=3 width=501)
      ->  Nested Loop  (cost=100.00..2878.74 rows=3 width=501)
            Join Filter: (lower((tmp.email)::text) = (blc.value)::text)
            ->  Seq Scan on import_temp_256 tmp  (cost=0.00..2750.96
rows=211 width=401)
                  Filter: (status = 1)
            ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
                  ->  Foreign Scan on blacklist_central blc
(cost=100.00..116.70 rows=3 width=130)

2020-10-27 11:08:39.027 CET [18671] cefron@kontriki LOG:  duration: 6.122 ms

And being executed at 11:20:54 (but completed at 11:46:32), the following:

2020-10-27 11:46:32.714 CET [30340] cefron@kontriki LOG:  duration:
1538286.304 ms  plan:
    Query Text: UPDATE import_temp_257 AS tmp SET status = 11 FROM
blacklist_central blc
                   WHERE lower(tmp.email) = blc.value AND tmp.status =
1 AND blc.reason = 'hardbounce'
    Update on import_temp_257 tmp  (cost=100.00..2878.74 rows=3 width=501)
      ->  Nested Loop  (cost=100.00..2878.74 rows=3 width=501)
            Join Filter: (lower((tmp.email)::text) = (blc.value)::text)
            ->  Seq Scan on import_temp_257 tmp  (cost=0.00..2750.96
rows=211 width=401)
                  Filter: (status = 1)
            ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
                  ->  Foreign Scan on blacklist_central blc
(cost=100.00..116.70 rows=3 width=130)

2020-10-27 11:46:32.715 CET [30340] cefron@kontriki LOG:  duration:
1538287.737 ms

As you can see, they don't seem to differ between each other, but the
duration is abismaly different.
I have the database logging everything, what should I look for so that
it can help me understand this strange behavior?

Thank you,
Eudald

El mié., 15 jul. 2020 a las 18:37, Eudald Valcàrcel Lacasa
(<eudald.valcarcel@gmail.com>) escribió:
>
> Hello Tom,
> I've noticed something that may be a factor on this problem.
>
> In the automated script, 5 or 6 seconds before this query happens, the
> foreign table is created.
> If I create the foreign table manually and skip its creation in the
> automated script, and then I run the automated script,
> the query takes 5-6 seconds to be executed instead of 2 hours.
>
> Hope this info is helpful to dig further.
> Thanks,
> Eudald
>
> El mié., 15 jul. 2020 a las 16:42, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:
> >
> > =?UTF-8?Q?Eudald_Valc=C3=A0rcel_Lacasa?= <eudald.valcarcel@gmail.com> writes:
> > > After running the query both manually and with the script, I've the
> > > following logs:
> >
> > > MANUALLY:
> > >     Update on import_temp_2 tmp  (cost=116.73..17352.10 rows=5557 width=293)
> > >       ->  Hash Join  (cost=116.73..17352.10 rows=5557 width=293)
> > >             Hash Cond: (lower((tmp.email)::text) = lower((bl.value)::text))
> > >             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..14864.20
> > > rows=370496 width=193)
> > >                   Filter: (status = 1)
> > >             ->  Hash  (cost=116.70..116.70 rows=3 width=130)
> > >                   Buckets: 32768 (originally 1024)  Batches: 2
> > > (originally 1)  Memory Usage: 3841kB
> > >                   ->  Foreign Scan on blacklist_central bl
> > > (cost=100.00..116.70 rows=3 width=130)
> >
> > > AUTOMATED:
> > >     Update on import_temp_2 tmp  (cost=100.00..13295.86 rows=15 width=500)
> > >       ->  Nested Loop  (cost=100.00..13295.86 rows=15 width=500)
> > >             Join Filter: (lower((tmp.email)::text) = lower((bl.value)::text))
> > >             ->  Seq Scan on import_temp_2 tmp  (cost=0.00..13118.74
> > > rows=1007 width=400)
> > >                   Filter: (status = 1)
> > >             ->  Materialize  (cost=100.00..116.71 rows=3 width=130)
> > >                   ->  Foreign Scan on blacklist_central bl
> > > (cost=100.00..116.70 rows=3 width=130)
> >
> > So the question is why you are getting an estimate of 370496 import_temp_2
> > rows with status = 1 in the first case, and only 1007 rows in the second.
> >
> > I suspect that the true number of rows is quite large, causing the
> > nested-loop plan to run slowly.  (Is the row estimate of 3 for the
> > foreign scan anywhere near reality, either?)
> >
> > You may need to insert a manual ANALYZE in your automated process to
> > ensure that import_temp_2 has up-to-date stats before you try to do
> > this step.  It seems somewhat likely that autovacuum takes care of
> > that for you in the "manual" case, but its reaction time is too slow
> > to fill the gap for the automated process.
> >
> >                         regards, tom lane



pgsql-general by date:

Previous
From: Susan Joseph
Date:
Subject: Re: 答复: Security issues concerning pgsql replication
Next
From: Sean McDaniel
Date:
Subject: GSSAPI Authentication for pgadmin4 macOS client