Thread: Adding nextval() to a select caused hang/very slow execution

Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 17:10:21

Hello (TL;DR):

Noob here, so please bear with me. The SQL I'm presenting is part of a larger PL/PGSQL script that generates generic "counts" from tables in our database. This is code converted from an Oracle database that we recently migrated from.

I have a strange situation where a base query completes in about 30 seconds but if I add a nextval() call to the select it never completes. There are other processes running that are accessing the same sequence, but I thought that concurrency was not an issue for sequences (other than skipped values).

We are running on Google Cloud SQL v12 (I believe it is currently 12.3). We are configured with a failover replica. The VM configured is 8 vCPUs and 16GB of RAM. PgAdmin shows that our cache hit rates are around 99%, as does this SQL:

SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
pg_statio_user_tables;

heap_read | heap_hit | ratio
------------+--------------+------------------------
1558247211 | 156357754256 | 0.99013242992145017164
(1 row)

We run autovacuum. work_mem = 131072.

The base SQL without the nextval() call and plan are at: https://explain.depesz.com/s/T3Gn

While the performance is not the fastest, 30 seconds for the execution is acceptable in our application. We only run this once a month. I am not looking to optimize the query as it stands (yet). The only change that causes it to be extremely slow or hang (can't tell which) is that I changed the select from:

select unnest(....

select nextval('sbowner.idgen'), unnest(....

Here are all the tables/views involved, as requested in the "Slow Query Questions" FAQ.

I am aware that the structure of these views leaves A LOT to be desired, but I don't think they have a bearing on this issue, since the addition of nextval() is the problem. We are going to restructure all of this and remove many layers eventually. Before subjecting the reader to this long list of views, here's the theory. We have customers and orders for various products. Those products are grouped together into "lists" that can be selected. Some products could be in more than one list. At the same time, some products are "pre-release", so they are only reported internally and omitted from these counts. Next, some of the orders are "autoship", meaning that the customer has subscribed to receive the product automatically. Any customer with an autoship in the same category of product is to be omitted from these counts.

As there are many of these "lists", there was a naming scheme created in the Oracle database we converted from. Oracle allows synonyms so that we could create one view and rename it to match the naming scheme. PostgreSQL does not allow that, so instead we had to create views on views to keep the names intact.

The views/tables involved are:

This table holds generated date ranges for the counts to be generated by the main query. Records are inserted once at the start of execution of the script, followed by creating an index:

create index count_tempcols_ndx on count_tempcols(mindate, maxdate, fmtdate, disporder, typecode);

analyze count_tempcols;

It is actually created as a temporary table, but that makes it hard to present here. ;-)

Here is fortherb_indcounts and the entire view chain:

fortherb_indcounts - an extract of the data that we actually generate the counts from

This next table is a list of "rentable" transactions - those transactions that we want to actually count (omitting pre-release and autoships).

This is the same table as the previous, just with a different name (Oracle synonym simulation):

This is a view of all transactions with the "list" they belong to appended, as well as a pre-release code.

This is the base table of transactions:

This is an old version of our table of products sold - it should be replaced by our newer "glmproducts" table.

This is a view that identifies customers with autoships and the category the product belongs to. Note that subscriptions can end, hence the canceldate. We only want to omit customers with active subscription in products of the same category.

This is the base table of the actual autoship subscriptions.

This is the new table of products.

I believe that's the entire list of tables/views involved. Again, my apologies for the long post. I presented all of this for completeness, although I don't believe it has anything to do with the actual problem. :-/

Thanks in advance for any advice!

Eric Raskin

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

RE: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 17:19:51

SORRY! Here's a link that should show the plan:

https://explain.depesz.com/s/SCdY

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

From

Tom Lane

Date:

04 November 2020, 18:04:18

Eric Raskin <eraskin@paslists.com> writes:
> I have a strange situation where a base query completes in about 30 seconds
> but if I add a nextval() call to the select it never completes.  There are
> other processes running that are accessing the same sequence, but I thought
> that concurrency was not an issue for sequences (other than skipped
> values).

Shouldn't be, probably ... but did you check to see if the query is
blocked on a lock?  (See pg_stat_activity or pg_locks views.)

> The only change that
> causes it to be extremely slow or hang (can't tell which) is that I changed
> the select from:
> select unnest(....
> to
> select nextval('sbowner.idgen'), unnest(....

Without seeing the complete query it's hard to say much.  But if
this isn't the topmost select list, maybe what's happening is that
the presence of a volatile function in a sub-select is defeating
some key plan optimization.  Did you compare plain EXPLAIN (w/out
ANALYZE) output for the two cases, to see if the plan shape changes?

            regards, tom lane

Re: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 18:22:33

Thanks for the reply. I see that the explain.depesz.com did not show you the query. My apologies:

select unnest(array[273941676,273941677,273941678,273941679,273941680]) countrow_id,

disporder, fmtdate, typecode,

unnest(array[count_273941676,count_273941677,count_273941678,count_273941679,count_273941680]) countval

from (select coalesce(count(distinct id_273941676),0) count_273941676,

coalesce(count(distinct id_273941677),0) count_273941677,

coalesce(count(distinct id_273941678),0) count_273941678,

coalesce(count(distinct id_273941679),0) count_273941679,

coalesce(count(distinct id_273941680),0) count_273941680,

disporder, fmtdate, typecode

from (select case when sexcode = 'M' then id else null end id_273941676,

case when sexcode = 'F' then id else null end id_273941677,

case when sexcode = 'A' then id else null end id_273941678,

case when sexcode = 'C' then id else null end id_273941679,

case when sexcode not in ('M','F','A','C') then id else null end id_273941680,

hotline cnt_hotline

from lruser.fortherb_indcounts c

where ( (rtype = '2')

and ((sexcode = 'M') or (sexcode = 'F') or (sexcode = 'A') or (sexcode = 'C') or (sexcode not in ('M','F','A','C'))

)

) as x

right outer join count_tempcols t on (x.cnt_hotline between t.mindate and t.maxdate) group by disporder, fmtdate, typecode ) as y

I know it seems overly complicated, but it is auto-generated by our code. The conditions and fields are variable based on what the user wants to generate.

This is the topmost select. The only difference that causes the hang is adding nextval('sbowner.idgen') to the start of the select right before the first unnest().

In the real application, this code feeds an insert statement with a trigger that accesses the sequence where we store the results of the query. I "simplified" it and discovered that the nextval() was the difference that caused the performance hit.

Eric

On Wed, Nov 4, 2020 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Eric Raskin <eraskin@paslists.com> writes:
> I have a strange situation where a base query completes in about 30 seconds
> but if I add a nextval() call to the select it never completes. There are
> other processes running that are accessing the same sequence, but I thought
> that concurrency was not an issue for sequences (other than skipped
> values).

Shouldn't be, probably ... but did you check to see if the query is
blocked on a lock? (See pg_stat_activity or pg_locks views.)

> The only change that
> causes it to be extremely slow or hang (can't tell which) is that I changed
> the select from:
> select unnest(....
> to
> select nextval('sbowner.idgen'), unnest(....

Without seeing the complete query it's hard to say much. But if
this isn't the topmost select list, maybe what's happening is that
the presence of a volatile function in a sub-select is defeating
some key plan optimization. Did you compare plain EXPLAIN (w/out
ANALYZE) output for the two cases, to see if the plan shape changes?

regards, tom lane

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 18:25:42

And, to follow up on your question, the plan shape DOES change when I add/remove the nextval() on a plain explain.

Without nextval(): https://explain.depesz.com/s/SCdY

With nextval(): https://explain.depesz.com/s/oLPn

On Wed, Nov 4, 2020 at 1:22 PM Eric Raskin <eraskin@paslists.com> wrote:

Thanks for the reply. I see that the explain.depesz.com did not show you the query. My apologies:

select unnest(array[273941676,273941677,273941678,273941679,273941680]) countrow_id,
disporder, fmtdate, typecode,
unnest(array[count_273941676,count_273941677,count_273941678,count_273941679,count_273941680]) countval
from (select coalesce(count(distinct id_273941676),0) count_273941676,
coalesce(count(distinct id_273941677),0) count_273941677,
coalesce(count(distinct id_273941678),0) count_273941678,
coalesce(count(distinct id_273941679),0) count_273941679,
coalesce(count(distinct id_273941680),0) count_273941680,
disporder, fmtdate, typecode
from (select case when sexcode = 'M' then id else null end id_273941676,
case when sexcode = 'F' then id else null end id_273941677,
case when sexcode = 'A' then id else null end id_273941678,
case when sexcode = 'C' then id else null end id_273941679,
case when sexcode not in ('M','F','A','C') then id else null end id_273941680,
hotline cnt_hotline
from lruser.fortherb_indcounts c
where ( (rtype = '2')
and ((sexcode = 'M') or (sexcode = 'F') or (sexcode = 'A') or (sexcode = 'C') or (sexcode not in ('M','F','A','C'))
)
)
) as x
right outer join count_tempcols t on (x.cnt_hotline between t.mindate and t.maxdate) group by disporder, fmtdate, typecode ) as y

I know it seems overly complicated, but it is auto-generated by our code. The conditions and fields are variable based on what the user wants to generate.

This is the topmost select. The only difference that causes the hang is adding nextval('sbowner.idgen') to the start of the select right before the first unnest().

In the real application, this code feeds an insert statement with a trigger that accesses the sequence where we store the results of the query. I "simplified" it and discovered that the nextval() was the difference that caused the performance hit.

Eric

On Wed, Nov 4, 2020 at 1:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Eric Raskin <eraskin@paslists.com> writes:
> I have a strange situation where a base query completes in about 30 seconds
> but if I add a nextval() call to the select it never completes. There are
> other processes running that are accessing the same sequence, but I thought
> that concurrency was not an issue for sequences (other than skipped
> values).

Shouldn't be, probably ... but did you check to see if the query is
blocked on a lock? (See pg_stat_activity or pg_locks views.)

> The only change that
> causes it to be extremely slow or hang (can't tell which) is that I changed
> the select from:
> select unnest(....
> to
> select nextval('sbowner.idgen'), unnest(....

Without seeing the complete query it's hard to say much. But if
this isn't the topmost select list, maybe what's happening is that
the presence of a volatile function in a sub-select is defeating
some key plan optimization. Did you compare plain EXPLAIN (w/out
ANALYZE) output for the two cases, to see if the plan shape changes?

regards, tom lane

--
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)
Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)
3 Morgan Drive #310 eraskin@paslists.com
Mt Kisco, NY 10549 http://www.paslists.com

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

From

Tom Lane

Date:

04 November 2020, 19:01:51

Eric Raskin <eraskin@paslists.com> writes:
> And, to follow up on your question, the plan shape DOES change when I
> add/remove the nextval() on a plain explain.
> Without nextval():  https://explain.depesz.com/s/SCdY
> With nextval():  https://explain.depesz.com/s/oLPn

Ah, there's your problem, I think: the plan without nextval() is
parallelized while the plan with nextval() is not, because nextval() is
marked as parallel-unsafe.  It's not immediately clear why that would
result in more than about a 4X speed difference, given that the parallel
plan is using 4 workers.  But some of the rowcount estimates seem fairly
far off, so I'm betting that the planner is just accidentally lighting on
a decent plan when it's using parallelism while making some poor choices
when it isn't.

The reason for the original form of your problem is likely that we don't
use parallelism at all in non-SELECT queries, so you ended up with a bad
plan even though the nextval() was hidden in a trigger.

What you need to do is get the rowcount estimates nearer to reality
--- those places where you've got estimated rowcount 1 while reality
is tens or hundreds of thousands of rows are just disasters waiting
to bite.  I suspect most of the problem is join conditions like

Join Filter: (CASE WHEN (c.rtype = ANY ('{0,1,7,9}'::bpchar[])) THEN c.rtype ELSE x.rtype END = '2'::bpchar)

The planner just isn't going to have any credible idea how selective
that is.  I wonder to what extent you could fix this by storing
generated columns that represent the derived conditions you want to
filter on.

            regards, tom lane

Re: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 19:12:07

OK - I see. And to add insult to injury, I tried creating a temporary table to store the intermediate results. Then I was going to just do an insert... select... to insert the rows. That would de-couple the nextval() from the query.

Strangely, the first query I tried it on worked great. But, when I tried to add a second set of data with a similar query to the same temporary table, it slowed right down again. And, of course, when I remove the insert, it's fine.

And, of course, your explanation that inserts will not be parallelized must be the reason. I will certainly re-vacuum the tables. I wonder why auto-vacuum didn't collect better stats. vacuum analyze <table> is all I need, right?

As a last resort, what about a PL/PGSQL procedure loop on the query result? Since the insert is very few rows relative to the work the select has to do, I could just turn the insert.. select.. into a for loop. Then the select could be parallel?

What do you think?

On Wed, Nov 4, 2020 at 2:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Eric Raskin <eraskin@paslists.com> writes:
> And, to follow up on your question, the plan shape DOES change when I
> add/remove the nextval() on a plain explain.
> Without nextval(): https://explain.depesz.com/s/SCdY
> With nextval(): https://explain.depesz.com/s/oLPn

Ah, there's your problem, I think: the plan without nextval() is
parallelized while the plan with nextval() is not, because nextval() is
marked as parallel-unsafe. It's not immediately clear why that would
result in more than about a 4X speed difference, given that the parallel
plan is using 4 workers. But some of the rowcount estimates seem fairly
far off, so I'm betting that the planner is just accidentally lighting on
a decent plan when it's using parallelism while making some poor choices
when it isn't.

The reason for the original form of your problem is likely that we don't
use parallelism at all in non-SELECT queries, so you ended up with a bad
plan even though the nextval() was hidden in a trigger.

What you need to do is get the rowcount estimates nearer to reality
--- those places where you've got estimated rowcount 1 while reality
is tens or hundreds of thousands of rows are just disasters waiting
to bite. I suspect most of the problem is join conditions like

Join Filter: (CASE WHEN (c.rtype = ANY ('{0,1,7,9}'::bpchar[])) THEN c.rtype ELSE x.rtype END = '2'::bpchar)

The planner just isn't going to have any credible idea how selective
that is. I wonder to what extent you could fix this by storing
generated columns that represent the derived conditions you want to
filter on.

regards, tom lane

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

From

Tom Lane

Date:

04 November 2020, 19:23:28

Eric Raskin <eraskin@paslists.com> writes:
> And, of course, your explanation that inserts will not be parallelized must
> be the reason.  I will certainly re-vacuum the tables.  I wonder why
> auto-vacuum didn't collect better stats.  vacuum  analyze <table> is all I
> need, right?

Plain ANALYZE is enough to collect stats; but I doubt that'll improve
matters for you.  The problem is basically that the planner can't do
anything with a CASE construct, so you end up with default selectivity
estimates for anything involving a CASE, statistics or no statistics.
You need to try to reformulate the query with simpler join conditions.

> As a last resort, what about a PL/PGSQL procedure loop on the query
> result?  Since the insert is very few rows relative to the work the select
> has to do, I could just turn the insert.. select.. into a for loop.  Then
> the select could be parallel?

Maybe, but you're still skating on a cliff edge.  I think it's pure chance
that the parallelized query is working acceptably well; next month with
slightly different conditions, it might not.

            regards, tom lane

Re: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 19:25:00

OK -- got it. Thanks very much for your help. I'll see what I can do to denormalize the case statements into actual columns to support the queries.

On Wed, Nov 4, 2020 at 2:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Eric Raskin <eraskin@paslists.com> writes:
> And, of course, your explanation that inserts will not be parallelized must
> be the reason. I will certainly re-vacuum the tables. I wonder why
> auto-vacuum didn't collect better stats. vacuum analyze <table> is all I
> need, right?

Plain ANALYZE is enough to collect stats; but I doubt that'll improve
matters for you. The problem is basically that the planner can't do
anything with a CASE construct, so you end up with default selectivity
estimates for anything involving a CASE, statistics or no statistics.
You need to try to reformulate the query with simpler join conditions.

> As a last resort, what about a PL/PGSQL procedure loop on the query
> result? Since the insert is very few rows relative to the work the select
> has to do, I could just turn the insert.. select.. into a for loop. Then
> the select could be parallel?

Maybe, but you're still skating on a cliff edge. I think it's pure chance
that the parallelized query is working acceptably well; next month with
slightly different conditions, it might not.

regards, tom lane

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

From

Tom Lane

Date:

04 November 2020, 19:35:44

... btw, it occurs to me that at least as a stopgap,
"set enable_nestloop = off" would be worth trying.
The killer problem with rowcount-1 estimates is that they
encourage the planner to use nestloops when it shouldn't.

            regards, tom lane

Re: Adding nextval() to a select caused hang/very slow execution

From

Michael Lewis

Date:

04 November 2020, 20:19:53

On Wed, Nov 4, 2020 at 12:12 PM Eric Raskin <eraskin@paslists.com> wrote:

OK - I see. And to add insult to injury, I tried creating a temporary table to store the intermediate results. Then I was going to just do an insert... select... to insert the rows. That would de-couple the nextval() from the query.

Strangely, the first query I tried it on worked great. But, when I tried to add a second set of data with a similar query to the same temporary table, it slowed right down again. And, of course, when I remove the insert, it's fine.

I am not entirely sure I am understanding your process properly, but just a note- If you are getting acceptable results creating the temp table, and the issue is just that you get very bad plans when using it in some query that follows, then it is worth noting that autovacuum does nothing on temp tables and for me it is nearly always worth the small cost to perform an analyze (at least on key fields) after creating a temp table, or rather after inserting/updating/deleting records in a significant way.

Re: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

04 November 2020, 20:39:41

So, things get even weirder. When I execute each individual select statement I am generating from a psql prompt, they all finish very quickly.

If I execute them inside a pl/pgsql block, the second one hangs.

Is there something about execution inside a pl/pgsql block that is different from the psql command line?

On Wed, Nov 4, 2020 at 3:20 PM Michael Lewis <mlewis@entrata.com> wrote:

On Wed, Nov 4, 2020 at 12:12 PM Eric Raskin <eraskin@paslists.com> wrote:
OK - I see. And to add insult to injury, I tried creating a temporary table to store the intermediate results. Then I was going to just do an insert... select... to insert the rows. That would de-couple the nextval() from the query.

Strangely, the first query I tried it on worked great. But, when I tried to add a second set of data with a similar query to the same temporary table, it slowed right down again. And, of course, when I remove the insert, it's fine.

I am not entirely sure I am understanding your process properly, but just a note- If you are getting acceptable results creating the temp table, and the issue is just that you get very bad plans when using it in some query that follows, then it is worth noting that autovacuum does nothing on temp tables and for me it is nearly always worth the small cost to perform an analyze (at least on key fields) after creating a temp table, or rather after inserting/updating/deleting records in a significant way.

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com

Re: Adding nextval() to a select caused hang/very slow execution

From

Tom Lane

Date:

04 November 2020, 21:16:29

Eric Raskin <eraskin@paslists.com> writes:
> So, things get even weirder.   When I execute each individual select
> statement I am generating from a psql prompt, they all finish very
> quickly.
> If I execute them inside a pl/pgsql block, the second one hangs.
> Is there something about execution inside a pl/pgsql block that is
> different from the psql command line?

Generic vs specific plan, perhaps?  Are you passing any parameter
values in from plpgql variables?

IIRC, you could force the matter by using EXECUTE, though it's
somewhat more notationally tedious.  In late-model PG versions,
plan_cache_mode could help you too.

            regards, tom lane

Re: Adding nextval() to a select caused hang/very slow execution

From

Eric Raskin

Date:

05 November 2020, 04:10:00

set enable_nestloop=off did the trick. Execution time when down to seconds per query.

Thanks very much for your help.

On Wed, Nov 4, 2020 at 4:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Eric Raskin <eraskin@paslists.com> writes:
> So, things get even weirder. When I execute each individual select
> statement I am generating from a psql prompt, they all finish very
> quickly.
> If I execute them inside a pl/pgsql block, the second one hangs.
> Is there something about execution inside a pl/pgsql block that is
> different from the psql command line?

Generic vs specific plan, perhaps? Are you passing any parameter
values in from plpgql variables?

IIRC, you could force the matter by using EXECUTE, though it's
somewhat more notationally tedious. In late-model PG versions,
plan_cache_mode could help you too.

regards, tom lane

Eric H. Raskin 914-765-0500 x120 or 315-338-4461 (direct)

Professional Advertising Systems Inc. fax: 914-765-0500 or 315-338-4461 (direct)

3 Morgan Drive #310 eraskin@paslists.com

Mt Kisco, NY 10549 http://www.paslists.com