Thread: much slower query in production

much slower query in production

From

Guillaume Cottenceau

Date:

26 February 2020, 16:17:21

Dear all,

I am facing a much, much slower query in production than on my
development computer using a restored production backup, and I
don't understand why nor I see what I could do to speedup the
query on production :/

Short data model explanation: one table stores tickets, one table
stores multicards; tickets can belong to a multicard through a
many-to-one relationship, or be independant from multicards. The
query I am interested in, will be used to update a new column in
the multicards table, containing the count of related tickets.

On production:

# EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from
multicards;
                                                                                   QUERY PLAN
                                                        
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on multicards  (cost=0.00..1455177.30 rows=204548 width=12) (actual time=0.178..1694987.355 rows=204548
loops=1)
   SubPlan 1
     ->  Aggregate  (cost=7.07..7.08 rows=1 width=8) (actual time=8.283..8.283 rows=1 loops=204548)
           ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..7.05 rows=9 width=0) (actual
time=1.350..8.280rows=6 loops=204548)
 
                 Index Cond: (multicard_uid = multicards.uid)
                 Heap Fetches: 1174940
 Planning Time: 1.220 ms
 Execution Time: 1695029.673 ms

iostat in the middle of this execution:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1 479548 143652  79412 5609292    0    0  1986   116  999 1115  2  2 71 25
 3  1 479548 134312  79412 5614564    0    0  2386   128 2832 3973 20  2 49 28
 0  1 479548 142520  79428 5617708    0    0  1584   232 3440 3716 11  3 58 29
 0  1 479548 161184  79024 5597756    0    0  1922   144 1249 1562  1  2 70 27
 0  1 479548 161244  79048 5600804    0    0  1556   117 2138 3035  6  2 68 25
 0  2 479548 158384  79048 5604008    0    0  1388   402 2970 4320  6  2 66 27

On my development computer:

# EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from
multicards;
                                                                                   QUERY PLAN
                                                        
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on multicards  (cost=0.00..538974.69 rows=204548 width=12) (actual time=0.055..451.691 rows=204548 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=2.57..2.58 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=204548)
           ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..2.56 rows=7 width=0) (actual
time=0.001..0.001rows=6 loops=204548)
 
                 Index Cond: (multicard_uid = multicards.uid)
                 Heap Fetches: 0
 Planning Time: 0.296 ms
 Execution Time: 456.677 ms

The execution time ratio is a huge 3700. I guess the Heap Fetches
difference is the most meaningful here; my understanding would
be, that the index would easily fit in the shared_buffers after a
few subselects, as configuration is 2GB shared_buffers and here's
the index on disk size:

# SELECT relname as index, reltuples as "rows estimate", pg_size_pretty(pg_table_size(quote_ident(relname))) as "on
disksize" FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'i' AND nspname =
'public'AND relname = 'tickets_multicard_uid';
 
        index          | rows estimate | on disk size 
-----------------------+---------------+--------------
 tickets_multicard_uid |    7.2136e+06 | 161 MB

Though it's not too clear for me what "heap fetches" are. It
seems it might be the actual table data fetches (e.g. not an
index fetch), but I don't really know why so many of them are
needed here, and how to reduce that (if that's the explanation
for the longer timing).

It's true that production has constant activity, but not so much
of it, load avg is typically about 0.5, most queries run very
quickly, pg_stat_activity frequently reports no currently running
query; here's iostat during normal production activity, for
reference:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0 480712 151112  10144 5664784    1    1   310    69    0    0  9  1 88  3
 0  0 480712 151312  10152 5664876    0    0     0   248 4911 7072  9  3 87  0
 0  0 480720 165488  10072 5642564    0    4    76   210 2019 2685  6  1 92  1
 0  0 480720 165332  10088 5642508    0    0     0   221 3535 17545 25  4 70  1
 0  0 480720 144772  10108 5643324    0    0    84   378 3833 5096 11  2 80  7
 0  0 480720 143300  10116 5644144    0    0    42   298 3446 4784  6  1 92  1
 0  0 480720 143300  10136 5644256    0    0    10  2340 1073 1496  1  1 96  2

Here's also a second susequent run on production for comparison:

# EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from
multicards;
                                                                                   QUERY PLAN
                                                        
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on multicards  (cost=0.00..1455177.30 rows=204548 width=12) (actual time=0.101..834176.389 rows=204548
loops=1)
   SubPlan 1
     ->  Aggregate  (cost=7.07..7.08 rows=1 width=8) (actual time=4.075..4.075 rows=1 loops=204548)
           ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..7.05 rows=9 width=0) (actual
time=0.624..4.072rows=6 loops=204548)
 
                 Index Cond: (multicard_uid = multicards.uid)
                 Heap Fetches: 1174941
 Planning Time: 0.273 ms
 Execution Time: 834209.323 ms

Heap fetches still almost the same, albeit timing divided by two.

Both environments are running postgresql 11.5, with 2GB
shared_buffers. Differences I can think of: production is using
ext4 on drbd on SATA and linux 3.2, dev is using ext4 (no drbd)
on SSD and linux 4.15. I can't believe SSD would explain the
difference alone? If positive, then I know what we should do on
production..

Thanks for any hints/help!

-- 
Guillaume Cottenceau

Re: much slower query in production

From

Justin Pryzby

Date:

26 February 2020, 16:28:09

On Wed, Feb 26, 2020 at 05:17:21PM +0100, Guillaume Cottenceau wrote:
> On production:
> 
> # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from
multicards;
>                                                                                    QUERY PLAN
                                                          
 
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on multicards  (cost=0.00..1455177.30 rows=204548 width=12) (actual time=0.178..1694987.355 rows=204548
loops=1)
>    SubPlan 1
>      ->  Aggregate  (cost=7.07..7.08 rows=1 width=8) (actual time=8.283..8.283 rows=1 loops=204548)
>            ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..7.05 rows=9 width=0) (actual
time=1.350..8.280rows=6 loops=204548)
 
>                  Index Cond: (multicard_uid = multicards.uid)
>                  Heap Fetches: 1174940
>  Planning Time: 1.220 ms
>  Execution Time: 1695029.673 ms

> The execution time ratio is a huge 3700. I guess the Heap Fetches
> difference is the most meaningful here;

Yes, it's doing an "index only" scan, but not very effectively.
Vacuum the tickets table to set relallvisible and see if that helps.

If so, try to keep it better vacuumed with something like
ALTER TABLE tickets SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);

-- 
Justin

Re: much slower query in production

From

Michael Lewis

Date:

26 February 2020, 16:52:37

By the way, I expect the time is cut in half while heap fetches stays similar because the index is now in OS cache on the second run and didn't need to be fetched from disk. Definitely need to check on vacuuming as Justin says. If you have a fairly active system, you would need to run this query many times in order to push other stuff out of shared_buffers and get this query to perform more like it does on dev.

Do you have the option to re-write the query or is this generated by an ORM? You are forcing the looping as I read this query. If you aggregate before you join, then the system should be able to do a single scan of the index, aggregate, then join those relatively few rows to the multicards table records.

SELECT transaction_uid, COALESCE( sub.count, 0 ) AS count FROM multicards LEFT JOIN (SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub ON sub.multicard_uid = multicards.uid;

Re: much slower query in production

From

Guillaume Cottenceau

Date:

26 February 2020, 18:02:05

Justin Pryzby <pryzby 'at' telsasoft.com> writes:

> On Wed, Feb 26, 2020 at 05:17:21PM +0100, Guillaume Cottenceau wrote:
>> On production:
>> 
>> # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from
multicards;
>>                                                                                    QUERY PLAN
                                                           
 
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Seq Scan on multicards  (cost=0.00..1455177.30 rows=204548 width=12) (actual time=0.178..1694987.355 rows=204548
loops=1)
>>    SubPlan 1
>>      ->  Aggregate  (cost=7.07..7.08 rows=1 width=8) (actual time=8.283..8.283 rows=1 loops=204548)
>>            ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..7.05 rows=9 width=0) (actual
time=1.350..8.280rows=6 loops=204548)
 
>>                  Index Cond: (multicard_uid = multicards.uid)
>>                  Heap Fetches: 1174940
>>  Planning Time: 1.220 ms
>>  Execution Time: 1695029.673 ms
>
>> The execution time ratio is a huge 3700. I guess the Heap Fetches
>> difference is the most meaningful here;
>
> Yes, it's doing an "index only" scan, but not very effectively.
> Vacuum the tickets table to set relallvisible and see if that helps.
>
> If so, try to keep it better vacuumed with something like
> ALTER TABLE tickets SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);

Thanks for your reply! The effect is huge:

# vacuum analyze tickets;
VACUUM
Time: 182850.756 ms (03:02.851)

# EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from
multicards;
                                                                                   QUERY PLAN
                                                        
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on multicards  (cost=0.00..947739.65 rows=204548 width=12) (actual time=15.579..5210.869 rows=204548
loops=1)
   SubPlan 1
     ->  Aggregate  (cost=4.59..4.60 rows=1 width=8) (actual time=0.025..0.025 rows=1 loops=204548)
           ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..4.57 rows=8 width=0) (actual
time=0.022..0.024rows=6 loops=204548)
 
                 Index Cond: (multicard_uid = multicards.uid)
                 Heap Fetches: 8
 Planning Time: 71.469 ms
 Execution Time: 5223.408 ms
(8 rows)

Time: 5332.361 ms (00:05.332)

(and subsequent executions are below 1 second)

It is actually consistent with using a restored backup on the dev
computer, as my understanding is this comes out without any
garbage and like a perfectly vacuumed database. Btw do you have
any hint as to how to perform timings using production data which
are consistent with production? Backup/restore is maybe not the
way to go, but rather a block device level copy?

Since postgresql 8, I have to say I rely entirely on autovacuum,
and did not notice it could really run too infrequently for the
work and create such difference. I see in documentation a default
autovacuum_vacuum_scale_factor = 0.2, is that something that is
typically lowered globally, e.g. maybe on a fairly active system?
I am worried that changing that configuration for that table to
0.005 would fix this query and similar ones, but later I might
face the same situation on other tables. Or how would you elect
tables for a lowered value configuration?

Thanks!

-- 
Guillaume Cottenceau

Re: much slower query in production

From

Guillaume Cottenceau

Date:

26 February 2020, 18:04:02

Michael Lewis <mlewis 'at' entrata.com> writes:

> By the way, I expect the time is cut in half while heap fetches stays similar because the index is now in OS cache on
the
> second run and didn't need to be fetched from disk. Definitely need to check on vacuuming as Justin says. If you have
afairly
 
> active system, you would need to run this query many times in order to push other stuff out of shared_buffers and get
this
> query to perform more like it does on dev.
>
> Do you have the option to re-write the query or is this generated by an ORM? You are forcing the looping as I read
thisquery.
 
> If you aggregate before you join, then the system should be able to do a single scan of the index, aggregate, then
jointhose
 
> relatively few rows to the multicards table records.
>
> SELECT transaction_uid, COALESCE( sub.count, 0 ) AS count FROM multicards LEFT JOIN (SELECT multicard_uid, COUNT(*)
AScount
 
> FROM tickets GROUP BY multicard_uid ) AS sub ON sub.multicard_uid = multicards.uid;

Thanks for this hint! I always hit this fact that I never write
good queries using explicit joins :/

Execution time (before vacuuming the table as adviced by Justin)
down 38x to 44509ms using this query :)

Real query was an UPDATE of the multicards table to set the count
value. I rewrote this using your approach but I think I lack what
coalesce did in your query, this would update only the rows where
count >= 1 obviously:

UPDATE multicards
   SET defacements = count
  FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub
 WHERE uid = multicard_uid;

Any hinted solution to do that in one pass? I could do a first
pass setting defacements = 0, but that would produce more garbage :/

Thanks!

-- 
Guillaume Cottenceau

Re: much slower query in production

From

Michael Lewis

Date:

26 February 2020, 18:18:53

UPDATE multicards
SET defacements = COALESCE( count, 0 )
FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub
WHERE uid = multicard_uid OR multicard_uid is null;

I expect this should work. Not sure of performance of course.

Re: much slower query in production

From

"MichaelDBA@sqlexec.com"

Date:

26 February 2020, 19:18:31

Vacuum everything that you restored

Sent from my iPhone

On Feb 26, 2020, at 1:19 PM, Michael Lewis <mlewis@entrata.com> wrote:

UPDATE multicards
SET defacements = COALESCE( count, 0 )
FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub
WHERE uid = multicard_uid OR multicard_uid is null;

I expect this should work. Not sure of performance of course.

Re: much slower query in production

From

Guillaume Cottenceau

Date:

26 February 2020, 19:37:48

Michael Lewis <mlewis 'at' entrata.com> writes:

>  UPDATE multicards
>  SET defacements = COALESCE( count, 0 )
>  FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub
>  WHERE uid = multicard_uid OR multicard_uid is null;
>
> I expect this should work. Not sure of performance of course.

This looked great but as it seems you suspected, it's very slow :/
I interrupted it after 5 minutes run on my dev computer.

-- 
Guillaume Cottenceau

Re: much slower query in production

From

Jeff Janes

Date:

26 February 2020, 22:59:56

On Wed, Feb 26, 2020 at 11:17 AM Guillaume Cottenceau <gc@mnc.ch> wrote:

Dear all,

I am facing a much, much slower query in production than on my
development computer using a restored production backup, and I
don't understand why nor I see what I could do to speedup the
query on production :/

You've already seen a VACUUM fixed it.

This must have been a logical restore (from pg_dump), not a physical restore (from pg_basebackup for example) correct?

A physical restore should have resulted in a database in the same state of vacuuming as its source was. A logical restore will not, and since it is an append-only process it will not trigger autovacuum to run, either.

If you do a logical restore, probably the first should you do afterwards is a VACUUM ANALYZE.

Cheers,

Jeff

Re: much slower query in production

From

Jeff Janes

Date:

26 February 2020, 23:17:58

On Wed, Feb 26, 2020 at 1:02 PM Guillaume Cottenceau <gc@mnc.ch> wrote:

It is actually consistent with using a restored backup on the dev
computer, as my understanding is this comes out without any
garbage and like a perfectly vacuumed database.

I think I got that backwards in my previous email. It is the dev that was restored, not the prod? But unless you went out of your way to vacuum dev, it would not be perfectly vacuumed. If it were a logical restore, it would be perfectly unvacuumed, and if a physical restore would be in the same state of vacuuming as the database it was cloned from.

Btw do you have
any hint as to how to perform timings using production data which
are consistent with production? Backup/restore is maybe not the
way to go, but rather a block device level copy?

block device copy seems like overkill, just using pg_basebackup should be good enough.

Since postgresql 8, I have to say I rely entirely on autovacuum,
and did not notice it could really run too infrequently for the
work and create such difference. I see in documentation a default
autovacuum_vacuum_scale_factor = 0.2, is that something that is
typically lowered globally, e.g. maybe on a fairly active system?
I am worried that changing that configuration for that table to
0.005 would fix this query and similar ones, but later I might
face the same situation on other tables. Or how would you elect
tables for a lowered value configuration?

The autovacuum system has never been redesigned with the needs of index-only-scans in mind. If I have a table for which index-only scans are important, I'd set autovacuum_vacuum_scale_factor = 0 and set autovacuum_vacuum_threshold to about 5% of the number of blocks in the table. There is no syntax to say '5% of the number of blocks in the table' so you have to compute it yourself and hardcode the result, which makes it unsuitable for a global setting. And this still only addresses UPDATE and DELETE operations, not INSERTs. If you have INSERT only or mostly table for which index-only-scans are important, you might need to set up cron jobs to do vacuuming.

Cheers,

Jeff

Re: much slower query in production

From

Guillaume Cottenceau

Date:

27 February 2020, 08:31:45

Jeff Janes <jeff.janes 'at' gmail.com> writes:

> On Wed, Feb 26, 2020 at 1:02 PM Guillaume Cottenceau <gc@mnc.ch> wrote:
>
>  It is actually consistent with using a restored backup on the dev
>  computer, as my understanding is this comes out without any
>  garbage and like a perfectly vacuumed database.
>
> I think I got that backwards in my previous email. It is the
> dev that was restored, not the prod? But unless you went out of

Yes (prod was also restored not so long ago, when updating to pg
11.5 tho).

> your way to vacuum dev, it would not be perfectly vacuumed. If
> it were a logical restore, it would be perfectly unvacuumed,
> and if a physical restore would be in the same state of
> vacuuming as the database it was cloned from.
>
>  Btw do you have
>  any hint as to how to perform timings using production data which
>  are consistent with production? Backup/restore is maybe not the
>  way to go, but rather a block device level copy?
>
> block device copy seems like overkill, just using pg_basebackup should be good enough.
>
>  Since postgresql 8, I have to say I rely entirely on autovacuum,
>  and did not notice it could really run too infrequently for the
>  work and create such difference. I see in documentation a default
>  autovacuum_vacuum_scale_factor = 0.2, is that something that is
>  typically lowered globally, e.g. maybe on a fairly active system?
>  I am worried that changing that configuration for that table to
>  0.005 would fix this query and similar ones, but later I might
>  face the same situation on other tables. Or how would you elect
>  tables for a lowered value configuration?
>
> The autovacuum system has never been redesigned with the needs of index-only-scans in mind. If I have a table for
which
> index-only scans are important, I'd set autovacuum_vacuum_scale_factor = 0 and set autovacuum_vacuum_threshold to
about5% of
 
> the number of blocks in the table. There is no syntax to say '5% of the number of blocks in the table' so you have to
compute
> it yourself and hardcode the result, which makes it unsuitable for a global setting. And this still only addresses
UPDATEand
 

It seems also difficult for us as this table grows over time (and
is trimmed only infrequently).

> DELETE operations, not INSERTs. If you have INSERT only or mostly table for which index-only-scans are important, you
might
> need to set up cron jobs to do vacuuming.

Thanks!

-- 
Guillaume Cottenceau