Thread: autovacuum failing on pg_largeobject and disk usage of the pg_largeobjectgrowing unchecked

We have a cloud service that uses PostgreSQL to temporarily store binary
content. We're using PostgreSQL's Large Objects to store the binary
content. Each large object lives anywhere from a few hundred milliseconds
to 5-10 minutes, after which it is deleted.

Normally, this works just fine and we have no issues. However, roughly
every 3 months or so, disk usage on our PostgreSQL database starts to
increase at an alarming rate. More specifically, the pg_largeobject
table's disk usage steadily increases even though we've deleted most of
the large objects. This will continue until the database server runs out
of disk space (and then all sorts of bad things happen of course).

It looks to us like autovacuum against the pg_largeobject table stops
working because the pg_stat_all_tables.last_autovacuum column stops
updating, or is updated infrequently, while it is still updated on other
tables:


 schemaname |         relname         | n_live_tup | n_dead_tup |
last_autovacuum
------------+-------------------------+------------+------------+-------------------------------
 pg_catalog | pg_largeobject          |      37205 |   92614852 |
2020-06-15 01:55:09.037101+00
 pg_toast   | pg_toast_973434724      |        281 |       3701 |
2020-06-15 17:22:54.746452+00
 pg_catalog | pg_largeobject_metadata |        320 |       1236 |
2020-06-15 17:23:49.765878+00
 pg_catalog | pg_shdepend             |        344 |        933 |
2020-06-15 17:23:54.448855+00
 pg_toast   | pg_toast_2619           |         52 |         85 |
2020-06-15 17:22:01.387933+00
 pg_catalog | pg_statistic            |        413 |        100 |
2020-06-15 17:15:52.656791+00
 pg_catalog | pg_class                |        349 |          2 |
2020-06-12 17:58:13.147931+00
 pg_catalog | pg_attribute            |       2633 |          1 |
2020-06-12 17:58:13.000114+00


Looking at our PostgreSQL logs, it looks like the autovacuum task takes
increasingly more time to run (against the pg_largeobject table):

2020-06-12T19:41:58.335931494Z stderr F         system usage: CPU: user:
0.02 s, system: 0.00 s, elapsed: 1.77 s
2020-06-12T19:42:59.704884752Z stderr F         system usage: CPU: user:
0.02 s, system: 0.02 s, elapsed: 3.06 s
2020-06-12T19:44:01.928300512Z stderr F         system usage: CPU: user:
0.06 s, system: 0.01 s, elapsed: 5.44 s
2020-06-12T19:45:14.124209167Z stderr F         system usage: CPU: user:
0.11 s, system: 0.05 s, elapsed: 17.13 s
2020-06-12T19:46:16.28758936Z stderr F  system usage: CPU: user: 0.13 s,
system: 0.08 s, elapsed: 19.04 s
2020-06-12T19:47:34.264882626Z stderr F         system usage: CPU: user:
0.20 s, system: 0.19 s, elapsed: 36.22 s
2020-06-12T19:49:15.383436343Z stderr F         system usage: CPU: user:
0.28 s, system: 0.38 s, elapsed: 74.06 s
2020-06-12T19:53:47.229361981Z stderr F         system usage: CPU: user:
0.66 s, system: 1.06 s, elapsed: 214.12 s
2020-06-12T20:19:39.619748109Z stderr F         system usage: CPU: user:
4.13 s, system: 5.30 s, elapsed: 1461.16 s
2020-06-12T21:30:31.634637945Z stderr F         system usage: CPU: user:
13.73 s, system: 19.59 s, elapsed: 4225.61 s
2020-06-12T23:54:32.511015886Z stderr F         system usage: CPU: user:
33.57 s, system: 41.90 s, elapsed: 8514.23 s
2020-06-13T04:23:32.230960572Z stderr F         system usage: CPU: user:
66.09 s, system: 82.95 s, elapsed: 16011.25 s
2020-06-13T12:00:37.43434175Z stderr F  system usage: CPU: user: 99.42 s,
system: 130.48 s, elapsed: 27296.71 s
2020-06-14T02:40:25.19122979Z stderr F  system usage: CPU: user: 202.96 s,
system: 263.66 s, elapsed: 52653.66 s
2020-06-15T01:55:09.03766272Z stderr F  system usage: CPU: user: 317.54 s,
system: 544.48 s, elapsed: 83550.21 s

We have several instances of our cloud service, and each instance has it's
own database. On other instances, the autovacuum elapsed is consistently
less than 25 seconds on every run.

Other than the increasing elapsed times for the autovacuum, we don't see
any other indication in the logs of a problem (no error messages, etc).

We're currently using PostgreSQL version 10.10. Our service is JVM-based
and we're using the PostgreSQL JDBC driver version 42.2.5.

Have we stumbled upon a potential bug here, or do we need to tweak some
autovacuum settings?  What should we look at next or what should we try
next to further troubleshoot this?

Regards,

Jim Hurne





On Tue, Jun 16, 2020 at 10:01 AM Jim Hurne <jhurne@us.ibm.com> wrote:
Other than the increasing elapsed times for the autovacuum, we don't see
any other indication in the logs of a problem (no error messages, etc).

We're currently using PostgreSQL version 10.10. Our service is JVM-based
and we're using the PostgreSQL JDBC driver version 42.2.5.

Have we stumbled upon a potential bug here, or do we need to tweak some
autovacuum settings?  What should we look at next or what should we try
next to further troubleshoot this?


What are your current autovacuum settings? Do you have long running transactions with any frequency? Decreasing autovacuum_vacuum_cost_delay to 1 or 2ms may be prudent (default changes from 20ms down to 2ms with PG 12). Is this a destructive queue of sorts with no rows permanently stored? If so, I would expect that a daily scheduled vacuum analyze may be the best course of action.
Thanks Michael,

Here are our current autovacuum settings:

 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 1min
 autovacuum_vacuum_cost_delay        | 20ms
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
 log_autovacuum_min_duration         | 1s

>  Is this a destructive queue of sorts with no rows permanently stored?

Essentially, yes. The system can create and delete objects at a high rate.
Here are some recent stats on the pg_largeobject table:

# select * from pg_stat_all_tables where relname = 'pg_largeobject';
-[ RECORD 1 ]-------+---------------
relid               | 2613
schemaname          | pg_catalog
relname             | pg_largeobject
seq_scan            | 0
seq_tup_read        | 0
idx_scan            | 66619475
idx_tup_fetch       | 126816721
n_tup_ins           | 57580140
n_tup_upd           | 188474
n_tup_del           | 57640395
n_tup_hot_upd       | 108845
n_live_tup          | 47771
n_dead_tup          | 57787135
n_mod_since_analyze | 115409009
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

# SELECT pg_size_pretty( pg_total_relation_size('pg_largeobject') );
-[ RECORD 1 ]--+-------
pg_size_pretty | 350 GB

The total size of the actual undeleted large objects is about 60 MB.

Regards,

Jim Hurne




From:   Michael Lewis <mlewis@entrata.com>
To:     Jim Hurne <jhurne@us.ibm.com>
Cc:     PostgreSQL General <pgsql-general@lists.postgresql.org>
Date:   06/16/2020 01:06 PM
Subject:        [EXTERNAL] Re: autovacuum failing on pg_largeobject and
disk usage of the pg_largeobject growing unchecked



On Tue, Jun 16, 2020 at 10:01 AM Jim Hurne <jhurne@us.ibm.com> wrote:
Other than the increasing elapsed times for the autovacuum, we don't see
any other indication in the logs of a problem (no error messages, etc).

We're currently using PostgreSQL version 10.10. Our service is JVM-based
and we're using the PostgreSQL JDBC driver version 42.2.5.

Have we stumbled upon a potential bug here, or do we need to tweak some
autovacuum settings?  What should we look at next or what should we try
next to further troubleshoot this?


What are your current autovacuum settings? Do you have long
running transactions with any frequency? Decreasing
autovacuum_vacuum_cost_delay to 1 or 2ms may be prudent (default changes
from 20ms down to 2ms with PG 12). Is this a destructive queue of sorts
with no rows permanently stored? If so, I would expect that a daily
scheduled vacuum analyze may be the best course of action.






On Tue, Jun 16, 2020 at 1:45 PM Jim Hurne <jhurne@us.ibm.com> wrote:
Thanks Michael,

Here are our current autovacuum settings:

 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 1min
 autovacuum_vacuum_cost_delay        | 20ms
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
 log_autovacuum_min_duration         | 1s

With no autovacuum_vacuum_cost_limit and autovacuum_work_mem set the same as maintenance_work_mem, I wouldn't expect any difference between the performance of manual vs auto vacuum. Still, if you run a manual vacuum analyze verbose, what sort of output do you get? What is maintenance_work_mem set to? Are there indexes on this table that perhaps are very large and needing to be rescanned many times because maintenance_work_mem isn't high enough to handle in a single pass? You might try "create index concurrently, drop index concurrently, & rename index" (reindex concurrently if you were on PG 12) as a sort of online 'vacuum full' on the index(es).

By the way, the best practices for these mailing list suggest partial quoting and responding in-line or below, not "top posting" with the entire conversation below.
På tirsdag 16. juni 2020 kl. 17:59:37, skrev Jim Hurne <jhurne@us.ibm.com>:
We have a cloud service that uses PostgreSQL to temporarily store binary
content. We're using PostgreSQL's Large Objects to store the binary
content. Each large object lives anywhere from a few hundred milliseconds
to 5-10 minutes, after which it is deleted.
[...]
 
In my experience vacuumlo, https://www.postgresql.org/docs/12/vacuumlo.html,  is needed to remove large objects, before vacuum can remove them from pg_largeobject.
 
--
Andreas Joseph Krogh
> In my experience vacuumlo,
https://www.postgresql.org/docs/12/vacuumlo.html,  is needed to remove
large objects, before vacuum can remove them from pg_largeobject.

Thanks for the suggestion! vacuumlo deletes any orphaned large objects,
presumably by using lo_unlink. While we don't use vacuumlo, our service
does clean up orphaned large objects in a similar way. And indeed,
inspecting the database reveals that there are no orphaned large objects.

Regards,

Jim Hurne




Michael Lewis <mlewis@entrata.com> wrote on 06/16/2020 04:41:16 PM:
> Still, if you run a manual vacuum analyze verbose, what sort of output
do you get?

On one of the instances that is exhibiting the "disk leak" behavior, the
VACUUM ANALYZE VERBOSE command doesn't generate any output or complete
before I loose the connection to the database (presumably because I hit a
connection read timeout). Is it possible to configure th read timeout for
psql?

One some of our healthy instances, we were able to run VACUUM ANALYZE
VERBOSE:

=> vacuum verbose analyze pg_largeobject;
INFO:  vacuuming "pg_catalog.pg_largeobject"
INFO:  scanned index "pg_largeobject_loid_pn_index" to remove 630 row
versions
DETAIL:  CPU: user: 0.06 s, system: 0.10 s, elapsed: 0.17 s
INFO:  "pg_largeobject": removed 630 row versions in 190 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "pg_largeobject_loid_pn_index" now contains 533 row versions
in 18346 pages
DETAIL:  630 index row versions were removed.
18340 index pages have been deleted, 18339 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_largeobject": found 577 removable, 533 nonremovable row
versions in 399 out of 399 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 69015245
There were 550 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.06 s, system: 0.11 s, elapsed: 0.17 s.
INFO:  "pg_largeobject": truncated 399 to 305 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  analyzing "pg_catalog.pg_largeobject"
INFO:  "pg_largeobject": scanned 305 of 305 pages, containing 533 live
rows and 0 dead rows; 533 rows in sample, 533 estimated total rows
VACUUM


> Are there indexes on this table that perhaps are very large and
> needing to be rescanned many times because maintenance_work_mem
> isn't high enough to handle in a single pass?

On the same healthy instance where we were able to run the VACUUM ANALYZE
VERBOSE, the index size does seem reasonably small:

=> SELECT pg_size_pretty (pg_indexes_size('pg_largeobject'));
-[ RECORD 1 ]--+-------
pg_size_pretty | 143 MB

But on the unhealthy instance, it is much larger:

=> SELECT pg_size_pretty (pg_indexes_size('pg_largeobject'));
 pg_size_pretty
----------------
 7241 MB
(1 row)

But it isn't clear to me if the index size is a symptom or if it's the
actual root cause.

> You might try "create index concurrently, drop index concurrently,
> & rename index" (reindex concurrently if you were on
> PG 12) as a sort of online 'vacuum full' on the index(es).

Unfortunately, since pg_largeobject is a system table, the user we use to
connect to the database doesn't have permissions to do this. We get a "
must be owner of relation pg_largeobject" error when we try to create the
replacement index (using CREATE INDEX CONCURRENTLY).

> By the way, the best practices for these mailing list suggest
> partial quoting and responding in-line or below, not "top posting"
> with the entire conversation below.

My mistake! Sorry about that.


Regards,

Jim Hurne





"Jim Hurne" <jhurne@us.ibm.com> writes:
> On one of the instances that is exhibiting the "disk leak" behavior, the
> VACUUM ANALYZE VERBOSE command doesn't generate any output or complete
> before I loose the connection to the database (presumably because I hit a
> connection read timeout). Is it possible to configure th read timeout for
> psql?

You could fool with your tcp timeout settings:

https://www.postgresql.org/docs/current/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS

            regards, tom lane



On Wed, 2020-06-17 at 14:26 -0400, Jim Hurne wrote:
> On one of the instances that is exhibiting the "disk leak" behavior, the 
> VACUUM ANALYZE VERBOSE command doesn't generate any output or complete 
> before I loose the connection to the database (presumably because I hit a 
> connection read timeout). Is it possible to configure th read timeout for 
> psql?

I have never heard about a connection read timeout for "psql".

I'd look into the server log; perhaps there is an error that indicates
data curruption that crashes the server?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




We're still struggling to figure out why autovacuum seems to be failing or
is ineffective sometimes against the pg_largeobject table.

We decided to try a VACUUM FULL VERBOSE on one of our problematic
databases. It did complete, but it took a surprisingly long time:

  INFO:  vacuuming "pg_catalog.pg_largeobject"
  INFO:  "pg_largeobject": found 97326130 removable, 22230 nonremovable
row versions in 207508868 pages
  DETAIL:  0 dead row versions cannot be removed yet.
  CPU: user: 1085.87 s, system: 3803.66 s, elapsed: 10883.94 s.
  VACUUM

So it took about 3 hours.  It's surprising because there was only 58 MB of
undeleted data in about 65 large objects. When we ran this particular
vacuum, we made sure there were no other processes connected to the DB and
that no other statements were running that would have locked the table.
It's my understanding that a VACUUM FULL works by creating a brand new
table and copying the "non-deleted" data into the new table. If that's the
case, I would have expected it to complete very quickly, given how little
"non-deleted" data existed at the time.

We are of course going to continue to try different things, but does
anyone have any other suggestions on what we should be looking at or what
settings we might want to adjust?

For reference, here was my original problem description:

> We have a cloud service that uses PostgreSQL to temporarily store binary

> content. We're using PostgreSQL's Large Objects to store the binary
> content. Each large object lives anywhere from a few hundred
milliseconds
> to 5-10 minutes, after which it is deleted.
>
> Normally, this works just fine and we have no issues. However, roughly
> every 3 months or so, disk usage on our PostgreSQL database starts to
> increase at an alarming rate. More specifically, the pg_largeobject
> table's disk usage steadily increases even though we've deleted most of
> the large objects. This will continue until the database server runs out

> of disk space (and then all sorts of bad things happen of course).
>
> It looks to us like autovacuum against the pg_largeobject table stops
> working because the pg_stat_all_tables.last_autovacuum column stops
> updating, or is updated infrequently, while it is still updated on other

> tables.
>
> We're currently using PostgreSQL version 10.10. Our service is JVM-based

> and we're using the PostgreSQL JDBC driver version 42.2.5.


Regards,

Jim Hurne




On Mon, Jun 22, 2020 at 10:01 PM Jim Hurne <jhurne@us.ibm.com> wrote:
We're still struggling to figure out why autovacuum seems to be failing or
is ineffective sometimes against the pg_largeobject table.

We decided to try a VACUUM FULL VERBOSE on one of our problematic
databases. It did complete, but it took a surprisingly long time:

  INFO:  vacuuming "pg_catalog.pg_largeobject"
  INFO:  "pg_largeobject": found 97326130 removable, 22230 nonremovable
row versions in 207508868 pages
  DETAIL:  0 dead row versions cannot be removed yet.
  CPU: user: 1085.87 s, system: 3803.66 s, elapsed: 10883.94 s.
  VACUUM

So it took about 3 hours.  It's surprising because there was only 58 MB of
undeleted data in about 65 large objects. When we ran this particular
vacuum, we made sure there were no other processes connected to the DB and
that no other statements were running that would have locked the table.
It's my understanding that a VACUUM FULL works by creating a brand new
table and copying the "non-deleted" data into the new table. If that's the
case, I would have expected it to complete very quickly, given how little
"non-deleted" data existed at the time.

Yes, that's how VACUUM FULL works (more or less). 

It had to read 207508868 pages, which is about 1.5TB worth of data (207508868 pages of 8k), in order to know it didn't need it. That's probably what took time. Depending on the performance of the machine, it does not seem unreasonable. (The problem, of course, being that it got to this big size with that little actual useful data in the table)

And for autovacuum, with a cost_delay of 20ms and a cost_limit of 200, autovacuum would spend about 55 hours just on vacuum delay for the reads (assuming all are cache failures and thus cost 10 "units", but that's probably close enough to give you an idea) if my math isn't off (reading 207508868 pages would then trigger the cost limit 207508868/20 times, and sleep 20ms each of those times).


We are of course going to continue to try different things, but does
anyone have any other suggestions on what we should be looking at or what
settings we might want to adjust?

The logs you posted originally seem to be excluding the actual autovacuum details -- can you include those? That is, you are only including the very last row of the log message, but the interesting parts are at the beginning.

I assume you've also looked for other autovacuum messages in the log -- such as it being canceled by concurrent activity?

--
On Tue, Jun 16, 2020 at 2:41 PM Michael Lewis <mlewis@entrata.com> wrote:
On Tue, Jun 16, 2020 at 1:45 PM Jim Hurne <jhurne@us.ibm.com> wrote:
Thanks Michael,

Here are our current autovacuum settings:

 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 1min
 autovacuum_vacuum_cost_delay        | 20ms
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
 log_autovacuum_min_duration         | 1s

With no autovacuum_vacuum_cost_limit and autovacuum_work_mem set the same as maintenance_work_mem, I wouldn't expect any difference between the performance of manual vs auto vacuum. 

This was flawed. -1 for autovacuum_vacuum_cost_limit doesn't mean unlimited, it means inherit the manual vacuum cost limit which seems likely to be the default 200 units. Now that you have the manual vacuum done, it might be prudent to set the autovacuum_vacuum_cost_delay to 1 or 2ms globally, and watch to see if it stays caught up.
    Jim Hurne wrote:

> We are of course going to continue to try different things, but does
> anyone have any other suggestions on what we should be looking at or what
> settings we might want to adjust?

If you can arrange a maintenance window, a faster way to rebuild
pg_largeobject when it contains mostly empty pages can be to:

- export the contents into files:
 for id in select oid from pg_largeobject_metadata
 loop
   perform lo_export(id, '/tmp-path/'||id::text);
 end loop;

- SET allow_system_table_mods to ON; (needs a restart)

- truncate table pg_largeobject, pg_largeobject_metadata;

- reimport the files with the same OIDs
  for id in select pg_ls_dir('/tmp-path/')
  loop
     perform lo_import('/tmp-path/' || id::text, id::oid);
  end loop;

- remove the files in /tmp-path

- Set allow_system_table_mods back to OFF and restart again, unless
 you don't need that safety check and prefer to leave it permanently to ON
 to avoid the restarts.

With less than 60MB of actual contents, all this might take no more
than a few minutes, as these operations don't need to fully scan
pg_largeobject, which is what is problematic with vacuum.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



Magnus Hagander <magnus@hagander.net> wrote on 06/22/2020 04:44:33 PM:
> Yes, that's how VACUUM FULL works (more or less).

Thanks for the confirmation and further explanation Magnus! I'm definitely
getting a better idea of how the vacuum and autovacuum facilities work.


> And for autovacuum, with a cost_delay of 20ms and a cost_limit of
> 200, autovacuum would spend about 55 hours just on vacuum delay for
> the reads...

Ah, very interesting. That explains (at least in part) why the autovacuums
seem to be taking a long time. We'll consider adjusting the cost_delay and
cost_limit settings. Is there any downsides to setting a very low
cost_delay (maybe even 0ms), or to setting a large cost_limit (such as
100,000)?


> The logs you posted originally seem to be excluding the actual
> autovacuum details -- can you include those?

Sure! Below are more of the details from the same set of logs. Looking at
them myself, I see that there is always some percentage of tuples that are
dead but are not yet removable. And that number increases on every vacuum,
which might explain in part why autovacuum elapsed times keep increasing.

What causes a dead tuple to be unremovable?

-- LOGS --
2020-06-12T19:41:58.335881879Z stderr F 2020-06-12 19:41:58 UTC [528141]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:41:58.335899718Z stderr F         pages: 0 removed, 8649
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:41:58.335908658Z stderr F         tuples: 0 removed, 30445
remain, 17410 are dead but not yet removable, oldest xmin: 1942750826
2020-06-12T19:41:58.335917078Z stderr F         buffer usage: 17444 hits,
0 misses, 0 dirtied
2020-06-12T19:41:58.335925217Z stderr F         avg read rate: 0.000 MB/s,
avg write rate: 0.000 MB/s
2020-06-12T19:41:58.335931494Z stderr F         system usage: CPU: user:
0.02 s, system: 0.00 s, elapsed: 1.77 s
2020-06-12T19:42:59.704833148Z stderr F 2020-06-12 19:42:59 UTC [528231]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T19:42:59.704852755Z stderr F         pages: 0 removed, 14951
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:42:59.704861911Z stderr F         tuples: 2 removed, 53054
remain, 27127 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:42:59.704870553Z stderr F         buffer usage: 30144 hits,
0 misses, 3 dirtied
2020-06-12T19:42:59.704878195Z stderr F         avg read rate: 0.000 MB/s,
avg write rate: 0.008 MB/s
2020-06-12T19:42:59.704884752Z stderr F         system usage: CPU: user:
0.02 s, system: 0.02 s, elapsed: 3.06 s
2020-06-12T19:44:01.928225496Z stderr F 2020-06-12 19:44:01 UTC [528326]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:44:01.928265589Z stderr F         pages: 0 removed, 22395
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:44:01.928276366Z stderr F         tuples: 0 removed, 76562
remain, 37235 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:44:01.9282856Z stderr F   buffer usage: 45131 hits, 0
misses, 433 dirtied
2020-06-12T19:44:01.928293976Z stderr F         avg read rate: 0.000 MB/s,
avg write rate: 0.621 MB/s
2020-06-12T19:44:01.928300512Z stderr F         system usage: CPU: user:
0.06 s, system: 0.01 s, elapsed: 5.44 s
2020-06-12T19:45:14.124140716Z stderr F 2020-06-12 19:45:14 UTC [528433]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:45:14.124178864Z stderr F         pages: 0 removed, 31029
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:45:14.124187376Z stderr F         tuples: 0 removed, 104441
remain, 65915 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:45:14.124196608Z stderr F         buffer usage: 57692 hits,
4834 misses, 2095 dirtied
2020-06-12T19:45:14.124203046Z stderr F         avg read rate: 2.204 MB/s,
avg write rate: 0.955 MB/s
2020-06-12T19:45:14.124209167Z stderr F         system usage: CPU: user:
0.11 s, system: 0.05 s, elapsed: 17.13 s
2020-06-12T19:46:16.287517553Z stderr F 2020-06-12 19:46:16 UTC [528529]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:46:16.287558376Z stderr F         pages: 0 removed, 36257
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:46:16.287567454Z stderr F         tuples: 0 removed, 122273
remain, 81080 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:46:16.287575752Z stderr F         buffer usage: 62700 hits,
10366 misses, 76 dirtied
2020-06-12T19:46:16.287582866Z stderr F         avg read rate: 4.252 MB/s,
avg write rate: 0.031 MB/s
2020-06-12T19:46:16.28758936Z stderr F  system usage: CPU: user: 0.13 s,
system: 0.08 s, elapsed: 19.04 s
2020-06-12T19:47:34.264816546Z stderr F 2020-06-12 19:47:34 UTC [528615]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:47:34.264850177Z stderr F         pages: 0 removed, 46373
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:47:34.264860322Z stderr F         tuples: 0 removed, 154605
remain, 107134 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:47:34.264869028Z stderr F         buffer usage: 67990 hits,
25507 misses, 51 dirtied
2020-06-12T19:47:34.264876424Z stderr F         avg read rate: 5.501 MB/s,
avg write rate: 0.011 MB/s
2020-06-12T19:47:34.264882626Z stderr F         system usage: CPU: user:
0.20 s, system: 0.19 s, elapsed: 36.22 s
2020-06-12T19:49:15.383346478Z stderr F 2020-06-12 19:49:15 UTC [528730]:
[4-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T19:49:15.383401164Z stderr F         pages: 0 removed, 59435
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:49:15.383411744Z stderr F         tuples: 27 removed, 196652
remain, 147872 are dead but not yet removable, oldest xmin: 1942751033
2020-06-12T19:49:15.383422329Z stderr F         buffer usage: 75546 hits,
44396 misses, 3587 dirtied
2020-06-12T19:49:15.3834291Z stderr F   avg read rate: 4.683 MB/s, avg
write rate: 0.378 MB/s
2020-06-12T19:49:15.383436343Z stderr F         system usage: CPU: user:
0.28 s, system: 0.38 s, elapsed: 74.06 s
2020-06-12T19:53:47.229279118Z stderr F 2020-06-12 19:53:47 UTC [529009]:
[2-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T19:53:47.229328267Z stderr F         pages: 0 removed, 83155
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:53:47.229337568Z stderr F         tuples: 370 removed,
277040 remain, 235763 are dead but not yet removable, oldest xmin:
1942751716
2020-06-12T19:53:47.229346494Z stderr F         buffer usage: 85923 hits,
82515 misses, 23329 dirtied
2020-06-12T19:53:47.229354327Z stderr F         avg read rate: 3.011 MB/s,
avg write rate: 0.851 MB/s
2020-06-12T19:53:47.229361981Z stderr F         system usage: CPU: user:
0.66 s, system: 1.06 s, elapsed: 214.12 s
2020-06-12T20:19:39.619667488Z stderr F 2020-06-12 20:19:39 UTC [529747]:
[4-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T20:19:39.619715627Z stderr F         pages: 0 removed, 163425
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T20:19:39.619725381Z stderr F         tuples: 269281 removed,
264998 remain, 264357 are dead but not yet removable, oldest xmin:
1942829786
2020-06-12T20:19:39.619733911Z stderr F         buffer usage: 168166 hits,
251776 misses, 238753 dirtied
2020-06-12T20:19:39.619740781Z stderr F         avg read rate: 1.346 MB/s,
avg write rate: 1.277 MB/s
2020-06-12T20:19:39.619748109Z stderr F         system usage: CPU: user:
4.13 s, system: 5.30 s, elapsed: 1461.16 s
2020-06-12T21:30:31.634549669Z stderr F 2020-06-12 21:30:31 UTC [532994]:
[5-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T21:30:31.634595582Z stderr F         pages: 0 removed, 538082
remain, 2 skipped due to pins, 36681 skipped frozen
2020-06-12T21:30:31.634606752Z stderr F         tuples: 1531965 removed,
153799 remain, 80305 are dead but not yet removable, oldest xmin:
1943217115
2020-06-12T21:30:31.634616083Z stderr F         buffer usage: 580519 hits,
953897 misses, 934221 dirtied
2020-06-12T21:30:31.634627566Z stderr F         avg read rate: 1.764 MB/s,
avg write rate: 1.727 MB/s
2020-06-12T21:30:31.634637945Z stderr F         system usage: CPU: user:
13.73 s, system: 19.59 s, elapsed: 4225.61 s
2020-06-12T23:54:32.51093334Z stderr F 2020-06-12 23:54:32 UTC [542595]:
[5-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T23:54:32.510977347Z stderr F         pages: 0 removed, 1389179
remain, 0 skipped due to pins, 267855 skipped frozen
2020-06-12T23:54:32.510988778Z stderr F         tuples: 3292911 removed,
515165 remain, 505391 are dead but not yet removable, oldest xmin:
1944778837
2020-06-12T23:54:32.510998568Z stderr F         buffer usage: 1258882
hits, 2096188 misses, 2108207 dirtied
2020-06-12T23:54:32.51100648Z stderr F  avg read rate: 1.923 MB/s, avg
write rate: 1.934 MB/s
2020-06-12T23:54:32.511015886Z stderr F         system usage: CPU: user:
33.57 s, system: 41.90 s, elapsed: 8514.23 s
2020-06-13T04:23:32.230860795Z stderr F 2020-06-13 04:23:32 UTC [559963]:
[4-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-13T04:23:32.230910901Z stderr F         pages: 0 removed, 2465148
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-13T04:23:32.230925263Z stderr F         tuples: 5248384 removed,
2733516 remain, 2705020 are dead but not yet removable, oldest xmin:
1946587052
2020-06-13T04:23:32.230936603Z stderr F         buffer usage: 2644322
hits, 4162765 misses, 4133675 dirtied
2020-06-13T04:23:32.230949056Z stderr F         avg read rate: 2.031 MB/s,
avg write rate: 2.017 MB/s
2020-06-13T04:23:32.230960572Z stderr F         system usage: CPU: user:
66.09 s, system: 82.95 s, elapsed: 16011.25 s
2020-06-13T12:00:37.434270463Z stderr F 2020-06-13 12:00:37 UTC [595111]:
[4-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-13T12:00:37.434307457Z stderr F         pages: 0 removed, 4131164
remain, 0 skipped due to pins, 207424 skipped frozen
2020-06-13T12:00:37.434317487Z stderr F         tuples: 8179570 removed,
4165421 remain, 4086602 are dead but not yet removable, oldest xmin:
1948480803
2020-06-13T12:00:37.43432695Z stderr F  buffer usage: 4243615 hits,
6654785 misses, 6619302 dirtied
2020-06-13T12:00:37.434334494Z stderr F         avg read rate: 1.905 MB/s,
avg write rate: 1.894 MB/s
2020-06-13T12:00:37.43434175Z stderr F  system usage: CPU: user: 99.42 s,
system: 130.48 s, elapsed: 27296.71 s
2020-06-14T02:40:25.191125478Z stderr F 2020-06-14 02:40:25 UTC [654970]:
[4-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-14T02:40:25.191163972Z stderr F         pages: 0 removed, 7782633
remain, 0 skipped due to pins, 5975 skipped frozen
2020-06-14T02:40:25.191204642Z stderr F         tuples: 17589483 removed,
8269831 remain, 8231465 are dead but not yet removable, oldest xmin:
1956222447
2020-06-14T02:40:25.191214917Z stderr F         buffer usage: 8456598
hits, 13571633 misses, 13500595 dirtied
2020-06-14T02:40:25.19122207Z stderr F  avg read rate: 2.014 MB/s, avg
write rate: 2.003 MB/s
2020-06-14T02:40:25.19122979Z stderr F  system usage: CPU: user: 202.96 s,
system: 263.66 s, elapsed: 52653.66 s
2020-06-15T01:55:09.037552909Z stderr F 2020-06-15 01:55:09 UTC [767824]:
[4-1] user=,db=,client= LOG:  automatic vacuum of table
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-15T01:55:09.037624876Z stderr F         pages: 0 removed, 13496086
remain, 0 skipped due to pins, 1212853 skipped frozen
2020-06-15T01:55:09.037635971Z stderr F         tuples: 28059058 removed,
12115915 remain, 12074335 are dead but not yet removable, oldest xmin:
1973686598
2020-06-15T01:55:09.037647754Z stderr F         buffer usage: 13308114
hits, 21188624 misses, 20841540 dirtied
2020-06-15T01:55:09.03765574Z stderr F  avg read rate: 1.981 MB/s, avg
write rate: 1.949 MB/s
2020-06-15T01:55:09.03766272Z stderr F  system usage: CPU: user: 317.54 s,
system: 544.48 s, elapsed: 83550.21 s






On Tue, Jun 23, 2020 at 2:34 PM Jim Hurne <jhurne@us.ibm.com> wrote:
Sure! Below are more of the details from the same set of logs. Looking at
them myself, I see that there is always some percentage of tuples that are
dead but are not yet removable. And that number increases on every vacuum,
which might explain in part why autovacuum elapsed times keep increasing.

What causes a dead tuple to be unremovable?

Here are a couple good write ups.

Long running transactions are the common one that I see. You might be dealing with replication slots or prepared transactions. Basically, if some process might see that "old truth", then it can't be vacuumed away yet.
"Daniel Verite" <daniel@manitou-mail.org> wrote on 06/22/2020 06:00:37 PM:
> If you can arrange a maintenance window, a faster way to rebuild
> pg_largeobject when it contains mostly empty pages can be to:

Thanks Daniel. If we cannot arrange for a maintenance window, is this the
sort of thing that pg_repack[1] could be used to do?

Would pg_repack require setting allow_system_table_mods to on?

[1] https://github.com/reorg/pg_repack

Regards,

Jim Hurne




    Jim Hurne wrote:

> "Daniel Verite" <daniel@manitou-mail.org> wrote on 06/22/2020 06:00:37 PM:
> > If you can arrange a maintenance window, a faster way to rebuild
> > pg_largeobject when it contains mostly empty pages can be to:
>
> Thanks Daniel. If we cannot arrange for a maintenance window, is this the
> sort of thing that pg_repack[1] could be used to do?

Unfortunately it can't work with pg_largeobject:

https://github.com/reorg/pg_repack/issues/163


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



"Daniel Verite" <daniel@manitou-mail.org> wrote on 06/24/2020 10:08:27 AM:
> Unfortunately it [pg_repack] can't work with pg_largeobject

That is unfortunate, and potentially leaves us in a difficult spot.

Is it possible to configure PosgreSQL to use a user table for large
objects instead of a system table?

We're finding it to be especially difficult to maintain the pg_largeobject
system table (tweak autovacuum settings, manually cleanup with something
like pg_repack, etc), particularly since our PosrgreSQL instances are
hosted and controlled by another team.

Regards,

Jim Hurne






Le mer. 24 juin 2020 à 16:24, Jim Hurne <jhurne@us.ibm.com> a écrit :
"Daniel Verite" <daniel@manitou-mail.org> wrote on 06/24/2020 10:08:27 AM:
> Unfortunately it [pg_repack] can't work with pg_largeobject

That is unfortunate, and potentially leaves us in a difficult spot.

Is it possible to configure PosgreSQL to use a user table for large
objects instead of a system table?

We're finding it to be especially difficult to maintain the pg_largeobject
system table (tweak autovacuum settings, manually cleanup with something
like pg_repack, etc), particularly since our PosrgreSQL instances are
hosted and controlled by another team.

You can consider moving your largeobjects to BYTEA storage type.
Data will be stored into user's tables (and can be split into multiple tables, in order to to ease functionnal split and maintenance tasks -- if needed.
Data acces is slightly different.

In the following link, you have a comparison grid:

And you also have this article https://blog-postgresql.verite.pro/2017/11/15/large-objects-bytea.html from Daniel Vérité, in French

Regards,

Jim Hurne




Regards,
Thomas 
Michael Lewis <mlewis@entrata.com> wrote on 06/23/2020 04:44:51 PM:
> Long running transactions are the common one that I see. You might
> be dealing with replication slots or prepared transactions.
> Basically, if some process might see that "old truth", then it can't
> be vacuumed away yet.

Thanks, those links you provided were informative.

Our application doesn't use prepared transactions, so that one is easy to
eliminate.

Our database does use replication, but there's only one replication slot
and the xmin and catalog_xmin columns are blank. I presume the only
replication slot that exists is the one that the other replica is in fact
using. I *think* this means replication isn't the issue, but it's hard to
say since the xmin column is blank (and all the reading I've done doesn't
even mention that the xmin on replication slots can be null).

That brings us to transactions. I looked for any long-running idle
transaction (transactions that have been idle for more than 15 or 5
minutes), but found none.

I tried:

SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;

But the backend_xmin for all of the rows returned is exactly the same, and
that xmin is greater than the oldest xmin reported in the autovacuum logs.
It does seem odd that the backend_xmin value isn't changing though. Is
that normal?

So, for replication slots, I'm seeing a null xmin value, which the
articles do not comment on how that should be interpreted. And for
transactions, all of the transaction xmins are the same, which also seems
odd and not what the articles suggested. I know the transactions
themselves are coming and going because I can see the pids changing, but
the xmins are always the same. Strange.

Having to check the logs to see what the oldest xmin is painful, and
requires that a vacuum or autovacuum has been executed. Is there another
way to check what the oldest xmin is on the dead tuples of a table?

Regards,

Jim Hurne




"Jim Hurne" <jhurne@us.ibm.com> wrote on 06/24/2020 03:44:26 PM:
> That brings us to transactions. I looked for any long-running idle
> transaction (transactions that have been idle for more than 15 or 5
> minutes), but found none.

It turns out I didn't see any long running transactions because I was
using a user who didn't have permissions to view most of the data in the
pg_stat_activity view.  When we looked using a more powerful user, we
found four transactions that had been idle for a long time. Killing those
transactions allowed vacuums to clean up the dead tuples.

This is most likely the root of all of our problems, so we can essentially
consider this mystery solved for now.

Thanks to all who gave us advice and information. It's been highly
educational. I'm just sorry it turned out to be a stupid "long running
transaction" problem. We'll know to look for that first in the future.