Re: vacuum question - Mailing list pgsql-general

From Stephen Frost
Subject Re: vacuum question
Date
Msg-id 20180930115320.GT4184@tamriel.snowman.net
Whole thread Raw
In response to vacuum question  (Torsten Förtsch <tfoertsch123@gmail.com>)
List pgsql-general
Greetings,

* Torsten Förtsch (tfoertsch123@gmail.com) wrote:
> I have a table with a really small number of rows, usually about 1500,
> sometimes may be up to 5000. The usage pattern of that table is such that
> rows are inserted and kept for a while, mostly seconds or minutes but
> theoretically up to 1 year. After that they are deleted. No updates, just
> insert, delete. The table is pretty actively written, sometimes >100
> transactions per second.

The number of live tuples is never over 5000..?  Or 5000 per
transaction, meaning the upper bound is more like 500 000?

> Although the table is frequently auto-vacuumed, its size is growing over
> time. The size of the table corresponds to the remaining number of pages in
> the autovacuum log.
>
>  pages: 0 removed, 30069 remain, 0 skipped due to pins, 29985 skipped
> frozen
>  tuples: 136 removed, 1643 remain, 15 are dead but not yet removable
>  buffer usage: 44327 hits, 0 misses, 21 dirtied
>  avg read rate: 0.000 MB/s, avg write rate: 0.036 MB/s
>  system usage: CPU 0.06s/0.05u sec elapsed 4.59 sec
>
> What does "29985 skipped frozen" mean? I presume these skipped buffers is
> the table bloat.

No, the pages skipped because they're frozen just means that there's
nothing for vacuum to do on those pages.

To look at bloat, use pgstattuple.

> My way to get rid of the bloat is to cluster the table. That takes less
> than half a second and the next autovacuum log then looks like this:
>
>  pages: 0 removed, 3493 remain, 0 skipped due to pins, 0 skipped frozen
>  tuples: 0 removed, 58873 remain, 57470 are dead but not yet removable
>  buffer usage: 9425 hits, 0 misses, 0 dirtied
>  avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
>  system usage: CPU 0.02s/0.03u sec elapsed 1.01 sec
>
> For a number of autovacuum cycles "skipped frozen" remains 0 until
> eventually:
>
>  pages: 0 removed, 4822 remain, 0 skipped due to pins, 4433 skipped frozen
>  tuples: 227 removed, 2609 remain, 13 are dead but not yet removable
>  buffer usage: 5269 hits, 0 misses, 315 dirtied
>  avg read rate: 0.000 MB/s, avg write rate: 2.065 MB/s
>  system usage: CPU 0.03s/0.01u sec elapsed 1.19 sec
>
> Is there anything I can do to prevent that bloat from accumulating in the
> first place?

What problem is the bloat causing?  If you can keep all of your
transactions short and autovacuum the table fast enough then the table
size should stablize.

> Another thing that seems odd is the number of dead but not removable tuples
> in the record in the middle. Sifting through the log, I can see numbers up
> to 80000. One of our replicas is configured with hot_standby_feedback. Can
> that be the culprit?

Yes, if you have hot_standby_feedback on then a long running transaction
on the replica would have the same effect as on the primary- PG is
unable to mark rows as reusable until the oldest transaction in the
system is after the commit where the rows were deleted.

If you need to be able to perform long running transactions on the
primary or replica w/ hot standby feedback enabled then there's a few
other approaches which can be used but they're more complicated and
involve things which aren't MVCC-safe (which is why they work to avoid
bloat, but does mean that those long-running transactions won't see rows
that maybe they should have if they look at this table).  Consider
having two tables where you flip between them periodically and empty the
prior one and then TRUNCATE it, then flip again, etc.  The TRUNCATE will
clear out all of the bloat, but will remove rows that a long running
transaction maybe should have been able to see (and which a routine
VACUUM would have kept, just in case).

Thanks!

Stephen

Attachment

pgsql-general by date:

Previous
From: Torsten Förtsch
Date:
Subject: vacuum question
Next
From: Arup Rakshit
Date:
Subject: How to improve sql query to achieve the better plan