Thread: When do vacuumed pages/tuples become available for reuse?
If an autovacuum job on a huge table runs for 5-6 hours, do its freed pages/tuples become available for reuse immediately when they are marked as free, or only at the end of the multi-hour vacuum operation?
rihad <rihad@mail.ru> writes: > If an autovacuum job on a huge table runs for 5-6 hours, do its freed > pages/tuples become available for reuse immediately when they are marked > as free, or only at the end of the multi-hour vacuum operation? They'll be freed in batches, where the size of a batch depends on the autovacuum_work_mem or maintenance_work_mem setting. The basic work cycle is * scan table to find dead tuples, save their TIDs in working memory; continue until end of table or working memory full * scan indexes to find index entries matching those TIDs, remove 'em * go back to table and remove the previously-found tuples * if not end of table, repeat So a larger work-mem setting means fewer passes over the indexes, but a longer time until space is reclaimed. regards, tom lane
On 04/11/2019 05:48 PM, Tom Lane wrote: > rihad <rihad@mail.ru> writes: >> If an autovacuum job on a huge table runs for 5-6 hours, do its freed >> pages/tuples become available for reuse immediately when they are marked >> as free, or only at the end of the multi-hour vacuum operation? > They'll be freed in batches, where the size of a batch depends on the > autovacuum_work_mem or maintenance_work_mem setting. The basic > work cycle is > > * scan table to find dead tuples, save their TIDs in working memory; > continue until end of table or working memory full > * scan indexes to find index entries matching those TIDs, remove 'em > * go back to table and remove the previously-found tuples > * if not end of table, repeat > > So a larger work-mem setting means fewer passes over the indexes, > but a longer time until space is reclaimed. > > regards, tom lane > . > Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space would be available for reuse only at the end of the vacuum? Are there any downsides in decreasing it to, say, 64MB? I see only pluses )
On 2019-Apr-11, rihad wrote: > Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space > would be available for reuse only at the end of the vacuum? Are there any > downsides in decreasing it to, say, 64MB? I see only pluses ) Yes, each vacuum will take longer and will use much more I/O. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/11/2019 06:09 PM, Alvaro Herrera wrote: > On 2019-Apr-11, rihad wrote: > >> Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space >> would be available for reuse only at the end of the vacuum? Are there any >> downsides in decreasing it to, say, 64MB? I see only pluses ) > Yes, each vacuum will take longer and will use much more I/O. > I see, thanks.
rihad <rihad@mail.ru> writes: > Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space > would be available for reuse only at the end of the vacuum? It's six bytes per dead tuple, last I checked ... you do the math. > Are there > any downsides in decreasing it to, say, 64MB? I see only pluses ) Well, usually people prefer to minimize the number of passes over the indexes. regards, tom lane
On 04/11/2019 06:20 PM, Tom Lane wrote: > rihad <rihad@mail.ru> writes: >> Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space >> would be available for reuse only at the end of the vacuum? > It's six bytes per dead tuple, last I checked ... you do the math. > >> Are there >> any downsides in decreasing it to, say, 64MB? I see only pluses ) > Well, usually people prefer to minimize the number of passes over > the indexes. > > regards, tom lane > . > Yup, it's just that n_dead_tuples grows by several hundred thousand (the table sees much much more updates than inserts) and disk usage grows constantly between several hour long vacuum runs. Running vacuum full isn't an option.
On 2019-Apr-11, rihad wrote: > On 04/11/2019 06:20 PM, Tom Lane wrote: > > rihad <rihad@mail.ru> writes: > > > Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space > > > would be available for reuse only at the end of the vacuum? > > It's six bytes per dead tuple, last I checked ... you do the math. > > > > > Are there > > > any downsides in decreasing it to, say, 64MB? I see only pluses ) > > Well, usually people prefer to minimize the number of passes over > > the indexes. > Yup, it's just that n_dead_tuples grows by several hundred thousand (the > table sees much much more updates than inserts) and disk usage grows > constantly between several hour long vacuum runs. Running vacuum full isn't > an option. Perhaps it'd be better to vacuum this table much more often. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/11/2019 06:41 PM, Alvaro Herrera wrote: > On 2019-Apr-11, rihad wrote: > >> On 04/11/2019 06:20 PM, Tom Lane wrote: >>> rihad <rihad@mail.ru> writes: >>>> Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space >>>> would be available for reuse only at the end of the vacuum? >>> It's six bytes per dead tuple, last I checked ... you do the math. >>> >>>> Are there >>>> any downsides in decreasing it to, say, 64MB? I see only pluses ) >>> Well, usually people prefer to minimize the number of passes over >>> the indexes. >> Yup, it's just that n_dead_tuples grows by several hundred thousand (the >> table sees much much more updates than inserts) and disk usage grows >> constantly between several hour long vacuum runs. Running vacuum full isn't >> an option. > Perhaps it'd be better to vacuum this table much more often. > Each run took 5-6 hours, now it takes 2-3 hours after I've tweaked some cost-based vacuum knobs.
On 2019-Apr-11, rihad wrote: > On 04/11/2019 06:41 PM, Alvaro Herrera wrote: > > > Perhaps it'd be better to vacuum this table much more often. > > > Each run took 5-6 hours, now it takes 2-3 hours after I've tweaked some > cost-based vacuum knobs. But how often does it run? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/11/2019 07:04 PM, Alvaro Herrera wrote: > On 2019-Apr-11, rihad wrote: > >> On 04/11/2019 06:41 PM, Alvaro Herrera wrote: >> >>> Perhaps it'd be better to vacuum this table much more often. >>> >> Each run took 5-6 hours, now it takes 2-3 hours after I've tweaked some >> cost-based vacuum knobs. > But how often does it run? > One after another. Enough n_dead_tuples accumulate in between runs to easily trigger that. autovacuum_vacuum_scale_factor = 0.01 utovacuum_vacuum_threshold = 50 which means to run autovac when 1% of table size + 50 rows have been updated or deleted. But we can't make each autovacuum run run faster ) Currently I lowered this from 20ms: autovacuum_vacuum_cost_delay = 10ms And increased this from 200: autovacuum_vacuum_cost_limit = 400 to make it finish in 1 hour rather than 3 hours. Anything more than that and we risk impacting the performance of user queries.
Yup, it's just that n_dead_tuples grows by several hundred thousand (the
table sees much much more updates than inserts) and disk usage grows
constantly between several hour long vacuum runs. Running vacuum full
isn't an option.
On Thu, Apr 11, 2019 at 10:28 AM rihad <rihad@mail.ru> wrote:
Yup, it's just that n_dead_tuples grows by several hundred thousand (the
table sees much much more updates than inserts) and disk usage grows
constantly between several hour long vacuum runs. Running vacuum full
isn't an option.The disk usage doesn't reach a steady state after one or two autovacs? Or it does, but you are just unhappy about the ratio between the steady state size and the theoretical fully packed size?Cheers,Jeff
Since we dump&restore production DB daily into staging environment, the difference in size (as reported by psql's \l+) is 11GB in a freshly restored DB as opposed to 70GB in production.
> autovacuum_vacuum_threshold = 50
autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 400
Anything more than that and we risk impacting the performance of user
queries.
On 04/11/2019 07:40 PM, Jeff Janes wrote:The disk usage doesn't reach a steady state after one or two autovacs? Or it does, but you are just unhappy about the ratio between the steady state size and the theoretical fully packed size?Cheers,Jeff
Since we dump&restore production DB daily into staging environment, the difference in size (as reported by psql's \l+) is 11GB in a freshly restored DB as opposed to 70GB in production.
vacuum frees tuples just fine. It's just that by the time each run finishes many more accumulate due to table update activity, ad nauseum. So this unused space constantly grows. Here's a sample autovacuum run:On Thu, Apr 11, 2019 at 11:44 AM rihad <rihad@mail.ru> wrote:On 04/11/2019 07:40 PM, Jeff Janes wrote:The disk usage doesn't reach a steady state after one or two autovacs? Or it does, but you are just unhappy about the ratio between the steady state size and the theoretical fully packed size?Cheers,Jeff
Since we dump&restore production DB daily into staging environment, the difference in size (as reported by psql's \l+) is 11GB in a freshly restored DB as opposed to 70GB in production.
Yeah, that seems like a problem. Do you have long lived transactions/snapshots that are preventing vacuuming from removing dead tuples? You can run a manual "vacuum verbose" and see how many dead but nonremovable tuples there were, or set log_autovacuum_min_duration to some non-negative value less than the autovac takes, and do the same.
2019-04-11 19:39:44.450841500 [] LOG: automatic vacuum of table "foo.public.bar": index scans: 1
2019-04-11 19:39:44.450843500 pages: 0 removed, 472095 remain, 4 skipped due to pins, 39075 skipped frozen
2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable
2019-04-11 19:39:44.450845500 buffer usage: 62407557 hits, 6984769 misses, 116409 dirtied
2019-04-11 19:39:44.450846500 avg read rate: 16.263 MB/s, avg write rate: 0.271 MB/s
2019-04-11 19:39:44.450847500 system usage: CPU 59.05s/115.26u sec elapsed 3355.28 sec
(Indeed, those dumps you take daily might be the source of those long-lived snapshots. How long does a dump take?)
The daily dumps are taken daily from the slave server as part of stock FreeBSD postgres port activity.
I don't think it impacts the master server.
Also, what does pg_freespace (https://www.postgresql.org/docs/current/pgfreespacemap.html) show about the available of space in the table? How about pgstattuple (https://www.postgresql.org/docs/current/pgstattuple.html)
Thanks, I'll try those. But as I said freshly restored DB is only 11GB in size, not 70 (only public schema is used).
Cheers,Jeff
vacuum frees tuples just fine. It's just that by the time each run finishes many more accumulate due to table update activity, ad nauseum. So this unused space constantly grows. Here's a sample autovacuum run:
2019-04-11 19:39:44.450841500 [] LOG: automatic vacuum of table "foo.public.bar": index scans: 1
2019-04-11 19:39:44.450843500 pages: 0 removed, 472095 remain, 4 skipped due to pins, 39075 skipped frozen
2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable
2019-04-11 19:39:44.450845500 buffer usage: 62407557 hits, 6984769 misses, 116409 dirtied
2019-04-11 19:39:44.450846500 avg read rate: 16.263 MB/s, avg write rate: 0.271 MB/s
2019-04-11 19:39:44.450847500 system usage: CPU 59.05s/115.26u sec elapsed 3355.28 sec
On 2019-Apr-11, rihad wrote: > 2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable What Jeff said. This vacuum spent a lot of time, only to remove miserly 19k tuples, but 2.7M dead tuples remained ... probably because you have long-running transactions preventing vacuum from removing them. Maybe that was pg_dump. I suspect what you really need is keep an eye on the age of your oldest transactions. Don't leave anything "idle in transaction". -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable
What Jeff said. This vacuum spent a lot of time, only to remove miserly
19k tuples, but 2.7M dead tuples remained ... probably because you have
long-running transactions preventing vacuum from removing them. Maybe
that was pg_dump.
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2019-Apr-11, rihad wrote: >> 2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable > What Jeff said. This vacuum spent a lot of time, only to remove miserly > 19k tuples, but 2.7M dead tuples remained ... probably because you have > long-running transactions preventing vacuum from removing them. I think you misread it --- I'm pretty sure "N remain" is referring to live tuples. Maybe we should adjust the wording to make that clearer? regards, tom lane
On 4/11/19 12:24 PM, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> On 2019-Apr-11, rihad wrote: >>> 2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable >> What Jeff said. This vacuum spent a lot of time, only to remove miserly >> 19k tuples, but 2.7M dead tuples remained ... probably because you have >> long-running transactions preventing vacuum from removing them. > I think you misread it --- I'm pretty sure "N remain" is referring > to live tuples. Because "465 are dead but not yet removable" are the dead tuples that remain instead of 2.7M? > Maybe we should adjust the wording to make that > clearer? > > regards, tom lane -- Angular momentum makes the world go 'round.
On 2019-Apr-11, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > On 2019-Apr-11, rihad wrote: > >> 2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable > > > What Jeff said. This vacuum spent a lot of time, only to remove miserly > > 19k tuples, but 2.7M dead tuples remained ... probably because you have > > long-running transactions preventing vacuum from removing them. > > I think you misread it --- I'm pretty sure "N remain" is referring > to live tuples. Maybe we should adjust the wording to make that > clearer? Oh, I've been confused with that many times, Not good trying to decode confusing messages while simultaneously figuring out trying to figure out logical decoding bugs that have already been fixed :-( Yeah, let's reword that. I've had to read the source half a dozen times because I always forget what each number means. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Apr-11, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > On 2019-Apr-11, rihad wrote:
> >> 2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable
>
> > What Jeff said. This vacuum spent a lot of time, only to remove miserly
> > 19k tuples, but 2.7M dead tuples remained ... probably because you have
> > long-running transactions preventing vacuum from removing them.
>
> I think you misread it --- I'm pretty sure "N remain" is referring
> to live tuples. Maybe we should adjust the wording to make that
> clearer?
Oh, I've been confused with that many times, Not good trying to decode
confusing messages while simultaneously figuring out trying to figure
out logical decoding bugs that have already been fixed :-(
Yeah, let's reword that. I've had to read the source half a dozen times
because I always forget what each number means.
On 04/11/2019 08:09 PM, Jeff Janes wrote:vacuum frees tuples just fine. It's just that by the time each run finishes many more accumulate due to table update activity, ad nauseum. So this unused space constantly grows. Here's a sample autovacuum run:On Thu, Apr 11, 2019 at 11:44 AM rihad <rihad@mail.ru> wrote:Since we dump&restore production DB daily into staging environment, the difference in size (as reported by psql's \l+) is 11GB in a freshly restored DB as opposed to 70GB in production.
Yeah, that seems like a problem. Do you have long lived transactions/snapshots that are preventing vacuuming from removing dead tuples? You can run a manual "vacuum verbose" and see how many dead but nonremovable tuples there were, or set log_autovacuum_min_duration to some non-negative value less than the autovac takes, and do the same.
2019-04-11 19:39:44.450841500 [] LOG: automatic vacuum of table "foo.public.bar": index scans: 1
2019-04-11 19:39:44.450843500 pages: 0 removed, 472095 remain, 4 skipped due to pins, 39075 skipped frozen
2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable
2019-04-11 19:39:44.450845500 buffer usage: 62407557 hits, 6984769 misses, 116409 dirtied
2019-04-11 19:39:44.450846500 avg read rate: 16.263 MB/s, avg write rate: 0.271 MB/s
2019-04-11 19:39:44.450847500 system usage: CPU 59.05s/115.26u sec elapsed 3355.28 sec
Thanks, I'll try those. But as I said freshly restored DB is only 11GB in size, not 70 (only public schema is used).Also, what does pg_freespace (https://www.postgresql.org/docs/current/pgfreespacemap.html) show about the available of space in the table? How about pgstattuple (https://www.postgresql.org/docs/current/pgstattuple.html)
If we just want to do a slight re-wording, I don't know what it would need to look like. "remain" includes live, recently dead, and uncommitted new, and uncommitted old (I think) so we can't just change "recent" to "live".
On Thu, Apr 11, 2019 at 12:18 PM rihad <rihad@mail.ru> wrote:On 04/11/2019 08:09 PM, Jeff Janes wrote:vacuum frees tuples just fine. It's just that by the time each run finishes many more accumulate due to table update activity, ad nauseum. So this unused space constantly grows. Here's a sample autovacuum run:On Thu, Apr 11, 2019 at 11:44 AM rihad <rihad@mail.ru> wrote:Since we dump&restore production DB daily into staging environment, the difference in size (as reported by psql's \l+) is 11GB in a freshly restored DB as opposed to 70GB in production.
Yeah, that seems like a problem. Do you have long lived transactions/snapshots that are preventing vacuuming from removing dead tuples? You can run a manual "vacuum verbose" and see how many dead but nonremovable tuples there were, or set log_autovacuum_min_duration to some non-negative value less than the autovac takes, and do the same.
2019-04-11 19:39:44.450841500 [] LOG: automatic vacuum of table "foo.public.bar": index scans: 1
2019-04-11 19:39:44.450843500 pages: 0 removed, 472095 remain, 4 skipped due to pins, 39075 skipped frozen
2019-04-11 19:39:44.450844500 tuples: 19150 removed, 2725811 remain, 465 are dead but not yet removable
2019-04-11 19:39:44.450845500 buffer usage: 62407557 hits, 6984769 misses, 116409 dirtied
2019-04-11 19:39:44.450846500 avg read rate: 16.263 MB/s, avg write rate: 0.271 MB/s
2019-04-11 19:39:44.450847500 system usage: CPU 59.05s/115.26u sec elapsed 3355.28 secThis data doesn't seem to support either one of our theories. "Dead but not yet removable" is low. But "removed" also seems pretty low. Is 19,150 really the number of updates you think occur over the course of an hour which causes the problem you are seeing? Updates that happened during one vacuum should be cleanly caught by the next one, so you should only see a steady state of bloat, not unbounded increase.But your buffer usage being 132 time the number of pages in the table suggests it is your indexes, not your table, which are bloated.How many indexes do you have, and of what type? Index pages can only get reused when they become completely empty, or when a new indexed value fits into (or near) the key-space that that page already covers. So if the key space for new tuples is constantly migrating around and your pages never become absolutely empty, you can get unbounded bloat in the indexes.
Way to many indexes. I'm going to have a hard time convincing our programmers to get rid of any of them )
Can you compare the sizes object by object between the live and the stage, taking care not to include index (or toast) size into the size of their parent table?
You're right, it's mostly indexes that are bloated.
Staging:
# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
2924 MB
(1 row)
# select pg_size_pretty(pg_indexes_size('foo'));
pg_size_pretty
----------------
1958 MB
(1 row)
Prod:
# select pg_size_pretty(pg_relation_size('foo'));
pg_size_pretty
----------------
3688 MB
(1 row)
# select pg_size_pretty(pg_indexes_size('foo'));
pg_size_pretty
----------------
60 GB
(1 row)
Thanks, I'll try those. But as I said freshly restored DB is only 11GB in size, not 70 (only public schema is used).Also, what does pg_freespace (https://www.postgresql.org/docs/current/pgfreespacemap.html) show about the available of space in the table? How about pgstattuple (https://www.postgresql.org/docs/current/pgstattuple.html)Yeah, but we need to know **why** that extra 59GB is not being reused, not simply the fact that it isn't being reused. If it isn't listed as free in the freespace map, then PostgreSQL might not know how to find it in order to reuse it, for example. But now that I think it is the indexes, not the table, that is bloated I would chase that part down first. No point checking the freespace of the table proper if the problem is with the indexes.Cheers,Jeff
On Thu, Apr 11, 2019 at 11:13:17AM -0600, Michael Lewis wrote: > Wouldn't "dead but not yet removable" be high if there were long running > transactions holding onto old row versions? You got it right. You need to look at the number behind the tuples dead, but not removable which is usually a pattern caused by long running transactions holding snapshot references which prevents autovacuum and/or vacuum to do their homework. -- Michael
Attachment
Way to many indexes. I'm going to have a hard time convincing our programmers to get rid of any of them )
On Thu, Apr 11, 2019 at 10:39:12PM -0600, Michael Lewis wrote: > You can create (concurrently) an identical index with a new name, then drop > old version concurrently and repeat for each. It doesn't help you figure > out the root cause and how to prevent it from happening again, but gets you > to a fresh start at least. Please note that Postgres 12 has added support for REINDEX concurrently. A huge advantage of that is the possibility to work on indexes with constraints. -- Michael
Attachment
Way to many indexes. I'm going to have a hard time convincing our programmers to get rid of any of them )You can create (concurrently) an identical index with a new name, then drop old version concurrently and repeat for each. It doesn't help you figure out the root cause and how to prevent it from happening again, but gets you to a fresh start at least.
Thanks for the tip. I believe other than increasing load temporarily creating a new index, dropping the old one and renaming the new one to the old one are transparent and (almost) lock-less operations? What would happen to user queries during DROP INDEX? Would they block on it, or, being unable to read it without blocking, will they pick the new one based on the same column(s)? And, likewise, is ALTER INDEX ... RENAME ... an instant operation, I hope?