Thread: Berserk Autovacuum (let's save next Mandrill)
Hi hackers,
-- Attached is sketch of small patch that fixes several edge cases with autovacuum. Long story short autovacuum never comes to append only tables, killing large productions.
First case, mine.
https://www.postgresql.org/message-id/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com
We had a table we were appending and wanted Index Only Scan to work. For it to work, you need to call VACUUM manually, since VACUUM is the only way to mark pages all visible, and autovacuum never comes to append only tables. We were clever to invent a workflow without dead tuples and it painfully bit us.
First case, mine.
https://www.postgresql.org/message-id/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com
We had a table we were appending and wanted Index Only Scan to work. For it to work, you need to call VACUUM manually, since VACUUM is the only way to mark pages all visible, and autovacuum never comes to append only tables. We were clever to invent a workflow without dead tuples and it painfully bit us.
Second case, just read in the news.
https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/
Mandrill has 6TB append only table that autovacuum probably never vacuumed. Then anti-wraparound came and production went down. If autovacuum did its job before that last moment, it would probably be okay.
Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it helps you get the idea.
https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/
Mandrill has 6TB append only table that autovacuum probably never vacuumed. Then anti-wraparound came and production went down. If autovacuum did its job before that last moment, it would probably be okay.
Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it helps you get the idea.
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Support me: http://patreon.com/komzpa
Attachment
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote: > Attached is sketch of small patch that fixes several edge cases with > autovacuum. Long story short autovacuum never comes to append only tables, > killing large productions. Yeah, autovac is not coping with these scenarios (and probably others). However, rather than taking your patch's idea verbatim, I think we should have autovacuum use separate actions for those two (wildly different) scenarios. For example: * certain tables would have some sort of partial scan that sets the visibility map. There's no reason to invoke the whole vacuuming machinery. I don't think this is limited to append-only tables, but rather those are just the ones that are affected the most. * tables nearing wraparound danger should use the (yet to be committed) option to skip index cleaning, which makes the cleanup action faster. Again, no need for complete vacuuming. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
-- чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
> Attached is sketch of small patch that fixes several edge cases with
> autovacuum. Long story short autovacuum never comes to append only tables,
> killing large productions.
Yeah, autovac is not coping with these scenarios (and probably others).
However, rather than taking your patch's idea verbatim, I think we
should have autovacuum use separate actions for those two (wildly
different) scenarios. For example:
* certain tables would have some sort of partial scan that sets the
visibility map. There's no reason to invoke the whole vacuuming
machinery. I don't think this is limited to append-only tables, but
rather those are just the ones that are affected the most.
What other machinery runs on VACUUM invocation that is not wanted there?
Since Postgres 11 index cleanup is already skipped on append-only tables.
Since Postgres 11 index cleanup is already skipped on append-only tables.
* tables nearing wraparound danger should use the (yet to be committed)
option to skip index cleaning, which makes the cleanup action faster.
Again, no need for complete vacuuming.
"Nearing wraparound" is too late already. In Amazon, reading table from gp2 after you exhausted your IOPS burst budget is like reading a floppy drive, you have to freeze a lot earlier than you hit several terabytes of unfrozen data, or you're dead like Mandrill's Search and Url tables from the link I shared.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Support me: http://patreon.com/komzpa
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote: > чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <alvherre@2ndquadrant.com>: > > > On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote: > > * certain tables would have some sort of partial scan that sets the > > visibility map. There's no reason to invoke the whole vacuuming > > machinery. I don't think this is limited to append-only tables, but > > rather those are just the ones that are affected the most. > > What other machinery runs on VACUUM invocation that is not wanted there? > Since Postgres 11 index cleanup is already skipped on append-only tables. Well, I think it would be useful to set all-visible earlier than waiting for a vacuum to be necessary, even for tables that are not append-only. So if you think about this just for the append-only table, you leave money on the table. > > * tables nearing wraparound danger should use the (yet to be committed) > > option to skip index cleaning, which makes the cleanup action faster. > > Again, no need for complete vacuuming. > > "Nearing wraparound" is too late already. In Amazon, reading table from gp2 > after you exhausted your IOPS burst budget is like reading a floppy drive, > you have to freeze a lot earlier than you hit several terabytes of unfrozen > data, or you're dead like Mandrill's Search and Url tables from the link I > shared. OK, then start freezing tuples in the cheap mode (skip index updates) earlier than that. I suppose a good question is when to start. I wonder if Mandrill's problem is related to Mailchimp raising the freeze_max_age to a point where autovac did not have enough time to react with an emergency vacuum. If you keep raising that value because the vacuums cause problems for you (they block DDL), there's something wrong. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote: > > "Nearing wraparound" is too late already. In Amazon, reading table from gp2 > > after you exhausted your IOPS burst budget is like reading a floppy drive, > > you have to freeze a lot earlier than you hit several terabytes of unfrozen > > data, or you're dead like Mandrill's Search and Url tables from the link I > > shared. > > OK, then start freezing tuples in the cheap mode (skip index updates) > earlier than that. I suppose a good question is when to start. I thought recently that it would be good to have some sort of pro-active auto-vacuum mode that made use of idle workers. Probably there would need to be some mode flag that mentioned which workers were in proactive mode so that these could be cancelled when more pressing work came in. I don't have an idea exactly of what "pro-active" would actually be defined as, but I know that when the single transaction ID is consumed that causes terra bytes of tables to suddenly need an anti-wraparound vacuum, then it's not a good situation to be in. Perhaps getting to some percentage of autovacuum_freeze_max_age could be classed as pro-active. > I wonder if Mandrill's problem is related to Mailchimp raising the > freeze_max_age to a point where autovac did not have enough time to > react with an emergency vacuum. If you keep raising that value because > the vacuums cause problems for you (they block DDL), there's something > wrong. I have seen some very high autovacuum_freeze_max_age settings recently. It would be interesting to know what they had theirs set to. I see they mentioned "Search and Url tables". I can imagine "search" never needs any UPDATEs, so quite possibly those were append-only, in which case the anti-wraparound vacuum would have had quite a lot of work on its hands since possibly every page needed frozen. A table receiving regular auto-vacuums from dead tuples would likely get some pages frozen during those. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
чт, 28 мар. 2019 г. в 01:01, Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
> чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> > On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:
> > * certain tables would have some sort of partial scan that sets the
> > visibility map. There's no reason to invoke the whole vacuuming
> > machinery. I don't think this is limited to append-only tables, but
> > rather those are just the ones that are affected the most.
>
> What other machinery runs on VACUUM invocation that is not wanted there?
> Since Postgres 11 index cleanup is already skipped on append-only tables.
Well, I think it would be useful to set all-visible earlier than waiting
for a vacuum to be necessary, even for tables that are not append-only.
So if you think about this just for the append-only table, you leave
money on the table.
Thing is, problem does not exist for non-append-only tables, they're going to be vacuumed after 50 rows got updated, automatically.
> > * tables nearing wraparound danger should use the (yet to be committed)
> > option to skip index cleaning, which makes the cleanup action faster.
> > Again, no need for complete vacuuming.
>
> "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> after you exhausted your IOPS burst budget is like reading a floppy drive,
> you have to freeze a lot earlier than you hit several terabytes of unfrozen
> data, or you're dead like Mandrill's Search and Url tables from the link I
> shared.
OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that. I suppose a good question is when to start.
Attached (autovacuum_berserk_v1.patch)
code achieves that. For append-only tables since https://commitfest.postgresql.org/16/952/ vacuum skips index cleanup if no updates happened. You just need to trigger it, and it already will be "cheap".
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Support me: http://patreon.com/komzpa
On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
> > "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> > after you exhausted your IOPS burst budget is like reading a floppy drive,
> > you have to freeze a lot earlier than you hit several terabytes of unfrozen
> > data, or you're dead like Mandrill's Search and Url tables from the link I
> > shared.
>
> OK, then start freezing tuples in the cheap mode (skip index updates)
> earlier than that. I suppose a good question is when to start.
I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.
Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.
Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote: > > On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com> wrote: >> I thought recently that it would be good to have some sort of >> pro-active auto-vacuum mode that made use of idle workers. > > Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be apart of normal database functioning. I'd say auto-vacuum is configured to run too slowly if you never have an idle worker. The chances that it happens to be running at exactly the right speed to keep up with demand must be about close to nil. > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletesand updates like we do now? Sounds like a good idea, although I do agree with Alvaro when he mentions that it would be good to only invoke a worker that was only going to freeze tuples and not look at the indexes. I've not looked at it, but there's a patch [1] in the current CF for that. I'd say a good course of action would be to review that then write a patch with a new bool flag in relation_needs_vacanalyze for "freezeonly" and have auto-vacuum invoke vacuum in this new freeze only mode if freezeonly is set and dovacuum is not. Any patch not in the current CF is already PG13 or beyond. Having at least a freeze only vacuum mode main ease some pain, even if it still needs to be done manually for anyone finding themselves in a similar situation as mailchimp. The idea I was mentioning was more targeted to ease the sudden rush of auto-vacuum activity when suddenly a bunch of large tables require an anti-wraparound vacuum all at once. [1] https://commitfest.postgresql.org/22/1817/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 28, 2019 at 6:32 PM David Rowley <david.rowley@2ndquadrant.com> wrote: > > On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski > <me@komzpa.net> wrote: > > > > On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com> wrote: > >> I thought recently that it would be good to have some sort of > >> pro-active auto-vacuum mode that made use of idle workers. > > > > Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to bea part of normal database functioning. > > I'd say auto-vacuum is configured to run too slowly if you never have > an idle worker. The chances that it happens to be running at exactly > the right speed to keep up with demand must be about close to nil. > > > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not onlydeletes and updates like we do now? > > Sounds like a good idea, although I do agree with Alvaro when he > mentions that it would be good to only invoke a worker that was only > going to freeze tuples and not look at the indexes. The invoking autovacuum on table based on inserts, not only deletes and updates, seems good idea to me. But in this case, I think that we can not only freeze tuples but also update visibility map even when setting all-visible. Roughly speaking I think vacuum does the following operations. 1. heap vacuum 2. HOT pruning 3. freezing tuples 4. updating visibility map (all-visible and all-frozen) 5. index vacuum/cleanup 6. truncation With the proposed patch[1] we can control to do 5 or not. In addition to that, another proposed patch[2] allows us to control 6. For append-only tables (and similar tables), what we periodically want to do would be 3 and 4 (possibly we can do 2 as well). So maybe we need to have both an option of (auto)vacuum to control whether to do 1 and something like a new autovacuum threshold (or an option) to invoke the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3 and 4 would be much cheaper than today's vacuum and anti-wraparound vacuum would be able to skip almost pages. [1] https://commitfest.postgresql.org/22/1817/ [2] https://commitfest.postgresql.org/22/1981/ Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Mar 28, 2019 at 12:32 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:
>
> On Thu, Mar 28, 2019 at 2:36 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I thought recently that it would be good to have some sort of
>> pro-active auto-vacuum mode that made use of idle workers.
>
> Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.
I'd say auto-vacuum is configured to run too slowly if you never have
an idle worker. The chances that it happens to be running at exactly
the right speed to keep up with demand must be about close to nil.
> Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes.
This is current behavior of VACUUM on tables without dead tuples, already. Issue is that nothing triggers this VACUUM apart from user performing VACUUM manually, or super late antiwraparound vacuum.
Any patch not in the current CF is already PG13 or beyond. Having at
least a freeze only vacuum mode main ease some pain, even if it still
needs to be done manually for anyone finding themselves in a similar
situation as mailchimp.
If you're in wraparound halt with super large table on Amazon gp2 nothing will help you - issue is, there's no option to "rewrite all of it quickly". Burst limit lets you feel the shared drive as if it was an SSD on most of your load, but reading and re-writing all the storage gets throttled, and there's no option to escape this quickly.
The process that freezes and marks all-visible pages has to run in parallel and at the speed of your backend pushing pages to disk, maybe lagging behind a bit - but not up to "we need to rescan all the table".
The idea I was mentioning was more targeted to ease the sudden rush of
auto-vacuum activity when suddenly a bunch of large tables require an
anti-wraparound vacuum all at once.
[1] https://commitfest.postgresql.org/22/1817/
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Hi,
> > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
>
> Sounds like a good idea, although I do agree with Alvaro when he
> mentions that it would be good to only invoke a worker that was only
> going to freeze tuples and not look at the indexes.
The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking I think vacuum does the
following operations.
1. heap vacuum
2. HOT pruning
Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something?
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
5. index vacuum/cleanup
There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?
6. truncation
This shouldn't be a heavy operation?
With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.
For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.
Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.
Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Thu, Mar 28, 2019 at 8:58 PM Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote: > > Hi, > > > > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not onlydeletes and updates like we do now? >> >> > >> > Sounds like a good idea, although I do agree with Alvaro when he >> > mentions that it would be good to only invoke a worker that was only >> > going to freeze tuples and not look at the indexes. >> >> The invoking autovacuum on table based on inserts, not only deletes >> and updates, seems good idea to me. But in this case, I think that we >> can not only freeze tuples but also update visibility map even when >> setting all-visible. Roughly speaking I think vacuum does the >> following operations. >> >> 1. heap vacuum >> >> 2. HOT pruning > > Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anywayon append-only tables and happen only when we actually need something? > Yeah, these operations are required only when the table has actual garbage. IOW, append-only tables never require them. >> >> 3. freezing tuples >> 4. updating visibility map (all-visible and all-frozen) > > These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-deadtuples. > >> >> 5. index vacuum/cleanup > > There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgresskips index cleanup on tables without new dead tuples, so this case is taken care of already? I think that's not enough. The feature "GUC for cleanup index threshold" allows us to skip only index cleanup when there are less insertion than the fraction of the total number of heap tuples since last index cleanup. Therefore it helps only append-only tables (and supporting only btree index for now). We still have to do index vacuuming even if the table has just a few dead tuple. The proposed patch[1] helps this situation; vacuum can run while skipping index vacuuming and index cleanup. > >> >> 6. truncation > > This shouldn't be a heavy operation? > I don't think so. This could take AccessExclusiveLock on the table and take a long time with large shared buffer as per reported on that thread[2]. >> >> >> With the proposed patch[1] we can control to do 5 or not. In addition >> to that, another proposed patch[2] allows us to control 6. >> >> For append-only tables (and similar tables), what we periodically want >> to do would be 3 and 4 (possibly we can do 2 as well). So maybe we >> need to have both an option of (auto)vacuum to control whether to do 1 >> and something like a new autovacuum threshold (or an option) to invoke >> the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3 >> and 4 would be much cheaper than today's vacuum and anti-wraparound >> vacuum would be able to skip almost pages. > > > Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop. > Because that's for tables that have many inserts but have some updates/deletes. I think that this strategy would help not only append-only tables but also such tables. > Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead? > I don't understand the meaning of "situation can be made non-emergency from the very beginning". Could you please elaborate on that? >> [1] https://commitfest.postgresql.org/22/1817/ >> [2] https://commitfest.postgresql.org/22/1981/ Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Thu, Mar 28, 2019 at 12:36:24PM +1300, David Rowley wrote: > On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > I wonder if Mandrill's problem is related to Mailchimp raising the > > freeze_max_age to a point where autovac did not have enough time to > > react with an emergency vacuum. If you keep raising that value because > > the vacuums cause problems for you (they block DDL), there's something > > wrong. > > I have seen some very high autovacuum_freeze_max_age settings > recently. It would be interesting to know what they had theirs set to. > I see they mentioned "Search and Url tables". I can imagine "search" > never needs any UPDATEs, so quite possibly those were append-only, in > which case the anti-wraparound vacuum would have had quite a lot of > work on its hands since possibly every page needed frozen. A table > receiving regular auto-vacuums from dead tuples would likely get some > pages frozen during those. By the way, the Routine Vacuuming chapter of the documentation says: "The sole disadvantage of increasing autovacuum_freeze_max_age (and vacuum_freeze_table_age along with it) is that the pg_xact and pg_commit_ts subdirectories of the database cluster will take more space [...] If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively] is trivial compared to your total database size, setting autovacuum_freeze_max_age to its maximum allowed value is recommended." Maybe this should be qualified with "unless you have trouble with your autovacuum keeping up" or so; or generally reworded? Michael
On Wed, Mar 27, 2019 at 5:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > * certain tables would have some sort of partial scan that sets the > visibility map. There's no reason to invoke the whole vacuuming > machinery. I don't think this is limited to append-only tables, but > rather those are just the ones that are affected the most. I think this is a really good idea, but in order for it to work well I think we would need to have some kind of estimate of autovacuum pressure. If we know that we're currently fairly on top of things, and there is not much for autovacuum to do, periodically vacuuming a chunk of some table that has a lot of unset visibility-map bits is probably a good idea. However, we can't possibly guess how aggressively to do this if we have no idea how long it's going to be before we need to vacuum that table for real. If the number of XIDs remaining until the table gets a wraparound vacuum is X, and the number of XIDs being consumed per day is Y, we can estimate that in roughly X/Y days, we're going to need to do a wraparound vacuum. That value might be in the range of months, or in the range of hours. If it's months, we probably want limit vacuum to working at a pretty slow rate, say 1% of the table size per hour or something. If it's in hours, we need to be a lot more aggressive. Right now we have no information to tell us which of those things is the case, so we'd just be shooting in the dark. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Mar 31, 2019 at 1:11 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 27, 2019 at 5:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > * certain tables would have some sort of partial scan that sets the > > visibility map. There's no reason to invoke the whole vacuuming > > machinery. I don't think this is limited to append-only tables, but > > rather those are just the ones that are affected the most. > > I think this is a really good idea, but in order for it to work well I > think we would need to have some kind of estimate of autovacuum > pressure. > > If we know that we're currently fairly on top of things, and there is > not much for autovacuum to do, periodically vacuuming a chunk of some > table that has a lot of unset visibility-map bits is probably a good > idea. However, we can't possibly guess how aggressively to do this if > we have no idea how long it's going to be before we need to vacuum > that table for real. If the number of XIDs remaining until the table > gets a wraparound vacuum is X, and the number of XIDs being consumed > per day is Y, we can estimate that in roughly X/Y days, we're going to > need to do a wraparound vacuum. That value might be in the range of > months, or in the range of hours. > > If it's months, we probably want limit vacuum to working at a pretty > slow rate, say 1% of the table size per hour or something. If it's in > hours, we need to be a lot more aggressive. Right now we have no > information to tell us which of those things is the case, so we'd just > be shooting in the dark. Sawada-san presented some ideas in his PGCon 2018 talk that may be related. https://www.pgcon.org/2018/schedule/attachments/488_Vacuum_More_Efficient_Than_Ever (slide 32~) Thanks, Amit
On 27/03/2019 21:54, Darafei "Komяpa" Praliaskouski wrote: > Hi hackers, > > Attached is sketch of small patch that fixes several edge cases with > autovacuum. Long story short autovacuum never comes to append only > tables, killing large productions. > > First case, mine. > https://www.postgresql.org/message-id/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com > > We had a table we were appending and wanted Index Only Scan to work. For > it to work, you need to call VACUUM manually, since VACUUM is the only > way to mark pages all visible, and autovacuum never comes to append only > tables. We were clever to invent a workflow without dead tuples and it > painfully bit us. > > Second case, just read in the news. > https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/ > > Mandrill has 6TB append only table that autovacuum probably never > vacuumed. Then anti-wraparound came and production went down. If > autovacuum did its job before that last moment, it would probably be okay. > > Idea: look not on dead tuples, but on changes, just like ANALYZE does. > It's my first patch on Postgres, it's probably all wrong but I hope it > helps you get the idea. This was suggested and rejected years ago: https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb@2ndquadrant.fr -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Mar 28, 2019 at 6:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> 1. heap vacuum
>>
>> 2. HOT pruning
>
> Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something?
>
Yeah, these operations are required only when the table has actual
garbage. IOW, append-only tables never require them.
>>
>> 3. freezing tuples
>> 4. updating visibility map (all-visible and all-frozen)
>
> These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
>
>>
>> 5. index vacuum/cleanup
>
> There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?
I think that's not enough. The feature "GUC for cleanup index
threshold" allows us to skip only index cleanup when there are less
insertion than the fraction of the total number of heap tuples since
last index cleanup. Therefore it helps only append-only tables (and
supporting only btree index for now). We still have to do index
vacuuming even if the table has just a few dead tuple. The proposed
patch[1] helps this situation; vacuum can run while skipping index
vacuuming and index cleanup.
So, the patch I posted can be technically applied after https://commitfest.postgresql.org/22/1817/ gets merged?
The change with my patch is that a table with 49 insertions and one delete:
- previously will wait for 49 more deletes by default (and ignore insertions), and only then clean up both table and indexes.
- with patch will freeze/update VM for insertions, and scan the index.
In my experience only btree index is requiring a slow full index scan, that's why only it was in the "GUC for cleanup index
threshold" patch. Is it wrong and more index types do a full index scan on vacuum after deletion of a single tuple?>> 6. truncation
>
> This shouldn't be a heavy operation?
>
I don't think so. This could take AccessExclusiveLock on the table and
take a long time with large shared buffer as per reported on that
thread[2].
While this can be a useful optimization, I believe it is out of scope for this patch. I want to fix vacuum never coming to append only tables without breaking other behaviors. Truncation is likely a case of enough dead tuples to trigger a vacuum via currently existing mechanisms.
>>
>>
>> With the proposed patch[1] we can control to do 5 or not. In addition
>> to that, another proposed patch[2] allows us to control 6.
>>
>> For append-only tables (and similar tables), what we periodically want
>> to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
>> need to have both an option of (auto)vacuum to control whether to do 1
>> and something like a new autovacuum threshold (or an option) to invoke
>> the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
>> and 4 would be much cheaper than today's vacuum and anti-wraparound
>> vacuum would be able to skip almost pages.
>
>
> Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.
>
Because that's for tables that have many inserts but have some
updates/deletes. I think that this strategy would help not only
append-only tables but also such tables.
How much do we save by skipping a heap vacuum on almost-append-only table, where amount of updates is below 50 which is current threshold?
> Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
>
I don't understand the meaning of "situation can be made non-emergency
from the very beginning". Could you please elaborate on that?
Let's imagine a simple append-only workflow on current default settings Postgres. You create a table, and start inserting tuples, one per transaction. Let's imagine a page fits 50 tuples (my case for taxi movement data), and Amazon gp2 storage which caps you say at 1000 IOPS in non-burst mode.
Anti-wrap-around-auto-vacuum (we need a drawing of misreading of this term with a crossed out car bent in Space) will be triggered in autovacuum_freeze_max_age inserts, 200000000 by default. That converts into 4000000 pages, or around 32 GB. It will be the first vacuum ever on that table, since no other mechanism triggers it, and if it steals all the available IOPS, it will finish in 200000000/50 /1000 = 4000 seconds, killing prod for over an hour.
Telemetry workloads can easily generate 32 GB of data a day (I've seen more, but let's stick to that number). Production going down for an hour a day isn't good and I consider it an emergency.
Now, two ways to fix it that reading documentation leads you while you're a sleepy one trying to get prod back:
- raise autovacuum_freeze_max_age so VACUUM keeps sleeping;
- rewrite code to use batching to insert more tuples at once.
We don't have a better recommendation mechanism for settings, and experience in tuning autovacuum into right direction comes at the cost of a job or company to people :)
Both ways not fix the problem but just delay the inevitable. Ratio of "one hour of vacuum per day of operation" keeps, you just delay it.
Let's say had same thing with 1000 records batched inserts, and moved autovacuum_freeze_max_age to the highest possible value. How much will the downtime last?
2**31 (max tid) * 1000 (tuples per tid) / 50 (tuples in page) / 1000 (pages per second) / 86400 (seconds in day) = 49 days.
This matches highest estimation in Mandrill's report, so that might be what have happened to them.
This matches highest estimation in Mandrill's report, so that might be what have happened to them.
This all would not be needed if autovacuum came after 50 inserted tuples. It will just mark page as all visible and all frozen and be gone, while it's still in memory. This will get rid of emergency altogether.
Is this elaborate enough disaster scenario? :)
Is this elaborate enough disaster scenario? :)
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
By the way, the Routine Vacuuming chapter of the documentation says:
"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more space
[...]
If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively]
is trivial compared to your total database size, setting
autovacuum_freeze_max_age to its maximum allowed value is recommended."
Maybe this should be qualified with "unless you have trouble with your
autovacuum keeping up" or so; or generally reworded?
This recommendation is in the mindset of "wraparound never happens".
If your database is large, you have more chances to hit it painfully, and if it's append-only even more so.
Alternative point of "if your database is super large and actively written, you may want to set autovacuum_freeze_max_age to even smaller values so that autovacuum load is more evenly spread over time" may be needed.
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
If it's months, we probably want limit vacuum to working at a pretty
slow rate, say 1% of the table size per hour or something. If it's in
hours, we need to be a lot more aggressive. Right now we have no
information to tell us which of those things is the case, so we'd just
be shooting in the dark.
Thing is, you don't need to spread out your vacuum in time if the rate of vacuuming matches rate of table growth. Can we mark tuples/pages as all-visible and all-frozen say, the moment they're pushed out of shared_buffers?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
> Idea: look not on dead tuples, but on changes, just like ANALYZE does.
> It's my first patch on Postgres, it's probably all wrong but I hope it
> helps you get the idea.
This was suggested and rejected years ago:
https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb@2ndquadrant.fr
Thank you for sharing the link. I've read through the thread and see you posted two patches, first being similar but different from mine, and second being about a different matter.
I don't see "rejected" there, just a common distraction of "you should also consider this" and time-out leading to "returned with feedback" at the end.
Thing is, we have dead large productions and post-mortems now as your patch wasn't pushed back in 2016, so situation is different. Let's push at least first of two patches of yours, or mine.
Which one is better and why?
I believe mine, as it just follows a pattern already established and proven in autoanalyze. If vacuum comes and unable to harvest some dead tuples, it will come over again in your case, and just sleep until it gets new dead tuples in mine, which looks better to me - there's no dead loop in case some dead tuples are stuck forever.
If someone thinks yours is better we may also consider it for autoanalyze?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking I think vacuum does the
following operations.
1. heap vacuum
2. HOT pruning
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
5. index vacuum/cleanup
6. truncation
With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.
[1] is committed, [2] nears commit. Seems we have now all the infra to teach autovacuum to run itself based on inserts and not hurt anybody?
...
[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Sat, Apr 6, 2019 at 9:56 AM Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote:
The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking I think vacuum does the
following operations.
1. heap vacuum
2. HOT pruning
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
5. index vacuum/cleanup
6. truncation
With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.[1] is committed, [2] nears commit. Seems we have now all the infra to teach autovacuum to run itself based on inserts and not hurt anybody?...[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/
Reading the thread and the patch, I generally agree that:
1. With the current infrastructure having auto vacuum periodically scan append-only tables for freezing would be good, and
2. I can't think of any cases where this would be a bad thing.
Also I am not 100% convinced that the problems are avoidable by setting the wraparound prevention thresholds low enough. In cases where one is doing large bulk inserts all the time, vacuum freeze could have a lot of work to do, and in some cases I could imagine IO storms making that difficult.
I plan to run some benchmarks on this to try to assess performance impact of this patch in standard pgbench scenarios.I will also try to come up with some other benchmarks in append only workloads.
Best Regards,
Chris Travers
Head of Database
Saarbrücker Straße 37a, 10405 Berlin
On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote: > Alternative point of "if your database is super large and actively written, > you may want to set autovacuum_freeze_max_age to even smaller values so > that autovacuum load is more evenly spread over time" may be needed. I don't think it's helpful to force emergency vacuuming more frequently; quite the contrary, it's likely to cause even more issues. We should tweak autovacuum to perform freezing more preemtively instead. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On April 10, 2019 8:13:06 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote: > >> Alternative point of "if your database is super large and actively >written, >> you may want to set autovacuum_freeze_max_age to even smaller values >so >> that autovacuum load is more evenly spread over time" may be needed. > >I don't think it's helpful to force emergency vacuuming more >frequently; >quite the contrary, it's likely to cause even more issues. We should >tweak autovacuum to perform freezing more preemtively instead. I still think the fundamental issue with making vacuum less painful is that the all indexes have to be read entirely. Evenif there's not much work (say millions of rows frozen, hundreds removed). Without that issue we could vacuum much morefrequently. And do it properly in insert only workloads. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, Apr 10, 2019 at 5:21 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On April 10, 2019 8:13:06 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote:
>
>> Alternative point of "if your database is super large and actively
>written,
>> you may want to set autovacuum_freeze_max_age to even smaller values
>so
>> that autovacuum load is more evenly spread over time" may be needed.
>
>I don't think it's helpful to force emergency vacuuming more
>frequently;
>quite the contrary, it's likely to cause even more issues. We should
>tweak autovacuum to perform freezing more preemtively instead.
I still think the fundamental issue with making vacuum less painful is that the all indexes have to be read entirely. Even if there's not much work (say millions of rows frozen, hundreds removed). Without that issue we could vacuum much more frequently. And do it properly in insert only workloads.
So I see a couple of issues here and wondering what the best approach is.
The first is to just skip lazy_cleanup_index if no rows were removed. Is this the approach you have in mind? Or is that insufficient?
The second approach would be to replace the whole idea of this patch with a lazy freeze worker which would basically periodically do a vacuum freeze on relations matching certain criteria. This could have a lower max workers than autovacuum and therefore less of a threat in terms of total IO usage.
Thoughts?
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Best Regards,
Chris Travers
Head of Database
Saarbrücker Straße 37a, 10405 Berlin
On Thu, Apr 11, 2019 at 11:25:29AM +0200, Chris Travers wrote: > On Wed, Apr 10, 2019 at 5:21 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On April 10, 2019 8:13:06 AM PDT, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > >On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote: > > > >> Alternative point of "if your database is super large and actively > >written, > >> you may want to set autovacuum_freeze_max_age to even smaller values > >so > >> that autovacuum load is more evenly spread over time" may be needed. > > > >I don't think it's helpful to force emergency vacuuming more > >frequently; > >quite the contrary, it's likely to cause even more issues. We should > >tweak autovacuum to perform freezing more preemtively instead. > > I still think the fundamental issue with making vacuum less painful is > that the all indexes have to be read entirely. Even if there's not much > work (say millions of rows frozen, hundreds removed). Without that issue > we could vacuum much more frequently. And do it properly in insert only > workloads. > > So I see a couple of issues here and wondering what the best approach is. > The first is to just skip lazy_cleanup_index if no rows were removed. Is > this the approach you have in mind? Or is that insufficient? I don't think that's what Andres had in mind, as he explicitly mentioned removed rows. So just skipping lazy_cleanup_index when there were no deleted would not help in that case. What I think we could do is simply leave the tuple pointers in the table (and indexes) when there are only very few of them, and only do the expensive table/index cleanup once there's anough of them. > The second approach would be to replace the whole idea of this patch with > a lazy freeze worker which would basically periodically do a vacuum freeze > on relations matching certain criteria. This could have a lower max > workers than autovacuum and therefore less of a threat in terms of total > IO usage. > Thoughts? > Not sure. I find it rather difficult to manage more and more different types of cleanup workers. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 10, 2019 at 6:13 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote:
> Alternative point of "if your database is super large and actively written,
> you may want to set autovacuum_freeze_max_age to even smaller values so
> that autovacuum load is more evenly spread over time" may be needed.
I don't think it's helpful to force emergency vacuuming more frequently;
quite the contrary, it's likely to cause even more issues. We should
tweak autovacuum to perform freezing more preemtively instead.
Okay. What would be your recommendation for the case of Mandrill running current Postgres 11? Which parameters shall they tune and to which values?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
>I don't think it's helpful to force emergency vacuuming more
>frequently;
>quite the contrary, it's likely to cause even more issues. We should
>tweak autovacuum to perform freezing more preemtively instead.
I still think the fundamental issue with making vacuum less painful is that the all indexes have to be read entirely. Even if there's not much work (say millions of rows frozen, hundreds removed). Without that issue we could vacuum much more frequently. And do it properly in insert only workloads.
Deletion of hundreds of rows on default settings will cause the same behavior now.
If there was 0 updates currently the index cleanup will be skipped.
https://commitfest.postgresql.org/22/1817/ got merged. This means Autovacuum can have two separate thresholds - the current, on dead tuples, triggering the VACUUM same way it triggers it now, and a new one, on inserted tuples only, triggering VACUUM (INDEX_CLEANUP FALSE)?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Sun, Apr 14, 2019 at 4:51 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Thu, Apr 11, 2019 at 11:25:29AM +0200, Chris Travers wrote: > > On Wed, Apr 10, 2019 at 5:21 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On April 10, 2019 8:13:06 AM PDT, Alvaro Herrera > > <alvherre@2ndquadrant.com> wrote: > > >On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote: > > > > > >> Alternative point of "if your database is super large and actively > > >written, > > >> you may want to set autovacuum_freeze_max_age to even smaller values > > >so > > >> that autovacuum load is more evenly spread over time" may be needed. > > > > > >I don't think it's helpful to force emergency vacuuming more > > >frequently; > > >quite the contrary, it's likely to cause even more issues. We should > > >tweak autovacuum to perform freezing more preemtively instead. > > > > I still think the fundamental issue with making vacuum less painful is > > that the all indexes have to be read entirely. Even if there's not much > > work (say millions of rows frozen, hundreds removed). Without that issue > > we could vacuum much more frequently. And do it properly in insert only > > workloads. > > > > So I see a couple of issues here and wondering what the best approach is. > > The first is to just skip lazy_cleanup_index if no rows were removed. Is > > this the approach you have in mind? Or is that insufficient? > > I don't think that's what Andres had in mind, as he explicitly mentioned > removed rows. So just skipping lazy_cleanup_index when there were no > deleted would not help in that case. > > What I think we could do is simply leave the tuple pointers in the table > (and indexes) when there are only very few of them, and only do the > expensive table/index cleanup once there's anough of them. Yeah, we now have an infrastructure that skips index vacuuming by leaving the tuples pointers. So we then can have a threshold for autovacuum to invoke index vacuuming. Or an another idea is to delete index entries more actively by index looking up instead of scanning the whole index. It's proposed[1]. [1] I couldn't get the URL of the thread right now for some reason but the thread subject is " [WIP] [B-Tree] Retail IndexTuple deletion". Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Apr 15, 2019 at 10:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Apr 14, 2019 at 4:51 AM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Thu, Apr 11, 2019 at 11:25:29AM +0200, Chris Travers wrote: > > > On Wed, Apr 10, 2019 at 5:21 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On April 10, 2019 8:13:06 AM PDT, Alvaro Herrera > > > <alvherre@2ndquadrant.com> wrote: > > > >On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote: > > > > > > > >> Alternative point of "if your database is super large and actively > > > >written, > > > >> you may want to set autovacuum_freeze_max_age to even smaller values > > > >so > > > >> that autovacuum load is more evenly spread over time" may be needed. > > > > > > > >I don't think it's helpful to force emergency vacuuming more > > > >frequently; > > > >quite the contrary, it's likely to cause even more issues. We should > > > >tweak autovacuum to perform freezing more preemtively instead. > > > > > > I still think the fundamental issue with making vacuum less painful is > > > that the all indexes have to be read entirely. Even if there's not much > > > work (say millions of rows frozen, hundreds removed). Without that issue > > > we could vacuum much more frequently. And do it properly in insert only > > > workloads. > > > > > > So I see a couple of issues here and wondering what the best approach is. > > > The first is to just skip lazy_cleanup_index if no rows were removed. Is > > > this the approach you have in mind? Or is that insufficient? > > > > I don't think that's what Andres had in mind, as he explicitly mentioned > > removed rows. So just skipping lazy_cleanup_index when there were no > > deleted would not help in that case. > > > > What I think we could do is simply leave the tuple pointers in the table > > (and indexes) when there are only very few of them, and only do the > > expensive table/index cleanup once there's anough of them. > > Yeah, we now have an infrastructure that skips index vacuuming by > leaving the tuples pointers. So we then can have a threshold for > autovacuum to invoke index vacuuming. Or an another idea is to delete > index entries more actively by index looking up instead of scanning > the whole index. It's proposed[1]. > > [1] I couldn't get the URL of the thread right now for some reason but > the thread subject is " [WIP] [B-Tree] Retail IndexTuple deletion". Now I got https://www.postgresql.org/message-id/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sun, Apr 14, 2019 at 9:59 PM Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote: >> >> >> >I don't think it's helpful to force emergency vacuuming more >> >frequently; >> >quite the contrary, it's likely to cause even more issues. We should >> >tweak autovacuum to perform freezing more preemtively instead. >> >> I still think the fundamental issue with making vacuum less painful is that the all indexes have to be read entirely.Even if there's not much work (say millions of rows frozen, hundreds removed). Without that issue we could vacuummuch more frequently. And do it properly in insert only workloads. > > > Deletion of hundreds of rows on default settings will cause the same behavior now. > If there was 0 updates currently the index cleanup will be skipped. > > https://commitfest.postgresql.org/22/1817/ got merged. This means Autovacuum can have two separate thresholds - the current,on dead tuples, triggering the VACUUM same way it triggers it now, and a new one, on inserted tuples only, triggeringVACUUM (INDEX_CLEANUP FALSE)? > Agreed. To invoke autovacuum even on insert-only tables we would need check the number of inserted tuples since last vacuum. I think we can keep track of the number of inserted tuples since last vacuum to the stats collector and add the threshold to invoke vacuum with INDEX_CLEANUP = false. If an autovacuum worker confirms that the number of inserted tuples exceeds the threshold it invokes vacuum with INDEX_CLEANUP = false. However if the number of dead tuples also exceeds the autovacuum thresholds (autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor) it should invoke vacuum with INDEX_CLEANUP = true. Therefore new threshold makes sense only when it's lower than the autovacuum thresholds. I guess we can have one new GUC parameter to control scale factor. Since only relatively large tables will require this feature we might not need the threshold based the number of tuples. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jul 23, 2019 at 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > To invoke autovacuum even on insert-only tables we would need check > the number of inserted tuples since last vacuum. I think we can keep > track of the number of inserted tuples since last vacuum to the stats > collector and add the threshold to invoke vacuum with INDEX_CLEANUP = > false. If an autovacuum worker confirms that the number of inserted > tuples exceeds the threshold it invokes vacuum with INDEX_CLEANUP = > false. However if the number of dead tuples also exceeds the > autovacuum thresholds (autovacuum_vacuum_threshold and > autovacuum_vacuum_scale_factor) it should invoke vacuum with > INDEX_CLEANUP = true. Therefore new threshold makes sense only when > it's lower than the autovacuum thresholds. > > I guess we can have one new GUC parameter to control scale factor. > Since only relatively large tables will require this feature we might > not need the threshold based the number of tuples. > Generally speaking, having more guc's for autovacuum and that too which are in some way dependent on existing guc's sounds bit scary, but OTOH whatever you wrote makes sense and can help the scenarios which this thread is trying to deal with. Have you given any thought to what Alvaro mentioned up-thread "certain tables would have some sort of partial scan that sets the visibility map. There's no reason to invoke the whole vacuuming machinery. I don't think this is limited to append-only tables, but rather those are just the ones that are affected the most."? This thread seems to be stalled for the reason that we don't have a clear consensus on what is the right solution for the problem being discussed. Alvaro, anyone has any thoughts on how we can move forward with this work? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Sep 10, 2019 at 8:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 23, 2019 at 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > To invoke autovacuum even on insert-only tables we would need check > > the number of inserted tuples since last vacuum. I think we can keep > > track of the number of inserted tuples since last vacuum to the stats > > collector and add the threshold to invoke vacuum with INDEX_CLEANUP = > > false. If an autovacuum worker confirms that the number of inserted > > tuples exceeds the threshold it invokes vacuum with INDEX_CLEANUP = > > false. However if the number of dead tuples also exceeds the > > autovacuum thresholds (autovacuum_vacuum_threshold and > > autovacuum_vacuum_scale_factor) it should invoke vacuum with > > INDEX_CLEANUP = true. Therefore new threshold makes sense only when > > it's lower than the autovacuum thresholds. > > > > I guess we can have one new GUC parameter to control scale factor. > > Since only relatively large tables will require this feature we might > > not need the threshold based the number of tuples. > > > > Generally speaking, having more guc's for autovacuum and that too > which are in some way dependent on existing guc's sounds bit scary, > but OTOH whatever you wrote makes sense and can help the scenarios > which this thread is trying to deal with. Have you given any thought > to what Alvaro mentioned up-thread "certain tables would have some > sort of partial scan that sets the visibility map. There's no reason > to invoke the whole vacuuming machinery. I don't think this is > limited to append-only tables, but > rather those are just the ones that are affected the most."? > Speaking of partial scan I've considered before that we could use WAL to find which pages have garbage much and not all-visible pages. We can vacuum only a particular part of table that is most effective of garbage collection instead of whole tables. I've shared some results of that at PGCon and it's still in PoC state. Also, to address the issue of updating VM of mostly-append-only tables I considered some possible solutions: 1. Using INDEX_CLEANUP = false and TRUNCATE = false vacuum does hot pruning, vacuuming table and updating VM. In addition to updating VM we need to do other two operations but since the mostly-insert-only tables would have less garbage the hot pruning and vacuuming table should be light workload. This is what I proposed on up-thread. 2. This may have already been discussed before but we could update VM when hot pruning during SELECT operation. Since this affects SELECT performance it should be enabled on only particular tables by user request. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Sep 13, 2019 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Sep 10, 2019 at 8:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Generally speaking, having more guc's for autovacuum and that too > > which are in some way dependent on existing guc's sounds bit scary, > > but OTOH whatever you wrote makes sense and can help the scenarios > > which this thread is trying to deal with. Have you given any thought > > to what Alvaro mentioned up-thread "certain tables would have some > > sort of partial scan that sets the visibility map. There's no reason > > to invoke the whole vacuuming machinery. I don't think this is > > limited to append-only tables, but > > rather those are just the ones that are affected the most."? > > > > Speaking of partial scan I've considered before that we could use WAL > to find which pages have garbage much and not all-visible pages. We > can vacuum only a particular part of table that is most effective of > garbage collection instead of whole tables. I've shared some results > of that at PGCon and it's still in PoC state. > > Also, to address the issue of updating VM of mostly-append-only tables > I considered some possible solutions: > > 1. Using INDEX_CLEANUP = false and TRUNCATE = false vacuum does hot > pruning, vacuuming table and updating VM. In addition to updating VM > we need to do other two operations but since the mostly-insert-only > tables would have less garbage the hot pruning and vacuuming table > should be light workload. This is what I proposed on up-thread. > Yes, this is an option, but it might be better if we can somehow avoid triggering the vacuum machinery. > 2. This may have already been discussed before but we could update > VM when hot pruning during SELECT operation. Since this affects SELECT > performance it should be enabled on only particular tables by user > request. > Yeah, doing anything additional in SELECT's can be tricky and think of a case where actually there is nothing to prune on-page, in that case also if we run the visibility checks and then mark the visibility map, then it can be a noticeable overhead. OTOH, I think this will be a one-time overhead because after the first scan the visibility map will be updated and future scans don't need to update visibility map unless someone has updated that page. I was wondering why not do this during write workloads. For example, when Insert operation finds that there is no space in the current page and it has to move to next page, it can check if the page (that doesn't have space to accommodate current tuple) can be marked all-visible. In this case, we would have already done the costly part of an operation which is to Read/Lock the buffer. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, This patch is currently in "needs review" state, but that seems quite wrong - there's been a lot of discussions about how we might improve behavior for append-only-tables, but IMO there's no clear consensus nor a patch that we might review. So I think this should be either "waiting on author" or maybe "rejected with feedback". Is there any chance of getting a reviewable patch in the current commitfest? If not, I propose to mark it as RWF. I still hope we can improve this somehow in time for PG13. The policy is not to allow new non-trivial patches in the last CF, but hopefully this might be considered an old patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 2020-01-07 at 19:05 +0100, Tomas Vondra wrote: > This patch is currently in "needs review" state, but that seems quite > wrong - there's been a lot of discussions about how we might improve > behavior for append-only-tables, but IMO there's no clear consensus nor > a patch that we might review. > > So I think this should be either "waiting on author" or maybe "rejected > with feedback". Is there any chance of getting a reviewable patch in the > current commitfest? If not, I propose to mark it as RWF. > > I still hope we can improve this somehow in time for PG13. The policy is > not to allow new non-trivial patches in the last CF, but hopefully this > might be considered an old patch. I think that no conclusion was reached because there are *many* things that could be improved, and *many* interesting and ambitious ideas were vented. But I think it would be good to have *something* that addresses the immediate problem (INSERT-only tables are autovacuumed too late), as long as that does not have negative side-effects or blocks further improvements. I don't feel totally well with the very simplistic approach of this patch (use the same metric to trigger autoanalyze and autovacuum), but what about this: - a new table storage option autovacuum_vacuum_insert_threshold, perhaps a GUC of the same name, by default deactivated. - if tabentry->tuples_inserted exceeds this threshold, but not one of the others, lauch autovacuum with index_cleanup=off. How would you feel about that? Yours, Laurenz Albe
On Mon, 2020-03-02 at 14:57 +0100, I wrote: > But I think it would be good to have *something* that addresses the immediate > problem (INSERT-only tables are autovacuumed too late), as long as > that does not have negative side-effects or blocks further improvements. > > I don't feel totally well with the very simplistic approach of this > patch (use the same metric to trigger autoanalyze and autovacuum), > but what about this: > > - a new table storage option autovacuum_vacuum_insert_threshold, > perhaps a GUC of the same name, by default deactivated. > > - if tabentry->tuples_inserted exceeds this threshold, but not one > of the others, lauch autovacuum with index_cleanup=off. As a more substantial base for discussion, here is a patch that: - introduces a GUC and reloption "autovacuum_vacuum_insert_limit", default 10000000 - introduces a statistics counter "inserts_since_vacuum" per table that gets reset to 0 after vacuum - causes autovacuum to run without cleaning up indexes if inserts_since_vacuum > autovacuum_vacuum_insert_limit and there is no other reason for an autovacuum No doc patch is included yet, and perhaps the new counter should be shown in "pg_stat_user_tables". Yours, Laurenz Albe
Attachment
On Tue, 2020-03-03 at 16:28 +0100, Laurenz Albe wrote: > As a more substantial base for discussion, here is a patch that: > > - introduces a GUC and reloption "autovacuum_vacuum_insert_limit", > default 10000000 > > - introduces a statistics counter "inserts_since_vacuum" per table > that gets reset to 0 after vacuum > > - causes autovacuum to run without cleaning up indexes if > inserts_since_vacuum > autovacuum_vacuum_insert_limit > and there is no other reason for an autovacuum I just realized that the exercise is pointless unless that autovacuum also runs with FREEZE on. Updated patch attached. Yours, Laurenz Albe
Attachment
On Thu, 5 Mar 2020 at 04:15, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > I just realized that the exercise is pointless unless that > autovacuum also runs with FREEZE on. I think we need to move forward with doing something to cope with INSERT-only tables not being auto-vacuumed. I think the patch you have is something along the lines to what I'd have imagined we should do. However, there are a few things that I'd do a different way. 1. I'd go for 2 new GUCs and reloptions. autovacuum_vacuum_insert_threshold (you're currently calling this autovacuum_vacuum_insert_limit. I don't see why the word "limit" is relevant here). The other GUC I think should be named autovacuum_vacuum_insert_scale_factor and these should work exactly the same way as autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor, but be applied in a similar way to the vacuum settings, but only be applied after we've checked to ensure the table is not otherwise eligible to be vacuumed. 2. I believe you're right in setting the freeze_min_age to 0 rather than freeze_min_age. My understanding of freeze_min_age is that we don't too pro-actively freeze tuples as in many workloads, a freshly INSERTed tuple is more likely to receive an UPDATE than an old tuple is. e.g something like new orders having their status updated various times until the order is complete, where it's not updated ever again. In the INSERT-only case, there seems to be not much reason not to just freeze right away. 3. The name "insert_only" does not seem the best for the new boolean variable that you're using in various places. That name seems to be too closely related to our current intended use case. Maybe skip_index_cleanup is more to the point. 4. Are you sure you mean "Maximum" here? Isn't it the minimum? At least it will be once you add both options. Otherwise, I think Maximum is not the correct word. Perhaps "The threshold" + {"autovacuum_vacuum_insert_limit", PGC_SIGHUP, AUTOVACUUM, + gettext_noop("Maximum number of tuple inserts prior to vacuum."), + NULL + }, 5. I think the new field in this struct should be named vacuum_insert_threshold @@ -252,6 +252,7 @@ typedef struct AutoVacOpts { bool enabled; int vacuum_threshold; + int vacuum_ins_limit; 6. Are you sure you meant to default this to 50? index e58e4788a8..9d96d58ed2 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -598,6 +598,8 @@ #autovacuum_naptime = 1min # time between autovacuum runs #autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum +#autovacuum_vacuum_insert_limit = 50 # max number of row inserts before + # vacuum Seems excessive given there's no scale factor in the current patch. 7. I know you know.... missing docs... would be good to get those. 8. Should we care when setting the insert counter back to 0 if auto-vacuum has skipped pages? 9. You should add a new column to the pg_stat_all_tables view to allow visibility of the insert since the last vacuum. The column should be named n_ins_since_vacuum. This seems like the best combination of n_mod_since_analyze and n_tup_ins. 10. I'm slightly worried about the case where we don't quite trigger a normal vacuum but trigger a vacuum due to INSERTs then skip cleaning up the indexes but proceed to leave dead index entries causing indexes to become bloated. It does not seem impossible that given the right balance of INSERTs and UPDATE/DELETEs that this could happen every time and the indexes would just become larger and larger. It's pretty easy to see this in action with: create extension if not exists pgstattuple; create table t0 (a int primary key); alter table t0 set (autovacuum_enabled=off); insert into t0 select generate_Series(1,1000000); delete from t0 where a&1=0; vacuum (index_cleanup off) t0; insert into t0 select generate_series(2,1000000,2); select * from pgstattuple('t0'),pg_relation_size('t0') as t0_size; select n_dead_tup from pg_stat_all_tables where relid = 't0'::regclass; -- repeat this a few times and watch the indexes bloat 11. We probably do also need to debate if we want this on or off by default. I'd have leaned towards enabling by default if I'd not personally witnessed the fact that people rarely* increase auto-vacuum to run faster than the standard cost settings. I've seen hundreds of servers over the years with all workers busy for days on something they'll never finish quickly enough. We increased those settings 10x in PG12, so there will be fewer people around suffering from that now, but even after having reduced the vacuum_cost_delay x10 over the PG11 settings, it's by no means fast enough for everyone. I've mixed feelings about giving auto-vacuum more work to do for those people, so perhaps the best option is to keep this off by default so as not to affect the people who don't tune auto-vacuum. They'll just suffer the pain all at once when they hit max freeze age instead of more gradually with the additional load on the workers. At least adding this feature gives the people who do tune auto-vacuum some ability to handle read-only tables in some sane way. An alternative way of doing it would be to set the threshold to some number of million tuples and set the scale_factor to 0.2 so that it only has an effect on larger tables, of which generally people only have a smallish number of. * My opinion may be biased as the sample of people did arrive asking for help
Hi, Thanks Laurenz for taking action on this and writing a better patch than my initial. This will help avoid both Mandrill-like downtimes and get Index Only Scan just work on large telemetry databases like the one I was responsible for back when I was in Juno. On Thu, Mar 5, 2020 at 9:40 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 5 Mar 2020 at 04:15, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > I just realized that the exercise is pointless unless that > > autovacuum also runs with FREEZE on. > 8. Should we care when setting the insert counter back to 0 if > auto-vacuum has skipped pages? I believe it would be enough just to leave a comment about this in code. > 10. I'm slightly worried about the case where we don't quite trigger a > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning > up the indexes but proceed to leave dead index entries causing indexes > to become bloated. It does not seem impossible that given the right > balance of INSERTs and UPDATE/DELETEs that this could happen every > time and the indexes would just become larger and larger. Can we not reset statistics about dead tuples upon index-skipping vacuum, since we didn't really take care of them? > 11. We probably do also need to debate if we want this on or off by > default. I'd have leaned towards enabling by default if I'd not > personally witnessed the fact that people rarely* increase auto-vacuum > to run faster than the standard cost settings. I've seen hundreds of > servers over the years with all workers busy for days on something > they'll never finish quickly enough. We increased those settings 10x > in PG12, so there will be fewer people around suffering from that now, > but even after having reduced the vacuum_cost_delay x10 over the PG11 > settings, it's by no means fast enough for everyone. I've mixed > feelings about giving auto-vacuum more work to do for those people, so > perhaps the best option is to keep this off by default so as not to > affect the people who don't tune auto-vacuum. They'll just suffer the > pain all at once when they hit max freeze age instead of more > gradually with the additional load on the workers. At least adding > this feature gives the people who do tune auto-vacuum some ability to > handle read-only tables in some sane way. That's exactly the situation we're trying to avoid with this patch. Suffering all at once takes large production deployments down for weeks, and that gets into news. In current cloud setups it's plain impossible to read the whole database at all, let alone rewrite, with IO budgets. I say we should enable this setting by default. If my calculations are correct, that will freeze the table once each new gigabyte of data is written, which is usually fitting into burst read thresholds. -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote: > I think we need to move forward with doing something to cope with > INSERT-only tables not being auto-vacuumed. > > I think the patch you have is something along the lines to what I'd > have imagined we should do. However, there are a few things that I'd > do a different way. Thanks for the review, and that's good news. > 1. I'd go for 2 new GUCs and reloptions. > autovacuum_vacuum_insert_threshold (you're currently calling this > autovacuum_vacuum_insert_limit. I don't see why the word "limit" is > relevant here). The other GUC I think should be named > autovacuum_vacuum_insert_scale_factor and these should work exactly > the same way as autovacuum_vacuum_threshold and > autovacuum_vacuum_scale_factor, but be applied in a similar way to the > vacuum settings, but only be applied after we've checked to ensure the > table is not otherwise eligible to be vacuumed. Yes, "threshold" is better than "limit" I have renamed the GUC and the reloption. I disagree about the scale_factor (and have not added it to the updated version of the patch). If we have a scale_factor, then the time between successive autovacuum runs would increase as the table gets bigger, which defeats the purpose of reducing the impact of each autovacuum run. Since autovacuum skips pages where it has nothing to do, we can expect that runs on a large table won't be much more expensive than runs on a smaller table, right? > 3. The name "insert_only" does not seem the best for the new boolean > variable that you're using in various places. That name seems to be > too closely related to our current intended use case. Maybe > skip_index_cleanup is more to the point. I originally called the variable "skip_indexes", but when I decided that such vacuum runs also aggressively freeze the table, I thought that the name was misleading and renamed it. I won't put up a fight about this, though. > 4. Are you sure you mean "Maximum" here? Isn't it the minimum? At > least it will be once you add both options. Otherwise, I think Maximum > is not the correct word. Perhaps "The threshold" > > + {"autovacuum_vacuum_insert_limit", PGC_SIGHUP, AUTOVACUUM, > + gettext_noop("Maximum number of tuple inserts prior to vacuum."), > + NULL > + }, I had actually been debating whether to use "maximum" or "minimum". I realize now that this strange uncertainty stems from the fact that there is (yet) only a single parameter to govern this. The updated patch desctibes the GUC as "Number of tuple inserts prior to vacuum." > 5. I think the new field in this struct should be named vacuum_insert_threshold > > @@ -252,6 +252,7 @@ typedef struct AutoVacOpts > { > bool enabled; > int vacuum_threshold; > + int vacuum_ins_limit; I agree as above, renamed. > 6. Are you sure you meant to default this to 50? > > index e58e4788a8..9d96d58ed2 100644 > --- a/src/backend/utils/misc/postgresql.conf.sample > +++ b/src/backend/utils/misc/postgresql.conf.sample > @@ -598,6 +598,8 @@ > #autovacuum_naptime = 1min # time between autovacuum runs > #autovacuum_vacuum_threshold = 50 # min number of row updates before > # vacuum > +#autovacuum_vacuum_insert_limit = 50 # max number of row inserts before > + # vacuum > > Seems excessive given there's no scale factor in the current patch. That was a mistake. I chose 10000000 as the actual default value, but forgot to put the same value into "postgresql.conf". > 7. I know you know.... missing docs... would be good to get those. The updated version of the patch has documentation. I just wanted to get a feeling if my patch would be killed cold before I went to the effort of writing documentation. > 8. Should we care when setting the insert counter back to 0 if > auto-vacuum has skipped pages? Since this is only an approximate value anyway, I decided not to care. I don't know if that is acceptable. > 9. You should add a new column to the pg_stat_all_tables view to allow > visibility of the insert since the last vacuum. The column should be > named n_ins_since_vacuum. This seems like the best combination of > n_mod_since_analyze and n_tup_ins. Done. > 10. I'm slightly worried about the case where we don't quite trigger a > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning > up the indexes but proceed to leave dead index entries causing indexes > to become bloated. It does not seem impossible that given the right > balance of INSERTs and UPDATE/DELETEs that this could happen every > time and the indexes would just become larger and larger. I understand. This might particularly be a problem with larger tables, where a normal autovacuum is rare because of the scale_factor. Perhaps we can take care of the problem by *not* skipping index cleanup if "changes_since_analyze" is substantially greater than 0. What do you think? > 11. We probably do also need to debate if we want this on or off by > default. I'd have leaned towards enabling by default if I'd not > personally witnessed the fact that people rarely* increase auto-vacuum > to run faster than the standard cost settings. I've seen hundreds of > servers over the years with all workers busy for days on something > they'll never finish quickly enough. We increased those settings 10x > in PG12, so there will be fewer people around suffering from that now, > but even after having reduced the vacuum_cost_delay x10 over the PG11 > settings, it's by no means fast enough for everyone. I've mixed > feelings about giving auto-vacuum more work to do for those people, so > perhaps the best option is to keep this off by default so as not to > affect the people who don't tune auto-vacuum. They'll just suffer the > pain all at once when they hit max freeze age instead of more > gradually with the additional load on the workers. At least adding > this feature gives the people who do tune auto-vacuum some ability to > handle read-only tables in some sane way. > > An alternative way of doing it would be to set the threshold to some > number of million tuples and set the scale_factor to 0.2 so that it > only has an effect on larger tables, of which generally people only > have a smallish number of. Yes, I think that disabling this by default defeats the purpose. Knowledgeable people can avoid the problem today by manually scheduling VACUUM runs on insert-only tables, and the functionality proposed here is specifically to improve the lives of people who don't know enough to tune autovacuum. My original idea was to set the threshold to 10 million and have no scale factor. Updated patch attached. Yours, Laurenz Albe
Attachment
On Thu, Mar 05, 2020 at 03:27:31PM +0100, Laurenz Albe wrote: > On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote: > > 1. I'd go for 2 new GUCs and reloptions. > > autovacuum_vacuum_insert_scale_factor and these should work exactly > > I disagree about the scale_factor (and have not added it to the > updated version of the patch). If we have a scale_factor, then the > time between successive autovacuum runs would increase as the table > gets bigger, which defeats the purpose of reducing the impact of each > autovacuum run. I would vote to include scale factor. You're right that a nonzero scale factor would cause vacuum to run with geometrically decreasing frequency. The same thing currently happens with autoanalyze as a table grows in size. I found that our monthly-partitioned tables were being analyzed too infrequently towards the end of the month. (At the beginning of the month, 10% is 2.4 hours worth of timeseries data, but at the end of the month 10% is 3 days, which was an issue when querying the previous day may have rowcount estimates near zero.) If someone wanted to avoid that, they'd set scale_factor=0. I think this patch should parallel what's already in place, and we can add documention for the behavior if need be. Possibly scale_factor should default to zero, which I think might make sense since insert-only tables seem to be the main target of this patch. > +++ b/doc/src/sgml/maintenance.sgml > + <para> > + Tables that have received more than > + <xref linkend="guc-autovacuum-vacuum-insert-threshold"/> > + inserts since they were last vacuumed and are not eligible for vacuuming > + based on the above criteria will be vacuumed to reduce the impact of a future > + anti-wraparound vacuum run. > + Such a vacuum will aggressively freeze tuples, and it will not clean up dead > + index tuples. "BUT will not clean .." > +++ b/src/backend/postmaster/autovacuum.c > + /* > + * If the number of inserted tuples exceeds the limit I would say "exceeds the threshold" Thanks for working on this; we would use this feature on our insert-only tables. -- Justin
On Fri, 6 Mar 2020 at 03:27, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote: > > 1. I'd go for 2 new GUCs and reloptions. > > autovacuum_vacuum_insert_threshold (you're currently calling this > > autovacuum_vacuum_insert_limit. I don't see why the word "limit" is > > relevant here). The other GUC I think should be named > > autovacuum_vacuum_insert_scale_factor and these should work exactly > > the same way as autovacuum_vacuum_threshold and > > autovacuum_vacuum_scale_factor, but be applied in a similar way to the > > vacuum settings, but only be applied after we've checked to ensure the > > table is not otherwise eligible to be vacuumed. > > I disagree about the scale_factor (and have not added it to the > updated version of the patch). If we have a scale_factor, then the > time between successive autovacuum runs would increase as the table > gets bigger, which defeats the purpose of reducing the impact of each > autovacuum run. My view here is not really to debate what logically makes the most sense. I don't really think for a minute that the current auto-vacuums scale_factor and thresholds are perfect for the job. It's true that the larger a table becomes, the less often it'll be vacuumed, but these are control knobs that people have become accustomed to and I don't really think that making an exception for this is warranted. Perhaps we can zero out the scale factor by default and set the threshold into the millions of tuples. We can have people chime in on what they think about that and why once the code is written and even perhaps committed. Lack of a scale_factor does leave people who regularly truncate their "append-only" tables out in the cold a bit. Perhaps they'd like index-only scans to kick in soon after they truncate without having to wait for 10 million tuples, or so. > > 10. I'm slightly worried about the case where we don't quite trigger a > > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning > > up the indexes but proceed to leave dead index entries causing indexes > > to become bloated. It does not seem impossible that given the right > > balance of INSERTs and UPDATE/DELETEs that this could happen every > > time and the indexes would just become larger and larger. > > I understand. > > This might particularly be a problem with larger tables, where > a normal autovacuum is rare because of the scale_factor. > > Perhaps we can take care of the problem by *not* skipping index > cleanup if "changes_since_analyze" is substantially greater than 0. > > What do you think? Well, there is code that skips the index scans when there are 0 dead tuples found in the heap. If the table is truly INSERT-only then it won't do any harm since we'll skip the index scan anyway. I think it's less risky to clean the indexes. If we skip that then there will be a group of people will suffer from index bloat due to this, no matter if they realise it or not. > > 11. We probably do also need to debate if we want this on or off by > > default. I'd have leaned towards enabling by default if I'd not > > personally witnessed the fact that people rarely* increase auto-vacuum > > to run faster than the standard cost settings. I've seen hundreds of > > servers over the years with all workers busy for days on something > > they'll never finish quickly enough. We increased those settings 10x > > in PG12, so there will be fewer people around suffering from that now, > > but even after having reduced the vacuum_cost_delay x10 over the PG11 > > settings, it's by no means fast enough for everyone. I've mixed > > feelings about giving auto-vacuum more work to do for those people, so > > perhaps the best option is to keep this off by default so as not to > > affect the people who don't tune auto-vacuum. They'll just suffer the > > pain all at once when they hit max freeze age instead of more > > gradually with the additional load on the workers. At least adding > > this feature gives the people who do tune auto-vacuum some ability to > > handle read-only tables in some sane way. > > > > An alternative way of doing it would be to set the threshold to some > > number of million tuples and set the scale_factor to 0.2 so that it > > only has an effect on larger tables, of which generally people only > > have a smallish number of. > > Yes, I think that disabling this by default defeats the purpose. Perhaps the solution to that is somewhere else then. I can picture some sort of load average counters for auto-vacuum and spamming the logs with WARNINGs if we maintain high enough load for long enough, but we'd likely be better just completely overhauling the vacuum cost settings to be a percentage of total effort rather than some fixed speed. That would allow more powerful servers to run vacuum more quickly and it would also run more quickly during low load periods. We'd just need to sample now and again how long vacuuming a series of page takes then sleep for a time based on how long that took. That's not for this patch though. > Updated patch attached. Thanks. I've not looked yet as I really think we need a scale_factor for this. I'm interested to hear what others think. So far both Justin and I think it's a good idea.
Thanks, Justin, for the review. I have applied the changes where still applicable. On Fri, 2020-03-06 at 10:52 +1300, David Rowley wrote: > On Fri, 6 Mar 2020 at 03:27, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote: > > > 1. I'd go for 2 new GUCs and reloptions. > > > autovacuum_vacuum_insert_threshold (you're currently calling this > > > autovacuum_vacuum_insert_limit. I don't see why the word "limit" is > > > relevant here). The other GUC I think should be named > > > autovacuum_vacuum_insert_scale_factor and these should work exactly > > > the same way as autovacuum_vacuum_threshold and > > > autovacuum_vacuum_scale_factor, but be applied in a similar way to the > > > vacuum settings, but only be applied after we've checked to ensure the > > > table is not otherwise eligible to be vacuumed. > > > > I disagree about the scale_factor (and have not added it to the > > updated version of the patch). If we have a scale_factor, then the > > time between successive autovacuum runs would increase as the table > > gets bigger, which defeats the purpose of reducing the impact of each > > autovacuum run. > > My view here is not really to debate what logically makes the most > sense. I don't really think for a minute that the current > auto-vacuums scale_factor and thresholds are perfect for the job. It's > true that the larger a table becomes, the less often it'll be > vacuumed, but these are control knobs that people have become > accustomed to and I don't really think that making an exception for > this is warranted. Perhaps we can zero out the scale factor by > default and set the threshold into the millions of tuples. We can have > people chime in on what they think about that and why once the code is > written and even perhaps committed. Ok, I submit. My main desire was to keep the number of new GUCs as low as reasonably possible, but making the feature tunable along the known and "trusted" lines may be a good thing. The new parameter is called "autovacuum_vacuum_insert_scale_factor". > Lack of a scale_factor does leave people who regularly truncate their > "append-only" tables out in the cold a bit. Perhaps they'd like > index-only scans to kick in soon after they truncate without having to > wait for 10 million tuples, or so. That point I don't see. Truncating a table resets the counters to 0. > > > 10. I'm slightly worried about the case where we don't quite trigger a > > > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning > > > up the indexes but proceed to leave dead index entries causing indexes > > > to become bloated. It does not seem impossible that given the right > > > balance of INSERTs and UPDATE/DELETEs that this could happen every > > > time and the indexes would just become larger and larger. > > > > Perhaps we can take care of the problem by *not* skipping index > > cleanup if "changes_since_analyze" is substantially greater than 0. > > > > What do you think? > > Well, there is code that skips the index scans when there are 0 dead > tuples found in the heap. If the table is truly INSERT-only then it > won't do any harm since we'll skip the index scan anyway. I think > it's less risky to clean the indexes. If we skip that then there will > be a group of people will suffer from index bloat due to this, no > matter if they realise it or not. Oh I didn't know that. In that case it is better to have this vacuum process indexes as well. I have changed the patch so that it freezes tuples, but does not skip index cleanup. Better err on the side of caution. > > Yes, I think that disabling this by default defeats the purpose. > > Perhaps the solution to that is somewhere else then. I can picture > some sort of load average counters for auto-vacuum and spamming the > logs with WARNINGs if we maintain high enough load for long enough, > but we'd likely be better just completely overhauling the vacuum cost > settings to be a percentage of total effort rather than some fixed > speed. That would allow more powerful servers to run vacuum more > quickly and it would also run more quickly during low load periods. > We'd just need to sample now and again how long vacuuming a series of > page takes then sleep for a time based on how long that took. That's > not for this patch though. Right. Updated patch attached. Yours, Laurenz Albe
Attachment
On Sat, 7 Mar 2020 at 03:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Thanks, Justin, for the review. > I have applied the changes where still applicable. > > On Fri, 2020-03-06 at 10:52 +1300, David Rowley wrote: > > Lack of a scale_factor does leave people who regularly truncate their > > "append-only" tables out in the cold a bit. Perhaps they'd like > > index-only scans to kick in soon after they truncate without having to > > wait for 10 million tuples, or so. > > That point I don't see. > Truncating a table resets the counters to 0. The scenario there is that if we don't have any autovacuum_vacuum_insert_scale_factor and we set the threshold to 10 million tuples. The user truncates the table on a monthly basis and nearer to the end of the month the tuples accumulates around 100 million tuples, roughly 3.2 million are inserted per day, so auto-vacuum kicks in for this table around once every 3 days. At the start of the month, the table is truncated and it begins refilling. The n_ins_since_vacuum is reset to 0 during the truncate. Meanwhile, the table is being queried constantly and it takes 3 days for us to vacuum the table again. Queries hitting the table are unable to use Index Only Scans for 3 days. The DBAs don't have a lot of control over this. I think we can help users with that by giving them a bit more control over when auto-vacuum will run for the table. scale_factor and threshold. > Updated patch attached. Great. I'll have a look.
On Tue, 10 Mar 2020 at 09:56, David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 7 Mar 2020 at 03:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Updated patch attached. > > Great. I'll have a look. I don't really have many complaints about the v4 patch. However, during my pass of it, I did note down a few things that you might want to have a look at. 1. Do we need to change documentation on freeze_min_age to mention that it does not apply in all cases? I'm leaning towards not changing this as `VACUUM FREEZE` is also an exception to this, which I don't see mentioned. 2. Perhaps the documentation in maintenance.sgml should mention that the table will be vacuumed with the equivalent of having vacuum_freeze_min_age = 0, instead of: "Such a vacuum will aggressively freeze tuples." aggressive is the wrong word here. We call it an aggressive vacuum if we disable page skipping, not for setting the vacuum_freeze_min_age to 0. See heap_vacuum_rel() /* * We request an aggressive scan if the table's frozen Xid is now older * than or equal to the requested Xid full-table scan limit; or if the * table's minimum MultiXactId is older than or equal to the requested * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified. */ aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid, xidFullScanLimit); aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid, mxactFullScanLimit); if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) aggressive = true; 3. The following DEBUG3 elog should be updated to include the new values: elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)", NameStr(classForm->relname), vactuples, vacthresh, anltuples, anlthresh); Someone might be confused at why auto-vacuum is running if you don't put those in. 4. This would be nicer if you swapped the order of the operands to the < condition and replaced the operator with >. That'll match the way it is done above. /* * If the number of inserted tuples exceeds the threshold and no * vacuum is necessary for other reasons, run an "insert-only" vacuum * that freezes aggressively. */ if (!(*dovacuum) && vacinsthresh < tabentry->inserts_since_vacuum) { *dovacuum = true; *freeze_all = true; } It would also be nicer if you assigned the value of tabentry->inserts_since_vacuum to a variable, so as to match what the other code there is doing. That'll also make the change for #3 neater. 5. The following text: A threshold similar to the above is calculated from <xref linkend="guc-autovacuum-vacuum-insert-threshold"/> and <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>. Tables that have received more inserts than the calculated threshold since they were last vacuumed (and are not eligible for vacuuming for other reasons) will be vacuumed to reduce the impact of a future anti-wraparound vacuum run. I think "... will be vacuumed with the equivalent of having <xref linkend="guc-vacuum-freeze-min-age"/> set to <literal>0</literal>". I'm not sure we need to mention the reduction of impact to anti-wraparound vacuums. 6. Please run the regression tests and make sure they pass. The "rules" test is currently failing due to the new column in "pg_stat_all_tables" Apart from the above, does anyone else have objections or concerns with the patch? I'd like to take a serious look at pushing it once the above points are resolved.
On Tue, 2020-03-10 at 09:56 +1300, David Rowley wrote: > > > Lack of a scale_factor does leave people who regularly truncate their > > > "append-only" tables out in the cold a bit. Perhaps they'd like > > > index-only scans to kick in soon after they truncate without having to > > > wait for 10 million tuples, or so. > > > > That point I don't see. > > Truncating a table resets the counters to 0. > > The scenario there is that if we don't have any > autovacuum_vacuum_insert_scale_factor and we set the threshold to 10 > million tuples. The user truncates the table on a monthly basis and > nearer to the end of the month the tuples accumulates around 100 > million tuples, roughly 3.2 million are inserted per day, so > auto-vacuum kicks in for this table around once every 3 days. At the > start of the month, the table is truncated and it begins refilling. > The n_ins_since_vacuum is reset to 0 during the truncate. Meanwhile, > the table is being queried constantly and it takes 3 days for us to > vacuum the table again. Queries hitting the table are unable to use > Index Only Scans for 3 days. The DBAs don't have a lot of control > over this. > > I think we can help users with that by giving them a bit more control > over when auto-vacuum will run for the table. scale_factor and > threshold. Oh, that's a good point. I only thought about anti-wraparound vacuum, but the feature might be useful for index-only scans as well. Yours, Laurenz Albe
> +++ b/src/backend/utils/misc/postgresql.conf.sample > +#autovacuum_vacuum_insert_threshold = 10000000 # min number of row inserts > + # before vacuum Similar to a previous comment [0] about reloptions or GUC: Can we say "threshold number of insertions before vacuum" ? ..or "maximum number of insertions before triggering autovacuum" -- Justin [0] https://www.postgresql.org/message-id/602873766faa0e9200a60dcc26dc10c636761d5d.camel%40cybertec.at
On Fri, 6 Mar 2020 at 23:46, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Thanks, Justin, for the review. > I have applied the changes where still applicable. > > On Fri, 2020-03-06 at 10:52 +1300, David Rowley wrote: > > On Fri, 6 Mar 2020 at 03:27, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > On Thu, 2020-03-05 at 19:40 +1300, David Rowley wrote: > > > > 1. I'd go for 2 new GUCs and reloptions. > > > > autovacuum_vacuum_insert_threshold (you're currently calling this > > > > autovacuum_vacuum_insert_limit. I don't see why the word "limit" is > > > > relevant here). The other GUC I think should be named > > > > autovacuum_vacuum_insert_scale_factor and these should work exactly > > > > the same way as autovacuum_vacuum_threshold and > > > > autovacuum_vacuum_scale_factor, but be applied in a similar way to the > > > > vacuum settings, but only be applied after we've checked to ensure the > > > > table is not otherwise eligible to be vacuumed. > > > > > > I disagree about the scale_factor (and have not added it to the > > > updated version of the patch). If we have a scale_factor, then the > > > time between successive autovacuum runs would increase as the table > > > gets bigger, which defeats the purpose of reducing the impact of each > > > autovacuum run. > > > > My view here is not really to debate what logically makes the most > > sense. I don't really think for a minute that the current > > auto-vacuums scale_factor and thresholds are perfect for the job. It's > > true that the larger a table becomes, the less often it'll be > > vacuumed, but these are control knobs that people have become > > accustomed to and I don't really think that making an exception for > > this is warranted. Perhaps we can zero out the scale factor by > > default and set the threshold into the millions of tuples. We can have > > people chime in on what they think about that and why once the code is > > written and even perhaps committed. > > Ok, I submit. My main desire was to keep the number of new GUCs as > low as reasonably possible, but making the feature tunable along the > known and "trusted" lines may be a good thing. > > The new parameter is called "autovacuum_vacuum_insert_scale_factor". > > > Lack of a scale_factor does leave people who regularly truncate their > > "append-only" tables out in the cold a bit. Perhaps they'd like > > index-only scans to kick in soon after they truncate without having to > > wait for 10 million tuples, or so. > > That point I don't see. > Truncating a table resets the counters to 0. > > > > > 10. I'm slightly worried about the case where we don't quite trigger a > > > > normal vacuum but trigger a vacuum due to INSERTs then skip cleaning > > > > up the indexes but proceed to leave dead index entries causing indexes > > > > to become bloated. It does not seem impossible that given the right > > > > balance of INSERTs and UPDATE/DELETEs that this could happen every > > > > time and the indexes would just become larger and larger. > > > > > > Perhaps we can take care of the problem by *not* skipping index > > > cleanup if "changes_since_analyze" is substantially greater than 0. > > > > > > What do you think? > > > > Well, there is code that skips the index scans when there are 0 dead > > tuples found in the heap. If the table is truly INSERT-only then it > > won't do any harm since we'll skip the index scan anyway. I think > > it's less risky to clean the indexes. If we skip that then there will > > be a group of people will suffer from index bloat due to this, no > > matter if they realise it or not. +1 FYI actually vacuum could perform index cleanup phase (i.g. PROGRESS_VACUUM_PHASE_INDEX_CLEANUP phase) on a table even if it's a truly INSERT-only table, depending on vacuum_cleanup_index_scale_factor. Anyway, I also agree with not disabling index cleanup in insert-only vacuum case, because it could become not only a cause of index bloat but also a big performance issue. For example, if autovacuum on a table always run without index cleanup, gin index on that table will accumulate insertion tuples in its pending list and will be cleaned up by a backend process while inserting new tuple, not by a autovacuum process. We can disable index vacuum by index_cleanup storage parameter per tables, so it would be better to defer these settings to users. I have one question about this patch from architectural perspective: have you considered to use autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor also for this purpose? That is, we compare the threshold computed by these values to not only the number of dead tuples but also the number of inserted tuples. If the number of dead tuples exceeds the threshold, we trigger autovacuum as usual. On the other hand if the number of inserted tuples exceeds, we trigger autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user consider the settings of newly added two parameters. We will have in total 4 parameters. Amit also was concerned about that[1]. I think this idea also works fine. In insert-only table case, since only the number of inserted tuples gets increased, only one threshold (that is, threshold computed by autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And in mostly-insert table case, in the first place, we can trigger autovacuum even in current PostgreSQL, since we have some dead tuples. But if we want to trigger autovacuum more frequently by the number of newly inserted tuples, we can set that threshold lower while considering only the number of inserted tuples. And I briefly looked at this patch: @@ -2889,7 +2898,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, tab->at_params.truncate = VACOPT_TERNARY_DEFAULT; /* As of now, we don't support parallel vacuum for autovacuum */ tab->at_params.nworkers = -1; - tab->at_params.freeze_min_age = freeze_min_age; + tab->at_params.freeze_min_age = freeze_all ? 0 : freeze_min_age; tab->at_params.freeze_table_age = freeze_table_age; tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age; tab->at_params.multixact_freeze_table_age = multixact_freeze_table_age; I think we can set multixact_freeze_min_age to 0 as well. Regards, [1] https://www.postgresql.org/message-id/CAA4eK1%2BrCxS_Pg4GdSa6G8ESOTHK%2BjDVgqYd_dnO07rGNaewKA%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 2020-03-10 at 13:53 +1300, David Rowley wrote: > 1. Do we need to change documentation on freeze_min_age to mention > that it does not apply in all cases? I'm leaning towards not changing > this as `VACUUM FREEZE` is also an exception to this, which I don't > see mentioned. I agree with that. Too little documentation is bad, but too much of it can also confuse and make it hard to find the needle in the haystack. > 2. Perhaps the documentation in maintenance.sgml should mention that > the table will be vacuumed with the equivalent of having > vacuum_freeze_min_age = 0, instead of: > > "Such a vacuum will aggressively freeze tuples." > > aggressive is the wrong word here. We call it an aggressive vacuum if > we disable page skipping, not for setting the vacuum_freeze_min_age to > 0. Agreed, see below. > 3. The following DEBUG3 elog should be updated to include the new values: > > elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)", > NameStr(classForm->relname), > vactuples, vacthresh, anltuples, anlthresh); Done. > Someone might be confused at why auto-vacuum is running if you don't > put those in. > > 4. This would be nicer if you swapped the order of the operands to the > < condition and replaced the operator with >. That'll match the way it > is done above. > > /* > * If the number of inserted tuples exceeds the threshold and no > * vacuum is necessary for other reasons, run an "insert-only" vacuum > * that freezes aggressively. > */ > if (!(*dovacuum) && vacinsthresh < tabentry->inserts_since_vacuum) > { > *dovacuum = true; > *freeze_all = true; > } > > It would also be nicer if you assigned the value of > tabentry->inserts_since_vacuum to a variable, so as to match what the > other code there is doing. That'll also make the change for #3 neater. Changed that way. > 5. The following text: > > A threshold similar to the above is calculated from > <xref linkend="guc-autovacuum-vacuum-insert-threshold"/> and > <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>. > Tables that have received more inserts than the calculated threshold > since they were last vacuumed (and are not eligible for vacuuming for > other reasons) will be vacuumed to reduce the impact of a future > anti-wraparound vacuum run. > > I think "... will be vacuumed with the equivalent of having <xref > linkend="guc-vacuum-freeze-min-age"/> set to <literal>0</literal>". > I'm not sure we need to mention the reduction of impact to > anti-wraparound vacuums. Done like that. I left in the explanation of the purpose of this setting. Understanding the purpose of the GUCs will make it easier to tune them correctly. > 6. Please run the regression tests and make sure they pass. The > "rules" test is currently failing due to the new column in > "pg_stat_all_tables" Oops, sorry. I ran pgindent, but forgot to re-run the regression tests. Done. Attached is V5, which also fixes the bug discovered my Masahiko Sawada. He made an interesting suggestion which we should consider before committing. Yours, Laurenz Albe
Attachment
On Tue, 2020-03-10 at 00:00 -0500, Justin Pryzby wrote: > > +++ b/src/backend/utils/misc/postgresql.conf.sample > > +#autovacuum_vacuum_insert_threshold = 10000000 # min number of row inserts > > + # before vacuum > > Similar to a previous comment [0] about reloptions or GUC: > > Can we say "threshold number of insertions before vacuum" ? > ..or "maximum number of insertions before triggering autovacuum" Hmm. I copied the wording from "autovacuum_vacuum_threshold". Since the parameters have similar semantics, a different wording would confuse. Yours, Laurenz Albe
On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote: Thanks for the review and your thoughts! > FYI actually vacuum could perform index cleanup phase (i.g. > PROGRESS_VACUUM_PHASE_INDEX_CLEANUP phase) on a table even if it's a > truly INSERT-only table, depending on > vacuum_cleanup_index_scale_factor. Anyway, I also agree with not > disabling index cleanup in insert-only vacuum case, because it could > become not only a cause of index bloat but also a big performance > issue. For example, if autovacuum on a table always run without index > cleanup, gin index on that table will accumulate insertion tuples in > its pending list and will be cleaned up by a backend process while > inserting new tuple, not by a autovacuum process. We can disable index > vacuum by index_cleanup storage parameter per tables, so it would be > better to defer these settings to users. Thanks for the confirmation. > I have one question about this patch from architectural perspective: > have you considered to use autovacuum_vacuum_threshold and > autovacuum_vacuum_scale_factor also for this purpose? That is, we > compare the threshold computed by these values to not only the number > of dead tuples but also the number of inserted tuples. If the number > of dead tuples exceeds the threshold, we trigger autovacuum as usual. > On the other hand if the number of inserted tuples exceeds, we trigger > autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user > consider the settings of newly added two parameters. We will have in > total 4 parameters. Amit also was concerned about that[1]. > > I think this idea also works fine. In insert-only table case, since > only the number of inserted tuples gets increased, only one threshold > (that is, threshold computed by autovacuum_vacuum_threshold and > autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And > in mostly-insert table case, in the first place, we can trigger > autovacuum even in current PostgreSQL, since we have some dead tuples. > But if we want to trigger autovacuum more frequently by the number of > newly inserted tuples, we can set that threshold lower while > considering only the number of inserted tuples. I am torn. On the one hand it would be wonderful not to have to add yet more GUCs to the already complicated autovacuum configuration. It already confuses too many users. On the other hand that will lead to unnecessary vacuums for small tables. Worse, the progression caused by the comparatively large scale factor may make it vacuum large tables too seldom. I'd be grateful if somebody knowledgeable could throw his or her opinion into the scales. > And I briefly looked at this patch: > > @@ -2889,7 +2898,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, > tab->at_params.truncate = VACOPT_TERNARY_DEFAULT; > /* As of now, we don't support parallel vacuum for autovacuum */ > tab->at_params.nworkers = -1; > - tab->at_params.freeze_min_age = freeze_min_age; > + tab->at_params.freeze_min_age = freeze_all ? 0 : freeze_min_age; > tab->at_params.freeze_table_age = freeze_table_age; > tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age; > tab->at_params.multixact_freeze_table_age = multixact_freeze_table_age; > > I think we can set multixact_freeze_min_age to 0 as well. Ugh, yes, that is a clear oversight. I have fixed it in the latest version. Yours, Laurenz Albe
On Wed, 11 Mar 2020 at 08:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote: > > I have one question about this patch from architectural perspective: > > have you considered to use autovacuum_vacuum_threshold and > > autovacuum_vacuum_scale_factor also for this purpose? That is, we > > compare the threshold computed by these values to not only the number > > of dead tuples but also the number of inserted tuples. If the number > > of dead tuples exceeds the threshold, we trigger autovacuum as usual. > > On the other hand if the number of inserted tuples exceeds, we trigger > > autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user > > consider the settings of newly added two parameters. We will have in > > total 4 parameters. Amit also was concerned about that[1]. > > > > I think this idea also works fine. In insert-only table case, since > > only the number of inserted tuples gets increased, only one threshold > > (that is, threshold computed by autovacuum_vacuum_threshold and > > autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And > > in mostly-insert table case, in the first place, we can trigger > > autovacuum even in current PostgreSQL, since we have some dead tuples. > > But if we want to trigger autovacuum more frequently by the number of > > newly inserted tuples, we can set that threshold lower while > > considering only the number of inserted tuples. > > I am torn. > > On the one hand it would be wonderful not to have to add yet more GUCs > to the already complicated autovacuum configuration. It already confuses > too many users. > > On the other hand that will lead to unnecessary vacuums for small > tables. > Worse, the progression caused by the comparatively large scale > factor may make it vacuum large tables too seldom. I think we really need to discuss what the default values for these INSERT-only vacuums should be before we can decide if we need 2 further GUCc to control the feature. Right now the default is 0.0 on the scale factor and 10 million tuples threshold. I'm not saying those are good or bad values, but if they are good, then they're pretty different from the normal threshold of 50 and the normal scale factor of 0.2, therefore (assuming the delete/update thresholds are also good), then we need the additional GUCs. If someone wants to put forward a case for making the defaults more similar, then perhaps we can consider merging the options. One case might be the fact that we want INSERT-only tables to benefit from Index Only Scans more often than after 10 million inserts. As for pros and cons. Feel free to add to the following list: For new GUCs/reloptions: 1. Gives users more control over this new auto-vacuum behaviour 2. The new feature can be completely disabled. This might be very useful for people who suffer from auto-vacuum starvation. Against new GUCs/reloptions: 1. Adds more code, documentation and maintenance. 2. Adds more complexity to auto-vacuum configuration. As for my opinion, I'm leaning towards keeping the additional options. I think if we were just adding auto-vacuum to core code now, then I'd be voting to keep the configuration as simple as possible. However, that's far from the case, and we do have over a decade of people that have gotten used to how auto-vacuum currently behaves. Many people are unlikely to even notice the change, but some will, and then there will be another group of people who want to turn it off, and that group might be upset when we tell them that they can't, at least not without flipping the big red "autovacuum" switch into the off position (of which, I'm pretty hesitant to recommend that anyone ever does).
On Wed, 11 Mar 2020 at 04:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2020-03-10 at 18:14 +0900, Masahiko Sawada wrote: > > Thanks for the review and your thoughts! > > > FYI actually vacuum could perform index cleanup phase (i.g. > > PROGRESS_VACUUM_PHASE_INDEX_CLEANUP phase) on a table even if it's a > > truly INSERT-only table, depending on > > vacuum_cleanup_index_scale_factor. Anyway, I also agree with not > > disabling index cleanup in insert-only vacuum case, because it could > > become not only a cause of index bloat but also a big performance > > issue. For example, if autovacuum on a table always run without index > > cleanup, gin index on that table will accumulate insertion tuples in > > its pending list and will be cleaned up by a backend process while > > inserting new tuple, not by a autovacuum process. We can disable index > > vacuum by index_cleanup storage parameter per tables, so it would be > > better to defer these settings to users. > > Thanks for the confirmation. > > > I have one question about this patch from architectural perspective: > > have you considered to use autovacuum_vacuum_threshold and > > autovacuum_vacuum_scale_factor also for this purpose? That is, we > > compare the threshold computed by these values to not only the number > > of dead tuples but also the number of inserted tuples. If the number > > of dead tuples exceeds the threshold, we trigger autovacuum as usual. > > On the other hand if the number of inserted tuples exceeds, we trigger > > autovacuum with vacuum_freeze_min_age = 0. I'm concerned that how user > > consider the settings of newly added two parameters. We will have in > > total 4 parameters. Amit also was concerned about that[1]. > > > > I think this idea also works fine. In insert-only table case, since > > only the number of inserted tuples gets increased, only one threshold > > (that is, threshold computed by autovacuum_vacuum_threshold and > > autovacuum_vacuum_scale_factor) is enough to trigger autovacuum. And > > in mostly-insert table case, in the first place, we can trigger > > autovacuum even in current PostgreSQL, since we have some dead tuples. > > But if we want to trigger autovacuum more frequently by the number of > > newly inserted tuples, we can set that threshold lower while > > considering only the number of inserted tuples. > > I am torn. > > On the one hand it would be wonderful not to have to add yet more GUCs > to the already complicated autovacuum configuration. It already confuses > too many users. > > On the other hand that will lead to unnecessary vacuums for small > tables. > Worse, the progression caused by the comparatively large scale > factor may make it vacuum large tables too seldom. > I might be missing your point but could you elaborate on that in what kind of case you think this lead to unnecessary vacuums? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 2020-03-11 at 12:00 +0900, Masahiko Sawada wrote: > > > I have one question about this patch from architectural perspective: > > > have you considered to use autovacuum_vacuum_threshold and > > > autovacuum_vacuum_scale_factor also for this purpose? > > > > I am torn. > > > > On the one hand it would be wonderful not to have to add yet more GUCs > > to the already complicated autovacuum configuration. It already confuses > > too many users. > > > > On the other hand that will lead to unnecessary vacuums for small > > tables. > > Worse, the progression caused by the comparatively large scale > > factor may make it vacuum large tables too seldom. > > I might be missing your point but could you elaborate on that in what > kind of case you think this lead to unnecessary vacuums? If you have an insert-only table that has 100000 entries, it will get vacuumed roughly every 20000 new entries. The impact is probably too little to care, but it will increase the contention for the three autovacuum workers available by default. Yours, Laurenz Albe
On Wed, 11 Mar 2020 at 13:24, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Wed, 2020-03-11 at 12:00 +0900, Masahiko Sawada wrote: > > > > I have one question about this patch from architectural perspective: > > > > have you considered to use autovacuum_vacuum_threshold and > > > > autovacuum_vacuum_scale_factor also for this purpose? > > > > > > I am torn. > > > > > > On the one hand it would be wonderful not to have to add yet more GUCs > > > to the already complicated autovacuum configuration. It already confuses > > > too many users. > > > > > > On the other hand that will lead to unnecessary vacuums for small > > > tables. > > > Worse, the progression caused by the comparatively large scale > > > factor may make it vacuum large tables too seldom. > > > > I might be missing your point but could you elaborate on that in what > > kind of case you think this lead to unnecessary vacuums? > > If you have an insert-only table that has 100000 entries, it will get > vacuumed roughly every 20000 new entries. The impact is probably too > little to care, but it will increase the contention for the three > autovacuum workers available by default. The same is true for read-write table, right? If that becomes a problem, it's a mis-configuration and user should increase these values just like when we set these values for read-write tables. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 11 Mar 2020 at 17:24, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Wed, 2020-03-11 at 12:00 +0900, Masahiko Sawada wrote: > > I might be missing your point but could you elaborate on that in what > > kind of case you think this lead to unnecessary vacuums? > > If you have an insert-only table that has 100000 entries, it will get > vacuumed roughly every 20000 new entries. The impact is probably too > little to care, but it will increase the contention for the three > autovacuum workers available by default. I guess that depends on your definition of unnecessary. If you want Index Only Scans, then those settings don't seem unreasonable. If you want it just to reduce the chances or impact of an anti-wraparound vacuum then likely it's a bit too often. I understand this patch was born due to the anti-wraparound case, but should we really just ignore the Index Only Scan case?
On Wed, 11 Mar 2020 at 19:00, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 11 Mar 2020 at 13:24, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > If you have an insert-only table that has 100000 entries, it will get > > vacuumed roughly every 20000 new entries. The impact is probably too > > little to care, but it will increase the contention for the three > > autovacuum workers available by default. > > The same is true for read-write table, right? If that becomes a > problem, it's a mis-configuration and user should increase these > values just like when we set these values for read-write tables. It is true that if vacuum has more to do than it can do, then something is not configured correctly. I imagine Laurenz set the scale factor to 0.0 and the threshold to 10 million to reduce the chances that someone will encounter that problem. I mentioned somewhere upthread that commonly used to see production servers running with the standard vacuum_cost_limit of 200 and the (pre-PG12) autovacuum_vacuum_cost_delay of 20. Generally, it didn't go well for them. autovacuum_vacuum_cost_delay is now 2 by default, so auto-vacuum in PG12 and beyond runs 10x faster, but it's still pretty conservative and it'll still need another bump in several years when hardware is faster than it is today. So, by no means did that 10x increase mean that nobody will suffer from auto-vacuum starvation ever again. Now, perhaps it remains to be seen if adding additional work onto auto-vacuum will help or hinder those people. If their auto-vacuum can just keep up until the cluster is old enough to need anti-wraparound vacuums and then falls massively behind, then perhaps this is a good thing as they might notice at some point before their server explodes in the middle of the night. By that time they might have become complacent. Additionally, I think this is pretty well aligned to the case mentioned in the subject line of this email. We now have a freeze map, so performing vacuums to freeze tuples twice as often is not really much more expensive in total than doing that vacuuming half as often. Even tables (e.g log tables) that are never queried won't become much more costly to maintain. In the meantime, for tables that do receive queries, then we're more likely to get an index-only scan. Perhaps a good way to decide what the scale_factor should be set to should depend on the run-time of an Index Only Scan, vs an Index Scan. create table ios (a int, b text); insert into ios select x,x::text from generate_series(1,1000000)x; create index on ios (a); vacuum analyze ios; explain (analyze, buffers) select a from ios order by a; -- on 2nd exec QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using ios_a_idx on ios (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.035..212.602 rows=1000000 loops=1) Heap Fetches: 0 Buffers: shared hit=2736 Planning Time: 0.095 ms Execution Time: 246.864 ms (5 rows) set enable_indexonlyscan=0; explain (analyze, buffers) select a from ios order by a; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Index Scan using ios_a_idx on ios (cost=0.42..31388.42 rows=1000000 width=4) (actual time=0.036..451.381 rows=1000000 loops=1) Buffers: shared hit=8140 Planning Time: 0.089 ms Execution Time: 486.582 ms (4 rows) So about twice as fast with the IOS. When it's going to be beneficial to perform the vacuum will depend on the reads to insert ratio. I'm starting to think that we should set the scale_factor to something like 0.3 and the threshold to 50. Is anyone strongly against that? Or Laurenz, are you really set on the 10 million threshold?
On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote: > I'm starting to think that we should set the scale_factor to something > like 0.3 and the threshold to 50. Is anyone strongly against that? Or > Laurenz, are you really set on the 10 million threshold? These values are almost the same as "autovacuum_vacuum_scale_factor" and "autovacuum_vacuum_threshold", so you actually agree with Masahiko with the exception that you want it tunable separately. I don't like the high scale factor. If your insert-only table was last vacuumed when it had 500 million rows, the next autovacuum will freeze 150 million tuples, which is a lot. The impact will be less than that of an anti-wraparound vacuum because it is not as persistent, but if our 150 million tuple autovacuum backs down because it hits a lock or gets killed by the DBA, that is also not good, since it will just come again. And the bigger the vacuum run is, the more likely it is to meet an obstacle. So I think that large insert-only tables should be vacuumed more often than that. If the number of tuples that have to be frozen is small, the vacuum run will be short and is less likely to cause problems. That is why I chose a scale factor of 0 here. But I totally see your point about index-only scans. I think the problem is that this insert-only autovacuum serves two masters: 1. preventing massive anti-wraparound vacuum that severely impacts the system 2. maintaining the visibility map for index-only scans I thought of the first case when I chose the parameter values. I am afraid that we cannot come up with one setting that fits all, so I advocate a setting that targets the first problem, which I think is more important (and was the motivation for this thread). I could add a paragraph to the documentation that tells people how to configure the parameters if they want to use it to get index-only scans. Yours, Laurenz Albe
On Thu, 12 Mar 2020 at 18:38, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote: > > Laurenz, are you really set on the 10 million threshold? > > These values are almost the same as "autovacuum_vacuum_scale_factor" > and "autovacuum_vacuum_threshold", so you actually agree with Masahiko > with the exception that you want it tunable separately. > > I don't like the high scale factor. > > If your insert-only table was last vacuumed when it had 500 million rows, > the next autovacuum will freeze 150 million tuples, which is a lot. > The impact will be less than that of an anti-wraparound vacuum because > it is not as persistent, but if our 150 million tuple autovacuum backs > down because it hits a lock or gets killed by the DBA, that is also not > good, since it will just come again. > And the bigger the vacuum run is, the more likely it is to meet an obstacle. > > So I think that large insert-only tables should be vacuumed more often > than that. If the number of tuples that have to be frozen is small, > the vacuum run will be short and is less likely to cause problems. > That is why I chose a scale factor of 0 here. That's a good point. If those 150 million inserts were done one per transaction, then it wouldn't take many more tuples before wraparound vacuums occur more often than insert vacuums. The only way I see around that is to a) configure it the way you'd like, or; b) add yet another GUC and reloption to represent how close to autovacuum_freeze_max_age / autovacuum_multixact_freeze_max_age the table is. I'm not very excited about adding yet another GUC, plus anti-wraparound vacuums already occur 10 times more often than they need to. If we added such a GUC and set it to, say, 0.1, then they'd happen 100 times more often than needed before actual wraparound occurs. I'm starting to see now why you were opposed to the scale_factor in the first place. I really think that this is really a problem with the design of the threshold and scale_factor system. I used to commonly see people with larger tables zeroing out the scale_factor and setting a reasonable threshold or dropping the scale_factor down to some fraction of a percent. I don't really have any better design in mind though, at least not one that does not require adding new vacuum options.
On Thu, 12 Mar 2020 at 14:38, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote: > > I'm starting to think that we should set the scale_factor to something > > like 0.3 and the threshold to 50. Is anyone strongly against that? Or > > Laurenz, are you really set on the 10 million threshold? > > These values are almost the same as "autovacuum_vacuum_scale_factor" > and "autovacuum_vacuum_threshold", so you actually agree with Masahiko > with the exception that you want it tunable separately. > > I don't like the high scale factor. > > If your insert-only table was last vacuumed when it had 500 million rows, > the next autovacuum will freeze 150 million tuples, which is a lot. > The impact will be less than that of an anti-wraparound vacuum because > it is not as persistent, but if our 150 million tuple autovacuum backs > down because it hits a lock or gets killed by the DBA, that is also not > good, since it will just come again. > And the bigger the vacuum run is, the more likely it is to meet an obstacle. > > So I think that large insert-only tables should be vacuumed more often > than that. If the number of tuples that have to be frozen is small, > the vacuum run will be short and is less likely to cause problems. > That is why I chose a scale factor of 0 here. The reason why you want to add new GUC parameters is to use different default values for insert-update table case and insert-only table case? I think I understand the pros and cons of adding separate parameters, but I still cannot understand use cases where we cannot handle without separate parameters. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 12 Mar 2020 at 19:50, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > The reason why you want to add new GUC parameters is to use different > default values for insert-update table case and insert-only table > case? Yes, but in particular so it can be completely disabled easily. > I think I understand the pros and cons of adding separate > parameters, but I still cannot understand use cases where we cannot > handle without separate parameters. That's a lot of negatives. I think I understand that you don't feel that additional GUCs are worth it? Laurenz highlighted a seemingly very valid reason that the current GUCs cannot be reused. Namely, say the table has 1 billion rows, if we use the current scale factor of 0.2, then we'll run an insert-only vacuum every 200 million rows. If those INSERTs are one per transaction then the new feature does nothing as the wraparound vacuum will run instead. Since this feature was born due to large insert-only tables, this concern seems very valid to me.
On Thu, 12 Mar 2020 at 16:28, David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 12 Mar 2020 at 19:50, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > The reason why you want to add new GUC parameters is to use different > > default values for insert-update table case and insert-only table > > case? > > Yes, but in particular so it can be completely disabled easily. > > > I think I understand the pros and cons of adding separate > > parameters, but I still cannot understand use cases where we cannot > > handle without separate parameters. > > That's a lot of negatives. I think I understand that you don't feel > that additional GUCs are worth it? > > Laurenz highlighted a seemingly very valid reason that the current > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we > use the current scale factor of 0.2, then we'll run an insert-only > vacuum every 200 million rows. If those INSERTs are one per > transaction then the new feature does nothing as the wraparound vacuum > will run instead. Since this feature was born due to large insert-only > tables, this concern seems very valid to me. Yeah, I understand and agree that since most people would use default values we can reduce mis-configuration cases by adding separate GUCs that have appropriate default values for that purpose but on the other hand I'm not sure it's worth that we cover the large insert-only table case by adding separate GUCs in spite of being able to cover it even by existing two GUCs. If we want to disable this feature on the particular table, we can have a storage parameter that means not to consider the number of inserted tuples rather than having multiple GUCs that allows us to fine tuning. And IIUC even in the above case, I think that if we trigger insert-only vacuum by comparing the number of inserted tuples to the threshold computed by existing threshold and scale factor, we can cover it. But since you and Laurenz already agreed to adding two GUCs I'm not going to insist on that. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 13 Mar 2020 at 01:43, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 12 Mar 2020 at 16:28, David Rowley <dgrowleyml@gmail.com> wrote: > > Laurenz highlighted a seemingly very valid reason that the current > > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we > > use the current scale factor of 0.2, then we'll run an insert-only > > vacuum every 200 million rows. If those INSERTs are one per > > transaction then the new feature does nothing as the wraparound vacuum > > will run instead. Since this feature was born due to large insert-only > > tables, this concern seems very valid to me. > > Yeah, I understand and agree that since most people would use default > values we can reduce mis-configuration cases by adding separate GUCs > that have appropriate default values for that purpose but on the other > hand I'm not sure it's worth that we cover the large insert-only table > case by adding separate GUCs in spite of being able to cover it even > by existing two GUCs. In light of the case above, do you have an alternative suggestion? > If we want to disable this feature on the > particular table, we can have a storage parameter that means not to > consider the number of inserted tuples rather than having multiple > GUCs that allows us to fine tuning. And IIUC even in the above case, I > think that if we trigger insert-only vacuum by comparing the number of > inserted tuples to the threshold computed by existing threshold and > scale factor, we can cover it. So you're suggesting we drive the insert-vacuums from existing scale_factor and threshold? What about the 1 billion row table example above?
On Fri, 2020-03-13 at 09:10 +1300, David Rowley wrote: > So you're suggesting we drive the insert-vacuums from existing > scale_factor and threshold? What about the 1 billion row table > example above? I am still not 100% certain if that is really realistic. Transactions that insert only a single row are probably the exception in large insert-only tables. But I think that we probably always can find a case where any given parameter setting is not so great, so in order to get ahead let's decide on something that is not right out stupid. Changing the defaults later is always an option. So the three options are: 1. introduce no new parameters and trigger autovacuum if the number of inserts exceeds the regular vacuum threshold. 2. introduce the new parameters with high base threshold and zero scale factor. 3. introduce the new parameters with low base threshold and high scale factor. I think all three are viable. If nobody else wants to weigh in, throw a coin. Yours, Laurenz Albe
On Fri, Mar 13, 2020 at 3:19 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Fri, 2020-03-13 at 09:10 +1300, David Rowley wrote: > > So you're suggesting we drive the insert-vacuums from existing > > scale_factor and threshold? What about the 1 billion row table > > example above? > > I am still not 100% certain if that is really realistic. > Transactions that insert only a single row are probably the > exception in large insert-only tables. > > But I think that we probably always can find a case where any given > parameter setting is not so great, so in order to get ahead > let's decide on something that is not right out stupid. > Changing the defaults later is always an option. > > So the three options are: > > 1. introduce no new parameters and trigger autovacuum if the number > of inserts exceeds the regular vacuum threshold. > > 2. introduce the new parameters with high base threshold and zero scale factor. Both of these look good to me. 1 is approach in my initial patch sketch, 2 is approach taken by Laurenz. Values I think in when considering vacuum is "how many megabytes of table aren't frozen/visible" (since that's what translates into processing time knowing io limits of storage), and "how many pages aren't yet vacuumed". Threshold in Laurenz's patch was good enough for my taste - it's basically "vacuum after every gigabyte", and that's exactly what we implemented when working around this issue manually. There's enough chance that latest gigabyte is in RAM and vacuum will be super fast on it; reading a gigabyte of data is not a showstopper for most contemporary physical and cloud environments I can think of. If reading a gigabyte is a problem already then wraparound is a guaranteed disaster. About index only scan, this threshold seems good enough too. There's a good chance last gig is already in RAM, and previous data was processed with previous vacuum. Anyway - with this patch Index Only Scan starts actually working :) I'd vote for 2 with a note "rip it off all together later and redesign scale factors and thresholds system to something more easily graspable". Whoever needs to cancel the new behavior for some reason will have a knob then, and patch is laid out already. > 3. introduce the new parameters with low base threshold and high scale factor. This looks bad to me. "the bigger the table, the longer we wait" does not look good for me for something designed as a measure preventing issues with big tables. > I think all three are viable. > If nobody else wants to weigh in, throw a coin. > > Yours, > Laurenz Albe > -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
On Wed, Mar 11, 2020 at 10:32:47AM +1300, David Rowley wrote: > 2. The new feature can be completely disabled. This might be very > useful for people who suffer from auto-vacuum starvation. On Thu, Mar 12, 2020 at 08:28:05PM +1300, David Rowley wrote: > Yes, but in particular so it can be completely disabled easily. How is it disabled ? By setting scale_factor=100 ? + { + "autovacuum_vacuum_insert_scale_factor", + "Number of tuple inserts prior to vacuum as a fraction of reltuples", + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, + ShareUpdateExclusiveLock + }, + -1, 0.0, 100.0 Note, vacuum_cleanup_index_scale_factor uses max: 1e10 See 4d54543efa5eb074ead4d0fadb2af4161c943044 -- Justin
On Fri, 2020-03-13 at 12:05 +0300, Darafei "Komяpa" Praliaskouski wrote: > 1. introduce no new parameters and trigger autovacuum if the number > > of inserts exceeds the regular vacuum threshold. > > > > 2. introduce the new parameters with high base threshold and zero scale factor. > > Both of these look good to me. 1 is approach in my initial patch > sketch, 2 is approach taken by Laurenz. > Values I think in when considering vacuum is "how many megabytes of > table aren't frozen/visible" (since that's what translates into > processing time knowing io limits of storage), and "how many pages > aren't yet vacuumed". > > Threshold in Laurenz's patch was good enough for my taste - it's > basically "vacuum after every gigabyte", and that's exactly what we > implemented when working around this issue manually. There's enough > chance that latest gigabyte is in RAM and vacuum will be super fast on > it; reading a gigabyte of data is not a showstopper for most > contemporary physical and cloud environments I can think of. If > reading a gigabyte is a problem already then wraparound is a > guaranteed disaster. > > About index only scan, this threshold seems good enough too. There's a > good chance last gig is already in RAM, and previous data was > processed with previous vacuum. Anyway - with this patch Index Only > Scan starts actually working :) > > I'd vote for 2 with a note "rip it off all together later and redesign > scale factors and thresholds system to something more easily > graspable". Whoever needs to cancel the new behavior for some reason > will have a knob then, and patch is laid out already. > > > 3. introduce the new parameters with low base threshold and high scale factor. > > This looks bad to me. "the bigger the table, the longer we wait" does > not look good for me for something designed as a measure preventing > issues with big tables. Thanks for the feedback. It looks like we have a loose consensus on #2, i.e. my patch. Yours, Laurenz Albe
On Fri, 2020-03-13 at 07:00 -0500, Justin Pryzby wrote: > > 2. The new feature can be completely disabled. This might be very > > useful for people who suffer from auto-vacuum starvation. > > > Yes, but in particular so it can be completely disabled easily. > > How is it disabled ? By setting scale_factor=100 ? > > + { > + "autovacuum_vacuum_insert_scale_factor", > + "Number of tuple inserts prior to vacuum as a fraction of reltuples", > + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, > + ShareUpdateExclusiveLock > + }, > + -1, 0.0, 100.0 > > Note, vacuum_cleanup_index_scale_factor uses max: 1e10 > See 4d54543efa5eb074ead4d0fadb2af4161c943044 By setting the threshold very high, or by setting the scale factor to 100. Yours, Laurenz Albe
On Tue, Mar 10, 2020 at 01:53:42PM +1300, David Rowley wrote: > 2. Perhaps the documentation in maintenance.sgml should mention that > the table will be vacuumed with the equivalent of having > vacuum_freeze_min_age = 0, instead of: > > "Such a vacuum will aggressively freeze tuples." > > aggressive is the wrong word here. We call it an aggressive vacuum if > we disable page skipping, not for setting the vacuum_freeze_min_age to > 0. Possible it would be better to run VACUUM *without* freeze_min_age=0 ? (I get confused and have to spend 20min re-reading the vacuum GUC docs every time I deal with this stuff, so maybe I'm off). As I understand, the initial motivation of this patch was to avoid disruptive anti-wraparound vacuums on insert-only table. But if vacuum were triggered at all, it would freeze the oldest tuples, which is all that's needed; especially since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need to be vacuumed again. Recently written tuples wouldn't be frozen, which is ok, they're handled next time. Another motivation of the patch is to allow indexonly scan, for which the planner looks at pages' "relallvisible" fraction (and at execution if a page isn't allvisible, visits the heap). Again, that happens if vacuum were run at all. Again, some pages won't be marked allvisible, which is fine, they're handled next time. I think freeze_min_age=0 could negatively affect people who have insert-mostly tables (I'm not concerned, but that includes us). If they consistently hit the autovacuum insert threshold before the cleanup threshold for updated/deleted tuples, any updated/deleted tuples would be frozen, which would be wasteful: |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause |VACUUM to do useless work: freezing a row version is a waste of time if the row |is modified soon thereafter (causing it to acquire a new XID). So the setting |should be large enough that rows are not frozen until they are unlikely to |change any more. So my question is if autovacuum triggered by insert threshold should trigger VACUUM with the same settings as a vacuum due to deleted tuples. I realize the DBA could just configure the thresholds so they'd hit vacuum for cleaning dead tuples, so my suggestion maybe just improves the case with the default settings. It's possible to set the reloption autovacuum_freeze_min_age, which I think supports the idea of running a vacuum normally and letting it (and the DBA) decide what do with existing logic. Also, there was a discussion about index cleanup with the conclusion that it was safer not to skip it, since otherwise indexes might bloat. I think that's right, since vacuum for cleanup is triggered by the number of dead heap tuples. To skip index cleanup, I think you'd want a metric for n_dead_since_index_cleanup. (Or maybe analyze could track dead index tuples and trigger vacuum of each index separately). Having now played with the patch, I'll suggest that 10000000 is too high a threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. -- Justin
Hi, On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote: > As I understand, the initial motivation of this patch was to avoid disruptive > anti-wraparound vacuums on insert-only table. But if vacuum were triggered at > all, it would freeze the oldest tuples, which is all that's needed; especially > since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need > to be vacuumed again. Recently written tuples wouldn't be frozen, which is ok, > they're handled next time. > > Another motivation of the patch is to allow indexonly scan, for which the > planner looks at pages' "relallvisible" fraction (and at execution if a page > isn't allvisible, visits the heap). Again, that happens if vacuum were run at > all. Again, some pages won't be marked allvisible, which is fine, they're > handled next time. > > I think freeze_min_age=0 could negatively affect people who have insert-mostly > tables (I'm not concerned, but that includes us). If they consistently hit the > autovacuum insert threshold before the cleanup threshold for updated/deleted > tuples, any updated/deleted tuples would be frozen, which would be > wasteful: I think that's a valid concern. > |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause > |VACUUM to do useless work: freezing a row version is a waste of time if the row > |is modified soon thereafter (causing it to acquire a new XID). So the setting > |should be large enough that rows are not frozen until they are unlikely to > |change any more. I think the overhead here might be a bit overstated. Once a page is dirtied (or already dirty) during vacuum, and we freeze a single row (necessating WAL logging), there's not really a good reason to not also freeze the rest of the row on that page. The added cost for freezing another row is miniscule compared to the "constant" cost of freezing anything on the page. It's of course different if there are otherwise no tuples worth freezing on the page (not uncommon). But there's really no reason for that to be the case: Afaict the only problem with more aggressively freezing when we touch (beyond hint bits) the page anyway is that we commonly end up with multiple WAL records for the same page: 1) lazy_scan_heap()->heap_page_prune() will log a XLOG_HEAP2_CLEAN record, but leave itemids in place most of the time 2) lazy_scan_heap()->log_heap_freeze() will log a XLOG_HEAP2_FREEZE_PAGE record 3a) if no indexes exist/index cleanup is disabled: lazy_vacuum_page()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN record, removing dead tuples (including itemids) 3b) if indexes need to be cleaned up, lazy_vacuum_heap()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN which is not nice. It likely is worth merging xl_heap_freeze_page into xl_heap_clean, and having heap pruning always freeze once it decides to dirty a page. We could probably always prune dead tuples as part of heap_prune_chain() if there's no indexes - but I'm doubtful it's worth it, since there'll be few tables with lots of dead tuples that don't have indexes. Merging 3b's WAL record would be harder, I think. There's also a significant source of additional WAL records here, one that I think should really not have been introduced: 4) HeapTupleSatisfiesVacuum() called both by heap_prune_chain(), and lazy_scan_heap() will often trigger a WAL record when the checksums or wal_log_hint_bits are enabled. If the page hasn't been modified in the current checkpoint window (extremely common for VACUUM, reasonably common for opportunistic pruning), we will log a full page write. Imo this really should have been avoided when checksums were added, that's a pretty substantial and unnecessary increase in overhead. It's probably overkill to tie fixing the 'insert only' case to improving the WAL logging for vacuuming / pruning. But it'd certainly would largely remove the tradeoff discussed here, by removing additional overhead of freezing in tables that are also updated. > Also, there was a discussion about index cleanup with the conclusion that it > was safer not to skip it, since otherwise indexes might bloat. I think that's > right, since vacuum for cleanup is triggered by the number of dead heap tuples. > To skip index cleanup, I think you'd want a metric for > n_dead_since_index_cleanup. (Or maybe analyze could track dead index tuples > and trigger vacuum of each index separately). > > Having now played with the patch, I'll suggest that 10000000 is too high a > threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be > much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. ISTM that the danger of regressing workloads due to suddenly repeatedly scanning huge indexes that previously were never / rarely scanned is significant (if there's a few dead tuples, otherwise most indexes will be able to skip the scan since the vacuum_cleanup_index_scale_factor introduction)). Greetings, Andres Freund
On Fri, 2020-03-13 at 13:44 -0500, Justin Pryzby wrote: > Possible it would be better to run VACUUM *without* freeze_min_age=0 ? (I get > confused and have to spend 20min re-reading the vacuum GUC docs every time I > deal with this stuff, so maybe I'm off). > > As I understand, the initial motivation of this patch was to avoid disruptive > anti-wraparound vacuums on insert-only table. But if vacuum were triggered at > all, it would freeze the oldest tuples, which is all that's needed; especially > since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need > to be vacuumed again. Recently written tuples wouldn't be frozen, which is ok, > they're handled next time. Freezing tuples too early is wasteful if the tuples get updated or deleted soon after, but based on the assumption that an autovacuum triggered by insert is dealing with an insert-mostly table, it is not that wasteful. If we didn't freeze all tuples, it is easy to envision a situation where bulk data loads load several million rows in a few transactions, which would trigger a vacuum. With the normal vacuum_freeze_min_age, that vacuum would do nothing at all. It is better if each vacuum freezes some rows, in other words, if it does some of the anti-wraparound work. > Another motivation of the patch is to allow indexonly scan, for which the > planner looks at pages' "relallvisible" fraction (and at execution if a page > isn't allvisible, visits the heap). Again, that happens if vacuum were run at > all. Again, some pages won't be marked allvisible, which is fine, they're > handled next time. Yes, freezing is irrelevant with respect to index only scans, but it helps with mitigating the impact of anti-wraparound vacuum runs. > I think freeze_min_age=0 could negatively affect people who have insert-mostly > tables (I'm not concerned, but that includes us). If they consistently hit the > autovacuum insert threshold before the cleanup threshold for updated/deleted > tuples, any updated/deleted tuples would be frozen, which would be > wasteful: I don't get that. Surely tuples whose xmax is committed won't be frozen. > So my question is if autovacuum triggered by insert threshold should trigger > VACUUM with the same settings as a vacuum due to deleted tuples. I realize the > DBA could just configure the thresholds so they'd hit vacuum for cleaning dead > tuples, so my suggestion maybe just improves the case with the default > settings. It's possible to set the reloption autovacuum_freeze_min_age, which > I think supports the idea of running a vacuum normally and letting it (and the > DBA) decide what do with existing logic. Yes, the DBA can explicitly set vacuum_freeze_min_age to 0. But for one DBA who understands his or her workload well enough, and who knows the workings of autovacuum well enough to do that kind of tuning, there are 99 DBAs who don't, and it is the goal of the patch (expressed in the subject) to make things work for those people who go with the default. And I believe that is better achieved with freezing as many tuples as possible. > Also, there was a discussion about index cleanup with the conclusion that it > was safer not to skip it, since otherwise indexes might bloat. I think that's > right, since vacuum for cleanup is triggered by the number of dead heap tuples. > To skip index cleanup, I think you'd want a metric for > n_dead_since_index_cleanup. (Or maybe analyze could track dead index tuples > and trigger vacuum of each index separately). Yes, I think we pretty much all agree on that. > Having now played with the patch, I'll suggest that 10000000 is too high a > threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be > much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. There is the concern that that might treat large table to seldom. I am curious - what were the findings that led you to think that 10000000 is too high? Yours, Laurenz Albe
On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote: > > |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause > > |VACUUM to do useless work: freezing a row version is a waste of time if the row > > |is modified soon thereafter (causing it to acquire a new XID). So the setting > > |should be large enough that rows are not frozen until they are unlikely to > > |change any more. > > I think the overhead here might be a bit overstated. Once a page is Could you clarify if you mean the language in docs in general or specifically in the context of this patch ? -- Justin
On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote: > > Having now played with the patch, I'll suggest that 10000000 is too high a > > threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be > > much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. > > ISTM that the danger of regressing workloads due to suddenly repeatedly > scanning huge indexes that previously were never / rarely scanned is > significant You're right - at one point, I was going to argue to skip index cleanup, and I think wrote that before I finished convincing myself why it wasn't ok to skip. -- Justin
On Fri, 13 Mar 2020 at 05:11, David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 13 Mar 2020 at 01:43, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 12 Mar 2020 at 16:28, David Rowley <dgrowleyml@gmail.com> wrote: > > > Laurenz highlighted a seemingly very valid reason that the current > > > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we > > > use the current scale factor of 0.2, then we'll run an insert-only > > > vacuum every 200 million rows. If those INSERTs are one per > > > transaction then the new feature does nothing as the wraparound vacuum > > > will run instead. Since this feature was born due to large insert-only > > > tables, this concern seems very valid to me. > > > > Yeah, I understand and agree that since most people would use default > > values we can reduce mis-configuration cases by adding separate GUCs > > that have appropriate default values for that purpose but on the other > > hand I'm not sure it's worth that we cover the large insert-only table > > case by adding separate GUCs in spite of being able to cover it even > > by existing two GUCs. > > In light of the case above, do you have an alternative suggestion? > > > If we want to disable this feature on the > > particular table, we can have a storage parameter that means not to > > consider the number of inserted tuples rather than having multiple > > GUCs that allows us to fine tuning. And IIUC even in the above case, I > > think that if we trigger insert-only vacuum by comparing the number of > > inserted tuples to the threshold computed by existing threshold and > > scale factor, we can cover it. > > So you're suggesting we drive the insert-vacuums from existing > scale_factor and threshold? What about the 1 billion row table > example above? My suggestion is the initial approach proposed by Justin; comparing the number of inserted tuples to the threshold computed by autovacuum_vacum_threshold and autovacuum_vacuum_scale_factor in order to trigger autovacuum. But as discussed, there is a downside; if the number of inserted tuples are almost the same as, but a little larger than, the number of dead tuples, we will trigger insert-only vacuum but it's wasteful. There is already a consensus on introducing new 2 parameters, but as the second idea I'd like to add one (or two) GUC(s) to my suggestion, say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio of the number of inserted tuples for total number of tuples modified and inserted, in order to trigger insert-only vacuum. For example, suppose the table has 1,000,000 tuples and we set threshold = 0, scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but we will instead trigger insert-only autovacuum, which is a vacuum with vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000 * 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples aggressively. If we want to trigger insert-only vacuum only on insert-only table we can set freeze_insert_ratio = 1.0. The down side of this idea is that we cannot disable autovacuum triggered by the number of inserted, although we might be able to introduce more one GUC that controls whether to include the number of inserted tuples for triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert = on|off). The pros of this idea would be that we can ensure that insert-only vacuum will run only in the case where the ratio of insertion is large enough. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 13, 2020 at 10:48:27PM +0100, Laurenz Albe wrote: > On Fri, 2020-03-13 at 13:44 -0500, Justin Pryzby wrote: > > Possible it would be better to run VACUUM *without* freeze_min_age=0 ? (I get > > confused and have to spend 20min re-reading the vacuum GUC docs every time I > > deal with this stuff, so maybe I'm off). > > > > As I understand, the initial motivation of this patch was to avoid disruptive > > anti-wraparound vacuums on insert-only table. But if vacuum were triggered at > > all, it would freeze the oldest tuples, which is all that's needed; especially > > since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need > > to be vacuumed again. Recently written tuples wouldn't be frozen, which is ok, > > they're handled next time. > > Freezing tuples too early is wasteful if the tuples get updated or deleted > soon after, but based on the assumption that an autovacuum triggered by insert > is dealing with an insert-mostly table, it is not that wasteful. You're right that it's not *that* wasteful. If it's a table that gets 90% inserts/10% updates, then only 10% of its tuples will be frozen. In the worst case, it's the same tuples every time, and that's somewhat wasteful. In the best case, those tuples are clustered on a small number of pages. -- Justin
On Mon, 2020-03-16 at 12:53 +0900, Masahiko Sawada wrote: > There is already a consensus on introducing new 2 parameters, but as > the second idea I'd like to add one (or two) GUC(s) to my suggestion, > say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio > of the number of inserted tuples for total number of tuples modified > and inserted, in order to trigger insert-only vacuum. For example, > suppose the table has 1,000,000 tuples and we set threshold = 0, > scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger > normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but > we will instead trigger insert-only autovacuum, which is a vacuum with > vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000 > * 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples > aggressively. If we want to trigger insert-only vacuum only on > insert-only table we can set freeze_insert_ratio = 1.0. The down side > of this idea is that we cannot disable autovacuum triggered by the > number of inserted, although we might be able to introduce more one > GUC that controls whether to include the number of inserted tuples for > triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert = > on|off). The pros of this idea would be that we can ensure that > insert-only vacuum will run only in the case where the ratio of > insertion is large enough. Two more parameters :^( But your reasoning is good. How about we go with what we have now and leave that for future discussion and patches? Yours, Laurenz Albe
On Mon, Mar 16, 2020 at 12:53:43PM +0900, Masahiko Sawada wrote: > There is already a consensus on introducing new 2 parameters, but as > the second idea I'd like to add one (or two) GUC(s) to my suggestion, > say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio > of the number of inserted tuples for total number of tuples modified > and inserted, in order to trigger insert-only vacuum. For example, > suppose the table has 1,000,000 tuples and we set threshold = 0, > scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger > normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but > we will instead trigger insert-only autovacuum, which is a vacuum with > vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000 > * 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples > aggressively. If we want to trigger insert-only vacuum only on > insert-only table we can set freeze_insert_ratio = 1.0. The down side > of this idea is that we cannot disable autovacuum triggered by the > number of inserted, although we might be able to introduce more one > GUC that controls whether to include the number of inserted tuples for > triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert = > on|off). The pros of this idea would be that we can ensure that > insert-only vacuum will run only in the case where the ratio of > insertion is large enough. I was thinking about something like this myself. I would appreciate keeping separate the thresholds for 1) triggering vacuum; and, 2) the options autovacuum uses when it runs (in this case, FREEZE). Someone might want autovacuum to run with FREEZE on a table vacuumed due to dead tuples (say, on a partitioned table), or might *not* want to run FREEZE on a table vacuumed due to insertions (maybe because index scans are too expensive or FREEZE makes it too slow). Normally, when someone complains about bad plan related to no index-onlyscan, we tell them to run vacuum, and if that helps, then ALTER TABLE .. SET (autovacuum_vacuum_scale_factor=0.005). If there's two thresholds (4 GUCs and 4 relopts) for autovacuum, then do we have to help determine which one was being hit, and which relopt to set? I wonder if the new insert GUCs should default to -1 (disabled)? And the insert thresholds should be set by new insert relopt (if set), or by new insert GUC (default -1), else normal relopt, or normal GUC. The defaults would give 50 + 0.20*n. When someone asks about IOS, we'd tell them to set autovacuum_vacuum_scale_factor=0.005, same as now. vac_ins_scale_factor = (relopts && relopts->vacuum_ins_scale_factor >= 0) ? relopts->vacuum_ins_scale_factor : autovacuum_vac_ins_scale >= 0 ? autovacuum_vac_ins_scale : (relopts && relopts->vacuum_scale_factor >= 0) ? relopts->vacuum_scale_factor : autovacuum_vac_scale; One would disable autovacuum triggered by insertions by setting autovacuum_vacuum_insert_scale_factor=1e10 (which I think should also be the max for this patch). It seems to me that the easy thing to do is to implement this initially without FREEZE (which is controlled by vacuum_freeze_table_age), and defer until July/v14 further discussion and implementation of another GUC/relopt for autovacuum freezing to be controlled by insert thresholds (or ratio). -- Justin
Hi, On 2020-03-13 19:10:00 -0500, Justin Pryzby wrote: > On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote: > > > |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause > > > |VACUUM to do useless work: freezing a row version is a waste of time if the row > > > |is modified soon thereafter (causing it to acquire a new XID). So the setting > > > |should be large enough that rows are not frozen until they are unlikely to > > > |change any more. > > > > I think the overhead here might be a bit overstated. Once a page is > > Could you clarify if you mean the language in docs in general or specifically > in the context of this patch ? In the docs. Greetings, Andres Freund
On Mon, 2020-03-16 at 07:47 -0500, Justin Pryzby wrote: > It seems to me that the easy thing to do is to implement this initially without > FREEZE (which is controlled by vacuum_freeze_table_age), and defer until > July/v14 further discussion and implementation of another GUC/relopt for > autovacuum freezing to be controlled by insert thresholds (or ratio). Freezing tuples is the point of this patch. As I have said, if you have a table where you insert many rows in few transactions, you would trigger an autovacuum that then ends up doing nothing because none of the rows have reached vacuum_freeze_table_age yet. Then some time later you will get a really large vacuum run. It seems to me that if we keep trying finding the formula that will vacuum every table just right and never so the wrong thing, we will never get to anything. Yours, Laurenz Albe
Hi, On 2020-03-16 20:49:43 +0100, Laurenz Albe wrote: > On Mon, 2020-03-16 at 07:47 -0500, Justin Pryzby wrote: > > It seems to me that the easy thing to do is to implement this initially without > > FREEZE (which is controlled by vacuum_freeze_table_age), and defer until > > July/v14 further discussion and implementation of another GUC/relopt for > > autovacuum freezing to be controlled by insert thresholds (or ratio). > > Freezing tuples is the point of this patch. Sure. But not hurting existing installation is also a goal of the patch. Since this is introducing potentially significant performance downsides, I think it's good to be a bit conservative with the default configuration. I'm gettin a bit more bullish on implementing some of what what I discussed in https://www.postgresql.org/message-id/20200313213851.ejrk5gptnmp65uoo%40alap3.anarazel.de at the same time as this patch. In particularl, I think it'd make sense to *not* have a lower freezing horizon for insert vacuums (because it *will* cause problems), but if the page is dirty anyway, then do the freezing even if freeze_min_age etc would otherwise prevent us from doing so? It'd probably be ok to incur the WAL logging overhead unconditionally, but I'm not sure about it. > As I have said, if you have a table where you insert many rows in few > transactions, you would trigger an autovacuum that then ends up doing nothing > because none of the rows have reached vacuum_freeze_table_age yet. > Then some time later you will get a really large vacuum run. Well, only if you don't further insert into the table. Which isn't that common a case for a table having a "really large vacuum run". Greetings, Andres Freund
On Mon, Mar 16, 2020 at 08:49:43PM +0100, Laurenz Albe wrote: > On Mon, 2020-03-16 at 07:47 -0500, Justin Pryzby wrote: > > It seems to me that the easy thing to do is to implement this initially without > > FREEZE (which is controlled by vacuum_freeze_table_age), and defer until > > July/v14 further discussion and implementation of another GUC/relopt for > > autovacuum freezing to be controlled by insert thresholds (or ratio). > > Freezing tuples is the point of this patch. > As I have said, if you have a table where you insert many rows in few > transactions, you would trigger an autovacuum that then ends up doing nothing > because none of the rows have reached vacuum_freeze_table_age yet. > > Then some time later you will get a really large vacuum run. Best practice is to vacuum following bulk load. I don't think this patch is going to change that. Bulk-loaded tuples will be autovacuumed, which is nice, but I don't think it'll be ideal if large bulk loads trigger an autovacuum with cost delays which ISTM if it runs with FREEZE will take even longer. If it's a bulk load, then I think it's okay to assume it was vacuumed, or otherwise that it'll eventually be hit by autovac at some later date. If it's not a "bulk load" but a normal runtime, and the table continues to receive inserts/deletes, then eventually it'll hit a vacuum threshold and tuples can be frozen. If it receives a bunch of activity, which then stops (like a partition of a table of timeseries data), then maybe it doesn't hit a vacuum threshold, until wraparound vacuum. I think in that case it's not catastrophic, since then it wasn't big enough to hit any threshold (it's partitioned). If every day, autovacuum kicks in and does wraparound vacuum on table with data from (say) 100 days ago, I think that's reasonable. One case which would suck is if the insert_threshold were 1e6, and you restore a DB with 1000 tables of historic data (which are no longer being inserted into) which have 9e5 rows each (just below the threshold). Then autovacuum will hit them all at once. The solution to that is to manual vacuum after bulk load, same as today. As a practical matter, some of the tables are likely to hit the autovacuum insert threshold, and some are likely to be pruned (or updated) before wraparound vacuum, so the patch usually does improve that case. -- Justin
On Mon, 2020-03-16 at 13:13 -0700, Andres Freund wrote: > > Freezing tuples is the point of this patch. > > Sure. But not hurting existing installation is also a goal of the > patch. Since this is introducing potentially significant performance > downsides, I think it's good to be a bit conservative with the default > configuration. > > I'm gettin a bit more bullish on implementing some of what what I > discussed in > https://www.postgresql.org/message-id/20200313213851.ejrk5gptnmp65uoo%40alap3.anarazel.de > at the same time as this patch. > > In particularl, I think it'd make sense to *not* have a lower freezing > horizon for insert vacuums (because it *will* cause problems), but if > the page is dirty anyway, then do the freezing even if freeze_min_age > etc would otherwise prevent us from doing so? I don't quite see why freezing tuples in insert-only tables will cause problems - are you saying that more WAL will be written compared to freezing with a higher freeze_min_age? > > As I have said, if you have a table where you insert many rows in few > > transactions, you would trigger an autovacuum that then ends up doing nothing > > because none of the rows have reached vacuum_freeze_table_age yet. > > Then some time later you will get a really large vacuum run. > > Well, only if you don't further insert into the table. Which isn't that > common a case for a table having a "really large vacuum run". Ah, yes, you are right. So it actually would not be worse if we use the normal freeze_min_age for insert-only vacuums. So do you think the patch would be ok as it is if we change only that? Yours, Laurenz Albe
On Mon, 2020-03-16 at 16:07 -0500, Justin Pryzby wrote: > Best practice is to vacuum following bulk load. Yes. > If it's a bulk load, then I think it's okay to assume it was vacuumed, No. This patch is there precisely because too many people don't know that they should vacuum their table after a bulk insert. The idea of autovacuum is to do these things for you atomatically. Yours, Laurenz Albe
Hi, On 2020-03-16 22:25:11 +0100, Laurenz Albe wrote: > On Mon, 2020-03-16 at 13:13 -0700, Andres Freund wrote: > > > Freezing tuples is the point of this patch. > > > > Sure. But not hurting existing installation is also a goal of the > > patch. Since this is introducing potentially significant performance > > downsides, I think it's good to be a bit conservative with the default > > configuration. > > > > I'm gettin a bit more bullish on implementing some of what what I > > discussed in > > https://www.postgresql.org/message-id/20200313213851.ejrk5gptnmp65uoo%40alap3.anarazel.de > > at the same time as this patch. > > > > In particularl, I think it'd make sense to *not* have a lower freezing > > horizon for insert vacuums (because it *will* cause problems), but if > > the page is dirty anyway, then do the freezing even if freeze_min_age > > etc would otherwise prevent us from doing so? > > I don't quite see why freezing tuples in insert-only tables will cause > problems - are you saying that more WAL will be written compared to > freezing with a higher freeze_min_age? As far as I understand the patch may trigger additional vacuums e.g. for tables that have some heavily updated parts / key ranges, and otherwise are largely insert only (as long as there are in total considerably more inserts than updates). That's not at all uncommon. And for the heavily updated regions the additional vacuums with a 0 min age could prove to be costly. I've not looked at the new code, but it'd be particularly bad if the changes were to trigger the lazy_check_needs_freeze() check in lazy_scan_heap() - it'd have the potential for a lot more contention. > > > As I have said, if you have a table where you insert many rows in few > > > transactions, you would trigger an autovacuum that then ends up doing nothing > > > because none of the rows have reached vacuum_freeze_table_age yet. > > > Then some time later you will get a really large vacuum run. > > > > Well, only if you don't further insert into the table. Which isn't that > > common a case for a table having a "really large vacuum run". > > Ah, yes, you are right. > So it actually would not be worse if we use the normal freeze_min_age > for insert-only vacuums. Well, it's still be worse, because it'd likely trigger more writes of the same pages. Once for setting hint bits during the first vacuum, and then later a second for freezing. Which is why I was pondering using the logic > So do you think the patch would be ok as it is if we change only that? I've not looked at it in enough detail so far to say either way, sorry. Greetings, Andres Freund
On Mon, 2020-03-16 at 14:34 -0700, Andres Freund wrote: > > > In particularl, I think it'd make sense to *not* have a lower freezing > > > horizon for insert vacuums (because it *will* cause problems), but if > > > the page is dirty anyway, then do the freezing even if freeze_min_age > > > etc would otherwise prevent us from doing so? > > > > I don't quite see why freezing tuples in insert-only tables will cause > > problems - are you saying that more WAL will be written compared to > > freezing with a higher freeze_min_age? > > As far as I understand the patch may trigger additional vacuums e.g. for > tables that have some heavily updated parts / key ranges, and otherwise > are largely insert only (as long as there are in total considerably more > inserts than updates). That's not at all uncommon. > > And for the heavily updated regions the additional vacuums with a 0 min > age could prove to be costly. I've not looked at the new code, but it'd > be particularly bad if the changes were to trigger the > lazy_check_needs_freeze() check in lazy_scan_heap() - it'd have the > potential for a lot more contention. I think I got it. Here is a version of the patch that does *not* freeze more tuples than normal, except if a prior tuple on the same page is already eligible for freezing. lazy_check_needs_freeze() is only called for an aggressive vacuum, which this isn't. Does that look sane? Yours, Laurenz Albe
Attachment
On Tue, Mar 17, 2020 at 01:14:02AM +0100, Laurenz Albe wrote: > lazy_check_needs_freeze() is only called for an aggressive vacuum, which > this isn't. > --- a/src/backend/access/heap/vacuumlazy.c > +++ b/src/backend/access/heap/vacuumlazy.c > @@ -1388,17 +1388,26 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, > else > { > bool tuple_totally_frozen; > + bool freeze_all; > > num_tuples += 1; > hastup = true; > > + /* > + * If any tuple was already frozen in the block and this is > + * an insert-only vacuum, we might as well freeze all other > + * tuples in that block. > + */ > + freeze_all = params->is_insert_only && has_dead_tuples; > + You're checking if any (previously-scanned) tuple was *dead*, but I think you need to check nfrozen>=0. Also, this will fail to freeze tuples on a page which *could* be oppotunistically-frozen, but *follow* the first tuple which *needs* to be frozen. I think Andres was thinking this would maybe be an optimization independent of is_insert_only (?) > /* > * Each non-removable tuple must be checked to see if it needs > * freezing. Note we already have exclusive buffer lock. > */ > if (heap_prepare_freeze_tuple(tuple.t_data, > relfrozenxid, relminmxid, > - FreezeLimit, MultiXactCutoff, > + freeze_all ? 0 : FreezeLimit, > + freeze_all ? 0 : MultiXactCutoff, > &frozen[nfrozen], > &tuple_totally_frozen)) > + /* normal autovacuum shouldn't freeze aggressively */ > + *insert_only = false; Aggressively is a bad choice of words. In the context of vacuum, it usually means "visit all pages, even those which are allvisible". -- Justin
On Tue, 2020-03-17 at 10:24 -0500, Justin Pryzby wrote: > > --- a/src/backend/access/heap/vacuumlazy.c > > +++ b/src/backend/access/heap/vacuumlazy.c > > @@ -1388,17 +1388,26 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, > > else > > { > > bool tuple_totally_frozen; > > + bool freeze_all; > > > > num_tuples += 1; > > hastup = true; > > > > + /* > > + * If any tuple was already frozen in the block and this is > > + * an insert-only vacuum, we might as well freeze all other > > + * tuples in that block. > > + */ > > + freeze_all = params->is_insert_only && has_dead_tuples; > > + > > You're checking if any (previously-scanned) tuple was *dead*, but I think you > need to check nfrozen>=0. Yes, that was a silly typo. > Also, this will fail to freeze tuples on a page which *could* be > oppotunistically-frozen, but *follow* the first tuple which *needs* to be > frozen. I am aware of that. I was trying to see if that went in the direction that Andres intends before trying more invasive modifications. > I think Andres was thinking this would maybe be an optimization independent of > is_insert_only (?) I wasn't sure. In the light of that, I have ripped out that code again. Also, since aggressive^H^H^H^H^H^H^H^H^H^Hproactive freezing seems to be a performance problem in some cases (pages with UPDATEs and DELETEs in otherwise INSERT-mostly tables), I have done away with the whole freezing thing, which made the whole patch much smaller and simpler. Now all that is introduced are the threshold and scale factor and the new statistics counter to track the number of inserts since the last VACUUM. > > + /* normal autovacuum shouldn't freeze aggressively */ > > + *insert_only = false; > > Aggressively is a bad choice of words. In the context of vacuum, it usually > means "visit all pages, even those which are allvisible". This is gone in the latest patch. Updated patch attached. Perhaps we can reach a consensus on this reduced functionality. Yours, Laurenz Albe
Attachment
On Tue, Mar 17, 2020 at 08:42:07PM +0100, Laurenz Albe wrote: > Also, since aggressive^H^H^H^H^H^H^H^H^H^Hproactive freezing seems to be a > performance problem in some cases (pages with UPDATEs and DELETEs in otherwise > INSERT-mostly tables), I have done away with the whole freezing thing, > which made the whole patch much smaller and simpler. > > Now all that is introduced are the threshold and scale factor and > the new statistics counter to track the number of inserts since the last > VACUUM. > > Updated patch attached. > > Perhaps we can reach a consensus on this reduced functionality. +1 I still suggest scale_factor maximum of 1e10, like 4d54543efa5eb074ead4d0fadb2af4161c943044 Which alows more effectively disabling it than a factor of 100, which would progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, .. I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be. With 1e10, it's first vacuumed when there's 10billion inserts, if we didn't previous hit the n_dead threshold. I think that's ok? If one wanted to disable it up to 1e11 tuples, I think they'd disable autovacuum, or preferably just implement an vacuum job. The commit message says: |The scale factor defaults to 0, which means that it is |effectively disabled, but it offers some flexibility ..but "it" is ambiguous, so should say something like: "the table size does not contribute to the autovacuum threshold". -- Justin
On Tue, 2020-03-17 at 14:56 -0500, Justin Pryzby wrote: > I still suggest scale_factor maximum of 1e10, like > 4d54543efa5eb074ead4d0fadb2af4161c943044 > > Which alows more effectively disabling it than a factor of 100, which would > progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, .. > > I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be. With > 1e10, it's first vacuumed when there's 10billion inserts, if we didn't previous > hit the n_dead threshold. > > I think that's ok? If one wanted to disable it up to 1e11 tuples, I think > they'd disable autovacuum, or preferably just implement an vacuum job. Assume a scale factor >= 1, for example 2, and n live tuples. The table has just been vacuumed. Now we insert m number tuples (which are live). Then the condition threshold + scale_factor * live_tuples < newly_inserted_tuples becomes 10000000 + 2 * (n + m) < m which can never be true for non-negative n and m. So a scale factor >= 1 disables the feature. Yours, Laurenz Albe
On Tue, Mar 17, 2020 at 10:01:15PM +0100, Laurenz Albe wrote: > On Tue, 2020-03-17 at 14:56 -0500, Justin Pryzby wrote: > > I still suggest scale_factor maximum of 1e10, like > > 4d54543efa5eb074ead4d0fadb2af4161c943044 > > > > Which alows more effectively disabling it than a factor of 100, which would > > progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, .. > > > > I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be. With > > 1e10, it's first vacuumed when there's 10billion inserts, if we didn't previous > > hit the n_dead threshold. > > > > I think that's ok? If one wanted to disable it up to 1e11 tuples, I think > > they'd disable autovacuum, or preferably just implement an vacuum job. > > Assume a scale factor >= 1, for example 2, and n live tuples. > The table has just been vacuumed. > > Now we insert m number tuples (which are live). > > Then the condition > > threshold + scale_factor * live_tuples < newly_inserted_tuples > > becomes > > 10000000 + 2 * (n + m) < m > > which can never be true for non-negative n and m. > > So a scale factor >= 1 disables the feature. No, this is what we mailed about privately yesterday, and I demonstrated that autovac can still run with factor=100. I said: |It's a multiplier, not a percent out of 100 (fraction is not a great choice of |words). | | &autovacuum_vac_scale, | 0.2, 0.0, 100.0, | |The default is 0.2 (20%), so 100 means after updating/deleting 100*reltuples. live tuples is an estimate, from the most recent vacuum OR analyze. If 1.0 disabled the feature, it wouldn't make much sense to allow factor up to 100. + { + {"autovacuum_vacuum_insert_scale_factor", PGC_SIGHUP, AUTOVACUUM, + gettext_noop("Number of tuple inserts prior to vacuum as a fraction of reltuples."), + NULL + }, + &autovacuum_vac_ins_scale, + 0.0, 0.0, 100.0, + NULL, NULL, NULL + }, -- Justin
On Tue, 2020-03-17 at 16:07 -0500, Justin Pryzby wrote: > > Assume a scale factor >= 1, for example 2, and n live tuples. > > The table has just been vacuumed. > > > > Now we insert m number tuples (which are live). > > > > Then the condition > > > > threshold + scale_factor * live_tuples < newly_inserted_tuples > > > > becomes > > > > 10000000 + 2 * (n + m) < m > > > > which can never be true for non-negative n and m. > > > > So a scale factor >= 1 disables the feature. > > No, this is what we mailed about privately yesterday, and I demonstrated that > autovac can still run with factor=100. I said: I remember. Can you point out where exactly the flaw in my reasoning is? > > It's a multiplier, not a percent out of 100 (fraction is not a great choice of > > words). > > > > &autovacuum_vac_scale, > > 0.2, 0.0, 100.0, > > > > The default is 0.2 (20%), so 100 means after updating/deleting 100*reltuples. Yes, exactly. > If 1.0 disabled the feature, it wouldn't make much sense to allow factor up to > 100. True, we could set the upper limit to 2, but it doesn't matter much. Note that this is different from autovacuum_vacuum_scale_factor, because inserted tuples are live, while dead tuples are not. Yours, Laurenz Albe
On Tue, Mar 17, 2020 at 10:22:44PM +0100, Laurenz Albe wrote: > On Tue, 2020-03-17 at 16:07 -0500, Justin Pryzby wrote: > > > Assume a scale factor >= 1, for example 2, and n live tuples. > > > The table has just been vacuumed. > > > > > > Now we insert m number tuples (which are live). .. but not yet counted in reltuples. On Tue, Mar 17, 2020 at 10:22:44PM +0100, Laurenz Albe wrote: > Note that this is different from autovacuum_vacuum_scale_factor, > because inserted tuples are live, while dead tuples are not. But they're not counted in reltuples until after the next vacuum (or analyze), which is circular, since it's exactly what we're trying to schedule. reltuples = classForm->reltuples; vactuples = tabentry->n_dead_tuples; + instuples = tabentry->inserts_since_vacuum; anltuples = tabentry->changes_since_analyze; vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples; + vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples; -- Justin
On Tue, 2020-03-17 at 16:34 -0500, Justin Pryzby wrote: > > > > Now we insert m number tuples (which are live). > > .. but not yet counted in reltuples. Thanks for pointing out my mistake. Here is another patch, no changes except setting the upper limit for autovacuum_vacuum_insert_scale_factor to 1e10. Yours, Laurenz Albe
Attachment
On Mon, Mar 16, 2020 at 07:47:13AM -0500, Justin Pryzby wrote: > Normally, when someone complains about bad plan related to no index-onlyscan, > we tell them to run vacuum, and if that helps, then ALTER TABLE .. SET > (autovacuum_vacuum_scale_factor=0.005). > > If there's two thresholds (4 GUCs and 4 relopts) for autovacuum, then do we > have to help determine which one was being hit, and which relopt to set? I don't think we came to any resolution on this. Right now, to encourage IOS, we'd tell someone to set autovacuum_vacuum_scale_factor=0.005. That wouldn't work for an insert-only table, but I've never heard back from someone that it didn't work. So with this patch, we'd maybe tell them to do this, to also get IOS on insert-only tables ? |ALTER TABLE .. SET (autovacuum_vacuum_scale_factor=0.005, autovacuum_vacuum_insert_threshold=50000); > I wonder if the new insert GUCs should default to -1 (disabled)? And the > insert thresholds should be set by new insert relopt (if set), or by new insert > GUC (default -1), else normal relopt, or normal GUC. The defaults would give > 50 + 0.20*n. When someone asks about IOS, we'd tell them to set > autovacuum_vacuum_scale_factor=0.005, same as now. > > vac_ins_scale_factor = > (relopts && relopts->vacuum_ins_scale_factor >= 0) ? relopts->vacuum_ins_scale_factor : > autovacuum_vac_ins_scale >= 0 ? autovacuum_vac_ins_scale : > (relopts && relopts->vacuum_scale_factor >= 0) ? relopts->vacuum_scale_factor : > autovacuum_vac_scale; -- Justin
Hi, On 2020-03-17 01:14:02 +0100, Laurenz Albe wrote: > lazy_check_needs_freeze() is only called for an aggressive vacuum, which > this isn't. Hm? I mean some of these will be aggressive vacuums, because it's older than vacuum_freeze_table_age? And the lower age limit would make that potentially more painful, no? Greetings, Andres Freund
Hi, On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > I think Andres was thinking this would maybe be an optimization independent of > > is_insert_only (?) > > I wasn't sure. I'm not sure myself - but I'm doubtful that using a 0 min age by default will be ok. I was trying to say (in a later email) that I think it might be a good compromise to opportunistically freeze if we're dirtying the page anyway, but not optimize WAL emission etc. That's a pretty simple change, and it'd address a lot of the potential performance regressions, while still freezing for the "first" vacuum in insert only workloads. > Add "autovacuum_vacuum_insert_threshold" and > "autovacuum_vacuum_insert_scale_factor" GUC and reloption. > The default value for the threshold is 10000000. > The scale factor defaults to 0, which means that it is > effectively disabled, but it offers some flexibility > to tune the feature similar to other autovacuum knobs. I don't think a default scale factor of 0 is going to be ok. For large-ish tables this will basically cause permanent vacuums. And it'll sometimes trigger for tables that actually coped well so far. 10 million rows could be a few seconds, not more. I don't think that the argument that otherwise a table might not get vacuumed before autovacuum_freeze_max_age is convincing enough. a) if that's indeed the argument, we should increase the default autovacuum_freeze_max_age - now that there's insert triggered vacuums, the main argument against that from before isn't valid anymore. b) there's not really a good arguments for vacuuming more often than autovacuum_freeze_max_age for such tables. It'll not be not frequent enough to allow IOS for new data, and you're not preventing anti-wraparound vacuums from happening. Greetings, Andres Freund
On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > I think Andres was thinking this would maybe be an optimization independent of > > > is_insert_only (?) > > > > I wasn't sure. > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > will be ok. > > I was trying to say (in a later email) that I think it might be a good > compromise to opportunistically freeze if we're dirtying the page > anyway, but not optimize WAL emission etc. That's a pretty simple > change, and it'd address a lot of the potential performance regressions, > while still freezing for the "first" vacuum in insert only workloads. If we have truly insert-only tables, then doesn't vacuuming with freezing every tuple actually decrease total vacuum cost (perhaps significantly) since otherwise every vacuum keeps having to scan the heap for dead tuples on pages where we know there are none? Those pages could conceptually be frozen and ignored, but are not frozen because of the default behavior, correct? We have tables that log each change to a business object (as I suspect many transactional workloads do), and I've often thought that immediately freeze every page as soon as it fills up would be a real win for us. If that's all true, it seems to me that removing that part of the patch significantly lowers its value. If we opportunistically freeze only if we're already dirtying a page, would that help a truly insert-only workload? E.g., are there hint bits on the page that would need to change the first time we vacuum a full page with no dead tuples? I would have assumed the answer was "no" (since if so I think it would follow that _all_ pages need updated the first time they're vacuumed?). But if that's the case, then this kind of opportunistic freezing wouldn't help this kind of workload. Maybe there's something I'm misunderstanding about how vacuum works though. Thanks, James
On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote: > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > > I think Andres was thinking this would maybe be an optimization independent of > > > > is_insert_only (?) > > > > > > I wasn't sure. > > > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > > will be ok. > > > > I was trying to say (in a later email) that I think it might be a good > > compromise to opportunistically freeze if we're dirtying the page > > anyway, but not optimize WAL emission etc. That's a pretty simple > > change, and it'd address a lot of the potential performance regressions, > > while still freezing for the "first" vacuum in insert only workloads. > > If we have truly insert-only tables, then doesn't vacuuming with > freezing every tuple actually decrease total vacuum cost (perhaps > significantly) since otherwise every vacuum keeps having to scan the > heap for dead tuples on pages where we know there are none? Those > pages could conceptually be frozen and ignored, but are not frozen > because of the default behavior, correct? The essential part of this patch is to trigger vacuum *at all* on an insert-only table. Before today's updated patch, it also used FREEZE on any table which hit the new insert threshold. The concern I raised is for insert-MOSTLY tables. I thought it might be an issue if repeatedly freezing updated tuples caused vacuum to be too slow, especially if they're distributed in pages all across the table rather than clustered. And I asked that the behavior (FREEZE) be configurable by a separate setting than the one that triggers autovacuum to run. FREEZE is already controlled by the vacuum_freeze_table_age param. I think you're right that VACUUM FREEZE on an insert-only table would be less expensive than vacuum once without freeze and vacuum again later, which uses freeze. To me, that suggests setting vacuum_freeze_table_age to a low value on those tables. Regular vacuum avoids scanning all-visible pages, so for an insert-only table pages should only be vacuumed once (if frozen the 1st time) or twice (if not). * Except when aggressive is set, we want to skip pages that are * all-visible according to the visibility map, but only when we can skip postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT generate_series(1,999999); VACUUM VERBOSE t; VACUUM VERBOSE t; ... INFO: "t": found 0 removable, 999999 nonremovable row versions in 4425 out of 4425 pages ... VACUUM Time: 106.038 ms INFO: "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 pages VACUUM Time: 1.828 ms => That's its not very clear way of saying that it only scanned 1 page the 2nd time around. > We have tables that log each change to a business object (as I suspect > many transactional workloads do), and I've often thought that > immediately freeze every page as soon as it fills up would be a real > win for us. > > If that's all true, it seems to me that removing that part of the > patch significantly lowers its value. > If we opportunistically freeze only if we're already dirtying a page, > would that help a truly insert-only workload? E.g., are there hint > bits on the page that would need to change the first time we vacuum a > full page with no dead tuples? I would have assumed the answer was > "no" (since if so I think it would follow that _all_ pages need > updated the first time they're vacuumed?). You probably know that hint bits are written by the first process to access the tuple after it was written. I think you're asking if the first *vacuum* requires additional writes beyond that. And I think vacuum wouldn't touch the page until it decides to freeze tuples. I do have a patch to display the number of hint bits written and pages frozen. https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com > But if that's the case, then this kind of opportunistic freezing wouldn't > help this kind of workload. Maybe there's something I'm misunderstanding > about how vacuum works though. I am reminding myself about vacuum with increasing frequency and usually still learn something new. -- Justin
Hi, On 2020-03-17 21:58:53 -0400, James Coleman wrote: > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > > I think Andres was thinking this would maybe be an optimization independent of > > > > is_insert_only (?) > > > > > > I wasn't sure. > > > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > > will be ok. > > > > I was trying to say (in a later email) that I think it might be a good > > compromise to opportunistically freeze if we're dirtying the page > > anyway, but not optimize WAL emission etc. That's a pretty simple > > change, and it'd address a lot of the potential performance regressions, > > while still freezing for the "first" vacuum in insert only workloads. > > If we have truly insert-only tables, then doesn't vacuuming with > freezing every tuple actually decrease total vacuum cost (perhaps > significantly) since otherwise every vacuum keeps having to scan the > heap for dead tuples on pages where we know there are none? Those > pages could conceptually be frozen and ignored, but are not frozen > because of the default behavior, correct? Yes. > If that's all true, it seems to me that removing that part of the > patch significantly lowers its value. Well, perfect sometimes is the enemy of the good. We gotta get something in, and having some automated vacuuming for insert mostly/only tables is a huge step forward. And avoiding regressions is an important part of doing so. I outlined the steps we could take to allow for more aggressive vacuuming upthread. > If we opportunistically freeze only if we're already dirtying a page, > would that help a truly insert-only workload? Yes. > E.g., are there hint bits on the page that would need to change the > first time we vacuum a full page with no dead tuples? Yes. HEAP_XMIN_COMMITTED. > I would have assumed the answer was "no" (since if so I think it would > follow that _all_ pages need updated the first time they're > vacuumed?). That is the case. Although they might already be set when the tuples are accessed for other reasons. Greetings, Andres Freund
On Tue, Mar 17, 2020 at 11:37 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote: > > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > > > I think Andres was thinking this would maybe be an optimization independent of > > > > > is_insert_only (?) > > > > > > > > I wasn't sure. > > > > > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > > > will be ok. > > > > > > I was trying to say (in a later email) that I think it might be a good > > > compromise to opportunistically freeze if we're dirtying the page > > > anyway, but not optimize WAL emission etc. That's a pretty simple > > > change, and it'd address a lot of the potential performance regressions, > > > while still freezing for the "first" vacuum in insert only workloads. > > > > If we have truly insert-only tables, then doesn't vacuuming with > > freezing every tuple actually decrease total vacuum cost (perhaps > > significantly) since otherwise every vacuum keeps having to scan the > > heap for dead tuples on pages where we know there are none? Those > > pages could conceptually be frozen and ignored, but are not frozen > > because of the default behavior, correct? > > The essential part of this patch is to trigger vacuum *at all* on an > insert-only table. Before today's updated patch, it also used FREEZE on any > table which hit the new insert threshold. The concern I raised is for > insert-MOSTLY tables. I thought it might be an issue if repeatedly freezing > updated tuples caused vacuum to be too slow, especially if they're distributed > in pages all across the table rather than clustered. Yeah, for some reason I'd completely forgotten (caught up in thinking about the best possible outcome re: freezing insert only tables) that the bigger problem was just triggering vacuum at all on those tables. > And I asked that the behavior (FREEZE) be configurable by a separate setting > than the one that triggers autovacuum to run. FREEZE is already controlled by > the vacuum_freeze_table_age param. > > I think you're right that VACUUM FREEZE on an insert-only table would be less > expensive than vacuum once without freeze and vacuum again later, which uses > freeze. To me, that suggests setting vacuum_freeze_table_age to a low value on > those tables. > > Regular vacuum avoids scanning all-visible pages, so for an insert-only table > pages should only be vacuumed once (if frozen the 1st time) or twice (if not). > > * Except when aggressive is set, we want to skip pages that are > * all-visible according to the visibility map, but only when we can skip > > postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT generate_series(1,999999); VACUUM VERBOSE t; VACUUM VERBOSE t; > ... > INFO: "t": found 0 removable, 999999 nonremovable row versions in 4425 out of 4425 pages > ... > VACUUM > Time: 106.038 ms > INFO: "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 pages > VACUUM > Time: 1.828 ms > > => That's its not very clear way of saying that it only scanned 1 page the 2nd > time around. I didn't realize that about the visibility map being taken into account. > > We have tables that log each change to a business object (as I suspect > > many transactional workloads do), and I've often thought that > > immediately freeze every page as soon as it fills up would be a real > > win for us. > > > > If that's all true, it seems to me that removing that part of the > > patch significantly lowers its value. > > > If we opportunistically freeze only if we're already dirtying a page, > > would that help a truly insert-only workload? E.g., are there hint > > bits on the page that would need to change the first time we vacuum a > > full page with no dead tuples? I would have assumed the answer was > > "no" (since if so I think it would follow that _all_ pages need > > updated the first time they're vacuumed?). > > You probably know that hint bits are written by the first process to access the > tuple after it was written. I think you're asking if the first *vacuum* > requires additional writes beyond that. And I think vacuum wouldn't touch the > page until it decides to freeze tuples. I think my assumption is that (at least in our case), the first process to access will definitely not be vacuum on any regular basis. > I do have a patch to display the number of hint bits written and pages frozen. > https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com I'll take a look at that too. > > But if that's the case, then this kind of opportunistic freezing wouldn't > > help this kind of workload. Maybe there's something I'm misunderstanding > > about how vacuum works though. > > I am reminding myself about vacuum with increasing frequency and usually still > learn something new. For sure. Thanks, James
On Wed, Mar 18, 2020 at 1:08 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-03-17 21:58:53 -0400, James Coleman wrote: > > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > > > I think Andres was thinking this would maybe be an optimization independent of > > > > > is_insert_only (?) > > > > > > > > I wasn't sure. > > > > > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > > > will be ok. > > > > > > I was trying to say (in a later email) that I think it might be a good > > > compromise to opportunistically freeze if we're dirtying the page > > > anyway, but not optimize WAL emission etc. That's a pretty simple > > > change, and it'd address a lot of the potential performance regressions, > > > while still freezing for the "first" vacuum in insert only workloads. > > > > If we have truly insert-only tables, then doesn't vacuuming with > > freezing every tuple actually decrease total vacuum cost (perhaps > > significantly) since otherwise every vacuum keeps having to scan the > > heap for dead tuples on pages where we know there are none? Those > > pages could conceptually be frozen and ignored, but are not frozen > > because of the default behavior, correct? > > Yes. > > > > If that's all true, it seems to me that removing that part of the > > patch significantly lowers its value. > > Well, perfect sometimes is the enemy of the good. We gotta get something > in, and having some automated vacuuming for insert mostly/only tables is > a huge step forward. And avoiding regressions is an important part of > doing so. Yep, as I responded to Justin, in thinking about the details I'd lost sight of the biggest issue. So I withdraw that concern in favor of getting something out that improves things now. ... > > If we opportunistically freeze only if we're already dirtying a page, > > would that help a truly insert-only workload? > > Yes. Only if some other process hasn't already read and caused hint bits to be written, correct? Or am I missing something there too? > > E.g., are there hint bits on the page that would need to change the > > first time we vacuum a full page with no dead tuples? > > Yes. HEAP_XMIN_COMMITTED. This can be set opportunistically by other non-vacuum processes though? > > I would have assumed the answer was "no" (since if so I think it would > > follow that _all_ pages need updated the first time they're > > vacuumed?). > > That is the case. Although they might already be set when the tuples are > accessed for other reasons. Ah, I think this is answering what I'd asked above. I'm very excited to see improvements in flight on this use case. Thanks, James
On Tue, 2020-03-17 at 17:26 -0700, Andres Freund wrote: > On 2020-03-17 01:14:02 +0100, Laurenz Albe wrote: > > lazy_check_needs_freeze() is only called for an aggressive vacuum, which > > this isn't. > > Hm? I mean some of these will be aggressive vacuums, because it's older > than vacuum_freeze_table_age? And the lower age limit would make that > potentially more painful, no? You are right. I thought of autovacuum_freeze_max_age, but not of vacuum_freeze_table_age. Autovacuum configuration is so woefully complicated that it makes me feel bad to propose two more parameters :^( Yours, Laurenz Albe
On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote: > I don't think a default scale factor of 0 is going to be ok. For > large-ish tables this will basically cause permanent vacuums. And it'll > sometimes trigger for tables that actually coped well so far. 10 million > rows could be a few seconds, not more. > > I don't think that the argument that otherwise a table might not get > vacuumed before autovacuum_freeze_max_age is convincing enough. > > a) if that's indeed the argument, we should increase the default > autovacuum_freeze_max_age - now that there's insert triggered vacuums, > the main argument against that from before isn't valid anymore. > > b) there's not really a good arguments for vacuuming more often than > autovacuum_freeze_max_age for such tables. It'll not be not frequent > enough to allow IOS for new data, and you're not preventing > anti-wraparound vacuums from happening. According to my reckoning, that is the remaining objection to the patch as it is (with ordinary freezing behavior). How about a scale_factor od 0.005? That will be high enough for large tables, which seem to be the main concern here. I fully agree with your point a) - should that be part of the patch? I am not sure about b). In my mind, the objective is not to prevent anti-wraparound vacuums, but to see that they have less work to do, because previous autovacuum runs already have frozen anything older than vacuum_freeze_min_age. So, assuming linear growth, the number of tuples to freeze during any run would be at most one fourth of today's number when we hit autovacuum_freeze_max_age. I am still sorry to see more proactive freezing go, which would reduce the impact for truly insert-only tables. After sleeping on it, here is one last idea. Granted, freezing with vacuum_freeze_min_age = 0 poses a problem for those parts of the table that will receive updates or deletes. But what if insert-triggered vacuum operates with - say - one tenth of vacuum_freeze_min_age (unless explicitly overridden for the table)? That might still be high enough not to needlessly freeze too many tuples that will still be modified, but it will reduce the impact on insert-only tables. Yours, Laurenz Albe
On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote: > On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote: > > Having now played with the patch, I'll suggest that 10000000 is too high a > > threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be > > much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. > > ISTM that the danger of regressing workloads due to suddenly repeatedly > scanning huge indexes that previously were never / rarely scanned is > significant (if there's a few dead tuples, otherwise most indexes will > be able to skip the scan since the vacuum_cleanup_index_scale_factor > introduction)). We could try to avoid that issue here: | /* If any tuples need to be deleted, perform final vacuum cycle */ | /* XXX put a threshold on min number of tuples here? */ | if (dead_tuples->num_tuples > 0) | { | /* Work on all the indexes, and then the heap */ | lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats, | lps, nindexes); | | /* Remove tuples from heap */ | lazy_vacuum_heap(onerel, vacrelstats); | } As you said, an insert-only table can skip scanning indexes, but an insert-mostly table currently cannot. Maybe we could skip the final index scan if we hit the autovacuum insert threshold? I still don't like mixing the thresholds with the behavior they imply, but maybe what's needed is better docs describing all of vacuum's roles and its procedure and priority in executing them. The dead tuples would just be cleaned up during a future vacuum, right ? So that would be less efficient, but (no surprise) there's a balance to strike and that can be tuned. I think that wouldn't be an issue for most people; the worst case would be if you set high maint_work_mem, and low insert threshold, and you got increased bloat. But faster vacuum if we avoided idx scans. That might allow more flexibility in our discussion around default values for thresholds for insert-triggered vacuum. -- Justin
On Thu, 19 Mar 2020 at 18:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote: > > I don't think a default scale factor of 0 is going to be ok. For > > large-ish tables this will basically cause permanent vacuums. And it'll > > sometimes trigger for tables that actually coped well so far. 10 million > > rows could be a few seconds, not more. > > > > I don't think that the argument that otherwise a table might not get > > vacuumed before autovacuum_freeze_max_age is convincing enough. > > > > a) if that's indeed the argument, we should increase the default > > autovacuum_freeze_max_age - now that there's insert triggered vacuums, > > the main argument against that from before isn't valid anymore. > > > > b) there's not really a good arguments for vacuuming more often than > > autovacuum_freeze_max_age for such tables. It'll not be not frequent > > enough to allow IOS for new data, and you're not preventing > > anti-wraparound vacuums from happening. > > According to my reckoning, that is the remaining objection to the patch > as it is (with ordinary freezing behavior). > > How about a scale_factor od 0.005? That will be high enough for large > tables, which seem to be the main concern here. I agree with that, however, I'd thought 0.01, just so we're still close to having about 100 times less work to do for huge insert-only tables when it comes to having to perform an anti-wraparound vacuum. > I fully agree with your point a) - should that be part of the patch? I think it will be a good idea to increase this, but I really don't think this patch should be touching it. It's something to put on the issues list for after the CF so more people have the bandwidth to chip in their thoughts. > I am not sure about b). In my mind, the objective is not to prevent > anti-wraparound vacuums, but to see that they have less work to do, > because previous autovacuum runs already have frozen anything older than > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > to freeze during any run would be at most one fourth of today's number > when we hit autovacuum_freeze_max_age. I hear what Andres is saying about proactive freezing for already dirty pages. I think that's worth looking into, but don't feel like we need to do it for this patch. The patch is worthy without it and such a change affects more than insert-vacuums, so should be a separate commit. If people really do have an insert-only table then we can recommend that they set the table's autovacuum_freeze_min_age to 0. > I am still sorry to see more proactive freezing go, which would > reduce the impact for truly insert-only tables. > After sleeping on it, here is one last idea. > > Granted, freezing with vacuum_freeze_min_age = 0 poses a problem > for those parts of the table that will receive updates or deletes. > But what if insert-triggered vacuum operates with - say - > one tenth of vacuum_freeze_min_age (unless explicitly overridden > for the table)? That might still be high enough not to needlessly > freeze too many tuples that will still be modified, but it will > reduce the impact on insert-only tables. I think that might be a bit too magical and may not be what some people want. I know that most people won't set autovacuum_freeze_min_age to 0 for insert-only tables, but we can at least throw something in the documents to mention it's a good idea, however, looking over the docs I'm not too sure the best place to note that down. I've attached a small fix which I'd like to apply to your v8 patch. With that, and pending one final look, I'd like to push this during my Monday (New Zealand time). So if anyone strongly objects to that, please state their case before then. David
Attachment
On Thu, 19 Mar 2020 at 19:07, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote: > > On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote: > > > Having now played with the patch, I'll suggest that 10000000 is too high a > > > threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be > > > much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. > > > > ISTM that the danger of regressing workloads due to suddenly repeatedly > > scanning huge indexes that previously were never / rarely scanned is > > significant (if there's a few dead tuples, otherwise most indexes will > > be able to skip the scan since the vacuum_cleanup_index_scale_factor > > introduction)). > > We could try to avoid that issue here: > > | /* If any tuples need to be deleted, perform final vacuum cycle */ > | /* XXX put a threshold on min number of tuples here? */ > | if (dead_tuples->num_tuples > 0) > | { > | /* Work on all the indexes, and then the heap */ > | lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats, > | lps, nindexes); > | > | /* Remove tuples from heap */ > | lazy_vacuum_heap(onerel, vacrelstats); > | } > > As you said, an insert-only table can skip scanning indexes, but an > insert-mostly table currently cannot. > > Maybe we could skip the final index scan if we hit the autovacuum insert > threshold? > > I still don't like mixing the thresholds with the behavior they imply, but > maybe what's needed is better docs describing all of vacuum's roles and its > procedure and priority in executing them. > > The dead tuples would just be cleaned up during a future vacuum, right ? So > that would be less efficient, but (no surprise) there's a balance to strike and > that can be tuned. I think that wouldn't be an issue for most people; the > worst case would be if you set high maint_work_mem, and low insert threshold, > and you got increased bloat. But faster vacuum if we avoided idx scans. > > That might allow more flexibility in our discussion around default values for > thresholds for insert-triggered vacuum. We went over this a bit already. The risk is that if you have an insert-mostly table and always trigger an auto-vacuum for inserts and never due to dead tuples, then you'll forego the index cleanup every time causing the indexes to bloat over time. I think any considerations to add some sort of threshold on dead tuples before cleaning the index should be considered independently. Trying to get everyone to agree to what's happening here is hard enough without adding more options to the list. I understand that there may be small issues with insert-only tables with a tiny number of dead tuples, perhaps due to aborts could cause some issues while scanning the index, but that's really one of the big reasons why the 10 million insert threshold has been added. Just in the past few hours we've talked about having a very small scale factor to protect from over-vacuum on huge tables that see 10 million tuples inserted in short spaces of time. I think that's a good compromise, however, certainly not perfect. David
On Thu, Mar 19, 2020 at 09:52:11PM +1300, David Rowley wrote: > On Thu, 19 Mar 2020 at 19:07, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote: > > > On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote: > > > > Having now played with the patch, I'll suggest that 10000000 is too high a > > > > threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be > > > > much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC. > > > > > > ISTM that the danger of regressing workloads due to suddenly repeatedly > > > scanning huge indexes that previously were never / rarely scanned is > > > significant (if there's a few dead tuples, otherwise most indexes will > > > be able to skip the scan since the vacuum_cleanup_index_scale_factor > > > introduction)). > > > > We could try to avoid that issue here: > > > > | /* If any tuples need to be deleted, perform final vacuum cycle */ > > | /* XXX put a threshold on min number of tuples here? */ > > | if (dead_tuples->num_tuples > 0) > > | { > > | /* Work on all the indexes, and then the heap */ > > | lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats, > > | lps, nindexes); > > | > > | /* Remove tuples from heap */ > > | lazy_vacuum_heap(onerel, vacrelstats); > > | } > > > > As you said, an insert-only table can skip scanning indexes, but an > > insert-mostly table currently cannot. > > > > Maybe we could skip the final index scan if we hit the autovacuum insert > > threshold? > > > > I still don't like mixing the thresholds with the behavior they imply, but > > maybe what's needed is better docs describing all of vacuum's roles and its > > procedure and priority in executing them. > > > > The dead tuples would just be cleaned up during a future vacuum, right ? So > > that would be less efficient, but (no surprise) there's a balance to strike and > > that can be tuned. I think that wouldn't be an issue for most people; the > > worst case would be if you set high maint_work_mem, and low insert threshold, > > and you got increased bloat. But faster vacuum if we avoided idx scans. > > > > That might allow more flexibility in our discussion around default values for > > thresholds for insert-triggered vacuum. > > We went over this a bit already. The risk is that if you have an > insert-mostly table and always trigger an auto-vacuum for inserts and > never due to dead tuples, then you'll forego the index cleanup every > time causing the indexes to bloat over time. At the time, we were talking about skipping index *cleanup* phase. Which also incurs an index scan. >+ tab->at_params.index_cleanup = insert_only ? VACOPT_TERNARY_DISABLED : VACOPT_TERNARY_DEFAULT; We decided not to skip this, since it would allow index bloat, if vacuum were only ever run due to inserts, and cleanup never happened. I'm suggesting the possibility of skipping not index *cleanup* but index (and heap) *vacuum*. So that saves an index scan itself, and I think implies later skipping cleanup (since no index tuples were removed). But more importantly, I think if we skip that during an insert-triggered vacuum, the dead heap tuples are still there during the next vacuum instance. So unlike index cleanup (where we don't keep track of the total number of dead index tuples), this can accumulate over time, and eventually trigger index+heap vacuum, and cleanup. > I think any considerations to add some sort of threshold on dead > tuples before cleaning the index should be considered independently. +1, yes. I'm hoping to anticipate and mitigate any objections and regressions more than raise them. -- Justin
On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote: > > According to my reckoning, that is the remaining objection to the patch > > as it is (with ordinary freezing behavior). > > > > How about a scale_factor od 0.005? That will be high enough for large > > tables, which seem to be the main concern here. > > I agree with that, however, I'd thought 0.01, just so we're still > close to having about 100 times less work to do for huge insert-only > tables when it comes to having to perform an anti-wraparound vacuum. Fine with me. > > I am still sorry to see more proactive freezing go, which would > > reduce the impact for truly insert-only tables. > > After sleeping on it, here is one last idea. > > > > Granted, freezing with vacuum_freeze_min_age = 0 poses a problem > > for those parts of the table that will receive updates or deletes. > > But what if insert-triggered vacuum operates with - say - > > one tenth of vacuum_freeze_min_age (unless explicitly overridden > > for the table)? That might still be high enough not to needlessly > > freeze too many tuples that will still be modified, but it will > > reduce the impact on insert-only tables. > > I think that might be a bit too magical and may not be what some > people want. I know that most people won't set > autovacuum_freeze_min_age to 0 for insert-only tables, but we can at > least throw something in the documents to mention it's a good idea, > however, looking over the docs I'm not too sure the best place to note > that down. I was afraid that idea would be too cute to appeal. > I've attached a small fix which I'd like to apply to your v8 patch. > With that, and pending one final look, I'd like to push this during my > Monday (New Zealand time). So if anyone strongly objects to that, > please state their case before then. Thanks! I have rolled your edits into the attached patch v9, rebased against current master. Yours, Laurenz Albe
Attachment
Hi, On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote: > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote: > > I don't think a default scale factor of 0 is going to be ok. For > > large-ish tables this will basically cause permanent vacuums. And it'll > > sometimes trigger for tables that actually coped well so far. 10 million > > rows could be a few seconds, not more. > > > > I don't think that the argument that otherwise a table might not get > > vacuumed before autovacuum_freeze_max_age is convincing enough. > > > > a) if that's indeed the argument, we should increase the default > > autovacuum_freeze_max_age - now that there's insert triggered vacuums, > > the main argument against that from before isn't valid anymore. > > > > b) there's not really a good arguments for vacuuming more often than > > autovacuum_freeze_max_age for such tables. It'll not be not frequent > > enough to allow IOS for new data, and you're not preventing > > anti-wraparound vacuums from happening. > > According to my reckoning, that is the remaining objection to the patch > as it is (with ordinary freezing behavior). > > How about a scale_factor od 0.005? That will be high enough for large > tables, which seem to be the main concern here. Seems low on a first blush. On a large-ish table with 1 billion tuples, we'd vacuum every 5 million inserts. For many ETL workloads this will result in a vacuum after every bulk operation. Potentially with an index scan associated (even if there's no errors, a lot of bulk loads use ON CONFLICT INSERT leading to the occasional update). Personally I think we should be considerably more conservative in the first release or two. Exposing a lot of people that previously didn't have a lot of problems to vacuuming being *massively* more aggressive, basically permanently running on an insert only table, will be bad. > I fully agree with your point a) - should that be part of the patch? We can just make it a seperate patch committed shortly afterwards. > I am not sure about b). In my mind, the objective is not to prevent > anti-wraparound vacuums, but to see that they have less work to do, > because previous autovacuum runs already have frozen anything older than > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > to freeze during any run would be at most one fourth of today's number > when we hit autovacuum_freeze_max_age. This whole chain of arguments seems like it actually has little to do with vacuuming insert only/mostly tables. The same problem exists for tables that aren't insert only/mostly. Instead it IMO is an argument for a general change in logic about when to freeze. What exactly is it that you want to achieve by having anti-wrap vacuums be quicker? If the goal is to reduce the window in which autovacuums aren't automatically cancelled when there's a conflicting lock request, or in which autovacuum just schedules based on xid age, then you can't have wraparound vacuums needing to do substantial amount of work. Except for not auto-cancelling, and the autovac scheduling issue, there's really nothing magic about anti-wrap vacuums. If the goal is to avoid redundant writes, then it's largely unrelated to anti-wrap vacuums, and can to a large degree addressed by opportunistically freezing (best even during hot pruning!). I am more and more convinced that it's a seriously bad idea to tie committing "autovacuum after inserts" to also committing a change in logic around freezing. That's not to say we shouldn't try to address both this cycle, but discussing them as if they really are one item makes it both more likely that we get nothing in, and more likely that we miss the larger picture. > I am still sorry to see more proactive freezing go, which would > reduce the impact for truly insert-only tables. > After sleeping on it, here is one last idea. > > Granted, freezing with vacuum_freeze_min_age = 0 poses a problem > for those parts of the table that will receive updates or deletes. IMO it's not at all just those regions that are potentially negatively affected: If there are no other modifications to the page, more aggressively freezing can lead to seriously increased write volume. Its quite normal to have databases where data in insert only tables *never* gets old enough to need to be frozen (either because xid usage is low, or because older partitions are dropped). If data in an insert-only table isn't write-only, the hint bits are likely to already be set, which means that vacuum will just cause the entire table to be written another time, without a reason. I don't see how it's ok to substantially regress this very common workload. IMO this basically means that more aggressively and non-opportunistically freezing simply is a no-go (be it for insert or other causes for vacuuming). What am I missing? Greetings, Andres Freund
> > According to my reckoning, that is the remaining objection to the patch > > as it is (with ordinary freezing behavior). > > > > How about a scale_factor od 0.005? That will be high enough for large > > tables, which seem to be the main concern here. > > Seems low on a first blush. On a large-ish table with 1 billion tuples, > we'd vacuum every 5 million inserts. For many ETL workloads this will > result in a vacuum after every bulk operation. Potentially with an index > scan associated (even if there's no errors, a lot of bulk loads use ON > CONFLICT INSERT leading to the occasional update). This is a good and wanted thing. Upthread it was already suggested that "everyone knows to vacuum after bulk operations". This will go and vacuum the data while it's hot and in caches, not afterwards, reading from disk. > > I am not sure about b). In my mind, the objective is not to prevent > > anti-wraparound vacuums, but to see that they have less work to do, > > because previous autovacuum runs already have frozen anything older than > > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > > to freeze during any run would be at most one fourth of today's number > > when we hit autovacuum_freeze_max_age. > > This whole chain of arguments seems like it actually has little to do > with vacuuming insert only/mostly tables. The same problem exists for > tables that aren't insert only/mostly. Instead it IMO is an argument for > a general change in logic about when to freeze. > > What exactly is it that you want to achieve by having anti-wrap vacuums > be quicker? If the goal is to reduce the window in which autovacuums > aren't automatically cancelled when there's a conflicting lock request, > or in which autovacuum just schedules based on xid age, then you can't > have wraparound vacuums needing to do substantial amount of work. The problem hit by Mandrill is simple: in modern cloud environments it's sometimes simply impossible to read all the data on disk because of different kinds of throttling. At some point your production database just shuts down and asks to VACUUM in single user mode for 40 days. You want vacuum to happen long before that, preferably when the data is still in RAM, or, at least, fits your cloud provider's disk burst performance budget, where performance of block device resembles that of an SSD and not of a Floppy Disk. Some more reading on how that works: https://aws.amazon.com/ru/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/ -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
Hi, On 2020-03-19 20:47:40 +0100, Laurenz Albe wrote: > On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote: > > I've attached a small fix which I'd like to apply to your v8 patch. > > With that, and pending one final look, I'd like to push this during my > > Monday (New Zealand time). So if anyone strongly objects to that, > > please state their case before then. I am doubtful it should be committed with the current settings. See below. > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001 > From: Laurenz Albe <laurenz.albe@cybertec.at> > Date: Thu, 19 Mar 2020 20:26:43 +0100 > Subject: [PATCH] Autovacuum tables that have received only inserts > > Add "autovacuum_vacuum_insert_threshold" and > "autovacuum_vacuum_insert_scale_factor" GUC and reloption. > The default value for the threshold is 10000000; > the scale factor defaults to 0.01. > > Any table that has received more inserts since it was > last vacuumed (and that is not vacuumed for another > reason) will be autovacuumed. > > This avoids the known problem that insert-only tables > are never autovacuumed until they need to have their > anti-wraparound autovacuum, which then can be massive > and disruptive. Shouldn't this also mention index only scans? IMO that's at least as big a problem as the "large vacuum" problem. > + <varlistentry id="guc-autovacuum-vacuum-insert-threshold" xreflabel="autovacuum_vacuum_insert_threshold"> > + <term><varname>autovacuum_vacuum_insert_threshold</varname> (<type>integer</type>) > + <indexterm> > + <primary><varname>autovacuum_vacuum_insert_threshold</varname></primary> > + <secondary>configuration parameter</secondary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies the number of inserted tuples needed to trigger a > + <command>VACUUM</command> in any one table. > + The default is 10000000 tuples. > + This parameter can only be set in the <filename>postgresql.conf</filename> > + file or on the server command line; > + but the setting can be overridden for individual tables by > + changing table storage parameters. > + </para> > + </listitem> > + </varlistentry> > + > <varlistentry id="guc-autovacuum-analyze-threshold" xreflabel="autovacuum_analyze_threshold"> > <term><varname>autovacuum_analyze_threshold</varname> (<type>integer</type>) > <indexterm> > @@ -7342,6 +7362,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; > </listitem> > </varlistentry> > > + <varlistentry id="guc-autovacuum-vacuum-insert-scale-factor" xreflabel="autovacuum_vacuum_insert_scale_factor"> > + <term><varname>autovacuum_vacuum_insert_scale_factor</varname> (<type>floating point</type>) > + <indexterm> > + <primary><varname>autovacuum_vacuum_insert_scale_factor</varname></primary> > + <secondary>configuration parameter</secondary> > + </indexterm> > + </term> > + <listitem> > + <para> > + Specifies a fraction of the table size to add to > + <varname>autovacuum_vacuum_insert_threshold</varname> > + when deciding whether to trigger a <command>VACUUM</command>. > + The default is 0.01 (1% of table size). > + This parameter can only be set in the <filename>postgresql.conf</filename> > + file or on the server command line; > + but the setting can be overridden for individual tables by > + changing table storage parameters. > + </para> > + </listitem> > + </varlistentry> > + I am *VERY* doubtful that the attempt of using a large threshold, and a tiny scale factor, is going to work out well. I'm not confident enough in my gut feeling to full throatedly object, but confident enough that I'd immediately change it on any important database I operated. Independent of how large a constant you set the threshold to, for databases with substantially bigger tables this will lead to [near] constant vacuuming. As soon as you hit 1 billion rows - which isn't actually that much - this is equivalent to setting autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where that can be a sensible setting, but I don't think anybody would suggest it as a default. After thinking about it for a while, I think it's fundamentally flawed to use large constant thresholds to avoid unnecessary vacuums. It's easy to see cases where it's bad for common databases of today, but it'll be much worse a few years down the line where common table sizes have grown by a magnitude or two. Nor do they address the difference between tables of a certain size with e.g. 2kb wide rows, and a same sized table with 28 byte wide rows. The point of constant thresholds imo can only be to avoid unnecessary work at the *small* (even tiny) end, not the opposite. I think there's too much "reinventing" autovacuum scheduling in a "local" insert-only manner happening in this thread. And as far as I can tell additionally only looking at a somewhat narrow slice of insert only workloads. I, again, strongly suggest using much more conservative values here. And then try to address the shortcomings - like not freezing aggressively enough - in separate patches (and by now separate releases, in all likelihood). This will have a huge impact on a lot of postgres installations. Autovacuum already is perceived as one of the biggest issues around postgres. If the ratio of cases where these changes improve things to the cases it regresses isn't huge, it'll be painful (silent improvements are obviously less noticed than breakages). Greetings, Andres Freund
Hi, On 2020-03-20 01:11:23 +0300, Darafei "Komяpa" Praliaskouski wrote: > > > According to my reckoning, that is the remaining objection to the patch > > > as it is (with ordinary freezing behavior). > > > > > > How about a scale_factor od 0.005? That will be high enough for large > > > tables, which seem to be the main concern here. > > > > Seems low on a first blush. On a large-ish table with 1 billion tuples, > > we'd vacuum every 5 million inserts. For many ETL workloads this will > > result in a vacuum after every bulk operation. Potentially with an index > > scan associated (even if there's no errors, a lot of bulk loads use ON > > CONFLICT INSERT leading to the occasional update). > > This is a good and wanted thing. I don't think that's true in general. As proposed this can increase the overall amount of IO (both reads and writes) due to vacuum by a *LOT*. > Upthread it was already suggested that "everyone knows to vacuum after > bulk operations". This will go and vacuum the data while it's hot and > in caches, not afterwards, reading from disk. For many bulk load cases the data will not be in cache, in particular not when individual bulk inserts are more than a few gigabytes. > The problem hit by Mandrill is simple: in modern cloud environments > it's sometimes simply impossible to read all the data on disk because > of different kinds of throttling. Yes. Which is one of the reasons why this has the potential to cause serious issues. The proposed changes very often will *increase* the total amount of IO. A good fraction of the time that will be "hidden" by caching, but it'll be far from all the time. > At some point your production database just shuts down and asks to > VACUUM in single user mode for 40 days. That basically has nothing to do with what we're talking about here. The default wraparound trigger is 200 million xids, and shutdowns start at more than 2 billion xids. If an anti-wrap autovacuum can't finish within 2 billion rows, then this won't be addressed by vacuuming more frequently (including more frequent index scans, causing a lot more IO!). Greetings, Andres Freund
On Fri, 20 Mar 2020 at 11:17, Andres Freund <andres@anarazel.de> wrote: > I think there's too much "reinventing" autovacuum scheduling in a > "local" insert-only manner happening in this thread. And as far as I can > tell additionally only looking at a somewhat narrow slice of insert only > workloads. I understand your concern and you might be right. However, I think the main reason that the default settings for the new threshold and scale factor has deviated this far from the existing settings is regarding the example of a large insert-only table that receives inserts of 1 row per xact. If we were to copy the existing settings then when that table gets to 1 billion rows, it would be eligible for an insert-vacuum after 200 million tuples/xacts, which does not help the situation since an anti-wraparound vacuum would be triggering then anyway. I'm unsure if it will help with the discussion, but I put together a quick and dirty C program to show when a table will be eligible for an auto-vacuum with the given scale_factor and threshold $ gcc -O2 vacuum.c -o vacuum $ ./vacuum Syntax ./vacuum <scale_factor> <threshold> <maximum table size in rows> $ ./vacuum 0.01 10000000 100000000000 | tail -n 1 Vacuum 463 at 99183465731 reltuples, 991915456 inserts $ ./vacuum 0.2 50 100000000000 | tail -n 1 Vacuum 108 at 90395206733 reltuples, 15065868288 inserts So, yeah, certainly, there are more than four times as many vacuums with an insert-only table of 100 billion rows using the proposed settings vs the defaults for the existing scale_factor and threshold. However, at the tail end of the first run there, we were close to a billion rows (991,915,456) between vacucums. Is that too excessive? I'm sharing this in the hope that it'll make it easy to experiment with settings which we can all agree on. For a 1 billion row table, the proposed settings give us 69 vacuums and the standard settings 83.
Attachment
Hi, On 2020-03-20 15:05:03 +1300, David Rowley wrote: > On Fri, 20 Mar 2020 at 11:17, Andres Freund <andres@anarazel.de> wrote: > > I think there's too much "reinventing" autovacuum scheduling in a > > "local" insert-only manner happening in this thread. And as far as I can > > tell additionally only looking at a somewhat narrow slice of insert only > > workloads. > > I understand your concern and you might be right. However, I think the > main reason that the default settings for the new threshold and scale > factor has deviated this far from the existing settings is regarding > the example of a large insert-only table that receives inserts of 1 > row per xact. If we were to copy the existing settings then when that > table gets to 1 billion rows, it would be eligible for an > insert-vacuum after 200 million tuples/xacts, which does not help the > situation since an anti-wraparound vacuum would be triggering then > anyway. Sure, that'd happen for inserts that happen after that threshold. I'm just not convinced that this is as huge a problem as presented in this thread. And I'm fairly convinced the proposed solution is the wrong direction to go into. It's not like that's not an issue for updates? If you update one row per transaction, then you run into exactly the same issue for a table of the same size? You maybe could argue that it's more common to insert 1 billion tuples in individual transaction, than it is to update 1 billion tuples in individual transactions, but I don't think it's a huge difference if it even exist. In fact the problem is worse for the update case, because that tends to generate a lot more random looking IO during vacuum (both because only parts of the table are updated causing small block reads/writes, and because it will need [multiple] index scans/vacuum, and because the vacuum is a lot more expensive CPU time wise). Imo this line of reasoning is about adding autovacuum scheduling based on xid age, not about insert only workloads. Greetings, Andres Freund
On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote: > I am doubtful it should be committed with the current settings. See below. > > > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001 > > From: Laurenz Albe <laurenz.albe@cybertec.at> > > Date: Thu, 19 Mar 2020 20:26:43 +0100 > > Subject: [PATCH] Autovacuum tables that have received only inserts > > > > This avoids the known problem that insert-only tables > > are never autovacuumed until they need to have their > > anti-wraparound autovacuum, which then can be massive > > and disruptive. > > Shouldn't this also mention index only scans? IMO that's at least as big > a problem as the "large vacuum" problem. Yes, that would be good. > I am *VERY* doubtful that the attempt of using a large threshold, and a > tiny scale factor, is going to work out well. I'm not confident enough > in my gut feeling to full throatedly object, but confident enough that > I'd immediately change it on any important database I operated. > > Independent of how large a constant you set the threshold to, for > databases with substantially bigger tables this will lead to [near] > constant vacuuming. As soon as you hit 1 billion rows - which isn't > actually that much - this is equivalent to setting > autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where > that can be a sensible setting, but I don't think anybody would suggest > it as a default. In that, you are assuming that the bigger a table is, the more data modifications it will get, so that making the scale factor the dominant element will work out better. My experience is that it is more likely for the change rate (inserts, I am less certain about updates and deletes) to be independent of the table size. (Too) many large databases are so large not because the data influx grows linearly over time, but because people don't want to get rid of old data (or would very much like to do so, but never planned for it). This second scenario would be much better served by a high threshold and a low scale factor. > After thinking about it for a while, I think it's fundamentally flawed > to use large constant thresholds to avoid unnecessary vacuums. It's easy > to see cases where it's bad for common databases of today, but it'll be > much worse a few years down the line where common table sizes have grown > by a magnitude or two. Nor do they address the difference between tables > of a certain size with e.g. 2kb wide rows, and a same sized table with > 28 byte wide rows. The point of constant thresholds imo can only be to > avoid unnecessary work at the *small* (even tiny) end, not the opposite. > > > I think there's too much "reinventing" autovacuum scheduling in a > "local" insert-only manner happening in this thread. And as far as I can > tell additionally only looking at a somewhat narrow slice of insert only > workloads. Perhaps. The traditional "high scale factor, low threshold" system is (in my perception) mostly based on the objective of cleaning up dead tuples. When autovacuum was introduced, index only scans were only a dream. With the objective of getting rid of dead tuples, having the scale factor be the dominant part makes sense: it is OK for bloat to be a certain percentage of the table size. Also, as you say, tables were much smaller then, and they will only become bigger in the future. But I find that to be an argument *for* making the threshold the dominant element: otherwise, you vacuum less and less often, and the individual runs become larger and larger. Now that vacuum skips pages where it knows it has nothing to do, doesn't take away much of the pain of vacuuming large tables where nothing much has changed? > I, again, strongly suggest using much more conservative values here. And > then try to address the shortcomings - like not freezing aggressively > enough - in separate patches (and by now separate releases, in all > likelihood). There is much to say for that, I agree. > This will have a huge impact on a lot of postgres > installations. Autovacuum already is perceived as one of the biggest > issues around postgres. If the ratio of cases where these changes > improve things to the cases it regresses isn't huge, it'll be painful > (silent improvements are obviously less noticed than breakages). Yes, that makes it scary to mess with autovacuum. One of the problems I see in the course of this discussion is that one can always come up with examples that make any choice look bad. It is impossible to do it right for everybody. In the light of that, I won't object to a more conservative default value for the parameters, even though my considerations above suggest to me the opposite. But perhaps my conclusions are based on flawed premises. Yours, Laurenz Albe
On Thu, 2020-03-19 at 14:38 -0700, Andres Freund wrote: > > I am not sure about b). In my mind, the objective is not to prevent > > anti-wraparound vacuums, but to see that they have less work to do, > > because previous autovacuum runs already have frozen anything older than > > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > > to freeze during any run would be at most one fourth of today's number > > when we hit autovacuum_freeze_max_age. > > This whole chain of arguments seems like it actually has little to do > with vacuuming insert only/mostly tables. The same problem exists for > tables that aren't insert only/mostly. Instead it IMO is an argument for > a general change in logic about when to freeze. My goal was to keep individual vacuum runs from having too much work to do. The freezing was an afterthought. The difference (for me) is that I am more convinced that the insert rate for insert-only table is constant over time than I am of the update rate to be constant. > What exactly is it that you want to achieve by having anti-wrap vacuums > be quicker? If the goal is to reduce the window in which autovacuums > aren't automatically cancelled when there's a conflicting lock request, > or in which autovacuum just schedules based on xid age, then you can't > have wraparound vacuums needing to do substantial amount of work. > > Except for not auto-cancelling, and the autovac scheduling issue, > there's really nothing magic about anti-wrap vacuums. Yes. I am under the impression that it is the duration and amount of work per vacuum run that is the problem here, not the aggressiveness as such. If you are in the habit of frequently locking tables with high lock modes (and I have seen people do that), you are lost anyway: normal autovacuum runs will always die, and anti-wraparound vacuum will kill you. There is nothing we can do about that, except perhaps put a fat warning in the documentation of LOCK. > If the goal is to avoid redundant writes, then it's largely unrelated to > anti-wrap vacuums, and can to a large degree addressed by > opportunistically freezing (best even during hot pruning!). > > > I am more and more convinced that it's a seriously bad idea to tie > committing "autovacuum after inserts" to also committing a change in > logic around freezing. That's not to say we shouldn't try to address > both this cycle, but discussing them as if they really are one item > makes it both more likely that we get nothing in, and more likely that > we miss the larger picture. I hear you, and I agree that we shouldn't do it with this patch. > If there are no other modifications to the page, more aggressively > freezing can lead to seriously increased write volume. Its quite normal > to have databases where data in insert only tables *never* gets old > enough to need to be frozen (either because xid usage is low, or because > older partitions are dropped). If data in an insert-only table isn't > write-only, the hint bits are likely to already be set, which means that > vacuum will just cause the entire table to be written another time, > without a reason. > > > I don't see how it's ok to substantially regress this very common > workload. IMO this basically means that more aggressively and > non-opportunistically freezing simply is a no-go (be it for insert or > other causes for vacuuming). > > What am I missing? Nothing that I can see, and these are good examples why eager freezing may not be such a smart idea after all. I think your idea of freezing everything on a page when we know it is going to be dirtied anyway is the smartest way of going about that. My only remaining quibbles are about scale factor and threshold, see my other mail. Yours, Laurenz Albe
Hi, On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote: > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote: > > I don't think a default scale factor of 0 is going to be ok. For > > large-ish tables this will basically cause permanent vacuums. And it'll > > sometimes trigger for tables that actually coped well so far. 10 million > > rows could be a few seconds, not more. > > > > I don't think that the argument that otherwise a table might not get > > vacuumed before autovacuum_freeze_max_age is convincing enough. > > > > a) if that's indeed the argument, we should increase the default > > autovacuum_freeze_max_age - now that there's insert triggered vacuums, > > the main argument against that from before isn't valid anymore. > > > > b) there's not really a good arguments for vacuuming more often than > > autovacuum_freeze_max_age for such tables. It'll not be not frequent > > enough to allow IOS for new data, and you're not preventing > > anti-wraparound vacuums from happening. > > According to my reckoning, that is the remaining objection to the patch > as it is (with ordinary freezing behavior). > > How about a scale_factor od 0.005? That will be high enough for large > tables, which seem to be the main concern here. > > I fully agree with your point a) - should that be part of the patch? > > I am not sure about b). In my mind, the objective is not to prevent > anti-wraparound vacuums, but to see that they have less work to do, > because previous autovacuum runs already have frozen anything older than > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > to freeze during any run would be at most one fourth of today's number > when we hit autovacuum_freeze_max_age. Based on two IM conversations I think it might be worth emphasizing how vacuum_cleanup_index_scale_factor works: For btree, even if there is not a single deleted tuple, we can *still* end up doing a full index scans at the end of vacuum. As the docs describe vacuum_cleanup_index_scale_factor: <para> Specifies the fraction of the total number of heap tuples counted in the previous statistics collection that can be inserted without incurring an index scan at the <command>VACUUM</command> cleanup stage. This setting currently applies to B-tree indexes only. </para> I.e. with the default settings we will perform a whole-index scan (without visibility map or such) after every 10% growth of the table. Which means that, even if the visibility map prevents repeated tables accesses, increasing the rate of vacuuming for insert-only tables can cause a lot more whole index scans. Which means that vacuuming an insert-only workload frequently *will* increase the total amount of IO, even if there is not a single dead tuple. Rather than just spreading the same amount of IO over more vacuums. And both gin and gist just always do a full index scan, regardless of vacuum_cleanup_index_scale_factor (either during a bulk delete, or during the cleanup). Thus more frequent vacuuming for insert-only tables can cause a *lot* of pain (even an approx quadratic increase of IO? O(increased_frequency * peak_index_size)?) if you have large indexes - which is very common for gin/gist. Is there something missing in the above description? Greetings, Andres Freund
Hi, On 2020-03-20 06:59:57 +0100, Laurenz Albe wrote: > On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote: > > I am *VERY* doubtful that the attempt of using a large threshold, and a > > tiny scale factor, is going to work out well. I'm not confident enough > > in my gut feeling to full throatedly object, but confident enough that > > I'd immediately change it on any important database I operated. > > > > Independent of how large a constant you set the threshold to, for > > databases with substantially bigger tables this will lead to [near] > > constant vacuuming. As soon as you hit 1 billion rows - which isn't > > actually that much - this is equivalent to setting > > autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where > > that can be a sensible setting, but I don't think anybody would suggest > > it as a default. > > In that, you are assuming that the bigger a table is, the more data > modifications it will get, so that making the scale factor the dominant > element will work out better. > My experience is that it is more likely for the change rate (inserts, > I am less certain about updates and deletes) to be independent of the > table size. (Too) many large databases are so large not because the > data influx grows linearly over time, but because people don't want to > get rid of old data (or would very much like to do so, but never planned > for it). I don't think growing ingest rate into insert only tables is exactly rare. Maybe I've been too long in the Bay Area though. > This second scenario would be much better served by a high threshold and > a low scale factor. I don't think that's really true. As soon as there's any gin/gist indexes, a single non-HOT dead tuple, or a btree index grew by more than vacuum_cleanup_index_scale_factor, indexes are scanned as a whole. See the email I just concurrently happened to write: https://postgr.es/m/20200320062031.uwagypenawujwajx%40alap3.anarazel.de Which means that often each additional vacuum causes IO that's proportional to the *total* index size, *not* the table size delta. Which means that the difference in total IO basically is O(increased_frequency * peak_table_size) in the worst case. > > After thinking about it for a while, I think it's fundamentally flawed > > to use large constant thresholds to avoid unnecessary vacuums. It's easy > > to see cases where it's bad for common databases of today, but it'll be > > much worse a few years down the line where common table sizes have grown > > by a magnitude or two. Nor do they address the difference between tables > > of a certain size with e.g. 2kb wide rows, and a same sized table with > > 28 byte wide rows. The point of constant thresholds imo can only be to > > avoid unnecessary work at the *small* (even tiny) end, not the opposite. > > > > > > I think there's too much "reinventing" autovacuum scheduling in a > > "local" insert-only manner happening in this thread. And as far as I can > > tell additionally only looking at a somewhat narrow slice of insert only > > workloads. > > Perhaps. The traditional "high scale factor, low threshold" system > is (in my perception) mostly based on the objective of cleaning up > dead tuples. When autovacuum was introduced, index only scans were > only a dream. > > With the objective of getting rid of dead tuples, having the scale factor > be the dominant part makes sense: it is OK for bloat to be a certain > percentage of the table size. > As far as I can tell this argument doesn't make sense in light of the ob fact that many vacuums trigger whole index scans, even if there are no deleted tuples, as described above? Even disregarding the index issue, I still don't think your argument is very convicing. For one, as I mentioned in another recent email, 10 million rows in a narrow table is something entirely different than 10 million rows in a very wide table. scale_factor doesn't have that problem to the same degree. Also, it's fairly obvious that this argument doesn't hold in the general sense, otherwise we could just set a threshold of, say, 10000. There's also the issue that frequent vacuums will often not be able to mark most of the the new data all-visible, due to concurrent sessions. E.g. concurrent bulk loading sessions, analytics queries actually looking at the data, replicas all can easily prevent data that was just inserted from being marked 'all-visible' (not to speak of frozen). That's not likely to be a problem in a purely oltp system that inserts only single rows per xact, and has no longlived readers (nor replicas with hs_feedback = on), but outside of that... > Also, as you say, tables were much smaller then, and they will only > become bigger in the future. But I find that to be an argument *for* > making the threshold the dominant element: otherwise, you vacuum less > and less often, and the individual runs become larger and larger. Which mostly is ok, because there are significant costs that scale with the table size. And in a lot (but far from all!) of cases the benefits of vacuuming scale more with the overall table size than with the delta of the size. > Now that vacuum skips pages where it knows it has nothing to do, > doesn't take away much of the pain of vacuuming large tables where > nothing much has changed? Unfortunately not really. Greetings, Andres Freund
On Fri, 20 Mar 2020 at 15:20, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote: > > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote: > > > I don't think a default scale factor of 0 is going to be ok. For > > > large-ish tables this will basically cause permanent vacuums. And it'll > > > sometimes trigger for tables that actually coped well so far. 10 million > > > rows could be a few seconds, not more. > > > > > > I don't think that the argument that otherwise a table might not get > > > vacuumed before autovacuum_freeze_max_age is convincing enough. > > > > > > a) if that's indeed the argument, we should increase the default > > > autovacuum_freeze_max_age - now that there's insert triggered vacuums, > > > the main argument against that from before isn't valid anymore. > > > > > > b) there's not really a good arguments for vacuuming more often than > > > autovacuum_freeze_max_age for such tables. It'll not be not frequent > > > enough to allow IOS for new data, and you're not preventing > > > anti-wraparound vacuums from happening. > > > > According to my reckoning, that is the remaining objection to the patch > > as it is (with ordinary freezing behavior). > > > > How about a scale_factor od 0.005? That will be high enough for large > > tables, which seem to be the main concern here. > > > > I fully agree with your point a) - should that be part of the patch? > > > > I am not sure about b). In my mind, the objective is not to prevent > > anti-wraparound vacuums, but to see that they have less work to do, > > because previous autovacuum runs already have frozen anything older than > > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > > to freeze during any run would be at most one fourth of today's number > > when we hit autovacuum_freeze_max_age. > > Based on two IM conversations I think it might be worth emphasizing how > vacuum_cleanup_index_scale_factor works: > > For btree, even if there is not a single deleted tuple, we can *still* > end up doing a full index scans at the end of vacuum. As the docs describe > vacuum_cleanup_index_scale_factor: > > <para> > Specifies the fraction of the total number of heap tuples counted in > the previous statistics collection that can be inserted without > incurring an index scan at the <command>VACUUM</command> cleanup stage. > This setting currently applies to B-tree indexes only. > </para> > > I.e. with the default settings we will perform a whole-index scan > (without visibility map or such) after every 10% growth of the > table. Which means that, even if the visibility map prevents repeated > tables accesses, increasing the rate of vacuuming for insert-only tables > can cause a lot more whole index scans. Which means that vacuuming an > insert-only workload frequently *will* increase the total amount of IO, > even if there is not a single dead tuple. Rather than just spreading the > same amount of IO over more vacuums. Right. > > And both gin and gist just always do a full index scan, regardless of > vacuum_cleanup_index_scale_factor (either during a bulk delete, or > during the cleanup). Thus more frequent vacuuming for insert-only > tables can cause a *lot* of pain (even an approx quadratic increase of > IO? O(increased_frequency * peak_index_size)?) if you have large > indexes - which is very common for gin/gist. That's right but for gin, more frequent vacuuming for insert-only tables can help to clean up the pending list, which increases search speed and better than doing it by a backend process. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 2020-03-19 at 23:20 -0700, Andres Freund wrote: > I am not sure about b). In my mind, the objective is not to prevent > > anti-wraparound vacuums, but to see that they have less work to do, > > because previous autovacuum runs already have frozen anything older than > > vacuum_freeze_min_age. So, assuming linear growth, the number of tuples > > to freeze during any run would be at most one fourth of today's number > > when we hit autovacuum_freeze_max_age. > > Based on two IM conversations I think it might be worth emphasizing how > vacuum_cleanup_index_scale_factor works: > > For btree, even if there is not a single deleted tuple, we can *still* > end up doing a full index scans at the end of vacuum. As the docs describe > vacuum_cleanup_index_scale_factor: > > <para> > Specifies the fraction of the total number of heap tuples counted in > the previous statistics collection that can be inserted without > incurring an index scan at the <command>VACUUM</command> cleanup stage. > This setting currently applies to B-tree indexes only. > </para> > > I.e. with the default settings we will perform a whole-index scan > (without visibility map or such) after every 10% growth of the > table. Which means that, even if the visibility map prevents repeated > tables accesses, increasing the rate of vacuuming for insert-only tables > can cause a lot more whole index scans. Which means that vacuuming an > insert-only workload frequently *will* increase the total amount of IO, > even if there is not a single dead tuple. Rather than just spreading the > same amount of IO over more vacuums. > > And both gin and gist just always do a full index scan, regardless of > vacuum_cleanup_index_scale_factor (either during a bulk delete, or > during the cleanup). Thus more frequent vacuuming for insert-only > tables can cause a *lot* of pain (even an approx quadratic increase of > IO? O(increased_frequency * peak_index_size)?) if you have large > indexes - which is very common for gin/gist. Ok, ok. Thanks for the explanation. In the light of that, I agree that we should increase the scale_factor. Yours, Laurenz Albe
On Fri, 2020-03-20 at 14:43 +0100, Laurenz Albe wrote: > I.e. with the default settings we will perform a whole-index scan > > (without visibility map or such) after every 10% growth of the > > table. Which means that, even if the visibility map prevents repeated > > tables accesses, increasing the rate of vacuuming for insert-only tables > > can cause a lot more whole index scans. Which means that vacuuming an > > insert-only workload frequently *will* increase the total amount of IO, > > even if there is not a single dead tuple. Rather than just spreading the > > same amount of IO over more vacuums. > > > > And both gin and gist just always do a full index scan, regardless of > > vacuum_cleanup_index_scale_factor (either during a bulk delete, or > > during the cleanup). Thus more frequent vacuuming for insert-only > > tables can cause a *lot* of pain (even an approx quadratic increase of > > IO? O(increased_frequency * peak_index_size)?) if you have large > > indexes - which is very common for gin/gist. > > In the light of that, I agree that we should increase the scale_factor. Here is version 10 of the patch, which uses a scale factor of 0.2. Yours, Laurenz Albe
Attachment
On Mon, 2020-03-23 at 14:27 +0100, Laurenz Albe wrote: > Here is version 10 of the patch, which uses a scale factor of 0.2. This patch should be what everybody can live with. It would be good if we can get at least that committed before feature freeze. Yours, Laurenz Albe
On Mon, Mar 23, 2020 at 02:27:29PM +0100, Laurenz Albe wrote: > Here is version 10 of the patch, which uses a scale factor of 0.2. Thanks > Any table that has received more inserts since it was > last vacuumed (and that is not vacuumed for another > reason) will be autovacuumed. Since this vacuum doesn't trigger any special behavior (freeze), you can remove the parenthesized part: "(and that is not vacuumed for another reason)". Maybe in the docs you can write this with thousands separators: 10,000,000 It looks like the GUC uses scale factor max=1e10, but the relopt is still max=100, which means it's less possible to disable for a single rel. > +++ b/src/backend/access/common/reloptions.c > @@ -398,6 +407,15 @@ static relopt_real realRelOpts[] = > }, > -1, 0.0, 100.0 > }, > + { > + { > + "autovacuum_vacuum_insert_scale_factor", > + "Number of tuple inserts prior to vacuum as a fraction of reltuples", > + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, > + ShareUpdateExclusiveLock > + }, > + -1, 0.0, 100.0 > + }, > { > { > "autovacuum_analyze_scale_factor", > +++ b/src/backend/utils/misc/guc.c > @@ -3549,6 +3558,17 @@ static struct config_real ConfigureNamesReal[] = > 0.2, 0.0, 100.0, > NULL, NULL, NULL > }, > + > + { > + {"autovacuum_vacuum_insert_scale_factor", PGC_SIGHUP, AUTOVACUUM, > + gettext_noop("Number of tuple inserts prior to vacuum as a fraction of reltuples."), > + NULL > + }, > + &autovacuum_vac_ins_scale, > + 0.2, 0.0, 1e10, > + NULL, NULL, NULL > + }, > + > { > {"autovacuum_analyze_scale_factor", PGC_SIGHUP, AUTOVACUUM, > gettext_noop("Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples."),
On 2020-Mar-25, Justin Pryzby wrote: > Maybe in the docs you can write this with thousands separators: 10,000,000 > > It looks like the GUC uses scale factor max=1e10, but the relopt is still > max=100, which means it's less possible to disable for a single rel. I have paid no attention to this thread, but how does it make sense to have a scale factor to be higher than 100? Surely you mean the threshold value that should be set to ten million, not the scale factor? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 25, 2020 at 12:46:52PM -0300, Alvaro Herrera wrote: > On 2020-Mar-25, Justin Pryzby wrote: > > > Maybe in the docs you can write this with thousands separators: 10,000,000 > > > > It looks like the GUC uses scale factor max=1e10, but the relopt is still > > max=100, which means it's less possible to disable for a single rel. > > I have paid no attention to this thread, but how does it make sense to > have a scale factor to be higher than 100? Surely you mean the > threshold value that should be set to ten million, not the scale factor? We went over this here: https://www.postgresql.org/message-id/20200317195616.GZ26184%40telsasoft.com ... https://www.postgresql.org/message-id/20200317213426.GB26184%40telsasoft.com The scale factor is relative to the reltuples estimate, which comes from vacuum (which presently doesn't run against insert-only tables, and what we're trying to schedule), or analyze, which probably runs adequately, but might be disabled or run too infrequently. Since we talked about how scale_factor can be used to effectively disable this new feature, I thought that scale=100 was too small and suggesed 1e10 (same as max for vacuum_cleanup_index_scale_factor since 4d54543ef). That should allow handling the case that analyze is disabled, or its threshold is high, or it hasn't run yet, or it's running but hasn't finished, or analyze is triggered as same time as vacuum. A table with 1e7 tuples (threshold) into which one inserts 1e9 tuples would hit scale_factor=100 threshold, which means scale_factor failed to "disable" the feature, as claimed. If anything, I think it may need to be larger... -- Justin
Hi, On 2020-03-25 11:05:21 -0500, Justin Pryzby wrote: > Since we talked about how scale_factor can be used to effectively disable this > new feature, I thought that scale=100 was too small and suggesed 1e10 (same as > max for vacuum_cleanup_index_scale_factor since 4d54543ef). That should allow > handling the case that analyze is disabled, or its threshold is high, or it > hasn't run yet, or it's running but hasn't finished, or analyze is triggered as > same time as vacuum. For disabling we instead should allow -1, and disable the feature if set to < 0. Greetings, Andres Freund
On Wed, Mar 25, 2020 at 10:26 PM Andres Freund <andres@anarazel.de> wrote: > On 2020-03-25 11:05:21 -0500, Justin Pryzby wrote: > > Since we talked about how scale_factor can be used to effectively disable this > > new feature, I thought that scale=100 was too small and suggesed 1e10 (same as > > max for vacuum_cleanup_index_scale_factor since 4d54543ef). That should allow > > handling the case that analyze is disabled, or its threshold is high, or it > > hasn't run yet, or it's running but hasn't finished, or analyze is triggered as > > same time as vacuum. > > For disabling we instead should allow -1, and disable the feature if set > to < 0. This patch introduces both GUC and reloption. In reloptions we typically use -1 for "disable reloption, use GUC value instead" semantics. So it's unclear how should we allow reloption to both disable feature and disable reloption. I think we don't have a precedent in the codebase yet. We could allow -2 (disable reloption) and -1 (disable feature) for reloption. Opinions? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, 2020-03-25 at 23:19 +0300, Alexander Korotkov wrote: > On Wed, Mar 25, 2020 at 10:26 PM Andres Freund <andres@anarazel.de> wrote: > > On 2020-03-25 11:05:21 -0500, Justin Pryzby wrote: > > > Since we talked about how scale_factor can be used to effectively disable this > > > new feature, I thought that scale=100 was too small and suggesed 1e10 (same as > > > max for vacuum_cleanup_index_scale_factor since 4d54543ef). That should allow > > > handling the case that analyze is disabled, or its threshold is high, or it > > > hasn't run yet, or it's running but hasn't finished, or analyze is triggered as > > > same time as vacuum. > > > > For disabling we instead should allow -1, and disable the feature if set > > to < 0. > > This patch introduces both GUC and reloption. In reloptions we > typically use -1 for "disable reloption, use GUC value instead" > semantics. So it's unclear how should we allow reloption to both > disable feature and disable reloption. I think we don't have a > precedent in the codebase yet. We could allow -2 (disable reloption) > and -1 (disable feature) for reloption. Opinions? Here is patch v11, where the reloption has the same upper limit 1e10 as the GUC. There is no good reason to have them different. I am reluctant to introduce new semantics like a reloption value of -2 to disable a feature in this patch right before feature freeze. I believe there are enough options to disable insert-only vacuuming for an individual table: - Set the threshold to 2147483647. True, that will not work for very large tables, but I think that there are few tables that insert that many rows before they hit autovacuum_freeze_max_age anyway. - Set the scale factor to some astronomical value. Yours, Laurenz Albe
Hi, On 2020-03-26 10:12:39 +0100, Laurenz Albe wrote: > On Wed, 2020-03-25 at 23:19 +0300, Alexander Korotkov wrote: > I am reluctant to introduce new semantics like a reloption value of -2 > to disable a feature in this patch right before feature freeze. > > I believe there are enough options to disable insert-only vacuuming for > an individual table: > - Set the threshold to 2147483647. True, that will not work for very > large tables, but I think that there are few tables that insert that > many rows before they hit autovacuum_freeze_max_age anyway. > > - Set the scale factor to some astronomical value. Meh. You *are* adding new semantics with these. And they're terrible. Greetings, Andres Freund
On Fri, 27 Mar 2020 at 07:51, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-03-26 10:12:39 +0100, Laurenz Albe wrote: > > On Wed, 2020-03-25 at 23:19 +0300, Alexander Korotkov wrote: > > I am reluctant to introduce new semantics like a reloption value of -2 > > to disable a feature in this patch right before feature freeze. > > > > I believe there are enough options to disable insert-only vacuuming for > > an individual table: > > > - Set the threshold to 2147483647. True, that will not work for very > > large tables, but I think that there are few tables that insert that > > many rows before they hit autovacuum_freeze_max_age anyway. > > > > - Set the scale factor to some astronomical value. > > Meh. You *are* adding new semantics with these. And they're terrible. I've modified this to allow a proper way to disable the entire feature by allowing the setting to be set to -1 to disable the feature. I feel people are fairly used to using -1 to disable various features (e.g. log_autovacuum_min_duration). I've used the special value of -2 for the reloption to have that cascade to using the GUC instead. The autovacuum_vacuum_insert_threshold reloption may be explicitly set to -1 to disable autovacuums for inserts for the relation. I've also knocked the default threshold down to 1000. I feel this is a better value given that the scale factor is now 0.2. There seemed to be no need to exclude smaller tables from seeing gains such as index-only scans. If nobody objects, I plan to push this one shortly. David
Attachment
On Fri, 2020-03-27 at 10:18 +1300, David Rowley wrote: > > > I believe there are enough options to disable insert-only vacuuming for > > > an individual table: > > > > > - Set the threshold to 2147483647. True, that will not work for very > > > large tables, but I think that there are few tables that insert that > > > many rows before they hit autovacuum_freeze_max_age anyway. > > > > > > - Set the scale factor to some astronomical value. > > > > Meh. You *are* adding new semantics with these. And they're terrible. > > I've modified this to allow a proper way to disable the entire feature > by allowing the setting to be set to -1 to disable the feature. I feel > people are fairly used to using -1 to disable various features (e.g. > log_autovacuum_min_duration). I've used the special value of -2 for > the reloption to have that cascade to using the GUC instead. The > autovacuum_vacuum_insert_threshold reloption may be explicitly set to > -1 to disable autovacuums for inserts for the relation. > > I've also knocked the default threshold down to 1000. I feel this is a > better value given that the scale factor is now 0.2. There seemed to > be no need to exclude smaller tables from seeing gains such as > index-only scans. > > If nobody objects, I plan to push this one shortly. Thanks for the help! The new meaning of -2 should be documented, other than that it looks good to me. I'll accept the new semantics, but they don't make me happy. People are used to -1 meaning "use the GUC value instead". I still don't see why we need that. Contrary to Andres' opinion, I don't think that disabling a parameter by setting it to a value high enough that it does not take effect is a bad thing. I won't put up a fight though. Yours, Laurenz Albe
On Fri, 27 Mar 2020 at 22:40, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > The new meaning of -2 should be documented, other than that it looks > good to me. But the users don't need to know anything about -2. It's not possible to explicitly set the value to -2. This is just the reset value of the reloption which means "use the GUC". > I'll accept the new semantics, but they don't make me happy. People are > used to -1 meaning "use the GUC value instead". The problem with having -1 on the reloption meaning use the GUC, in this case, is that it means the reset value of the reloption must be -1 and we need to allow them to set -2 explicitly, and if we do that, then -1 also becomes a valid value that users can set. Maybe that's not the end of the world, but I'd rather have the reset value be unsettable by users. To me, that's less confusing as there are fewer special values to remember the meaning of. The reason I want a method to explicitly disable the feature is the fact that it's easy to document and it should reduce the number of people who are confused about the best method to disable the feature. I know there's going to be a non-zero number of people who'll want to do that.
On Sat, 2020-03-28 at 11:59 +1300, David Rowley wrote: > On Fri, 27 Mar 2020 at 22:40, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > The new meaning of -2 should be documented, other than that it looks > > good to me. > > But the users don't need to know anything about -2. It's not possible > to explicitly set the value to -2. This is just the reset value of the > reloption which means "use the GUC". I see. > > I'll accept the new semantics, but they don't make me happy. People are > > used to -1 meaning "use the GUC value instead". > > The problem with having -1 on the reloption meaning use the GUC, in > this case, is that it means the reset value of the reloption must be > -1 and we need to allow them to set -2 explicitly, and if we do that, > then -1 also becomes a valid value that users can set. Maybe that's > not the end of the world, but I'd rather have the reset value be > unsettable by users. To me, that's less confusing as there are fewer > special values to remember the meaning of. > > The reason I want a method to explicitly disable the feature is the > fact that it's easy to document and it should reduce the number of > people who are confused about the best method to disable the feature. > I know there's going to be a non-zero number of people who'll want to > do that. In the light of that, I have no objections. Yours, Laurenz Albe
On Sat, 28 Mar 2020 at 17:12, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > In the light of that, I have no objections. Thank you. Pushed. David
On Sat, 28 Mar 2020 at 19:21, David Rowley <dgrowleyml@gmail.com> wrote: > Thank you. Pushed. I'm unsure yet if this has caused an instability on lousyjack's run in [1]. I see that table does have 30,000 rows inserted, so it does seem probable that it may receive an autovacuum now when didn't before. I did a quick local test to see if swapping the "ANALYZE pagg_tab_ml;" to "VACUUM ANALYZE pagg_tab_ml;" would do the same on my local machine, but it didn't. I'll keep an eye on lousyjack's next run. If it passes next run, I may add some SQL to determine if pg_stat_all_tables.autovacuum_count for those tables are varying between passing and failing runs. David [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
David Rowley <dgrowleyml@gmail.com> writes: > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> In the light of that, I have no objections. > Thank you. Pushed. It seems like this commit has resulted in some buildfarm instability: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02 apparent change of plan https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05 unstable results in stats_ext test I initially thought that Dean's functional-stats adjustment might be the culprit, but the timestamps on these failures disprove that. regards, tom lane
On Sun, 29 Mar 2020 at 06:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > >> In the light of that, I have no objections. > > > Thank you. Pushed. > > It seems like this commit has resulted in some buildfarm instability: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02 > > apparent change of plan > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05 > > unstable results in stats_ext test Yeah, thanks for pointing that out. I'm just doing some tests locally to see if I can recreate those results after vacuuming the mcv_list table, so far I'm unable to.
On Sun, 29 Mar 2020 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote: > > On Sun, 29 Mar 2020 at 06:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > David Rowley <dgrowleyml@gmail.com> writes: > > > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > >> In the light of that, I have no objections. > > > > > Thank you. Pushed. > > > > It seems like this commit has resulted in some buildfarm instability: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02 > > > > apparent change of plan > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05 > > > > unstable results in stats_ext test > > Yeah, thanks for pointing that out. I'm just doing some tests locally > to see if I can recreate those results after vacuuming the mcv_list > table, so far I'm unable to. I'm considering pushing the attached to try to get some confirmation that additional autovacuums are the issue. However, I'm not too sure it's a wise idea to as I can trigger an additional auto-vacuum and have these new tests fail with make installcheck after setting autovacuum_naptime to 1s, but I'm not getting the other diffs experienced by lousyjack and petalura. The patch may just cause more failures without proving much, especially so with slower machines. The other idea I had was just to change the autovacuum_vacuum_insert_threshold relopt to -1 for the problem tables and see if that stabilises things. Yet another option would be to see if reltuples varies between runs and ditch the autovacuum_count column from the attached. There does not appear to be any part of the tests which would cause any dead tuples in any of the affected relations, so I'm unsure why reltuples would vary between what ANALYZE and VACUUM would set it to. I'm still thinking. Input welcome. David
Attachment
On Sun, 29 Mar 2020 at 15:29, David Rowley <dgrowleyml@gmail.com> wrote: > I'm considering pushing the attached to try to get some confirmation > that additional autovacuums are the issue. However, I'm not too sure > it's a wise idea to as I can trigger an additional auto-vacuum and > have these new tests fail with make installcheck after setting > autovacuum_naptime to 1s, but I'm not getting the other diffs > experienced by lousyjack and petalura. The patch may just cause more > failures without proving much, especially so with slower machines. Instead of the above, I ended up modifying the two intermittently failing tests to change the ANALYZE into a VACUUM ANALYZE. This should prevent autovacuum sneaking in a vacuum at some point in time after the ANALYZE has taken place. I don't believe any of the current buildfarm failures can be attributed to any of the recent changes to autovacuum, but I'll continue to monitor the farm to see if anything is suspect. David
David Rowley <dgrowleyml@gmail.com> writes: > I don't believe any of the current buildfarm failures can be > attributed to any of the recent changes to autovacuum, but I'll > continue to monitor the farm to see if anything is suspect. I agree none of the failures I see right now are related to that (there's some "No space left on device" failures, Windows randomicity, snapper's compiler bug, and don't-know-what on hyrax). But the ones that were seemingly due to that were intermittent, so we'll have to watch for awhile. regards, tom lane
On Mon, Mar 30, 2020 at 7:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I don't believe any of the current buildfarm failures can be > > attributed to any of the recent changes to autovacuum, but I'll > > continue to monitor the farm to see if anything is suspect. > > I agree none of the failures I see right now are related to that > (there's some "No space left on device" failures, Windows randomicity, > snapper's compiler bug, and don't-know-what on hyrax). > > But the ones that were seemingly due to that were intermittent, > so we'll have to watch for awhile. > Today, stats_ext failed on petalura [1]. Can it be due to this? I have also committed a patch but immediately I don't see it to be related to my commit. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03 -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Mon, Mar 30, 2020 at 7:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But the ones that were seemingly due to that were intermittent, >> so we'll have to watch for awhile. > Today, stats_ext failed on petalura [1]. Can it be due to this? I > have also committed a patch but immediately I don't see it to be > related to my commit. Yeah, this looks just like petalura's previous failures, so the problem is still there :-( regards, tom lane
On Sat, 2020-03-28 at 19:21 +1300, David Rowley wrote: > Thank you. Pushed. Thanks for your efforts on this, and thanks for working on the fallout. How can it be that even after an explicit VACUUM, this patch can cause unstable regression test results? Yours, Laurenz Albe
On Mon, 30 Mar 2020 at 17:57, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > How can it be that even after an explicit VACUUM, this patch can cause > unstable regression test results? I only added vacuums for mcv_lists. The problem with petalura [1] is with the functional_dependencies table. I'll see if I can come up with some way to do this in a more deterministic way to determine which tables to add vacuums for, rather than waiting for and reacting post-failure. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03
On Mon, 30 Mar 2020 at 19:49, David Rowley <dgrowleyml@gmail.com> wrote: > I'll see if I can come up with some way to do this in a more > deterministic way to determine which tables to add vacuums for, rather > than waiting for and reacting post-failure. I ended up running make installcheck on an instance with autovacuum_naptime set to 1s with a small additional debug line in autovacuum.c, namely: diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 7e97ffab27..ad81e321dc 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -3099,6 +3099,9 @@ relation_needs_vacanalyze(Oid relid, *dovacuum = force_vacuum || (vactuples > vacthresh) || (vac_ins_base_thresh >= 0 && instuples > vacinsthresh); *doanalyze = (anltuples > anlthresh); + + if (vac_ins_base_thresh >= 0 && instuples > vacinsthresh) + elog(LOG, "******** %s", NameStr(classForm->relname)); } else { I grepped the log after the installcheck to grab the table names that saw an insert vacuum during the test then grepped the test output to see if the table appears to pose a risk of test instability. I've classed each table with a risk factor. "VeryLow" seems like there's almost no risk because we don't ever look at EXPLAIN. Low risk tables look at EXPLAIN, but I feel are not quite looking in enough detail to cause issues. Medium risk look at EXPLAIN and I feel there's a risk of some change, I think these are all Append nodes which do order subnodes based on their cost. High risk.... those are the ones I'm about to look into changing. The full results of my analysis are: Table: agg_group_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow Table: agg_hash_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow Table: atest12 privileges.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: brin_test brin.out. Test already does VACUUM ANALYZE. Risk:VeryLow Table: bt_f8_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: bt_i4_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: bt_name_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: bt_txt_heap btree_index.out, create_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: dupindexcols create_index.out. Some looking at EXPLAIN plans, but nothing appears to look into row estimates in detail. Risk:Low Table: fast_emp4000 create_am.out, create_index.out, create_misc.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: functional_dependencies stats_ext.out. Lots of looking at EXPLAIN output. Test looks at row estimates. Risk:High Table: gist_tbl gist.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: hash_f8_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: hash_i4_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: hash_name_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: hash_txt_heap hash_index.out. Rows loaded in copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: kd_point_tbl create_index_spgist.out. Lots of looking at EXPLAIN, but nothing appears to look into row estimates in detail. Risk:Low Table: mcv_lists stats_ext.out. Lots of looking at EXPLAIN, but tests appear to VACUUM after loading rows. Risk:Low Table: mcv_lists_arrays stats_ext.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: mcv_lists_bool stats_ext.out. Lots of looking at EXPLAIN output. Test looks at row estimates. Risk:High Table: ndistinct stats_ext.out. Lots of looking at EXPLAIN output. Test looks at row estimates. Only 1000 rows are loaded initially and then 5000 after a truncate. 1000 rows won't trigger the auto-vacuum. Risk:High Table: onek Lots of files. Sees a VACUUM in sanity_check test, however, some tests run before sanity_check, e.g. create_index, select, copy, none of which appear to pay particular attention to anything vacuum might change. Risk:Low Table: pagg_tab_ml_p2_s1 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_ml_p2_s2 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_ml_p3_s1 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_ml_p3_s2 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_para_p1 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_para_p2 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pagg_tab_para_p3 partition_aggregate.out Appears to be some risk of Append reordering partitions based on cost. Risk:Medium Table: pg_attribute Seen in several tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: pg_depend Seen in several tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: pg_largeobject Seen in several tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: quad_box_tbl box.out. Sees some use of EXPLAIN, but nothing looks critical. Risk:Low Table: quad_box_tbl_ord_seq1 box.out. No EXPLAIN usage. Risk:VeryLow Table: quad_box_tbl_ord_seq2 box.out. No EXPLAIN usage. Risk:VeryLow Table: quad_point_tbl create_index_spgist.out Sees some use of EXPLAIN. Index Only Scans are already being used. Risk:Low Table: quad_poly_tbl polygon.out Some usages of EXPLAIN. Risk:Low Table: radix_text_tbl create_index_spgist.out Some usages of EXPLAIN. Risk:Low Table: road various tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: slow_emp4000 various tests. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: spgist_box_tbl spgist.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: spgist_point_tbl spgist.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: spgist_text_tbl spgist.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: tenk1 aggregates.out, groupingsets.out, join.out, limit.out, misc_functions.out, rowtypes.out,select_distinct.out, select_parallel.out, subselect.out, tablesample.out, tidscan.out, union.out, window.out and write_parallel.out are after vacuum in sanity_check. EXPLAIN used in create_index.out and inherit.out, which are all run before sanity_check does the vacuum. Risk:Medium Table: tenk2 Only sees EXPLAIN usages in select_parallel.out, which is after the table is vacuumed in sanity_check. Risk:Low Table: test_range_gist rangetypes.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: test_range_spgist rangetypes.out. Some EXPLAIN usage. Risk:Low Table: testjsonb jsonb.out. Some EXPLAIN usage. Risk:Low Table: transition_table_level2 plpgsql.out. Nothing appears to look at EXPLAIN. Risk:VeryLow Table: transition_table_status plpgsql.out. Nothing appears to look at EXPLAIN. Risk:VeryLow I'd like to wait to see if we get failures for the ones I've classed as medium risk. David
On Sat, 28 Mar 2020 at 22:22, David Rowley <dgrowleyml@gmail.com> wrote: > I'm unsure yet if this has caused an instability on lousyjack's run in > [1]. pogona has just joined in on the fun [1], so, we're not out the woods on this yet. I'll start having a look at this in more detail. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
On Tue, 31 Mar 2020 at 04:39, David Rowley <dgrowleyml@gmail.com> wrote: > > On Sat, 28 Mar 2020 at 22:22, David Rowley <dgrowleyml@gmail.com> wrote: > > I'm unsure yet if this has caused an instability on lousyjack's run in > > [1]. > > pogona has just joined in on the fun [1], so, we're not out the woods > on this yet. I'll start having a look at this in more detail. > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03 > I had a go at reproducing this. I wasn't able to produce the reported failure, but I can reliably produce an Assert failure that may be related by doing a VACUUM FULL simultaneously with an ANALYZE that is generating extended stats, which produces: #0 0x00007f28081c9520 in raise () from /lib64/libc.so.6 #1 0x00007f28081cab01 in abort () from /lib64/libc.so.6 #2 0x0000000000aad1ad in ExceptionalCondition (conditionName=0xb2f1a1 "ItemIdIsNormal(lp)", errorType=0xb2e7c9 "FailedAssertion", fileName=0xb2e848 "heapam.c", lineNumber=3016) at assert.c:67 #3 0x00000000004fb79e in heap_update (relation=0x7f27feebeda8, otid=0x2d881fc, newtup=0x2d881f8, cid=0, crosscheck=0x0, wait=true, tmfd=0x7ffc568a5900, lockmode=0x7ffc568a58fc) at heapam.c:3016 #4 0x00000000004fdead in simple_heap_update (relation=0x7f27feebeda8, otid=0x2d881fc, tup=0x2d881f8) at heapam.c:3902 #5 0x00000000005be860 in CatalogTupleUpdate (heapRel=0x7f27feebeda8, otid=0x2d881fc, tup=0x2d881f8) at indexing.c:230 #6 0x00000000008df898 in statext_store (statOid=18964, ndistinct=0x0, dependencies=0x2a85fe0, mcv=0x0, stats=0x2a86570) at extended_stats.c:553 #7 0x00000000008deec0 in BuildRelationExtStatistics (onerel=0x7f27feed9008, totalrows=5000, numrows=5000, rows=0x2ad5a30, natts=7, vacattrstats=0x2a75f40) at extended_stats.c:187 #8 0x000000000065c1b7 in do_analyze_rel (onerel=0x7f27feed9008, params=0x7ffc568a5fc0, va_cols=0x0, acquirefunc=0x65ce37 <acquire_sample_rows>, relpages=31, inh=false, in_outer_xact=false, elevel=13) at analyze.c:606 #9 0x000000000065b532 in analyze_rel (relid=18956, relation=0x29b0bc0, params=0x7ffc568a5fc0, va_cols=0x0, in_outer_xact=false, bstrategy=0x2a7dfa0) at analyze.c:263 #10 0x00000000006fd768 in vacuum (relations=0x2a7e148, params=0x7ffc568a5fc0, bstrategy=0x2a7dfa0, isTopLevel=true) at vacuum.c:468 #11 0x00000000006fd22c in ExecVacuum (pstate=0x2a57a00, vacstmt=0x29b0ca8, isTopLevel=true) at vacuum.c:251 It looks to me as though the problem is that statext_store() needs to take its lock on pg_statistic_ext_data *before* searching for the stats tuple to update. It's late here, so I haven't worked up a patch yet, but it looks pretty straightforward. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > I had a go at reproducing this. I wasn't able to produce the reported > failure, but I can reliably produce an Assert failure that may be > related by doing a VACUUM FULL simultaneously with an ANALYZE that is > generating extended stats, which produces: > ... > It looks to me as though the problem is that statext_store() needs to > take its lock on pg_statistic_ext_data *before* searching for the > stats tuple to update. Hmm, yeah, that seems like clearly a bad idea. > It's late here, so I haven't worked up a patch yet, but it looks > pretty straightforward. I can take care of it. regards, tom lane
I wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> I had a go at reproducing this. I wasn't able to produce the reported >> failure, but I can reliably produce an Assert failure that may be >> related by doing a VACUUM FULL simultaneously with an ANALYZE that is >> generating extended stats, which produces: >> ... >> It looks to me as though the problem is that statext_store() needs to >> take its lock on pg_statistic_ext_data *before* searching for the >> stats tuple to update. > Hmm, yeah, that seems like clearly a bad idea. I pushed a fix for that, but I think it must be unrelated to the buildfarm failures we're seeing. For that coding to be a problem, it would have to run concurrently with a VACUUM FULL or CLUSTER on pg_statistic_ext_data (which would give all the tuples new TIDs). AFAICS that won't happen with the tests that are giving trouble. regards, tom lane
On Tue, 31 Mar 2020 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > >> ... > >> It looks to me as though the problem is that statext_store() needs to > >> take its lock on pg_statistic_ext_data *before* searching for the > >> stats tuple to update. > > > Hmm, yeah, that seems like clearly a bad idea. > > I pushed a fix for that Thanks for doing that (looks like it was my mistake originally). > but I think it must be unrelated to the > buildfarm failures we're seeing. For that coding to be a problem, > it would have to run concurrently with a VACUUM FULL or CLUSTER > on pg_statistic_ext_data (which would give all the tuples new TIDs). > AFAICS that won't happen with the tests that are giving trouble. > Yeah, that makes sense. I still can't see what might be causing those failures. The tests that were doing an ALTER COLUMN and then expecting to see the results of a non-analysed table ought to be fixed by 0936d1b6f, but that doesn't match the buildfarm failures. Possibly 0936d1b6f will help with those anyway, but if so, it'll be annoying not understanding why. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > Yeah, that makes sense. I still can't see what might be causing those > failures. The tests that were doing an ALTER COLUMN and then expecting > to see the results of a non-analysed table ought to be fixed by > 0936d1b6f, but that doesn't match the buildfarm failures. Possibly > 0936d1b6f will help with those anyway, but if so, it'll be annoying > not understanding why. Quite :-(. While it's too early to declare victory, we've seen no more failures of this ilk since 0936d1b6f, so it's sure looking like autovacuum did have something to do with it. Just to save people repeating the search I did, these are the buildfarm failures of interest so far: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35 The first of those is unlike the rest, and I'm not trying to account for it here. In the rest, what we see is that sometimes the estimates are off by a little bit from what's expected, up or down just a percent or two. And those deltas kick at inconsistent spots partway through a series of similar tests, so it's hard to deny that *something* asynchronous to the test script is causing it. After contemplating the failures for awhile, I have a theory that at least partially matches the data. What I think is happening is that autovacuum (NOT auto-analyze) launches on the table, and since it is running concurrently with the foreground test script, it fails to immediately acquire buffer lock on one or more of the table pages. Since this isn't an aggressive vacuum scan, it just politely backs off and doesn't scan those pages. And that translates to not getting a perfectly accurate reltuples estimate at the end of the vacuum. On my x86_64 machine, which matches the buildfarm critters having trouble, the actual contents of both of the troublesome tables will be 5000 tuples in 31 pages --- which comes out to be 30 pages with 162 tuples each and then 140 tuples in the last page. Working through the math in vac_estimate_reltuples (and assuming that the "old" values were accurate numbers from the test script's own ANALYZE), what I find is that autovacuum will conclude there are 4999 tuples if it misses scanning one of the first 30 pages, or 5021 tuples if it misses scanning the last page, because its interpolation from the old tuple density figure will underestimate or overestimate the number of missed tuples accordingly. Once that slightly-off number gets pushed into pg_class, we start to get slightly-off rowcount estimates in the test cases. So what I'm hypothesizing is that the pg_statistic data is perfectly fine but pg_class.reltuples goes off a little bit after autovacuum. The percentage changes in reltuples that I predict this way don't quite square with the percentage changes we see in the overall rowcount estimates, which is a problem for this theory. But the test cases are exercising some fairly complex estimation logic, and it wouldn't surprise me much if the estimates aren't linearly affected by reltuples. (Tomas, do you want to comment further on that point?) regards, tom lane
On Thu, 2 Apr 2020 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > Yeah, that makes sense. I still can't see what might be causing those > > failures. The tests that were doing an ALTER COLUMN and then expecting > > to see the results of a non-analysed table ought to be fixed by > > 0936d1b6f, but that doesn't match the buildfarm failures. Possibly > > 0936d1b6f will help with those anyway, but if so, it'll be annoying > > not understanding why. > > Quite :-(. While it's too early to declare victory, we've seen no > more failures of this ilk since 0936d1b6f, so it's sure looking like > autovacuum did have something to do with it. How about [1]? It seems related to me and also post 0936d1b6f. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-01%2017%3A03%3A05
On Wed, Apr 01, 2020 at 11:13:12PM -0400, Tom Lane wrote: >Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> Yeah, that makes sense. I still can't see what might be causing those >> failures. The tests that were doing an ALTER COLUMN and then expecting >> to see the results of a non-analysed table ought to be fixed by >> 0936d1b6f, but that doesn't match the buildfarm failures. Possibly >> 0936d1b6f will help with those anyway, but if so, it'll be annoying >> not understanding why. > >Quite :-(. While it's too early to declare victory, we've seen no >more failures of this ilk since 0936d1b6f, so it's sure looking like >autovacuum did have something to do with it. > >Just to save people repeating the search I did, these are the buildfarm >failures of interest so far: > >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03 >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35 > >The first of those is unlike the rest, and I'm not trying to account for >it here. In the rest, what we see is that sometimes the estimates are off >by a little bit from what's expected, up or down just a percent or two. >And those deltas kick at inconsistent spots partway through a series of >similar tests, so it's hard to deny that *something* asynchronous to the >test script is causing it. > >After contemplating the failures for awhile, I have a theory that >at least partially matches the data. What I think is happening is >that autovacuum (NOT auto-analyze) launches on the table, and since >it is running concurrently with the foreground test script, it fails >to immediately acquire buffer lock on one or more of the table pages. >Since this isn't an aggressive vacuum scan, it just politely backs >off and doesn't scan those pages. And that translates to not getting >a perfectly accurate reltuples estimate at the end of the vacuum. >On my x86_64 machine, which matches the buildfarm critters having >trouble, the actual contents of both of the troublesome tables will >be 5000 tuples in 31 pages --- which comes out to be 30 pages with >162 tuples each and then 140 tuples in the last page. Working through >the math in vac_estimate_reltuples (and assuming that the "old" values >were accurate numbers from the test script's own ANALYZE), what I find >is that autovacuum will conclude there are 4999 tuples if it misses >scanning one of the first 30 pages, or 5021 tuples if it misses scanning >the last page, because its interpolation from the old tuple density >figure will underestimate or overestimate the number of missed tuples >accordingly. Once that slightly-off number gets pushed into pg_class, >we start to get slightly-off rowcount estimates in the test cases. > >So what I'm hypothesizing is that the pg_statistic data is perfectly >fine but pg_class.reltuples goes off a little bit after autovacuum. >The percentage changes in reltuples that I predict this way don't >quite square with the percentage changes we see in the overall >rowcount estimates, which is a problem for this theory. But the test >cases are exercising some fairly complex estimation logic, and it >wouldn't surprise me much if the estimates aren't linearly affected by >reltuples. (Tomas, do you want to comment further on that point?) > I think this theory makes perfect sense. I think it's much less likely to see the last page skipped, so we're likely to end up with reltuples lower than 5000 (as opposed to seeing the 5021). That kinda matches the reports, where we generally see estimates reduced by 1 or 2. The -1 change could be explained by rounding errors, I guess - with 5000 we might have produced 139.51, rounded up to 140, a slight drop may get us 139. Not sure about the -2 changes, but I suppose it's possible we might actually skip multiple pages, reducing the reltuples estimate even more? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 2 Apr 2020 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Quite :-(. While it's too early to declare victory, we've seen no >> more failures of this ilk since 0936d1b6f, so it's sure looking like >> autovacuum did have something to do with it. > How about [1]? It seems related to me and also post 0936d1b6f. That looks much like the first lousyjack failure, which as I said I wasn't trying to account for at that point. After looking at those failures, though, I believe that the root cause may be the same, ie small changes in pg_class.reltuples due to autovacuum not seeing all pages of the tables. The test structure is a bit different, but it is accessing the tables in between EXPLAIN attempts, so it could be preventing a concurrent autovac from seeing all pages. I see your fix at cefb82d49, but it feels a bit brute-force. Unlike stats_ext.sql, we're not (supposed to be) dependent on exact planner estimates in this test. So I think the real problem here is crappy test case design. Namely, that these various sub-tables are exactly the same size, despite which the test is expecting that the planner will order them consistently --- with a planning algorithm that prefers to put larger tables first in parallel appends (cf. create_append_path). It's not surprising that the result is unstable in the face of small variations in the rowcount estimates. I'd be inclined to undo what you did in favor of initializing the test tables to contain significantly different numbers of rows, because that would (a) achieve plan stability more directly, and (b) demonstrate that the planner is actually ordering the tables by cost correctly. Maybe somewhere else we have a test that is verifying (b), but these test cases abysmally fail to check that point. I'm not really on board with disabling autovacuum in the regression tests anywhere we aren't absolutely forced to do so. It's not representative of real world practice (or at least not real world best practice ;-)) and it could help hide actual bugs. We don't seem to have much choice with the stats_ext tests as they are constituted, but those tests look really fragile to me. Let's not adopt that technique where we have other possible ways to stabilize test results. regards, tom lane
I wrote: > I'd be inclined to undo what you did in favor of initializing the > test tables to contain significantly different numbers of rows, > because that would (a) achieve plan stability more directly, > and (b) demonstrate that the planner is actually ordering the > tables by cost correctly. Maybe somewhere else we have a test > that is verifying (b), but these test cases abysmally fail to > check that point. Concretely, I suggest the attached, which replaces the autovac disables with adjusting partition boundaries so that the partitions contain different numbers of rows. I did not touch the partition boundaries for pagg_tab1 and pagg_tab2, because that would have required also changing the associated test queries (which are designed to access only particular partitions). It seemed like too much work to verify that the answers were still right, and it's not really necessary because those tables are so small as to fit in single pages. That means that autovac will either see the whole table or none of it, and in either case it won't change reltuples. This does cause the order of partitions to change in a couple of the pagg_tab_ml EXPLAIN results, but I think that's fine; the ordering does now match the actual sizes of the partitions. regards, tom lane diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out index d8a6836..a4dc12b 100644 --- a/src/test/regress/expected/partition_aggregate.out +++ b/src/test/regress/expected/partition_aggregate.out @@ -2,9 +2,9 @@ -- PARTITION_AGGREGATE -- Test partitionwise aggregation on partitioned tables -- --- Note: Various tests located within are sensitive to tables being --- auto-vacuumed while the tests are running. For this reason we --- run autovacuum_enabled = off for all tables. +-- Note: to ensure plan stability, it's a good idea to make the partitions of +-- any one partitioned table in this test all have different numbers of rows. +-- -- Enable partitionwise aggregate, which by default is disabled. SET enable_partitionwise_aggregate TO true; -- Enable partitionwise join, which by default is disabled. @@ -15,9 +15,9 @@ SET max_parallel_workers_per_gather TO 0; -- Tests for list partitioned tables. -- CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY LIST(c); -CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003') WITH (autovacuum_enabled =off); -CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0004', '0005', '0006', '0007') WITH (autovacuum_enabled =off); -CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0008', '0009', '0010', '0011') WITH (autovacuum_enabled =off); +CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003', '0004'); +CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005', '0006', '0007', '0008'); +CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009', '0010', '0011'); INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30 FROM generate_series(0, 2999) i; ANALYZE pagg_tab; -- When GROUP BY clause matches; full aggregation is performed for each partition. @@ -400,13 +400,13 @@ SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2; -- JOIN query CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x); -CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30); CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y); -CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30); INSERT INTO pagg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; INSERT INTO pagg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; ANALYZE pagg_tab1; @@ -820,9 +820,9 @@ SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a -- Partition by multiple columns CREATE TABLE pagg_tab_m (a int, b int, c int) PARTITION BY RANGE(a, ((a+b)/2)); -CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (10, 10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (10, 10) TO (20, 20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (20, 20) TO (30, 30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (12, 12); +CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (12, 12) TO (22, 22); +CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (22, 22) TO (30, 30); INSERT INTO pagg_tab_m SELECT i % 30, i % 40, i % 50 FROM generate_series(0, 2999) i; ANALYZE pagg_tab_m; -- Partial aggregation as GROUP BY clause does not match with PARTITION KEY @@ -926,15 +926,15 @@ SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab_m GROUP BY (a+b)/2, 2, 1 HAV -- Test with multi-level partitioning scheme CREATE TABLE pagg_tab_ml (a int, b int, c text) PARTITION BY RANGE(a); -CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10); CREATE TABLE pagg_tab_ml_p2 PARTITION OF pagg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY LIST (c); -CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001') WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0002', '0003') WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001', '0002'); +CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0003'); -- This level of partitioning has different column positions than the parent CREATE TABLE pagg_tab_ml_p3(b int, c text, a int) PARTITION BY RANGE (b); -CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (5) TO (10) WITH (autovacuum_enabled = off); -ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (5); +CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int); +CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (7) TO (10); +ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (7); ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30); INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM0000') FROM generate_series(0, 29999) i; ANALYZE pagg_tab_ml; @@ -1253,14 +1253,14 @@ SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b ORDER BY 1, 2, 3; Group Key: pagg_tab_ml_1.b -> Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1 -> Partial HashAggregate - Group Key: pagg_tab_ml_2.b - -> Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2 - -> Partial HashAggregate Group Key: pagg_tab_ml_3.b -> Parallel Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3 -> Partial HashAggregate Group Key: pagg_tab_ml_4.b -> Parallel Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4 + -> Partial HashAggregate + Group Key: pagg_tab_ml_2.b + -> Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2 (24 rows) SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b HAVING avg(a) < 15 ORDER BY 1, 2, 3; @@ -1292,10 +1292,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O Filter: (avg(pagg_tab_ml_1.b) > '7'::numeric) -> Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1 -> HashAggregate - Group Key: pagg_tab_ml_2.a, pagg_tab_ml_2.b, pagg_tab_ml_2.c - Filter: (avg(pagg_tab_ml_2.b) > '7'::numeric) - -> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2 - -> HashAggregate Group Key: pagg_tab_ml_3.a, pagg_tab_ml_3.b, pagg_tab_ml_3.c Filter: (avg(pagg_tab_ml_3.b) > '7'::numeric) -> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3 @@ -1303,6 +1299,10 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O Group Key: pagg_tab_ml_4.a, pagg_tab_ml_4.b, pagg_tab_ml_4.c Filter: (avg(pagg_tab_ml_4.b) > '7'::numeric) -> Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4 + -> HashAggregate + Group Key: pagg_tab_ml_2.a, pagg_tab_ml_2.b, pagg_tab_ml_2.c + Filter: (avg(pagg_tab_ml_2.b) > '7'::numeric) + -> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2 (25 rows) SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3; @@ -1332,9 +1332,9 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O -- costing such plans. SET parallel_setup_cost TO 10; CREATE TABLE pagg_tab_para(x int, y int) PARTITION BY RANGE(x); -CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (12); +CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (12) TO (22); +CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (22) TO (30); INSERT INTO pagg_tab_para SELECT i % 30, i % 20 FROM generate_series(0, 29999) i; ANALYZE pagg_tab_para; -- When GROUP BY clause matches; full aggregation is performed for each partition. diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql index 8350ddc..946197f 100644 --- a/src/test/regress/sql/partition_aggregate.sql +++ b/src/test/regress/sql/partition_aggregate.sql @@ -2,9 +2,9 @@ -- PARTITION_AGGREGATE -- Test partitionwise aggregation on partitioned tables -- --- Note: Various tests located within are sensitive to tables being --- auto-vacuumed while the tests are running. For this reason we --- run autovacuum_enabled = off for all tables. +-- Note: to ensure plan stability, it's a good idea to make the partitions of +-- any one partitioned table in this test all have different numbers of rows. +-- -- Enable partitionwise aggregate, which by default is disabled. SET enable_partitionwise_aggregate TO true; @@ -17,9 +17,9 @@ SET max_parallel_workers_per_gather TO 0; -- Tests for list partitioned tables. -- CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY LIST(c); -CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003') WITH (autovacuum_enabled =off); -CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0004', '0005', '0006', '0007') WITH (autovacuum_enabled =off); -CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0008', '0009', '0010', '0011') WITH (autovacuum_enabled =off); +CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003', '0004'); +CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005', '0006', '0007', '0008'); +CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009', '0010', '0011'); INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30 FROM generate_series(0, 2999) i; ANALYZE pagg_tab; @@ -94,14 +94,14 @@ SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2; -- JOIN query CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x); -CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30); CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y); -CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30); INSERT INTO pagg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; INSERT INTO pagg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; @@ -177,9 +177,9 @@ SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a -- Partition by multiple columns CREATE TABLE pagg_tab_m (a int, b int, c int) PARTITION BY RANGE(a, ((a+b)/2)); -CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (10, 10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (10, 10) TO (20, 20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (20, 20) TO (30, 30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_m_p1 PARTITION OF pagg_tab_m FOR VALUES FROM (0, 0) TO (12, 12); +CREATE TABLE pagg_tab_m_p2 PARTITION OF pagg_tab_m FOR VALUES FROM (12, 12) TO (22, 22); +CREATE TABLE pagg_tab_m_p3 PARTITION OF pagg_tab_m FOR VALUES FROM (22, 22) TO (30, 30); INSERT INTO pagg_tab_m SELECT i % 30, i % 40, i % 50 FROM generate_series(0, 2999) i; ANALYZE pagg_tab_m; @@ -202,17 +202,17 @@ SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab_m GROUP BY (a+b)/2, 2, 1 HAV -- Test with multi-level partitioning scheme CREATE TABLE pagg_tab_ml (a int, b int, c text) PARTITION BY RANGE(a); -CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_ml_p1 PARTITION OF pagg_tab_ml FOR VALUES FROM (0) TO (10); CREATE TABLE pagg_tab_ml_p2 PARTITION OF pagg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY LIST (c); -CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001') WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0002', '0003') WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_ml_p2_s1 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0000', '0001', '0002'); +CREATE TABLE pagg_tab_ml_p2_s2 PARTITION OF pagg_tab_ml_p2 FOR VALUES IN ('0003'); -- This level of partitioning has different column positions than the parent CREATE TABLE pagg_tab_ml_p3(b int, c text, a int) PARTITION BY RANGE (b); -CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (5) TO (10) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_ml_p3_s1(c text, a int, b int); +CREATE TABLE pagg_tab_ml_p3_s2 PARTITION OF pagg_tab_ml_p3 FOR VALUES FROM (7) TO (10); -ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (5); +ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0) TO (7); ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30); INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM0000') FROM generate_series(0, 29999) i; @@ -287,9 +287,9 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O SET parallel_setup_cost TO 10; CREATE TABLE pagg_tab_para(x int, y int) PARTITION BY RANGE(x); -CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off); -CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off); +CREATE TABLE pagg_tab_para_p1 PARTITION OF pagg_tab_para FOR VALUES FROM (0) TO (12); +CREATE TABLE pagg_tab_para_p2 PARTITION OF pagg_tab_para FOR VALUES FROM (12) TO (22); +CREATE TABLE pagg_tab_para_p3 PARTITION OF pagg_tab_para FOR VALUES FROM (22) TO (30); INSERT INTO pagg_tab_para SELECT i % 30, i % 20 FROM generate_series(0, 29999) i;
On Fri, 3 Apr 2020 at 04:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > I'd be inclined to undo what you did in favor of initializing the > > test tables to contain significantly different numbers of rows, > > because that would (a) achieve plan stability more directly, > > and (b) demonstrate that the planner is actually ordering the > > tables by cost correctly. Maybe somewhere else we have a test > > that is verifying (b), but these test cases abysmally fail to > > check that point. > > Concretely, I suggest the attached, which replaces the autovac disables > with adjusting partition boundaries so that the partitions contain > different numbers of rows. I've looked over this and I agree that it's a better solution to the problem. I'm happy for you to go ahead on this. David
David Rowley <dgrowleyml@gmail.com> writes: > On Fri, 3 Apr 2020 at 04:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Concretely, I suggest the attached, which replaces the autovac disables >> with adjusting partition boundaries so that the partitions contain >> different numbers of rows. > I've looked over this and I agree that it's a better solution to the problem. > I'm happy for you to go ahead on this. Pushed, thanks for looking at it! regards, tom lane