Thread: Behavior change in PostgreSQL 14Beta3 or bug?
Hi, while playing with PostgreSQL 14 Beta 3 I noticed a change when it comes to the visibility map and vacuum. Test case: gbench=# create table t1 ( a int, b text ) with ( fillfactor = 100 ); CREATE TABLE pgbench=# insert into t1 select a, a::text from generate_series(1,1000000) a; INSERT 0 1000000 pgbench=# create index i1 on t1 (a); CREATE INDEX gbench=# select ctid,* from t1 order by 1 limit 5; ctid | a | b -------+---+--- (0,1) | 1 | 1 (0,2) | 2 | 2 (0,3) | 3 | 3 (0,4) | 4 | 4 (0,5) | 5 | 5 (5 rows) pgbench=# begin; BEGIN pgbench=*# update t1 set b ='xx' where a = 1; UPDATE 1 pgbench=*# select ctid,* from t1 order by 1 limit 5; ctid | a | b -------+---+--- (0,2) | 2 | 2 (0,3) | 3 | 3 (0,4) | 4 | 4 (0,5) | 5 | 5 (0,6) | 6 | 6 (5 rows) pgbench=*# select ctid,* from t1 where a = 1; ctid | a | b ---------+---+---- (4,203) | 1 | xx (1 row) pgbench=*# commit; pgbench=# select pg_visibility_map('t1'::regclass, 0); pg_visibility_map ------------------- (f,f) (1 row) pgbench=# vacuum t1; VACUUM pgbench=# select pg_visibility_map('t1'::regclass, 0); pg_visibility_map ------------------- (f,f) <XXXXXXXXXXXXXXXXXXXXXXXX Why? (1 row) Versions before 14 (I did not test 14Beta1 and 14Beta2) changed the visibility bit to true after the vacuum. Did I miss a"feature" or is this a bug? Thanks in advance Daniel
It's a non-hot update, and so there is a single dead index tuple. You're seeing the new optimization that makes vacuum skip indexes in marginal cases.
Try running vacuum with index cleanup = on.
Peter Geoghegan
(Sent from my phone)
(Sent from my phone)
>It's a non-hot update, and so there is a single dead index tuple. You're seeing the new optimization that makes vacuum skip indexes in >marginal cases.
>Try running vacuum with index cleanup = on.
Thank you, Peter
On Mon, 2021-09-06 at 07:46 -0700, Peter Geoghegan wrote: > It's a non-hot update, and so there is a single dead index tuple. You're seeing > the new optimization that makes vacuum skip indexes in marginal cases. > > Try running vacuum with index cleanup = on. It occurs to me that this new default "auto" setting for "index_cleanup" may cause a performance regression for people who VACUUM tables frequently in order to get fast index-only scans. That is not a bug, but it would be good to alert the users. It is not an incompatibility that warrants a mention in the release notes, but perhaps somthing in https://www.postgresql.org/docs/14/indexes-index-only-scans.html and/or https://www.postgresql.org/docs/14/routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP could be added that recommends that people should consider frequent VACUUM with "index_cleanup = on" for best performance with index-only scans. Suggested patch attached, should be backpatched to v14. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Attachment
Laurenz Albe <laurenz.albe@cybertec.at> writes: > It is not an incompatibility that warrants a mention in the release notes, > but perhaps somthing in > https://www.postgresql.org/docs/14/indexes-index-only-scans.html > and/or > https://www.postgresql.org/docs/14/routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP > could be added that recommends that people should consider frequent > VACUUM with "index_cleanup = on" for best performance with index-only scans. If enough pages would change their all-visible state to make a significant difference in index-only scan performance, VACUUM should not be skipping the cleanup. If it is, the threshold for that is too aggressive. Assuming that that choice was made appropriately, I think the advice you propose here will just cause people to waste lots of cycles on VACUUM runs that have only marginal effects. regards, tom lane
On Mon, 2021-09-06 at 11:59 -0400, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: > > It is not an incompatibility that warrants a mention in the release notes, > > but perhaps somthing in > > https://www.postgresql.org/docs/14/indexes-index-only-scans.html > > and/or > > https://www.postgresql.org/docs/14/routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP > > could be added that recommends that people should consider frequent > > VACUUM with "index_cleanup = on" for best performance with index-only scans. > > If enough pages would change their all-visible state to make a significant > difference in index-only scan performance, VACUUM should not be skipping > the cleanup. If it is, the threshold for that is too aggressive. > > Assuming that that choice was made appropriately, I think the advice you > propose here will just cause people to waste lots of cycles on VACUUM > runs that have only marginal effects. #define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */ So up to an additional 2% of all pages can have the all-visible bit unset with "index_cleanup = auto". That is probably not worth worrying, right? Yours, Laurenz Albe
On Mon, Sep 6, 2021 at 9:21 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > #define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */ > > So up to an additional 2% of all pages can have the all-visible bit > unset with "index_cleanup = auto". > > That is probably not worth worrying, right? I don't think it's worth worrying about. I would say that, since I chose the exact threshold myself. The threshold was a bit arbitrary, of course. Note that Daniel's example had a non-HOT update, even though it's the kind of update that we imagine can use HOT (because it didn't modify an indexed column). He could have ensured a HOT update by lowering heap fill factor, but why should that be necessary if updates are rare anyway? The bypass-index-vacuuming feature may have had a bit of a messaging problem. It was something we usually talked about as being about skipping index vacuuming, because that's what it actually does. However, the feature isn't really about doing less work during VACUUM. It's actually about doing *more* work during VACUUM -- more useful work. Especially setting visibility map bits. But also freezing. Now you can very aggressively tune VACUUM to do these things more often, with no fear of that being way too expensive because of index vacuuming that has only marginal value. The threshold is not so much about any one VACUUM operation -- you have to think about the aggregate effect on the table over time. Most individual tables will never have the new optimization kick in even once, because the workload just couldn't possibly allow it -- the 2% threshold is vastly exceeded every single time. The cases that it actually applies to are pretty much insert-only tables, perhaps with some HOT updates. 100% clean inserts are probably very rare in the real world. I believe that it's *vastly* more likely that such a table will have pages that are ~98%+ free of LP_DEAD line pointers in heap pages (i.e., the thing that BYPASS_THRESHOLD_PAGES applies to). To get to 100% you cannot allow even one single insert transaction to abort since the last VACUUM. If you assume that BYPASS_THRESHOLD_PAGES is actually too low for your workload (which is the opposite problem), then it doesn't matter very much. The feature as coded should still have the desired effect of skipping index vacuuming in *most* cases where it's unnecessary (however you happen to define "unnecessary") -- the number of pages with LP_DEAD items will naturally increase over time without index vacuuming, until the threshold is crossed. Maybe still-unnecessary index vacuuming will still take place in 1 out of 5 cases with the feature. This is still much better than 5 out of 5. More importantly, you can now aggressively tune vacuuming without noticeably increasing the number of individual vacuums that still have the problem of unnecessary index vacuuming. So if you go from 5 vacuums per day to 20 through tuning alone, the number of vacuum operations that do unnecessary index vacuuming doesn't increase at all (except perhaps due to rounding effects). -- Peter Geoghegan
On Mon, Sep 6, 2021 at 8:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Assuming that that choice was made appropriately, I think the advice you > propose here will just cause people to waste lots of cycles on VACUUM > runs that have only marginal effects. Right. The advice that they should receive (if any) is to tune autovacuum aggressively, and enable autovacuum log output. The log output reports on whether or not the implementation applied the optimization in each case. As I pointed out to Laurenz just now, users that care about index-only scans are actually the big beneficiaries here. Now they can set autovacuum_vacuum_insert_threshold very aggressively, without doing a useless round of index vacuuming just because one inserting transaction out of a million aborted. Once indexes are removed from the equation (to the extent that that makes sense), each round of vacuuming by autovacuum only needs to do work that is proportional to the number of unset-in-vm heap pages. I believe that that trade-off makes a lot of sense. Autovacuum has little chance of keeping anything like 100% of all pages set in the VM anyway. But it can get a lot closer to it in some cases now. -- Peter Geoghegan
On Mon, Sep 6, 2021 at 7:52 AM Daniel Westermann (DWE) <daniel.westermann@dbi-services.com> wrote: > >Try running vacuum with index cleanup = on. > > Thank you, Peter Thanks for testing! -- Peter Geoghegan
On Mon, 2021-09-06 at 12:11 -0700, Peter Geoghegan wrote: > On Mon, Sep 6, 2021 at 9:21 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > #define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */ > > > > So up to an additional 2% of all pages can have the all-visible bit > > unset with "index_cleanup = auto". > > > > That is probably not worth worrying, right? > > I don't think it's worth worrying about. > > The bypass-index-vacuuming feature may have had a bit of a messaging > problem. It was something we usually talked about as being about > skipping index vacuuming, because that's what it actually does. > However, the feature isn't really about doing less work during VACUUM. > It's actually about doing *more* work during VACUUM -- more useful > work. Especially setting visibility map bits. But also freezing. Now > you can very aggressively tune VACUUM to do these things more often, > with no fear of that being way too expensive because of index > vacuuming that has only marginal value. That makes sense; thanks for the detailed explanation. Yours, Laurenz Albe