Re: PROC_IN_ANALYZE stillborn 13 years ago - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: PROC_IN_ANALYZE stillborn 13 years ago |
Date | |
Msg-id | CA+TgmoZ9hycF=fu0V+812fOvTsje5bCV6vMdDWphSANOiv-vqw@mail.gmail.com Whole thread Raw |
In response to | Re: PROC_IN_ANALYZE stillborn 13 years ago (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: PROC_IN_ANALYZE stillborn 13 years ago
Re: PROC_IN_ANALYZE stillborn 13 years ago |
List | pgsql-hackers |
On Thu, Aug 6, 2020 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > (1) Without a snapshot it's hard to make any non-bogus decisions about > which tuples are live and which are dead. Admittedly, with Simon's > proposal the final totals would be spongy anyhow, but at least the > individual decisions produce meaningful answers. I don't think I believe this. It's impossible to make *consistent* decisions, but it's not difficult to make *non-bogus* decisions. HeapTupleSatisfiesVacuum() and HeapTupleSatifiesUpdate() both make such decisions, and neither takes a snapshot argument. > (2) I'm pretty sure there are places in the system that assume that any > reader of a table is using an MVCC snapshot. For instance, didn't you > introduce some such assumptions along with or just after getting rid of > SnapshotNow for catalog scans? SnapshotSelf still exists and is still used, and IIRC, it has very similar semantics to the old SnapshotNow, so I don't think that we introduced any really general assumptions of this sort. I think the important part of those changes was that all the code that had previously used SnapshotNow to examine system catalog tuples for DDL purposes and catcache lookups and so forth started using an MVCC scan, which removed one (of many) impediments to concurrent DDL. I think the fact that we removed SnapshotNow outright rather than just ceasing to use it for that purpose was mostly so that nobody would accidentally reintroduce code that used it for the sorts of purposes for which it had been used previously, and secondarily for code cleanliness. There's nothing wrong with it fundamentally AFAIK. It's worth mentioning, I think, that the main problem with SnapshotNow was that it provided no particular stability. If you did an index scan under SnapshotNow you might find two copies or no copies of a row being concurrently updated, rather than exactly one. And that in turn could cause problems like failure to build a relcache entry. Now, how important is stability to ANALYZE? If you *either* retake your MVCC snapshots periodically as you re-scan the table *or* use a non-MVCC snapshot for the scan, you can get those same kinds of artifacts: you might see two copies of a just-updated row, or none. Maybe this would actually *break* something - e.g. could there be code that would get confused if we sample multiple rows for the same value in a column that has a UNIQUE index? But I think mostly the consequences would be that you might get somewhat different results from the statistics. It's not clear to me that it would even be correct to categorize those somewhat-different results as "less accurate." Tuples that are invisible to a query often have performance consequences very similar to visible tuples, in terms of the query run time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: