Thread: ANALYZE to be ignored by VACUUM
When there are a heavily updated table and a large table at the same time, ANALYZE against the large table disturbs other autovacuums and HOT updates. In my workload, ANALYZE takes long time (1min at statistics_target = 10, and 5min at 100), but the updated table needs to be vacuumed every 30 seconds because seqscans run on the table repeatedly. ANALYZE is a "transaction". As long as long transactions are alive, VACUUM cannot remove dead tuples deleted after those transaction began. HOT also cannot work under long transactions. We will be happy if VACUUM can get along with long transactions, but it requires some kinds of complex managements of vacuum horizon. I have no idea for it... So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored by VACUUM. It is just same as VACUUM that has already been ignored by other VACUUMs since version 8.2. My proposal is splitting ANALYZEs with use_own_xacts (by VACUUM ANALYZE, autovacuum or database-wide analyze) to two transactions: T1: acquire_sample_rows() T2: compute_stats() and update_attstats() and set PROC_IN_VACUUM during T1. T1 takes long time because read pages to sample rows. T2 is relatively short because stats are calculated in memory, without i/o. T2 is needed for consistent snapshot because user-defined functions in expression indexes might use it. Is it an acceptable approach? Or am I missing something? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > In my workload, ANALYZE takes long time (1min at statistics_target = 10, > and 5min at 100), but the updated table needs to be vacuumed every 30 seconds > because seqscans run on the table repeatedly. There is something *seriously* wrong with that. If vacuum can complete in under 30 seconds, how can analyze take a minute? (I'm also wondering whether you'll still need such frantic vacuuming with HOT...) > So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored > by VACUUM. I think we need to understand what the real problem is with your test case. This proposal seems very messy/ugly to me, and I'm unconvinced that it solves anything. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > > In my workload, ANALYZE takes long time (1min at statistics_target = 10, > > and 5min at 100), but the updated table needs to be vacuumed every 30 seconds > > because seqscans run on the table repeatedly. > > There is something *seriously* wrong with that. If vacuum can complete > in under 30 seconds, how can analyze take a minute? (I'm also wondering > whether you'll still need such frantic vacuuming with HOT...) There are two tables here: [S] A small table, that is frequently updated and seqscan-ed [L] A large table, that takes a longtime to be analyzed The table [S] should be vacuumed every 30 seconds, because dead tuples affects the performance of seqscan seriously. HOT and autovacuum are very useful here *unless* long transactions begins. Analyzing [L] takes 1-5 minutes and both HOT and vacuums don't work during it. I want to use statistics_target = 100 at heart for more accurate statistics, but I'm using 10 instead because of avoiding long transactions by analyze. Basically, the above is based on avoiding needless long transactions. Aside from ANALYZE, pg_start_backup() is also a long transactional command. It takes checkpoint_timeout * checkpoint_completion_target (2.5- min.) at worst. Users could avoid their own long transactions, but have no choice but to use those provided maintenance commands. > > So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored > > by VACUUM. > > I think we need to understand what the real problem is with your test > case. This proposal seems very messy/ugly to me, and I'm unconvinced > that it solves anything. I think there are some direct or indirect solutions: 1. VACUUM removes recently dead tuples under some circumstances. For example, tuples updated twice after a long transactionbegins. The oldest tuple can be seen by the old long transaction and the newest can be seen new transactions.However, the intermediate tuple is invisible all transactions. 2. ANALYZE don't disturb vacuuming of other tables. (my first proposal) We know ANALYZE don't touch other tables duringsampling phases. We can treat analyzing transactions as same as PROC_IN_VACUUM xacts. The same can be said for pg_start_backup;non-transactinal starting backup command might be better. 3. Recover density of tuples; i.e, auto-CLUSTER. If the performance recovers after long transactions, the problem willnot be so serious. It would be better that autovacuum invokes CLUSTER if required and we could run CLUSTER concurrently. 4. ANALYZE finishes in a short time. It is ok that VACUUM takes a long time because it is not a transaction, but ANALYZEshould not. It requres cleverer statistics algorithm. Sampling factor 10 is not enough for pg_stats.n_distinct. Weseems to estimate n_distinct too low for clustered (ordered) tables. There might be a matter of research in calculationof n_distinct. Also, this cannot resolve the problem in pg_start_backup. 1 or 3 might be more universal approaches, but I think 2 will be an independent improvement from them. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
"ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> writes: > 4. ANALYZE finishes in a short time. > It is ok that VACUUM takes a long time because it is not a transaction, > but ANALYZE should not. It requres cleverer statistics algorithm. > Sampling factor 10 is not enough for pg_stats.n_distinct. We seems to > estimate n_distinct too low for clustered (ordered) tables. Unfortunately no constant size sample is going to be enough for reliable n_distinct estimates. To estimate n_distinct you really have to see a percentage of the table, and to get good estimates that percentage has to be fairly large. There was a paper with a nice algorithm posted a while back which required only constant memory but it depended on scanning the entire table. I think to do n_distinct estimates we'll need some statistics which are either gathered opportunistically whenever a seqscan happens or maintained by an index. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
Gregory Stark <stark@enterprisedb.com> wrote: > > 4. ANALYZE finishes in a short time. > There was a paper with a nice algorithm posted a while back which required > only constant memory but it depended on scanning the entire table. I think to > do n_distinct estimates we'll need some statistics which are either gathered > opportunistically whenever a seqscan happens or maintained by an index. VACUUM would be another good timing for the alogrithm, because it does a seqscan. If we do so, we need to separate an analyzing transaction into sampling and updating-stats transactions to keep vacuums as ignorable transactions. However, VACUUM will not do a seqscan when we have Dead Space Map or Segment Visibility Map. We will need incremental statistics updating if reliable n_distinct estimation requires many samples. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Feb 19, 2008 8:31 AM, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > > > In my workload, ANALYZE takes long time (1min at statistics_target = 10, > > > and 5min at 100), but the updated table needs to be vacuumed every 30 seconds > > > because seqscans run on the table repeatedly. > > > > There is something *seriously* wrong with that. If vacuum can complete > > in under 30 seconds, how can analyze take a minute? (I'm also wondering > > whether you'll still need such frantic vacuuming with HOT...) > > There are two tables here: > [S] A small table, that is frequently updated and seqscan-ed > [L] A large table, that takes a long time to be analyzed > > The table [S] should be vacuumed every 30 seconds, because dead tuples > affects the performance of seqscan seriously. HOT and autovacuum are > very useful here *unless* long transactions begins. > Analyzing [L] takes 1-5 minutes and both HOT and vacuums don't work > during it. I want to use statistics_target = 100 at heart for more > accurate statistics, but I'm using 10 instead because of avoiding > long transactions by analyze. I am sure the idea is not original, yet still I would like to know how hard would it be to support local (per table) oldest visible XIDs. I mean, when transaction start you need to keep all tuples with xmin >= oldest_xid in all tables, because who knows what table will that transaction like to touch. But then again, there is relatively large list of cases when we don't need to hold vacuum on _all_ relations. These include: SELECTs in auto-commit mode -- provided the SELECT is not something fancy (not immutable PL-functions), we just need to keep a snapshot of affected tables. DMLs in auto-commit mode -- provided no PL-functions or triggers are in effect. WITH HOLD CURSORS. Yes, I know, WITH HOLD cursor on first COMMIT will create a copy of rows to be returned (which can take a looong time in some cases), but perhaps it could be possible to just "lock" the table from vacuuming and skip the temporary store. And lots of other, when done in auto-commit. Like ALTER TABLEs, CREATE TABLE AS SELECT, COPY, etc... I am sure that such an idea isn't original. What are the main obstacles in making it happen except "time&money"? :) Regards, Dawid
"Dawid Kuroczko" <qnex42@gmail.com> wrote: > I am sure the idea is not original, yet still I would like to know how hard > would it be to support local (per table) oldest visible XIDs. > > I mean, when transaction start you need to keep all tuples with xmin >= > oldest_xid in all tables, because who knows what table will that transaction > like to touch. Per-table oldest XID management sounds good! You mean transactions that touch no tables does not affect vacuums at all, right? If so, the solution can resolve pg_start_backup problem, too. I feel it is enough for standard maintenance commands. Another solution might need for user defined long transactions, though. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center