ANALYZE to be ignored by VACUUM - Mailing list pgsql-hackers

From ITAGAKI Takahiro
Subject ANALYZE to be ignored by VACUUM
Date
Msg-id 20080215152826.A861.52131E4D@oss.ntt.co.jp
Whole thread Raw
Responses Re: ANALYZE to be ignored by VACUUM  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
When there are a heavily updated table and a large table at the same time,
ANALYZE against the large table disturbs other autovacuums and HOT updates.
In my workload, ANALYZE takes long time (1min at statistics_target = 10,
and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
because seqscans run on the table repeatedly.

ANALYZE is a "transaction". As long as long transactions are alive,
VACUUM cannot remove dead tuples deleted after those transaction began.
HOT also cannot work under long transactions. We will be happy if VACUUM
can get along with long transactions, but it requires some kinds of complex
managements of vacuum horizon. I have no idea for it...


So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored
by VACUUM. It is just same as VACUUM that has already been ignored by other
VACUUMs since version 8.2.

My proposal is splitting ANALYZEs with use_own_xacts (by VACUUM ANALYZE,
autovacuum or database-wide analyze) to two transactions:  T1: acquire_sample_rows()  T2: compute_stats() and
update_attstats()
and set PROC_IN_VACUUM during T1.
T1 takes long time because read pages to sample rows.
T2 is relatively short because stats are calculated in memory, without i/o.
T2 is needed for consistent snapshot because user-defined functions
in expression indexes might use it.

Is it an acceptable approach? Or am I missing something?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Brendan Jurd"
Date:
Subject: Re: Key for grant attributes (was: Re: Show INHERIT in \du)
Next
From: Simon Riggs
Date:
Subject: Re: Reworking WAL locking