Autoanalyze and OldestXmin - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Autoanalyze and OldestXmin
Date
Msg-id BANLkTikHq3mtk2G_LO-Hk33khPVM0q5caA@mail.gmail.com
Whole thread Raw
Responses Re: Autoanalyze and OldestXmin
List pgsql-hackers

Hi All,

I was running some pgbench tests and observed this phenomenon. This might be a known issue, but I thought its nevertheless worth mentioning.

Auto-analyze process grabs a MVCC snapshot. If it runs on a very large table, it may take considerable time and would stop the OldestXmin from advancing. During that time, if there are highly updated small tables, those would bloat a lot. For example, in the attached log snippet (and the HEAD is patched a bit to produce more information than what you would otherwise see), for a scale factor of 50 and 50 clients:

branches and tellers tables, which had a stable size of around 65 and 90 pages respectively, bloat to 402 and 499 pages respectively when accounts table is being analyzed. The accounts table analyze takes around 5 mins on my decent server and the branches and tellers tables keep bloating during that time. If these small tables are very actively accessed, vacuum may not be able to even truncate them later, once OldestXmin advances at the end of auto analyze.

I understand analyze needs snapshot to run index predicate functions, but is there something we can do  ? There is a PROC_IN_ANALYZE flag, but we don't seem to be using it anywhere.  Since acquire_sample_rows() returns palloced tuples, can't we let OldestXmin advance after scanning a page by ignoring procs with the flag set, just like we do for PROC_IN_VACUUM ? 

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WALInsertLock contention
Next
From: Robert Haas
Date:
Subject: Re: reindex creates predicate lock on index root