Re: Inaccuracy in VACUUM's tuple count estimates - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Inaccuracy in VACUUM's tuple count estimates
Date
Msg-id 20140612114059.GA24710@alap3.anarazel.de
Whole thread Raw
In response to Inaccuracy in VACUUM's tuple count estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Inaccuracy in VACUUM's tuple count estimates  (tim_wilson <tim.wilson@telogis.com>)
Re: Inaccuracy in VACUUM's tuple count estimates  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Hi Tom,

On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
> I figured it'd be easy enough to get a better estimate by adding another
> counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
> assuming that in-progress inserts and deletes will both commit).

Did you plan to backpatch that? My inclination would be no...

>  I did
> that, and found that it helped Tim's test case not at all :-(.  A bit of
> sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
> INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
> whether the transaction has since marked it for deletion:
> 
>             /*
>              * It'd be possible to discern between INSERT/DELETE in progress
>              * here by looking at xmax - but that doesn't seem beneficial for
>              * the majority of callers and even detrimental for some. We'd
>              * rather have callers look at/wait for xmin than xmax. It's
>              * always correct to return INSERT_IN_PROGRESS because that's
>              * what's happening from the view of other backends.
>              */
>             return HEAPTUPLE_INSERT_IN_PROGRESS;
> 
> It did not use to blow this question off: back around 8.3 you got
> DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
> less laziness + fuzzy thinking here.  Maybe we should have a separate
> HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
> the case that callers other than VACUUM itself are okay with failing
> to make this distinction?  I'm dubious: there are very few if any
> callers that treat the INSERT and DELETE cases exactly alike.

My current position on this is that we should leave the code as is <9.4
and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
with that? The second best thing imo would be to discern and return
HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
respective cases.
Which way would you like to go?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: replication commands and log_statements
Next
From: Fujii Masao
Date:
Subject: Audit of logout