Re: Do we need so many hint bits? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Do we need so many hint bits?
Date
Msg-id 20121116150942.GE6505@awork2.anarazel.de
Whole thread Raw
In response to Do we need so many hint bits?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Do we need so many hint bits?
List pgsql-hackers
On 2012-11-15 16:42:57 -0800, Jeff Davis wrote:
> Related to discussion here:
> http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com
>
> It occurred to me recently that many of the hint bits aren't terribly
> important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
> has a purpose, and we'd expect it to be used many times following the
> initial CLOG lookup.

> But the other tuple hint bits seem to be there just for symmetry,
> because they shouldn't last long. If HEAP_XMIN_INVALID or
> HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
> soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
> should just be changed to InvalidTransactionId.

Wrt HEAP_XMAX_COMMITTED:
It can take an *awfully* long time till autovacuum crosses the
thresholds the next time for a big table.
I also think we cannot dismiss the case of longrunning transactions
because vacuum won't be able to cleanup those rows in that case.

Wrt HEAP_(XMIN|XMAX)_INVALID: yes, if we are in need of new flag bits
those sound like a good target to me.

> Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
> in the visibility map patch, apparently as a way to know when to clear
> the VM bit when doing an update. It was then also used for scans, which
> showed a significant speedup. But I wonder: why not just use the
> visibilitymap directly from those places? It can be used for the scan
> because it is crash safe now (not possible before). And since it's only
> one lookup per scanned page, then I don't think it would be a measurable
> performance loss there. Inserts/updates/deletes also do a significant
> amount of work, so again, I doubt it's a big drop in performance there
> -- maybe under a lot of concurrency or something.
>
> The benefit of removing PD_ALL_VISIBLE would be significantly higher.
> It's quite common to load a lot of data, and then do some reads for a
> while (setting hint bits and flushing them to disk), and then do a
> VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the
> pages again. Also, if I remember correctly, Robert went to significant
> effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM
> bits consistent. Maybe this was all discussed before?

As far as I understand the code the crash-safety aspects of the
visibilitymap currently rely on on having the knowledge that ALL_VISIBLE
has been cleared during a heap_(insert|update|delete). That allows
management of the visibilitymap without it being xlogged itself which
seems pretty important to me.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Do we need so many hint bits?
Next
From: Euler Taveira
Date:
Subject: Re: another idea for changing global configuration settings from SQL