Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Heap WARM Tuples - Design Draft
Date
Msg-id 20160808201907.GF16416@momjian.us
Whole thread Raw
In response to Re: Heap WARM Tuples - Design Draft  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Heap WARM Tuples - Design Draft  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On Mon, Aug  8, 2016 at 11:22:49PM +0530, Pavan Deolasee wrote:
> What I am currently trying to do is to reuse at least the BlockNumber field in
> t_ctid. For HOT/WARM chains, that field is really unused (except the last tuple
> when regular update needs to store block number of the new block). My idea is
> to use one free bit in t_infomask2 to tell us that t_ctid is really not a CTID,
> but contains new information (for pg_upgrade's sake). For example, one bit in
> bi_hi can tell us that this is the last tuple in the chain (information today
> conveyed by t_ctid pointing to self). Another bit can tell us that this tuple
> was WARM updated. We will still have plenty of bits to store additional
> information about WARM chains.

Yes, that works too, though it probably only makes sense if we can make
use of more than one bit in bi_hi.

>     My guess is we would need one bit to mark a WARM chain, and perhaps
>     reuse obsolete pre-9.0 HEAP_MOVED_OFF to indicate increment-only or
>     decrement-only. 
> 
> 
> I am not convinced that the checking for increment/decrement adds a lot of
> value. Sure, we might be able to address some typical work load, but is that
> really a common use case? Instead, what I am looking at storing a bitmap which

I have no idea, but it guarantees that the first WARM update works
because there is no direction set.  Then, if the direction changes, you
create a new chain and hope the changes stay in that direction for a
while.

> shows us which table columns have changed so far in the WARM chain. We only
> have limited bits, so we can track only limited columns. This will help the
> cases where different columns are updated, but not so much if the same column
> is updated repeatedly.

Well, I don't think in 15 bits we have enough space to store many column
numbers, let alone column numbers and _values_.  You would need four
bits (1-16) to exceed what you can store in a simple bitmap of the first
15 columns.  If you want to extend that range, you can use 8 bits (2^8)
to record one of the first 256 column numbers.  You could do 7 bits (2^7
= 128) and use another bit per column to record the increment/decrement
direction, meaning that repeated changes to the same column in the same
direction would be allowed in the same WARM chain.  I think it is more
likely that the same column is going to be changed in the same WARM
chain, than changes in different columns.

Frankly, with only 16 bits, I can't see how recording specific columns
really buys us much because we have to limit the column number storage.
Plus, if a column changes twice, you need to create a new WARM chain
unless you record the increment/decrement direction.

What we could do is to record the first two changed columns in the
16-bit field, with direction, then record a bit for direction of all
columns not in the first two that change.  That allows you to record
three sets of directions in the same HOT chain.  It does not allow you
to change the direction of any column previously recorded in the WARM
chain.

You could say you are going to scan the WARM chain for changes, but that
limits pruning.  You could try storing just the changes for pruned rows,
but then you are going to have a lot of overhead scanning the WARM chain
looking for changes.

It would be interesting to store the change _direction_ for the first 15
columns in the bitmap, and then use the 16th bit for the rest of the
columns, but I can't figure out how to record which bits are set with a
direction and which are the default/unused.  You really need two bits
per column, so that only records the first seven or eight columns.

> What will help, and something I haven't yet applied any thoughts, is when we
> can turn WARM chains back to HOT by removing stale index entries.

I can't see how we can ever do that because we have multiple indexes
pointing to the chain, and keys that might be duplicated if we switched
to HOT.  Seems only VACUUM can fix that.

> Some heuristics and limits on amount of work done to detect duplicate index
> entries will help too.

Yeah, I have kind of given up on that.

> > We can't use the bits LP_REDIRECT lp_len because we need to create WARM
> > chains before pruning, and I don't think walking the pre-pruned chain is
> > worth it.  (As I understand HOT, LP_REDIRECT is only created during
> > pruning.)
> 
> That's correct. But lp_len provides us some place to stash information from
> heap tuples when they are pruned.

Right.  However, I see storing information at prune time as only useful
if you are willing to scan the chain, and frankly, I have given up on
chain scanning (with column comparisons) as being too expensive for 
its limited value.  What we could do is to use the LP_REDIRECT lp_len to
store information of two more changed column numbers, with directions. 
However, once you store one bit that records the direction of all other
columns, you can't go back and record the changes, unless you do a chain
scan at prune time.

You have to wonder how much complexity is reasonable for this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?
Next
From: Bruce Momjian
Date:
Subject: Re: Heap WARM Tuples - Design Draft