Thread: Heap WARM Tuples - Design Draft

Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:
Write Amplification Reduction Method (WARM)
====================================

A few years back, we developed HOT to address the problem associated with MVCC and frequent updates and it has served us very well. But in the light of Uber's recent technical blog highlighting some of the problems that are still remaining, especially for workloads where HOT does not help to the full extent, Simon, myself and others at 2ndQuadrant discussed several ideas and what I present below is an outcome of that. This is not to take away credit from anybody else. Others may have thought about similar ideas, but I haven’t seen any concrete proposal so far.

At present, WARM looks to be a better idea than LITE, though we have that as a backstop also (says Simon).

Background
==========

As a background, HOT primarily helps in two ways.

1. It avoids additional index entries when a tuple is updated in a way such that no index columns are changed. Before we implemented HOT, every update would generate a new index entry in every index on the table. This not only resulted in bigger and bloater indexes, but also made vacuuming difficult because heap tuples can't be removed without first removing the corresponding index entries.  HOT made it possible to avoid additional index entries whenever possible and ensured most dead heap tuples can be removed at a page level.

2. HOT also helped other cases such as DELETE and non-HOT updates by reducing such heap tuples to a DEAD line pointer stub, until a regular vacuum happens. But the space consumed by the heap tuple is freed and returned to the FSM for future utilization.

Given that HOT was going to change the very core of MVCC and how tuples are stored and accessed, we made a decision to keep things simple and two important conditions were spelled out for doing HOT update.

1. There must be enough space in the same page to store the new version of the tuple.
2. None of the index columns must be updated.

While it was generally easy to satisfy the first condition by either leaving some free space in heap pages or system would anyways come to a stable state after initial few updates. The second condition is somewhat harder to control. Those who are aware would carefully design their schema and choose indexes so that HOT conditions are met. But that may not be possible for every workload, as seen from the Uber example. But many others may have faced similar situations.

This proposal is to relax the second condition, so it’s not quite HOT, just WARM.

Basic Idea
========

The basic idea is to allow updates that change an index column, without requiring every index on the table to cause a new index entry. We still require the first HOT condition i.e. the heap page must have sufficient free space to store the new version of the tuple.

Indexes whose values do not change do not require new index pointers. Only the index whose key is being changed will need a new index entry. The new index entry will be set to the CTID of the root line pointer.

This method succeeds in reducing the write amplification, but causes other issues which also need to be solved. WARM breaks the invariant that all tuples in a HOT chain have the same index values and so an IndexScan would need to re-check the index scan conditions against the visible tuple returned from heap_hot_search(). We must still check visibility, so we only need to re-check scan conditions on that one tuple version.
  
We don’t want to force a recheck for every index fetch because that will slow everything down. So we would like a simple and efficient way of knowing about the existence of a WARM tuple in a chain and do a recheck in only those cases, ideally O(1). Having a HOT chain contain a WARM tuple is discussed below as being a “WARM chain”, implying it needs re-check.

We could solve this by either 1) marking both old and new index tuples as "recheck-required" or 2) marking the HOT chain on the heap as "recheck-required" such that any tuple returned from the chain requires a recheck. 

The advantage of doing 1) is that the recheck is required only for the compromised index.
Note that in presence of multiple index keys pointing to the same root line pointer, the chain needs re-check for both the old as well as the new index key. The old index key must not fetch the new tuple(s) and the new index key must not fetch the old tuple(s). So irrespective of whether an old or a new tuple is being returned, the caller must know about the WARM tuples in the chains. So marking the index would require us to mark both old and new index tuples as requiring re-check. That requires an additional index scan to locate the old row and then an additional write to force it to re-check, which is algorithmically O(NlogN) on table size.

Marking the heap, (2) is O(1) per updated index, so seems better. But since we don't know with respect to which index or indexes the chain is broken, all index scans must do a recheck for a tuple returned from a WARM chain.

How do we mark a chain as WARM? How do we clear the flag? There are a few possible approaches:

1. Mark the first WARM tuple which causes HOT chain to become WARM with a special HEAP_RECHECK_REQUIRED flag (this flag must then be carried over to any further additions to the chain. Then whenever a tuple if returned from a chain, we must look forward into the chain for WARM tuple and let the caller know about potential breakage. This seems simple but would require us to follow HOT chain every time to check if it’s HOT or WARM.

2. Mark the root line pointer (or the root tuple) with a special HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in the chain. Since all indexes point to the root line pointer, it should be enough to just mark the root line pointer (or tuple) with this flag. 

3. We could also adopt a hybrid approach and mark the new index tuple and new heap tuple with HEAP_RECHECK_REQUIRED flag and presence of either of the flag in either tuple forces recheck.

Approach 2 seems more reasonable and simple. 

There are only 2 bits for lp_flags and all combinations are already used. But for LP_REDIRECT root line pointer, we could use the lp_len field to store this special flag, which is not used for LP_REDIRECT line pointers. So we are able to mark the root line pointer.

Let me explain the idea in more detail using an example and technique 2)

CREATE TABLE test (col1 int, col2 int, col3 int, col4 text);
CREATE INDEX testindx_col1 ON test (col1);
CREATE INDEX testindx_col2 ON test (col2);
CREATE INDEX testindx_col3 ON test (col3);

INSERT INTO test VALUES (1, 11, 111, 'foo');

Let’s assume a single heap tuple and three indexes on three different columns. Lets assume that the heap tuple has CTID (0,1) at this point. At this point, each index has exactly one index entry pointing to CTID (0,1)

testindx_col1: (1)    ===> (0,1)
testindx_col2: (11)  ===> (0,1)
testindx_col3: (111) ===> (0,1)

Now if a transaction T1 UPDATEs the table such as, "UPDATE test SET col1 = 2". This does not satisfy the HOT property since index column 'col1' is being updated. But as per WARM algorithm, since only testindx_col1's index key has changed, we insert a new index tuple only in testindx_col1. Say the new heap tuple has CTID (0,2). So testindx_col1 will have a new index tuple with key (2), but still pointing to CTID (0,1)

testindx_col1: (1)    ===> (0,1)
                       (2)   ===> (0,1)
testindx_col2: (11)  ===> (0,1)
testindx_col3: (111) ===> (0,1)

The heap at this point has two tuples, linked by CTID chain. 
(0,1)* ---> (0,2)

Both these tuples are reachable via all indexes, starting at the root (0,1). A transaction which started before T1 should see (0,1) and those started after T1 should see (0,2). The MVCC visibility will ensure that the correct tuple is returned. But if a new transaction T2 is looking up (col1 = 1), while MVCC will return (0,2), the tuple does not satisfy the index condition. Similarly, for an old transaction T0 looking up (col1 = 2), MVCC will return (0,1) but that tuple does not satisfy the index condition either. IndexScan will ensure that tuples are rechecked in such instances.

If T3 later updates col2 and T4 further updates col3,
T3: UPDATE test SET col2 = 12;
T4: UPDATE test SET col3 = 112;

The heap will look like:
(0,1)* ---> (0,2) ---> (0,3) ---> (0,4)

And the index pointers will be:

testindx_col1: (1)   ===> (0,1)
                       (2)   ===> (0,1)
testindx_col2: (11)  ===> (0,1)
                       (12)  ===> (0,1)
testindx_col3: (111) ===> (0,1)
                       (112) ===> (0,1)

The (*) here indicates that there may exist a WARM tuple in the chain and index recheck is required. Our algorithm to recheck heap tuple when it’s returned from a WARM chain, should ensure that only valid and matching tuples are returned by the index scans.

Marking index tuples DEAD
=====================
When a WARM chain becomes DEAD and replaced by just a LP_DEAD stub, referencing index tuples can be marked as killed, and removed by VACUUM.

HOT Pruning
==========
No change is required for HOT pruning other than to copy the HEAP_CHECK_REQUIRED flag to the line pointer when root heap tuple is pruned and the root line pointer is turned into redirected line pointer. Except that, HOT prune can do its job i.e. remove dead tuples from the HOT or WARM chains and if necessary turn root line pointers into redirect or dead line pointers.

Clearing the flags
==============
We can't clear the flag until stale index pointers go, even if the chain is all HOT again. In other words, the flag must remain set until there exists a wrong index key pointing to the root tuple.

One idea is to somehow do this as part of the vacuum where we collect root line pointers of  WARM chains during the first phase of heap scan, check the indexes for all such tuples (may be combined with index vacuum) and then clear the heap flags during the second pass, unless new tuples are added to the WARM chain. We can detect that by checking that all tuples in the WARM chain still have XID less than the OldestXmin that VACUUM is using.

We don’t really need to count all references to a root of WARM chain. What we need to know is if there exists an index which has more than one pointer to the root tuple. If yes, the flag cannot be cleared. ISTM that even two bits per htid will be enough to track this state. Other ideas welcome. 

WAL Logging
===========

It’s important to ensure that the flag is set when it is absolutely necessary, while having false positives is not a problem. We might do a little wasteful work if the flag is incorrectly set. Since flag will be set only during heap_update() and the operation is already WAL logged, this can be piggybacked with the heap_update WAL record. Similarly, when a root tuple is pruned to a redirect line pointer, the operation is already WAL logged and we can piggyback setting of line pointer flag with that WAL record.

Flag clearing need not be WAL logged, unless we can piggyback that to some existing WAL logging.

Vacuum
========

Vacuum should continue to do what it does today i.e. prune HOT chains and collect LP_DEAD line pointers and remove associated index entries and then mark line pointers as LP_UNUSED. It could be extended to do additional work of clearing the flags to avoid rechecks when they are not any more required as discussed above.

Testing
=====

It will be straightforward to have a developer-level index_recheck GUC parameter.

Index_recheck = off (default) | on

For testing, compare results between two queries, one with index_recheck=on and one with index_recheck=off. If this finds any difference we’ll know we have a bug.

We can use this in testing the feature in development and also use it in the field if we suspect problems.

Comments/suggestions?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Andres Freund
Date:
Hi,

On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
> Indexes whose values do not change do not require new index pointers. Only
> the index whose key is being changed will need a new index entry. The new
> index entry will be set to the CTID of the root line pointer.

That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself.  Or have you found a smart way to
figure that out?

Greetings,

Andres Freund



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
> > Indexes whose values do not change do not require new index pointers. Only
> > the index whose key is being changed will need a new index entry. The new
> > index entry will be set to the CTID of the root line pointer.
> 
> That seems to require tracing all hot-chains in a page, to actually
> figure out what the root line pointer of a warm-updated HOT tuple is,
> provided it's HOT_UPDATED itself.  Or have you found a smart way to
> figure that out?

The index points to the head of the HOT chain, and you just walk the
chain on the page --- on need to look at other chains on the page.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Andres Freund
Date:
On 2016-08-04 12:55:25 -0400, Bruce Momjian wrote:
> On Thu, Aug  4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
> > > Indexes whose values do not change do not require new index pointers. Only
> > > the index whose key is being changed will need a new index entry. The new
> > > index entry will be set to the CTID of the root line pointer.
> > 
> > That seems to require tracing all hot-chains in a page, to actually
> > figure out what the root line pointer of a warm-updated HOT tuple is,
> > provided it's HOT_UPDATED itself.  Or have you found a smart way to
> > figure that out?
> 
> The index points to the head of the HOT chain, and you just walk the
> chain on the page --- on need to look at other chains on the page.

When doing an update, you'll need to re-find the root tuple, to insert
the root ctid for the new index tuples.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 04:29:09PM +0530, Pavan Deolasee wrote:
> Write Amplification Reduction Method (WARM)
> ====================================
> 
> A few years back, we developed HOT to address the problem associated with MVCC
> and frequent updates and it has served us very well. But in the light of Uber's
> recent technical blog highlighting some of the problems that are still
> remaining, especially for workloads where HOT does not help to the full extent,
> Simon, myself and others at 2ndQuadrant discussed several ideas and what I
> present below is an outcome of that. This is not to take away credit from
> anybody else. Others may have thought about similar ideas, but I haven’t seen
> any concrete proposal so far.

HOT was a huge win for Postgres and I am glad you are reviewing
improvements.

> This method succeeds in reducing the write amplification, but causes other
> issues which also need to be solved. WARM breaks the invariant that all tuples
> in a HOT chain have the same index values and so an IndexScan would need to
> re-check the index scan conditions against the visible tuple returned from
> heap_hot_search(). We must still check visibility, so we only need to re-check
> scan conditions on that one tuple version.
>   
> We don’t want to force a recheck for every index fetch because that will slow
> everything down. So we would like a simple and efficient way of knowing about
> the existence of a WARM tuple in a chain and do a recheck in only those cases,
> ideally O(1). Having a HOT chain contain a WARM tuple is discussed below as
> being a “WARM chain”, implying it needs re-check.

In summary, we are already doing visibility checks on the HOT chain, so
a recheck if the heap tuple matches the index value is only done at most
on the one visible tuple in the chain, not ever tuple in the chain.

> 2. Mark the root line pointer (or the root tuple) with a special
> HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in the
> chain. Since all indexes point to the root line pointer, it should be enough to
> just mark the root line pointer (or tuple) with this flag. 

Yes, I think #2 is the easiest.  Also, if we modify the index page, we
would have to WAL the change and possibly WAL log the full page write
of the index page.  :-(

> Approach 2 seems more reasonable and simple. 
> 
> There are only 2 bits for lp_flags and all combinations are already used. But
> for LP_REDIRECT root line pointer, we could use the lp_len field to store this
> special flag, which is not used for LP_REDIRECT line pointers. So we are able
> to mark the root line pointer.

Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.

Also, what is currently in the lp_len field for LP_REDIRECT?  Zeros, or
random data?  I am asking for pg_upgrade purposes.

> One idea is to somehow do this as part of the vacuum where we collect root line
> pointers of  WARM chains during the first phase of heap scan, check the indexes
> for all such tuples (may be combined with index vacuum) and then clear the heap
> flags during the second pass, unless new tuples are added to the WARM chain. We
> can detect that by checking that all tuples in the WARM chain still have XID
> less than the OldestXmin that VACUUM is using.

Yes, it seems natural to clear the ctid HEAP_RECHECK_REQUIRED flag where
you are adjusting the HOT chain anyway.

> It’s important to ensure that the flag is set when it is absolutely necessary,
> while having false positives is not a problem. We might do a little wasteful
> work if the flag is incorrectly set. Since flag will be set only during
> heap_update() and the operation is already WAL logged, this can be piggybacked
> with the heap_update WAL record. Similarly, when a root tuple is pruned to a
> redirect line pointer, the operation is already WAL logged and we can piggyback
> setting of line pointer flag with that WAL record.
> 
> Flag clearing need not be WAL logged, unless we can piggyback that to some
> existing WAL logging.

Agreed, good point.

Very nice!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Simon Riggs
Date:
On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
>> Indexes whose values do not change do not require new index pointers. Only
>> the index whose key is being changed will need a new index entry. The new
>> index entry will be set to the CTID of the root line pointer.
>
> That seems to require tracing all hot-chains in a page, to actually
> figure out what the root line pointer of a warm-updated HOT tuple is,
> provided it's HOT_UPDATED itself.  Or have you found a smart way to
> figure that out?

Hmm, sorry, I did think of that point and I thought I had added it to the doc.

So, yes, I agree - re-locating the root is the biggest downside to
this idea. Perhaps Pavan has other thoughts?

I think its doable, but it will be fiddly.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 09:58:29AM -0700, Andres Freund wrote:
> On 2016-08-04 12:55:25 -0400, Bruce Momjian wrote:
> > On Thu, Aug  4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
> > > > Indexes whose values do not change do not require new index pointers. Only
> > > > the index whose key is being changed will need a new index entry. The new
> > > > index entry will be set to the CTID of the root line pointer.
> > > 
> > > That seems to require tracing all hot-chains in a page, to actually
> > > figure out what the root line pointer of a warm-updated HOT tuple is,
> > > provided it's HOT_UPDATED itself.  Or have you found a smart way to
> > > figure that out?
> > 
> > The index points to the head of the HOT chain, and you just walk the
> > chain on the page --- on need to look at other chains on the page.
> 
> When doing an update, you'll need to re-find the root tuple, to insert
> the root ctid for the new index tuples.

Ah, I had not thought of that --- good point.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 7:59 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> We don’t want to force a recheck for every index fetch because that will
> slow everything down. So we would like a simple and efficient way of knowing
> about the existence of a WARM tuple in a chain and do a recheck in only
> those cases, ideally O(1). Having a HOT chain contain a WARM tuple is
> discussed below as being a “WARM chain”, implying it needs re-check.
>
> We could solve this by either 1) marking both old and new index tuples as
> "recheck-required" or 2) marking the HOT chain on the heap as
> "recheck-required" such that any tuple returned from the chain requires a
> recheck.
>
> The advantage of doing 1) is that the recheck is required only for the
> compromised index.
> Note that in presence of multiple index keys pointing to the same root line
> pointer, the chain needs re-check for both the old as well as the new index
> key. The old index key must not fetch the new tuple(s) and the new index key
> must not fetch the old tuple(s). So irrespective of whether an old or a new
> tuple is being returned, the caller must know about the WARM tuples in the
> chains. So marking the index would require us to mark both old and new index
> tuples as requiring re-check. That requires an additional index scan to
> locate the old row and then an additional write to force it to re-check,
> which is algorithmically O(NlogN) on table size.
>
> Marking the heap, (2) is O(1) per updated index, so seems better. But since
> we don't know with respect to which index or indexes the chain is broken,
> all index scans must do a recheck for a tuple returned from a WARM chain.
>
> How do we mark a chain as WARM? How do we clear the flag? There are a few
> possible approaches:
>
> 1. Mark the first WARM tuple which causes HOT chain to become WARM with a
> special HEAP_RECHECK_REQUIRED flag (this flag must then be carried over to
> any further additions to the chain. Then whenever a tuple if returned from a
> chain, we must look forward into the chain for WARM tuple and let the caller
> know about potential breakage. This seems simple but would require us to
> follow HOT chain every time to check if it’s HOT or WARM.
>
> 2. Mark the root line pointer (or the root tuple) with a special
> HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in
> the chain. Since all indexes point to the root line pointer, it should be
> enough to just mark the root line pointer (or tuple) with this flag.
>
> 3. We could also adopt a hybrid approach and mark the new index tuple and
> new heap tuple with HEAP_RECHECK_REQUIRED flag and presence of either of the
> flag in either tuple forces recheck.


I've actually been thinking of another possibility in the context of
LITE, and what I was pondering looks eerily similar to WARM, so let me
summarize (I was going to write a lengthier POC but I thought I better
comment now since you're introducing the same concept):

Marks are troublesome becuase, as you mention, you would need one
"WARM" bit per index to be efficient. We also don't want a recheck on
every index access.

HOT is very close to WARM already, even the code. The only problem
with HOT is that it stops scanning the update chain when a tuple is
not HOT-updated. If heap_hot_search_buffer knew to keep on following
the update chain when there is a WARM update on the index in question,
this could work.

But it won't work if it's implemented naively, because if there was
another item pointer in another page of the index pointing to a heap
tuple that belongs to an already-visited chain, which is possible in
the below case, then the scan would produce duplicate rows:

N: Row versions:
->: update
* : causes a new index tuple on index I

1 -> 2 -> 3 -> 4 -> 5
*             *     *
a             b    a  <- key

If the scan qual is "key in ('a','b')", then the index scan will visit
both versions of the row, 1, 3 and 4. Suppose only 5 is visible, and
suppose the whole chain lies in the same page.

There is one update chain, 1..5, but three WARM chains: 1..2, 3, 4..5

When visiting item pointers, 1 will be found, the chain will be
followed and eventually version 5 will pass the visibility check and
the tuple will pass the scan quals, it will even match the key in the
index item, so no key check will fix this. 5 will be yielded when
visiting version 1.

Next, version 3 will be visited. There's no easy way to know that we
already visited 3, so we'll follow the chain all the way to 5, and,
again, it will be yielded. Here, the key won't match, so maybe this
case could be filtered out with a key check.

Finally, version 4 will be visited, the chain will be followed to 5,
and as in the first case, it will be yielded.

So the scan yielded the same heap tuple thrice, which isn't an ideal situation.

It would be totally acceptable for a bitmap index scan, so the
expensive logic I will be outlining below may be skipped when building
a bitmap (lossy or not).

To fix this, what I was pondering, and I believe it would be viable,
is to teach heap_hot_search_buffer about the WARM invariant. It will
make correctness heavily depend on the invariant being true, which
means either flagging item pointers as WARM chains, or requiring a
reindex during upgrade to make sure all indexes verify the invariant -
as old indices won't.

When an item pointer is WARM (either implicit or explicit),
heap_hot_search_buffer will only stop following the update chain if it
reaches not the end of a HOT chain, but rather a key change. For this,
it will need to recheck the key of all but the first TID in the chain,
so singleton chains (ie: not-updated or frozen tuples) won't incurr
this extra cost. When walking the chain, it will need to form an index
tuple for both the old and new versions, and compare them, and follow
the chain *only* if the keys are equal.

As an optimization, if the tuple is marked HOT-updated, the key check
may be skipped.

Vacuum will need to be taught not to break the invariant, but I
believe this is viable and efficient.



Re: Heap WARM Tuples - Design Draft

From
Simon Riggs
Date:
On 4 August 2016 at 18:05, Bruce Momjian <bruce@momjian.us> wrote:

>> Approach 2 seems more reasonable and simple.
>>
>> There are only 2 bits for lp_flags and all combinations are already used. But
>> for LP_REDIRECT root line pointer, we could use the lp_len field to store this
>> special flag, which is not used for LP_REDIRECT line pointers. So we are able
>> to mark the root line pointer.
>
> Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
> the tuple that the ctid was pointing to, but it seems you would need to
> set HEAP_RECHECK_REQUIRED earlier than that.

Hmm. Mostly there will be one, so this is just for the first update
after any VACUUM.

Adding a new linepointer just to hold this seems kludgy and could mean
we run out of linepointers.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 09:58:29AM -0700, Andres Freund wrote:
> On 2016-08-04 12:55:25 -0400, Bruce Momjian wrote:
> > On Thu, Aug  4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
> > > > Indexes whose values do not change do not require new index pointers. Only
> > > > the index whose key is being changed will need a new index entry. The new
> > > > index entry will be set to the CTID of the root line pointer.
> > > 
> > > That seems to require tracing all hot-chains in a page, to actually
> > > figure out what the root line pointer of a warm-updated HOT tuple is,
> > > provided it's HOT_UPDATED itself.  Or have you found a smart way to
> > > figure that out?
> > 
> > The index points to the head of the HOT chain, and you just walk the
> > chain on the page --- on need to look at other chains on the page.
> 
> When doing an update, you'll need to re-find the root tuple, to insert
> the root ctid for the new index tuples.

Also, I was thinking we could just point into the middle of the HOT
chain to limit the number of rows we have to check, but the problem
there is that HOT pruning could then not remove that middle ctid, which
is bad.

I wonder if walking all HOT chains on a single page to find the root is
cheap enough.  The ctid tells you if it is part of a HOT chain, right?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Andres Freund
Date:
On 2016-08-04 18:11:17 +0100, Simon Riggs wrote:
> On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
> > Hi,
> >
> > On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
> >> Indexes whose values do not change do not require new index pointers. Only
> >> the index whose key is being changed will need a new index entry. The new
> >> index entry will be set to the CTID of the root line pointer.
> >
> > That seems to require tracing all hot-chains in a page, to actually
> > figure out what the root line pointer of a warm-updated HOT tuple is,
> > provided it's HOT_UPDATED itself.  Or have you found a smart way to
> > figure that out?
> 
> Hmm, sorry, I did think of that point and I thought I had added it to the doc.
> 
> So, yes, I agree - re-locating the root is the biggest downside to
> this idea. Perhaps Pavan has other thoughts?
> 
> I think its doable, but it will be fiddly.

I'm less afraid of the fiddlyness of finding the root tuple, than the
cost. It's not cheap to walk through, potentially, all hot chains to
find the root ctid.

Have you considered what it'd take to allow index pointers to allow to
point to "intermediate" root tuples? I.e. have some indexes point into
the middle of an update-chain, whereas others still point to the
beginning? There's certainly some complications involved with that, but
it'd also have the advantage in reducing the amount of rechecking that
needs to be done.



Re: Heap WARM Tuples - Design Draft

From
Simon Riggs
Date:
On 4 August 2016 at 18:17, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-04 18:11:17 +0100, Simon Riggs wrote:
>> On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
>> > Hi,
>> >
>> > On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
>> >> Indexes whose values do not change do not require new index pointers. Only
>> >> the index whose key is being changed will need a new index entry. The new
>> >> index entry will be set to the CTID of the root line pointer.
>> >
>> > That seems to require tracing all hot-chains in a page, to actually
>> > figure out what the root line pointer of a warm-updated HOT tuple is,
>> > provided it's HOT_UPDATED itself.  Or have you found a smart way to
>> > figure that out?
>>
>> Hmm, sorry, I did think of that point and I thought I had added it to the doc.
>>
>> So, yes, I agree - re-locating the root is the biggest downside to
>> this idea. Perhaps Pavan has other thoughts?
>>
>> I think its doable, but it will be fiddly.
>
> I'm less afraid of the fiddlyness of finding the root tuple, than the
> cost. It's not cheap to walk through, potentially, all hot chains to
> find the root ctid.
>
> Have you considered what it'd take to allow index pointers to allow to
> point to "intermediate" root tuples? I.e. have some indexes point into
> the middle of an update-chain, whereas others still point to the
> beginning? There's certainly some complications involved with that, but
> it'd also have the advantage in reducing the amount of rechecking that
> needs to be done.

"Intermediate root tuples" was my first attempt at this and it didn't
work. I'll dig up the details, some problem in VACUUM, IIRC.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 06:16:02PM +0100, Simon Riggs wrote:
> On 4 August 2016 at 18:05, Bruce Momjian <bruce@momjian.us> wrote:
> 
> >> Approach 2 seems more reasonable and simple.
> >>
> >> There are only 2 bits for lp_flags and all combinations are already used. But
> >> for LP_REDIRECT root line pointer, we could use the lp_len field to store this
> >> special flag, which is not used for LP_REDIRECT line pointers. So we are able
> >> to mark the root line pointer.
> >
> > Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
> > the tuple that the ctid was pointing to, but it seems you would need to
> > set HEAP_RECHECK_REQUIRED earlier than that.
> 
> Hmm. Mostly there will be one, so this is just for the first update
> after any VACUUM.
> 
> Adding a new linepointer just to hold this seems kludgy and could mean
> we run out of linepointers.

Ah, so in cases where there isn't an existing LP_REDIRECT for the chain,
you create one and use the lp_len to identify it as a WARM chain?  Hmm.

You can't update the indexes pointing to the existing ctid, so what you
would really have to do is to write over the existing ctid with
LP_REDIRECT plus WARM marker, and move the old ctid to a new ctid slot?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 06:18:47PM +0100, Simon Riggs wrote:
> > Have you considered what it'd take to allow index pointers to allow to
> > point to "intermediate" root tuples? I.e. have some indexes point into
> > the middle of an update-chain, whereas others still point to the
> > beginning? There's certainly some complications involved with that, but
> > it'd also have the advantage in reducing the amount of rechecking that
> > needs to be done.
> 
> "Intermediate root tuples" was my first attempt at this and it didn't
> work. I'll dig up the details, some problem in VACUUM, IIRC.

I think the problem is that HOT chain pruning becomes almost impossible
except via VACUUM.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Andres Freund
Date:
On 2016-08-04 18:18:47 +0100, Simon Riggs wrote:
> On 4 August 2016 at 18:17, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-08-04 18:11:17 +0100, Simon Riggs wrote:
> > I'm less afraid of the fiddlyness of finding the root tuple, than the
> > cost. It's not cheap to walk through, potentially, all hot chains to
> > find the root ctid.
> >
> > Have you considered what it'd take to allow index pointers to allow to
> > point to "intermediate" root tuples? I.e. have some indexes point into
> > the middle of an update-chain, whereas others still point to the
> > beginning? There's certainly some complications involved with that, but
> > it'd also have the advantage in reducing the amount of rechecking that
> > needs to be done.
> 
> "Intermediate root tuples" was my first attempt at this and it didn't
> work. I'll dig up the details, some problem in VACUUM, IIRC.

I think it's doable, but the question is how to do cleanup. It's kind of
easy to end up with a page full of line pointers which are all reachable
from indexes, but otherwise useless.  But I'm not sure that's actually
that inevitable:

a) Given that we fully scan indexes anyway, we could apply redirections  to the indexes themselves, during vacuum. Then
redirectionscould be  removed afterwards.
 
b) Start fixing up indexes, by refinding entries in the index from the  heap tuple. That requires expressions to
actuallybe immutable, but  personally I think it's stupid to cater for expressions that are  not. Then we can update
indexesto point to the "final tuple" after  pruning.  It'd also pave the way to avoid the full index scan at the  end
ofvacuum (in some cases).
 


Andres



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Thu, Aug 4, 2016 at 10:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
>> Indexes whose values do not change do not require new index pointers. Only
>> the index whose key is being changed will need a new index entry. The new
>> index entry will be set to the CTID of the root line pointer.
>
> That seems to require tracing all hot-chains in a page, to actually
> figure out what the root line pointer of a warm-updated HOT tuple is,
> provided it's HOT_UPDATED itself.  Or have you found a smart way to
> figure that out?

Hmm, sorry, I did think of that point and I thought I had added it to the doc.

So, yes, I agree - re-locating the root is the biggest downside to
this idea. Perhaps Pavan has other thoughts?


I clearly missed this problem, so what I say now is not fully thought through. I see couple of options:

1. Most often the tuple that's being updated will be fetched by recent index scan. So we can potentially cache last (few) mappings of CTID to their root line pointers. May be we can store that in RelationData on a per relation basis.
2. I also think that most often the tuple that will be updated will have its t_ctid pointing to itself. This sounds more invasive, but can we somehow utilise the t_ctid field to store the OffsetNumber of the root line pointer? The root line pointer only exists when the tuple under consideration has HEAP_ONLY_TUPLE flag set and we know for such tuples, BlockNumber in t_ctid must be same as current block (unless HEAP_UPDATED is also set and the updating transaction eventually aborted).

We may have to fall back to a full page scan to find the root line pointer, but we can limit that for corner cases.

Thanks,
Pavan


--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Thu, Aug 4, 2016 at 10:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug  4, 2016 at 06:16:02PM +0100, Simon Riggs wrote:
> On 4 August 2016 at 18:05, Bruce Momjian <bruce@momjian.us> wrote:
>
> >> Approach 2 seems more reasonable and simple.
> >>
> >> There are only 2 bits for lp_flags and all combinations are already used. But
> >> for LP_REDIRECT root line pointer, we could use the lp_len field to store this
> >> special flag, which is not used for LP_REDIRECT line pointers. So we are able
> >> to mark the root line pointer.
> >
> > Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
> > the tuple that the ctid was pointing to, but it seems you would need to
> > set HEAP_RECHECK_REQUIRED earlier than that.
>
> Hmm. Mostly there will be one, so this is just for the first update
> after any VACUUM.
>
> Adding a new linepointer just to hold this seems kludgy and could mean
> we run out of linepointers.

Ah, so in cases where there isn't an existing LP_REDIRECT for the chain,
you create one and use the lp_len to identify it as a WARM chain?  Hmm.


If the root tuple still exists, we store the WARM flag (or HEAP_RECHECK_REQUIRED as used in the original post) in the tuple header itself. When the root tuple becomes dead and HOT prune decides to replace it with a LP_REDIRECT line pointer, the information is moved to lp_len (which is currently set to 0 for LP_REDIRECT items). Does that answer your question?
 
You can't update the indexes pointing to the existing ctid, so what you
would really have to do is to write over the existing ctid with
LP_REDIRECT plus WARM marker, and move the old ctid to a new ctid slot?


Not really. I hope the above answers this, but please let me know if you mean something else.
 
Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 2:14 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> To fix this, what I was pondering, and I believe it would be viable,
> is to teach heap_hot_search_buffer about the WARM invariant. It will
> make correctness heavily depend on the invariant being true, which
> means either flagging item pointers as WARM chains, or requiring a
> reindex during upgrade to make sure all indexes verify the invariant -
> as old indices won't.
>
> When an item pointer is WARM (either implicit or explicit),
> heap_hot_search_buffer will only stop following the update chain if it
> reaches not the end of a HOT chain, but rather a key change. For this,
> it will need to recheck the key of all but the first TID in the chain,
> so singleton chains (ie: not-updated or frozen tuples) won't incurr
> this extra cost. When walking the chain, it will need to form an index
> tuple for both the old and new versions, and compare them, and follow
> the chain *only* if the keys are equal.
>
> As an optimization, if the tuple is marked HOT-updated, the key check
> may be skipped.
>
> Vacuum will need to be taught not to break the invariant, but I
> believe this is viable and efficient.


And I didn't mention the invariant.

The invariant would be that all WARM chains:
* contain entire HOT chains (ie: no item pointers pointing to the
middle of a HOT chain)* start at the item pointer, and end when the t_ctid of the heap
tuple either points to another page, or a tuple with a different index
key* there is exactly one WARM chain containing any heap tuple,
including the singleton chain represented by a tuple whose WARM chain
contains only itself

This is maintained by:
* No item pointer will be created for row versions whose immediately
previous version is already on the index with the same key

Cleanup can only proceed by applying cleanup on HOT chains, by moving
the head of a WARM chain forward, or by removing it entirely. It may
be that VACUUM already works like that, but I'm unsure.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 11:04:49PM +0530, Pavan Deolasee wrote:
> If the root tuple still exists, we store the WARM flag (or
> HEAP_RECHECK_REQUIRED as used in the original post) in the tuple header itself.
> When the root tuple becomes dead and HOT prune decides to replace it with a
> LP_REDIRECT line pointer, the information is moved to lp_len (which is
> currently set to 0 for LP_REDIRECT items). Does that answer your question?

Ah, brilliant!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 01:05:53PM -0400, Bruce Momjian wrote:
> > Approach 2 seems more reasonable and simple. 
> > 
> > There are only 2 bits for lp_flags and all combinations are already used. But
> > for LP_REDIRECT root line pointer, we could use the lp_len field to store this
> > special flag, which is not used for LP_REDIRECT line pointers. So we are able
> > to mark the root line pointer.
> 
> Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
> the tuple that the ctid was pointing to, but it seems you would need to
> set HEAP_RECHECK_REQUIRED earlier than that.
> 
> Also, what is currently in the lp_len field for LP_REDIRECT?  Zeros, or
> random data?  I am asking for pg_upgrade purposes.

It seems LP_REDIRECT's lp_len is always zero:  :-)
/* * ItemIdSetRedirect *      Set the item identifier to be REDIRECT, with the specified link. *      Beware of
multipleevaluations of itemId! */#define ItemIdSetRedirect(itemId, link) \( \    (itemId)->lp_flags = LP_REDIRECT, \
(itemId)->lp_off= (link), \    (itemId)->lp_len = 0 \              ^^^^^^^^^^)
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 02:35:32PM -0300, Claudio Freire wrote:
> > Vacuum will need to be taught not to break the invariant, but I
> > believe this is viable and efficient.
> 
> 
> And I didn't mention the invariant.
> 
> The invariant would be that all WARM chains:
> 
>  * contain entire HOT chains (ie: no item pointers pointing to the
> middle of a HOT chain)

You mean no _index_ item pointers, right?

>  * start at the item pointer, and end when the t_ctid of the heap
> tuple either points to another page, or a tuple with a different index
> key

Uh, there is only one visible tuple in a HOT chain, so I don't see the
different index key as being a significant check.  That is, I think we
should check the visibility first, as we do now, and then, for the
only-visible tuple, check the key.  I don't see a value in (expensive)
checking the key of an invisible tuple in hopes of stoping the HOT chain
traversal.

Also, what happens when a tuple chain goes off-page, then returns to the
original page?  That is a new HOT chain given our current code, but
would the new tuple addition realize it needs to create a new index
tuple?  The new tuple code needs to check the index's root HOT tid for a
non-match.

> This is maintained by:
> 
>  * No item pointer will be created for row versions whose immediately
> previous version is already on the index with the same key

You really only have the ctid HOT head stored in the index, not the
immediate ctid.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 3:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Aug  4, 2016 at 02:35:32PM -0300, Claudio Freire wrote:
>> > Vacuum will need to be taught not to break the invariant, but I
>> > believe this is viable and efficient.
>>
>>
>> And I didn't mention the invariant.
>>
>> The invariant would be that all WARM chains:
>>
>>  * contain entire HOT chains (ie: no item pointers pointing to the
>> middle of a HOT chain)
>
> You mean no _index_ item pointers, right?

Yes

>>  * start at the item pointer, and end when the t_ctid of the heap
>> tuple either points to another page, or a tuple with a different index
>> key
>
> Uh, there is only one visible tuple in a HOT chain, so I don't see the
> different index key as being a significant check.

For a HOT chain, sure. That's why I mentioned it as an optimization.

But t_ctid, I checked, unless I read the code wrong, is set for
non-HOT updates too.

So following t_ctid regardless of the HOT-update flags should yield
update chains that can walk several buffers, and that's in fact the
mechanism I'm proposing, only just stop following the chain when it
jumps buffers.

>  That is, I think we
> should check the visibility first, as we do now, and then, for the
> only-visible tuple, check the key.

No, because, as I explained in the example, doing that will result in
duplicates being returned.

The whole update chain can span multiple WARM chains, and each WARM
chain can span multiple HOT chains.

Consider, assuming there's an index on id and another on val:

INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla');
UPDATE t SET dat='blabla2' where id = 32;
UPDATE t SET dat='blabla3', id=33 where id = 32;
UPDATE t SET val='b' where id = 33;
UPDATE t SET dat='blabla4' where id = 33;
UPDATE t SET val='a' where id = 33;

This produces a case similar to the one I described. Assuming all
updates happen on the same page, the first will be HOT, so no key
check is needed. The second cannot be HOT, because it updates the id.
But it can be WARM, if it's done on the same page, so it can avoid
updating the index on val. The third will need a new index item
pointer, because it has a new key. The t_ctid chain will walk all the
versions of the row, but the one with 'blabla2' will not have the
HOT-updated bit set. So regular HOT chain walking will stop there, I
propose to not stop there, so the index on val can follow the chain
into blabla3 as if it were a HOT update. Correctness is achieved only
if there is no index item pointer on that index pointing to that index
version, and since there is no efficient way of checking, the
invariants I propose aim to guarantee it so it can be assumed. Since
WARM updates don't create new index item pointers, what needs to be
checked is whether updating from version 2 to version 3 would be a
WARM update on this index or not, and that equals to checking the
index keys for equality.

>  I don't see a value in (expensive)
> checking the key of an invisible tuple in hopes of stoping the HOT chain
> traversal.

Not HOT chain, the point is to guarantee correctness by properly
identifying the end of the WARM (not HOT) chain, which is left
implicit.

> Also, what happens when a tuple chain goes off-page, then returns to the
> original page?

The WARM chain ends on the page hop, and when it returns it's a new
WARM chain. So traversal would not follow t_ctid pointers that hop
pages, which makes it efficient, and also guarantees correctness
(since if the is a page hop, we know the index will contain an item
pointer to the version in that other page, so we'll visit it when we
get there).

> That is a new HOT chain given our current code, but
> would the new tuple addition realize it needs to create a new index
> tuple?  The new tuple code needs to check the index's root HOT tid for a
> non-match.

I'm not proposing to change how HOT works, but adding another very
similar mechanism on top of it called WARM.

So HOT will keep working like that, HOT pruning will happen as usual,
but we'll have the concept (immaterialized in the physical
representation of the data, just implicit) of WARM chains. WARM chains
would span one or several HOT chains, so they're "bigger".

>> This is maintained by:
>>
>>  * No item pointer will be created for row versions whose immediately
>> previous version is already on the index with the same key
>
> You really only have the ctid HOT head stored in the index, not the
> immediate ctid.

I know. I just mentioned it just in case there was a transient time
during cleanup when it isn't true, but thinking it a little bit more,
if it weren't, HOT would also cause duplicates during index scans, so
it's probably not the case (or protected with a lock).



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 3:23 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> That is a new HOT chain given our current code, but
>> would the new tuple addition realize it needs to create a new index
>> tuple?  The new tuple code needs to check the index's root HOT tid for a
>> non-match.
>
> I'm not proposing to change how HOT works, but adding another very
> similar mechanism on top of it called WARM.
>
> So HOT will keep working like that, HOT pruning will happen as usual,
> but we'll have the concept (immaterialized in the physical
> representation of the data, just implicit) of WARM chains. WARM chains
> would span one or several HOT chains, so they're "bigger".


Answering your question, there's no need to find the root page on index updates.

When creating the new tuple in nodeModifyTable.c, the code currently
skips updating indexes if it's using HOT.

We would add a check for WARM too. It will use WARM for index X iff both:
* ItemPointerGetBlockNumber(oldtuplectid) ==
ItemPointerGetBlockNumber(newtuplectid)* satisfies HOT for only this index X



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 03:23:10PM -0300, Claudio Freire wrote:
> INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla');
> UPDATE t SET dat='blabla2' where id = 32;
> UPDATE t SET dat='blabla3', id=33 where id = 32;
> UPDATE t SET val='b' where id = 33;
> UPDATE t SET dat='blabla4' where id = 33;
> UPDATE t SET val='a' where id = 33;
> 
> This produces a case similar to the one I described. Assuming all
> updates happen on the same page, the first will be HOT, so no key
> check is needed. The second cannot be HOT, because it updates the id.
> But it can be WARM, if it's done on the same page, so it can avoid
> updating the index on val. The third will need a new index item
> pointer, because it has a new key. The t_ctid chain will walk all the
> versions of the row, but the one with 'blabla2' will not have the
> HOT-updated bit set. So regular HOT chain walking will stop there, I
> propose to not stop there, so the index on val can follow the chain
> into blabla3 as if it were a HOT update. Correctness is achieved only
> if there is no index item pointer on that index pointing to that index
> version, and since there is no efficient way of checking, the
> invariants I propose aim to guarantee it so it can be assumed. Since
> WARM updates don't create new index item pointers, what needs to be
> checked is whether updating from version 2 to version 3 would be a
> WARM update on this index or not, and that equals to checking the
> index keys for equality.

OK, it took me a while to understand this, so let me see if I can
restate it.  You have an update chain in a single page which is made up
of one ore more HOT chains, some of which might be WARM.  A flag
indicates a WARM chain (see below).  A WARM chain cannot go outside of a
HOT chain.

The HOT chains have identical keys for all indexes, and the WARM chains
have identical keys for some indexes.  If an update chain is all on the
same page, it can have multiple HOT chains.  I think this is possible if
an update changes _all_ indexed columns in the middle of an update
chain, turning off HOT and WARM.

I think we agreed that WARM index tuples will point to the head of the
HOT update chain on the same page.  We will store the
HEAP_RECHECK_REQUIRED flag on the update chain head tuple, or on the
LP_REDIRECT ctid if we removed the tuple during HOT cleanup.

So, I think the case you are discussing is walking WARM beyond the end
of the HOT chain so you don't have to create a second index entry for
the second HOT chain in the same update chain for matching values.  

For example, you have a HOT chain, and you set the indexed col1=2, then
later, in the same HOT chain, change it to col1=4, then later, in the
same HOT chain, back to col1=2.  I assume the last update would see
there was already an index entry pointing to the head of the same HOT
chain, and would skip adding to the index.

However, you might be saying that the last update of col1=2 makes a
non-HOT entry in the update chain, and you want to find that without
adding an index entry.

That seems overly complex and not worth it.

Another approach would be to keep updates on the same page in a single
WARM chain, even if all indexes have changed columns.  We will already
have code to walk the HOT chain looking for matches, and at most one
tuple in the update chain is visible.  The downside of this is
unnecessary checking if the tuple matches the index.

> >  I don't see a value in (expensive)
> > checking the key of an invisible tuple in hopes of stoping the HOT chain
> > traversal.
> 
> Not HOT chain, the point is to guarantee correctness by properly
> identifying the end of the WARM (not HOT) chain, which is left
> implicit.

Why should WARM chains span beyond HOT chains?

> > Also, what happens when a tuple chain goes off-page, then returns to the
> > original page?
> 
> The WARM chain ends on the page hop, and when it returns it's a new
> WARM chain. So traversal would not follow t_ctid pointers that hop
> pages, which makes it efficient, and also guarantees correctness
> (since if the is a page hop, we know the index will contain an item
> pointer to the version in that other page, so we'll visit it when we
> get there).
> 
> > That is a new HOT chain given our current code, but
> > would the new tuple addition realize it needs to create a new index
> > tuple?  The new tuple code needs to check the index's root HOT tid for a
> > non-match.
> 
> I'm not proposing to change how HOT works, but adding another very
> similar mechanism on top of it called WARM.
> 
> So HOT will keep working like that, HOT pruning will happen as usual,
> but we'll have the concept (immaterialized in the physical
> representation of the data, just implicit) of WARM chains. WARM chains
> would span one or several HOT chains, so they're "bigger".

Yes, I think I got it, but what is the benefit?  Fewer index entries?

As I see it, the only way with WARM you could have an update chain on
the same page that has multiple HOT chains is if you changed all the
indexed columns.  The complexity and rechecking of handling that doesn't
seem worth trying to avoid index entries for the second HOT chain.

> >> This is maintained by:
> >>
> >>  * No item pointer will be created for row versions whose immediately
> >> previous version is already on the index with the same key
> >
> > You really only have the ctid HOT head stored in the index, not the
> > immediate ctid.
> 
> I know. I just mentioned it just in case there was a transient time
> during cleanup when it isn't true, but thinking it a little bit more,
> if it weren't, HOT would also cause duplicates during index scans, so
> it's probably not the case (or protected with a lock).

I am looking for clarification on this posting.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 03:33:16PM -0300, Claudio Freire wrote:
> On Thu, Aug 4, 2016 at 3:23 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> >> That is a new HOT chain given our current code, but
> >> would the new tuple addition realize it needs to create a new index
> >> tuple?  The new tuple code needs to check the index's root HOT tid for a
> >> non-match.
> >
> > I'm not proposing to change how HOT works, but adding another very
> > similar mechanism on top of it called WARM.
> >
> > So HOT will keep working like that, HOT pruning will happen as usual,
> > but we'll have the concept (immaterialized in the physical
> > representation of the data, just implicit) of WARM chains. WARM chains
> > would span one or several HOT chains, so they're "bigger".
> 
> 
> Answering your question, there's no need to find the root page on index updates.
> 
> When creating the new tuple in nodeModifyTable.c, the code currently
> skips updating indexes if it's using HOT.
> 
> We would add a check for WARM too. It will use WARM for index X iff both:
> 
>  * ItemPointerGetBlockNumber(oldtuplectid) ==
> ItemPointerGetBlockNumber(newtuplectid)
>  * satisfies HOT for only this index X

OK, I see why you like walking the entire update chain instead of just
walking the HOT chain --- you think it avoids us having to find the HOT
chain root.  However, how do you check the index to find out of the
update chain root is already in the index?  I don't think you can just
look at the current tuple to know that since a previous tuple in the
update chain might have put it there even if the current old tuple
wouldn't have.

My point is there can be multiple update chains on the same page for
different rows with identical indexed values, so I don't see how
checking just for the page number helps here.  Are we going to check the
entire page?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 10:58:33PM +0530, Pavan Deolasee wrote:
> 2. I also think that most often the tuple that will be updated will have its
> t_ctid pointing to itself. This sounds more invasive, but can we somehow

Agreed!  I think it has to.  Great idea.

> utilise the t_ctid field to store the OffsetNumber of the root line pointer?
> The root line pointer only exists when the tuple under consideration has
> HEAP_ONLY_TUPLE flag set and we know for such tuples, BlockNumber in t_ctid
> must be same as current block (unless HEAP_UPDATED is also set and the updating
> transaction eventually aborted).

Oh, aborts, so the tuple we are looking at is the head of the chain and
points to a tuple that was aborted.  So we would still need a page-scan
to handle cases where the t_ctid was over-written by an aborted tuple,
but as you said it would be rare.

So, if the tuple has HEAP_ONLY_TUPLE set, we know the block number
ip_blkid equals our block number, so any ip_blkid value that doesn't
equal our block number is a special flag that indicates that ip_posid
points to the head of our HOT chain.  I guess we could just set it to
(our block number + 1), with rollover possible and desired.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 7:21 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Aug  4, 2016 at 03:23:10PM -0300, Claudio Freire wrote:
>> INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla');
>> UPDATE t SET dat='blabla2' where id = 32;
>> UPDATE t SET dat='blabla3', id=33 where id = 32;
>> UPDATE t SET val='b' where id = 33;
>> UPDATE t SET dat='blabla4' where id = 33;
>> UPDATE t SET val='a' where id = 33;
>>
>> This produces a case similar to the one I described. Assuming all
>> updates happen on the same page, the first will be HOT, so no key
>> check is needed. The second cannot be HOT, because it updates the id.
>> But it can be WARM, if it's done on the same page, so it can avoid
>> updating the index on val. The third will need a new index item
>> pointer, because it has a new key. The t_ctid chain will walk all the
>> versions of the row, but the one with 'blabla2' will not have the
>> HOT-updated bit set. So regular HOT chain walking will stop there, I
>> propose to not stop there, so the index on val can follow the chain
>> into blabla3 as if it were a HOT update. Correctness is achieved only
>> if there is no index item pointer on that index pointing to that index
>> version, and since there is no efficient way of checking, the
>> invariants I propose aim to guarantee it so it can be assumed. Since
>> WARM updates don't create new index item pointers, what needs to be
>> checked is whether updating from version 2 to version 3 would be a
>> WARM update on this index or not, and that equals to checking the
>> index keys for equality.
>
> OK, it took me a while to understand this, so let me see if I can
> restate it.  You have an update chain in a single page which is made up
> of one ore more HOT chains, some of which might be WARM.  A flag
> indicates a WARM chain (see below).  A WARM chain cannot go outside of a
> HOT chain.

It's the other way around. A WARM chain has to span whole HOT chains.
The WARM chain always begins in the head of a HOT chain (or non-HOT
tuple of course), and cannot end before the HOT chain ends.

That is, all within a single page:
 [  v1 .. v2 .. v3  .. v4 .. v5 .. v6  v7 ] [ hot chain 1   ][ hot chain 2 ] [NH] [         WARM CHAIN          ] [NW]

(NH = non-HOT)
(NW = non-WARM)

The flag could be useful for the upgrade path, but if you ignore
having to live with old-format indexes, it's not necessary.

A WARM chain starts at any index item pointer. It ends when there's a
key change or a page change (ie: the WARM chain cannot span multiple
pages). That cannot happen within a HOT chain, so that point will also
be the end of a HOT chain.

You can think of a WARM chain as a chain of HOT chains.

> The HOT chains have identical keys for all indexes, and the WARM chains
> have identical keys for some indexes.

Lets highlight that a WARM chain concept only exists in the context of
a specific index. WARM chains (the way I describe them) apply to only
one index at a time.

So, different indexes will have different chaining of HOT chains into
WARM chains. While index A may chain HOT chain 1 and HOT chain 2
together, index B may not. In fact, the way WARM works garantees that
at least one index won't chain them.

So WARM chains have identical keys for a specific index. Different
indexes have different WARM chains.

>  If an update chain is all on the
> same page, it can have multiple HOT chains.  I think this is possible if
> an update changes _all_ indexed columns in the middle of an update
> chain, turning off HOT and WARM.

Yes

> I think we agreed that WARM index tuples will point to the head of the
> HOT update chain on the same page.

The head of the WARM chain, which may not be the HOT update chain.

In the above example, the WARM index tuple would point to the head of
the hot chain 1, not hot chain 2.

> We will store the
> HEAP_RECHECK_REQUIRED flag on the update chain head tuple, or on the
> LP_REDIRECT ctid if we removed the tuple during HOT cleanup.

I wouldn't put it on the update chain head, I'd put it on the
just-inserted heap tuple, and I'd call it HEAP_WARM_TUPLE. I wouldn't
use a corresponding HEAP_WARM_UPDATED, there's no need to update the
old tuple's flags.

The purpose of the flag is to be able to discern new-style index
pointers from old-style ones, since they require different ways of
following the update chain.

In the example above, old indexes will have item pointers to the head
of hot chain 1 and 2. A scan that finds the item pointer for hot chain
1 has to scan the HOT chain only, and not step into hot chain 2,
because there's an item pointer for hot chain 2 already. New indexes
need to follow through the end of HOT chain 1 into HOT chain 2,
because there won't be an item pointer pointing to the head of HOT
chain 2. The flag being set in the tuple would signal that this is a
new-style update, so it needs to follow the WARM chain, whereas if the
flag is 0, it needs not. It allows for a painless upgrade, but that's
it.

The flag alone doesn't remove the need to perform the key check.
Because what's a WARM update for one index may be a regular update for
another, and you can't store that level of detail on heap tuple flags.

HOT cleanup, I believe, needs to move the HEAP_WARM_TUPLE flag as it
moves the head. Or HOT updates could copy the flag across all HOT
tuples. The last way would mean HOT cleanup needs no modifications.


> So, I think the case you are discussing is walking WARM beyond the end
> of the HOT chain so you don't have to create a second index entry for
> the second HOT chain in the same update chain for matching values.

Exactly.

> For example, you have a HOT chain, and you set the indexed col1=2, then
> later, in the same HOT chain, change it to col1=4, then later, in the
> same HOT chain, back to col1=2.  I assume the last update would see
> there was already an index entry pointing to the head of the same HOT
> chain, and would skip adding to the index.

If an indexed col1 is updated, the update isn't HOT. May be WARM, but not HOT.

Assume WARM, that is, there is an index on col1 but it's not the index
we're updating. How would you do that efficiently?

For unique indexes maybe, but for non-unique, a key could match an
unbounded number of rows, and scanning all item tuples to find the
right one would be extremely expensive.

The mechanism for deciding not to add an index entry, then, is key
changes. As in the case of HOT, if no indexed (for this index) columns
change, there's no need to add an index entry. So only the initial
insert will create an index entry, and then WARM will know to follow
the update chain even if it crosses HOT chain boundaries, **because
the keys don't change**.

> However, you might be saying that the last update of col1=2 makes a
> non-HOT entry in the update chain, and you want to find that without
> adding an index entry.

Yes.

The way I understood the initial proposal (similarly mine) is that
WARM would avoid adding index entries for some non-HOT updates in some
indexes.

So, suppose you have a row, and update it always in non-HOT (to
simplify things) but WARM ways (including an indexed column, just not
the index in question).

There are no HOT chains, only an update chain. You can have only a
single index entry like this, and it works:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4)

Then you update the first column which is the index key. The index
will need a new index entry or lookups may fail:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5)

And you go back to the old value. You get:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) -> ?>(a, 6)

Now here you do have a choice, new index entry or not. In order to
decide that, however, you have to find ix1. But ix1 could be anywhere
on the tid list for key 'a', and in some cases that key could match
millions (or more) tids. So I don't believe finding it through the
index would be efficient. The only item you do know exists is ix2.
Maybe you could backtrack the update chain to find the root, but that
would be expensive. How expensive? How does it compare to the options?
Lets see

So, if you *don't* create an index entry there, you need to find the
root ix1. That's doable with some CPU cost and no I/O (just
backtracking index pointers). That would slow down inserts, I'm not
sure how much relative to other options.
And when you do find it, you need to compare the keys, ix1 could have
pointed to (c, 1) and then an index entry is absolutely necessary. You
also have to compare all the keys for the whole chain, there could be
keys in the middle with index entries, and any one of those could be
'a', so you need to walk the chain from ix1 onward comparing keys as
you see them. If the update chain is long, this would be rather
expensive. Sure, no need to compare keys within a HOT chain, but here
there's no HOT chain, they're all non-HOT updates.

So, what I was proposing is to always add an index entry when there's
a key change:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) -> ix3>(a, 6)

That's simpler to decide at insert time, just one key comparison (that
was already done to decide HOT, mind you).

But at scan time, we'll find two index entries for key a: ix1 and ix3,
**both of which can reach a visible tuple** (only not consecutively,
so there's no way to know when you find ix1 that you will find ix3).

So the scan code needs a way to decide to stop scanning the update
chain when it reaches ix3, without actually seeing ix3 (it may be
anywhere else on the index). The way I came up with, is to check index
keys between HOT chains. Since we always add an index entry when keys
change, if keys change, we stop scanning becuase we *will* find
another index entry that will point to that place and continue the
scan. If we kept scanning, we would scan the remainder of the chain
multiple times.

Yes, it's expensive, but you pay a smaller price, relatively the size
of a recheck node: only one key recheck when going from one HOT chain
to another (vs analyzing a whole page of them during inserts).

I prefer that tradeoff, but I guess some people might prefer the
other, depending on access patterns.


> That seems overly complex and not worth it.
>
> Another approach would be to keep updates on the same page in a single
> WARM chain, even if all indexes have changed columns.  We will already
> have code to walk the HOT chain looking for matches, and at most one
> tuple in the update chain is visible.  The downside of this is
> unnecessary checking if the tuple matches the index.

The problem with that is, how to avoid visiting the same version twice
when visiting more than one index entry that points to the same chain?

That's unavoidable when indexed keys do change mid-way. You can't add
a flag for that anywhere on the heap, since the value depends on the
point of view (from the index being used).

>> >> This is maintained by:
>> >>
>> >>  * No item pointer will be created for row versions whose immediately
>> >> previous version is already on the index with the same key
>> >
>> > You really only have the ctid HOT head stored in the index, not the
>> > immediate ctid.
>>
>> I know. I just mentioned it just in case there was a transient time
>> during cleanup when it isn't true, but thinking it a little bit more,
>> if it weren't, HOT would also cause duplicates during index scans, so
>> it's probably not the case (or protected with a lock).
>
> I am looking for clarification on this posting.

Just ignore it. I thought it might be a source of trouble but that's
not the case.


So, summarizing:

During update:
* If the tuple satisfies HOT for at least one index, HEAP_WARM_TUPLE
is set on the new, perhaps HEAP_WARM_UPDATED on the old* If the tuple satisfies HOT for all indexes, HEAP_ONLY_TUPLE is
set,
HEAP_HOT_UPDATED on the old* For each index that didn't satisfy HOT (that is, had key changes):  - insert new index
entry

During scan:
* Set needs_recheck to false* Follow redirects and update chains as usual* When a tuple doesn't have either
HEAP_HOT_UPDATEDor
 
HEAP_WARM_UPDATED, stop following the chain* If a tuple only has HEAP_WARM_UPDATED and not HEAP_HOT_UPDATED:    - Find
thenext tuple, compare keys    - Stop if keys differ    - Otherwise, set needs_recheck to true and follow the chain
 

HOT cleanup works as usual

Vacuum I haven't researched yet, it's possible that it will need
changes to not break WARM chains, since dead index pointers could
still be necessary to reach live tuples.


Alternatively, without rechecks on non-visible keys:

During update:
* If the tuple satisfies HOT for at least one index, HEAP_WARM_TUPLE
is set on the new, perhaps HEAP_WARM_UPDATED on the old* If the tuple satisfies HOT for all indexes, HEAP_ONLY_TUPLE is
set,
HEAP_HOT_UPDATED on the old* If HEAP_ONLY_TUPLE is not set, for all indexes:  - invoke index_insert:      - Map update
chainsof the whole heap page, find root entry for
 
the new tuple (checking HEAP_WARM_UPDATED flags)      - Walk forward from the root, following t_ctid, comparing keys
against the new key      - Insert new entry iff there was no matching key

During scan:
* Set needs_recheck to false* Follow redirects and update chains as usual* When a tuple doesn't have either
HEAP_HOT_UPDATEDor
 
HEAP_WARM_UPDATED, stop following the chain* If a tuple only has HEAP_WARM_UPDATED and not HEAP_HOT_UPDATED:    - set
needs_recheckto true and follow the chain
 

HOT cleanup works as usual

The alternatives differ only on the performance tradeoffs, they both
avoid updating indexes unnecessarily. The second one may avoid more
updates than the first and provide faster scans, but slower updates.

Tough choice.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
>On Thu, Aug  4, 2016 at 06:21:48PM -0400, Bruce Momjian wrote:
> Another approach would be to keep updates on the same page in a single
> WARM chain, even if all indexes have changed columns.  We will already
> have code to walk the HOT chain looking for matches, and at most one
> tuple in the update chain is visible.  The downside of this is
> unnecessary checking if the tuple matches the index.

Forget this idea I had.  If all the indexed keys change, we don't save
anything in fewer index tuples, and we extend the length of the HOT
chain because we always have to point to the root of the HOT chain in
the index.

The only advantage would be allowing reuse of the old index tuple if the
indexed value changes back to and old indexed value in the same chain,
which seems rare.

Let me say I am excited at the progress that has been made on this item
in the past 36 hours and I think we are triangulating on a solution.

I know we are restricted by the existing page format and pg_upgrade in
adding this features, but frankly that limitation seems minor --- even
if designing a new storage system, figuring out how to do WARM chains
efficiently would be hard.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 7:59 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> So marking the index would require us to mark both old and new index tuples
> as requiring re-check. That requires an additional index scan to locate the
> old row and then an additional write to force it to re-check, which is
> algorithmically O(NlogN) on table size.


So, basically, I'm saying this isn't really O(NlogN), it's O(N),
manifested in low-cardinality indexes.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 09:53:31PM -0300, Claudio Freire wrote:
> It's the other way around. A WARM chain has to span whole HOT chains.
> The WARM chain always begins in the head of a HOT chain (or non-HOT
> tuple of course), and cannot end before the HOT chain ends.
> 
> That is, all within a single page:
> 
>   [  v1 .. v2 .. v3  .. v4 .. v5 .. v6  v7 ]
>   [ hot chain 1   ][ hot chain 2 ] [NH]
>   [         WARM CHAIN          ] [NW]
> 
> (NH = non-HOT)
> (NW = non-WARM)
> 
> The flag could be useful for the upgrade path, but if you ignore
> having to live with old-format indexes, it's not necessary.
> 
> A WARM chain starts at any index item pointer. It ends when there's a
> key change or a page change (ie: the WARM chain cannot span multiple
> pages). That cannot happen within a HOT chain, so that point will also
> be the end of a HOT chain.
> 
> You can think of a WARM chain as a chain of HOT chains.

OK, that's a lot of text, and I am confused.  Please tell me the
advantage of having an index point to a string of HOT chains, rather
than a single one?  Is the advantage that the index points into the
middle of the HOT chain rather than at the beginning?  I can see some
value to that, but having more ctid HOT headers that can't be removed
except by VACUUM seems bad, plus the need for index rechecks as you
cross to the next HOT chain seems bad.

The value of WARM is to avoid index bloat --- right now we traverse the
HOT chain on a single page just fine with no complaints about speed so I
don't see a value to optimizing that traversal, and I think the key
rechecks and ctid bloat will make it all much slower.

It also seems much more complicated.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Aug  4, 2016 at 09:53:31PM -0300, Claudio Freire wrote:
>> It's the other way around. A WARM chain has to span whole HOT chains.
>> The WARM chain always begins in the head of a HOT chain (or non-HOT
>> tuple of course), and cannot end before the HOT chain ends.
>>
>> That is, all within a single page:
>>
>>   [  v1 .. v2 .. v3  .. v4 .. v5 .. v6  v7 ]
>>   [ hot chain 1   ][ hot chain 2 ] [NH]
>>   [         WARM CHAIN          ] [NW]
>>
>> (NH = non-HOT)
>> (NW = non-WARM)
>>
>> The flag could be useful for the upgrade path, but if you ignore
>> having to live with old-format indexes, it's not necessary.
>>
>> A WARM chain starts at any index item pointer. It ends when there's a
>> key change or a page change (ie: the WARM chain cannot span multiple
>> pages). That cannot happen within a HOT chain, so that point will also
>> be the end of a HOT chain.
>>
>> You can think of a WARM chain as a chain of HOT chains.
>
> OK, that's a lot of text, and I am confused.  Please tell me the
> advantage of having an index point to a string of HOT chains, rather
> than a single one?  Is the advantage that the index points into the
> middle of the HOT chain rather than at the beginning?  I can see some
> value to that, but having more ctid HOT headers that can't be removed
> except by VACUUM seems bad, plus the need for index rechecks as you
> cross to the next HOT chain seems bad.
>
> The value of WARM is to avoid index bloat --- right now we traverse the
> HOT chain on a single page just fine with no complaints about speed so I
> don't see a value to optimizing that traversal, and I think the key
> rechecks and ctid bloat will make it all much slower.
>
> It also seems much more complicated.

The point is avoiding duplicate rows in the output of index scans.

I don't think you can avoid it simply by applying index condition
rechecks as the original proposal implies, in this case:

CREATE TABLE t (id integer not null primary key, someid integer, dat integer);
CREATE INDEX i1 ON t (someid);

INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
UPDATE t SET dat = dat + 1 where id = 1;
UPDATE t SET dat = dat + 1, id = 2 where id = 1;
UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
SELECT * FROM t WHERE someid = 2;

That, I believe, will cause the problematic chains where the condition
recheck passes both times the last visible tuple is visited during the
scan. It will return more than one tuple when in reality there is only
one.

Not to mention the cost of scanning the chain of N tuples N times,
making it quadratic - bound by the number of tuples that can fit a
page, but still high.

Solving that, I believe, will remove all the simplicity of pointing to
root tuples.

This is done in my proposals by adding boundaries to WARM chains such
that potential matches are visited only once (first variant) or
guaranteeing that visible tuples with a particular key will be
reachable from only one index entry (second variant). Pointing to the
middle of the chain also allows skipping the recheck for the first HOT
chain, which could improve scan performance a lot, especially in the
second variant that has no condition rechecks on invisible tuples.

I don't really see how you'd do that on yours. You seem to assume
finding a particular key-item pointer pair in an index would be
efficient, but IMHO that's not the case.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Thu, Aug  4, 2016 at 11:53:05PM -0300, Claudio Freire wrote:
> The point is avoiding duplicate rows in the output of index scans.
> 
> I don't think you can avoid it simply by applying index condition
> rechecks as the original proposal implies, in this case:
> 
> CREATE TABLE t (id integer not null primary key, someid integer, dat integer);
> CREATE INDEX i1 ON t (someid);
> 
> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);

OK, let me run through this and you can tell me where I am wrong.

At this point there are two indexes, one on 'id' and one on 'someid'.

> UPDATE t SET dat = dat + 1 where id = 1;

This is a HOT update because no indexes were changed.

> UPDATE t SET dat = dat + 1, id = 2 where id = 1;

This changes the HOT chain to a WARM chain because one index is changed.
That means that lookups on both indexes recheck the single visible
tuple, if there is one.

> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;

This is ends the WARM chain, and creates new index entries because all
indexes are changed.

> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;

This does the same thing.

> SELECT * FROM t WHERE someid = 2;

This uses the 'someid' index.  The index contains three entries:
1. {someid=2} pointing to first WARM chain2. {someid=3} pointing to single tuple (no HOT chain)3. {someid=2} pointing
tosingle tuple (no HOT chain)
 

The scan of #1 returns no visible rows.  #2 doesn't match the value in
the WHERE clause, so we don't even check the heap.  The scan of #3
returns one row.

Remember, we don't scan past the end of the HOT chain, which is what we
do now.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Fri, Aug 5, 2016 at 12:44 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
>
> This is ends the WARM chain, and creates new index entries because all
> indexes are changed.
>
>> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
>
> This does the same thing.
>
>> SELECT * FROM t WHERE someid = 2;
>
> This uses the 'someid' index.  The index contains three entries:
>
>         1. {someid=2} pointing to first WARM chain
>         2. {someid=3} pointing to single tuple (no HOT chain)
>         3. {someid=2} pointing to single tuple (no HOT chain)
>
> The scan of #1 returns no visible rows.  #2 doesn't match the value in
> the WHERE clause, so we don't even check the heap.  The scan of #3
> returns one row.
>
> Remember, we don't scan past the end of the HOT chain, which is what we
> do now.

Ok, I botched the example.

I wanted the other updates to all be WARM updates, so imagine there's
another index that is unchanged (say, a someid2 not updated ever).



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian <bruce@momjian.us> wrote:

>
> OK, that's a lot of text, and I am confused.  Please tell me the
> advantage of having an index point to a string of HOT chains, rather
> than a single one?  Is the advantage that the index points into the
> middle of the HOT chain rather than at the beginning?  I can see some
> value to that, but having more ctid HOT headers that can't be removed
> except by VACUUM seems bad, plus the need for index rechecks as you
> cross to the next HOT chain seems bad.
>
> The value of WARM is to avoid index bloat --- right now we traverse the
> HOT chain on a single page just fine with no complaints about speed so I
> don't see a value to optimizing that traversal, and I think the key
> rechecks and ctid bloat will make it all much slower.
>
> It also seems much more complicated.

The point is avoiding duplicate rows in the output of index scans.

I don't think you can avoid it simply by applying index condition
rechecks as the original proposal implies, in this case:

CREATE TABLE t (id integer not null primary key, someid integer, dat integer);
CREATE INDEX i1 ON t (someid);

INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
UPDATE t SET dat = dat + 1 where id = 1;
UPDATE t SET dat = dat + 1, id = 2 where id = 1;
UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
SELECT * FROM t WHERE someid = 2;

That, I believe, will cause the problematic chains where the condition
recheck passes both times the last visible tuple is visited during the
scan. It will return more than one tuple when in reality there is only
one.

Hmm. That seems like a real problem. The only way to address that is to ensure that for a given WARM chain and given index key, there must exists only a single index entry. There can and will be multiple entries if the index key changes, but if the index key changes to the original value (or any other value already in the WARM chain) again, we must not add another index entry. Now that does not seem like a very common scenario and skipping a duplicate index entry does have its own benefit, so may be its not a terrible idea to try that. You're right, it may be expensive to search for an existing matching index key for the same root line pointer. But if we could somehow short-circuit the more common case, it might still be worth trying.

The other idea you mentioned is to index intermediate tuples but somehow restrict WARM chain following knowing that the subsequent tuples are reachable via different index tuple. I see two major problems with that approach:

1. When HOT or WARM chains are pruned and dead tuples removed, we may lose the knowledge about key changes between intermediate updates. Without that its seems difficult to know when to stop while following chain starting from the old index tuple.

2. The existence of index pointers to intermediate tuples will lead to index bloat. This is the same problem that we found in Simon's original proposal (discussed internally). None of the intermediate line pointers can be vacuumed until the entire chain becomes DEAD. Event if the a duplicate index key is inserted for every index, knowing that and reclaiming to the index pointers to the original root line pointer, will be difficult.
 

Not to mention the cost of scanning the chain of N tuples N times,
making it quadratic - bound by the number of tuples that can fit a
page, but still high.

Solving that, I believe, will remove all the simplicity of pointing to
root tuples.


You're right. But having all indexes point to the root line pointer makes things simpler to manage and less buggy. So if we can somehow solve the duplicate tuples problem, even at additional cost at update time, it might still be worth. I would suggest that we should think more and we can find some solution to the problem.
 

I don't really see how you'd do that on yours. You seem to assume
finding a particular key-item pointer pair in an index would be
efficient, but IMHO that's not the case.

That's true. But we could somehow short-circuit the more common pattern, that might still be worth. For corner cases, we can fall back to non-HOT update and keep things simple. It will still be a huge improvement over what we have currently.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 09:57:19AM +0530, Pavan Deolasee wrote:
> Hmm. That seems like a real problem. The only way to address that is to ensure
> that for a given WARM chain and given index key, there must exists only a
> single index entry. There can and will be multiple entries if the index key
> changes, but if the index key changes to the original value (or any other value
> already in the WARM chain) again, we must not add another index entry. Now that
> does not seem like a very common scenario and skipping a duplicate index entry
> does have its own benefit, so may be its not a terrible idea to try that.
> You're right, it may be expensive to search for an existing matching index key
> for the same root line pointer. But if we could somehow short-circuit the more
> common case, it might still be worth trying.

I assumed that behavior was part of the original design;  I stated
earlier:
Also, what happens when a tuple chain goes off-page, then returns to theoriginal page?  That is a new HOT chain given
ourcurrent code, butwould the new tuple addition realize it needs to create a new indextuple?  The new tuple code needs
tocheck the index's root HOT tid for anon-match.
 

We have to store the _head_ of the HOT/WARM chain in the index, not only
for pruning, but so we can know if this HOT/WARM chain is already in the
index and skip the index addition.

I think modifying an index tuple is expensive, but checking an index and
avoiding an addition should be quick.

> The other idea you mentioned is to index intermediate tuples but somehow
> restrict WARM chain following knowing that the subsequent tuples are reachable
> via different index tuple. I see two major problems with that approach:

I don't see much value of doing that, and a lot of complexity.

> 
> 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose the
> knowledge about key changes between intermediate updates. Without that its
> seems difficult to know when to stop while following chain starting from the
> old index tuple.
> 
> 2. The existence of index pointers to intermediate tuples will lead to index
> bloat. This is the same problem that we found in Simon's original proposal
> (discussed internally). None of the intermediate line pointers can be vacuumed
> until the entire chain becomes DEAD. Event if the a duplicate index key is
> inserted for every index, knowing that and reclaiming to the index pointers to
> the original root line pointer, will be difficult.
>  
> 
> 
>     Not to mention the cost of scanning the chain of N tuples N times,
>     making it quadratic - bound by the number of tuples that can fit a
>     page, but still high.
> 
>     Solving that, I believe, will remove all the simplicity of pointing to
>     root tuples.
> 
> 
> 
> You're right. But having all indexes point to the root line pointer makes
> things simpler to manage and less buggy. So if we can somehow solve the
> duplicate tuples problem, even at additional cost at update time, it might
> still be worth. I would suggest that we should think more and we can find some
> solution to the problem.

Why is that hard?  It seems simple to me as we just try to do the index
insert, and skip it an entry for our key and ctid already exists.

>     I don't really see how you'd do that on yours. You seem to assume
>     finding a particular key-item pointer pair in an index would be
>     efficient, but IMHO that's not the case.
> 
> 
> That's true. But we could somehow short-circuit the more common pattern, that
> might still be worth. For corner cases, we can fall back to non-HOT update and
> keep things simple. It will still be a huge improvement over what we have
> currently.

I don't see why it is hard.  Trying to find the index entry for the old
update row seems odd, and updating an index row seems odd, but skipping
an index addition for the new row seems simple.  Is the problem
concurrent activity?  I assumed already had that ability to add to HOT
chains because we have to lock the update chain.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> >
>> > OK, that's a lot of text, and I am confused.  Please tell me the
>> > advantage of having an index point to a string of HOT chains, rather
>> > than a single one?  Is the advantage that the index points into the
>> > middle of the HOT chain rather than at the beginning?  I can see some
>> > value to that, but having more ctid HOT headers that can't be removed
>> > except by VACUUM seems bad, plus the need for index rechecks as you
>> > cross to the next HOT chain seems bad.
>> >
>> > The value of WARM is to avoid index bloat --- right now we traverse the
>> > HOT chain on a single page just fine with no complaints about speed so I
>> > don't see a value to optimizing that traversal, and I think the key
>> > rechecks and ctid bloat will make it all much slower.
>> >
>> > It also seems much more complicated.
>>
>> The point is avoiding duplicate rows in the output of index scans.
>>
>> I don't think you can avoid it simply by applying index condition
>> rechecks as the original proposal implies, in this case:
>>
>> CREATE TABLE t (id integer not null primary key, someid integer, dat
>> integer);
>> CREATE INDEX i1 ON t (someid);
>>
>> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
>> UPDATE t SET dat = dat + 1 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 2 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
>> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
>> SELECT * FROM t WHERE someid = 2;
>>
>> That, I believe, will cause the problematic chains where the condition
>> recheck passes both times the last visible tuple is visited during the
>> scan. It will return more than one tuple when in reality there is only
>> one.
>
>
> Hmm. That seems like a real problem. The only way to address that is to
> ensure that for a given WARM chain and given index key, there must exists
> only a single index entry. There can and will be multiple entries if the
> index key changes, but if the index key changes to the original value (or
> any other value already in the WARM chain) again, we must not add another
> index entry. Now that does not seem like a very common scenario and skipping
> a duplicate index entry does have its own benefit, so may be its not a
> terrible idea to try that. You're right, it may be expensive to search for
> an existing matching index key for the same root line pointer. But if we
> could somehow short-circuit the more common case, it might still be worth
> trying.

I think it can be done. I could try to write a POC and see how it works.

> The other idea you mentioned is to index intermediate tuples but somehow
> restrict WARM chain following knowing that the subsequent tuples are
> reachable via different index tuple. I see two major problems with that
> approach:
>
> 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose
> the knowledge about key changes between intermediate updates. Without that
> its seems difficult to know when to stop while following chain starting from
> the old index tuple.
>
> 2. The existence of index pointers to intermediate tuples will lead to index
> bloat. This is the same problem that we found in Simon's original proposal
> (discussed internally). None of the intermediate line pointers can be
> vacuumed until the entire chain becomes DEAD. Event if the a duplicate index
> key is inserted for every index, knowing that and reclaiming to the index
> pointers to the original root line pointer, will be difficult.

I don't see the difference it makes for bloat between storing the root
tid and the intermediate tid, but I haven't yet figured out how
vacuuming would work so maybe I have to think about that to realize
the difference.

>> Not to mention the cost of scanning the chain of N tuples N times,
>> making it quadratic - bound by the number of tuples that can fit a
>> page, but still high.
>>
>> Solving that, I believe, will remove all the simplicity of pointing to
>> root tuples.
>>
>
> You're right. But having all indexes point to the root line pointer makes
> things simpler to manage and less buggy. So if we can somehow solve the
> duplicate tuples problem, even at additional cost at update time, it might
> still be worth. I would suggest that we should think more and we can find
> some solution to the problem.

Will try to go down that road and see what can be optimized.

>> I don't really see how you'd do that on yours. You seem to assume
>> finding a particular key-item pointer pair in an index would be
>> efficient, but IMHO that's not the case.
>
>
> That's true. But we could somehow short-circuit the more common pattern,
> that might still be worth. For corner cases, we can fall back to non-HOT
> update and keep things simple. It will still be a huge improvement over what
> we have currently.

Well, my proposed second variant can also work with root pointers, so
I believe that one should be viable. The only thing left is trying to
optimize duplicate path removal for the most common cases.


On Fri, Aug 5, 2016 at 10:52 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>     I don't really see how you'd do that on yours. You seem to assume
>>     finding a particular key-item pointer pair in an index would be
>>     efficient, but IMHO that's not the case.
>>
>>
>> That's true. But we could somehow short-circuit the more common pattern, that
>> might still be worth. For corner cases, we can fall back to non-HOT update and
>> keep things simple. It will still be a huge improvement over what we have
>> currently.
>
> I don't see why it is hard.  Trying to find the index entry for the old
> update row seems odd, and updating an index row seems odd, but skipping
> an index addition for the new row seems simple.  Is the problem
> concurrent activity?  I assumed already had that ability to add to HOT
> chains because we have to lock the update chain.


Think of an index over a 1TB table whose keys have only 2 values: true
and false.

Sure, it's a horrible index. But I've seen things like that, and I've
seen cases when they're useful too.

So, conceptually, for each key you have circa N/2 tids on the index.
nbtree finds the leftmost valid insert point comparing keys, it
doesn't care about tids, so to find the index entries that point to
the page where the new tuple is, you'd have to scan the N/2 tids in
the index, an extremely expensive operation.

Insert is fast, because you just have to add a tid somewhere in that
range, if you want to find a specific tid, it's a wholly different
story.

And that's only for nbtree, other indexams may have their own perks.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 12:25:39PM -0300, Claudio Freire wrote:
> > 2. The existence of index pointers to intermediate tuples will lead to index
> > bloat. This is the same problem that we found in Simon's original proposal
> > (discussed internally). None of the intermediate line pointers can be
> > vacuumed until the entire chain becomes DEAD. Event if the a duplicate index
> > key is inserted for every index, knowing that and reclaiming to the index
> > pointers to the original root line pointer, will be difficult.
> 
> I don't see the difference it makes for bloat between storing the root
> tid and the intermediate tid, but I haven't yet figured out how
> vacuuming would work so maybe I have to think about that to realize
> the difference.

Think of page pruning --- we can't remove a ctid that an index points
to.  The more ctids you point to in a HOT chain, the fewer ctids you can
remove --- that's why we want to point only to the head of the HOT/WARM
chain.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> I don't see why it is hard.  Trying to find the index entry for the old
> update row seems odd, and updating an index row seems odd, but skipping
> an index addition for the new row seems simple.  Is the problem
> concurrent activity?  I assumed already had that ability to add to HOT
> chains because we have to lock the update chain.


Think of an index over a 1TB table whose keys have only 2 values: true
and false.

Sure, it's a horrible index. But I've seen things like that, and I've
seen cases when they're useful too.

So, conceptually, for each key you have circa N/2 tids on the index.
nbtree finds the leftmost valid insert point comparing keys, it
doesn't care about tids, so to find the index entries that point to
the page where the new tuple is, you'd have to scan the N/2 tids in
the index, an extremely expensive operation.


Well, it's always going to be extremely hard to solve for all use cases. This is one such extreme case and we should just give up and do cold update. 

I think we can look at the index type (unique vs non-unique) along with table statistics to find what fraction of column values are distinct and then estimate whether its worthwhile to look for duplicate (key, CTID) or just do a cold update. In addition put some cap of how hard we try once we decide to check for duplicates and give up after we cross that threshold.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Fri, Aug 5, 2016 at 2:26 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>> >
>> >
>> > I don't see why it is hard.  Trying to find the index entry for the old
>> > update row seems odd, and updating an index row seems odd, but skipping
>> > an index addition for the new row seems simple.  Is the problem
>> > concurrent activity?  I assumed already had that ability to add to HOT
>> > chains because we have to lock the update chain.
>>
>>
>> Think of an index over a 1TB table whose keys have only 2 values: true
>> and false.
>>
>> Sure, it's a horrible index. But I've seen things like that, and I've
>> seen cases when they're useful too.
>>
>> So, conceptually, for each key you have circa N/2 tids on the index.
>> nbtree finds the leftmost valid insert point comparing keys, it
>> doesn't care about tids, so to find the index entries that point to
>> the page where the new tuple is, you'd have to scan the N/2 tids in
>> the index, an extremely expensive operation.
>>
>
> Well, it's always going to be extremely hard to solve for all use cases.
> This is one such extreme case and we should just give up and do cold update.
>
> I think we can look at the index type (unique vs non-unique) along with
> table statistics to find what fraction of column values are distinct and
> then estimate whether its worthwhile to look for duplicate (key, CTID) or
> just do a cold update. In addition put some cap of how hard we try once we
> decide to check for duplicates and give up after we cross that threshold.

I don't think bailing out in this case is necessary, and it could be
more common than you'd think. It doesn't need to be this extreme to
cause the issue, it only needs equal-key runs that span more than an
index page, and that could be far more common than the extreme example
I gave.

But doing the WARM chain backtracking and checking for previous
versions with appropriate keys should be enough to support this use
case too, it just needs to be well optimized to avoid seriously
impacting performance on the average case.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
> But doing the WARM chain backtracking and checking for previous
> versions with appropriate keys should be enough to support this use
> case too, it just needs to be well optimized to avoid seriously
> impacting performance on the average case.

Yes, that was an idea I had to --- if the in-page HOT chain already has
the key, we know it is already in the index and we can skip the index
addition.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
> But doing the WARM chain backtracking and checking for previous
> versions with appropriate keys should be enough to support this use
> case too, it just needs to be well optimized to avoid seriously
> impacting performance on the average case.

Yes, that was an idea I had to --- if the in-page HOT chain already has
the key, we know it is already in the index and we can skip the index
addition.


I just don't know how would you do that without delaying/not-doing HOT chain prune. As soon as root and intermediate HOT tuples are gone, all information is lost. You may delay that, but that will defeat the whole purpose. If chains are not pruned in-time, you may not find any free space in the page and end up doing a cold update anyways. But may be I am missing something and Claudio has a different idea.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 11:30:26PM +0530, Pavan Deolasee wrote:
> On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
> 
>     On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
>     > But doing the WARM chain backtracking and checking for previous
>     > versions with appropriate keys should be enough to support this use
>     > case too, it just needs to be well optimized to avoid seriously
>     > impacting performance on the average case.
> 
>     Yes, that was an idea I had to --- if the in-page HOT chain already has
>     the key, we know it is already in the index and we can skip the index
>     addition.
> 
> I just don't know how would you do that without delaying/not-doing HOT chain
> prune. As soon as root and intermediate HOT tuples are gone, all information is
> lost. You may delay that, but that will defeat the whole purpose. If chains are
> not pruned in-time, you may not find any free space in the page and end up
> doing a cold update anyways. But may be I am missing something and Claudio has
> a different idea.

Yes, pruning would be a problem.  :-(

A check only needs to happen when the indexed key changes, right?  So we
are comparing the cost of adding an index key vs. the cost of trying to
find a matching key/ctid in the index.  Seems the later is cheaper, but
it is not obvious.

I think I see Claudio's idea now --- from his diagram, you can have WARM
chains span multiple HOT chains.  What he is doing is creating a new HOT
chain everytime _any_ key changes, and he knows the entire HOT chain has
idential values for all indexes.  He moves from one HOT chain to another
during an index scan by checking if the index value is the same in the
old and new HOT chain (that is the same WARM chain).

This does create more HOT chains where the root ctid cannot be removed,
but it does avoid the index key/ctid check because any entry in the HOT
chain has identical keys.  What this would not handle is when an entire
HOT chain is pruned, as the keys would be gone.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 03:26:15PM -0400, Bruce Momjian wrote:
> > I just don't know how would you do that without delaying/not-doing HOT chain
> > prune. As soon as root and intermediate HOT tuples are gone, all information is
> > lost. You may delay that, but that will defeat the whole purpose. If chains are
> > not pruned in-time, you may not find any free space in the page and end up
> > doing a cold update anyways. But may be I am missing something and Claudio has
> > a different idea.
> 
> Yes, pruning would be a problem.  :-(
> 
> A check only needs to happen when the indexed key changes, right?  So we
> are comparing the cost of adding an index key vs. the cost of trying to
> find a matching key/ctid in the index.  Seems the later is cheaper, but
> it is not obvious.

Here is an illustration of the issue:
CREATE TABLE test (col1 INTEGER, col2 INTEGER, col3 INTEGER);-- index first two columnsCREATE INDEX i_test1 ON test
(col1);CREATEINDEX i_test2 ON test (col2);INSERT INTO test VALUES (1, 7, 20);-- create a HOT chainUPDATE test SET col3
=30;-- change the HOT chain to a WARM chain, changes i_test2 but not i_test1UPDATE test SET col2 = 8;-- we should avoid
addinga col2=7 i_test2 index tuple-- because we already have one;  how do we know that?UPDATE test SET col2 = 7;-- we
shouldsee only one col2=7 i_test2 index tupleSELECT * FROM test WHERE col2 = 7; col1 | col2 | col3------+------+------
 1 |    7 |   30
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Fri, Aug 5, 2016 at 4:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Aug  5, 2016 at 11:30:26PM +0530, Pavan Deolasee wrote:
>> On Fri, Aug 5, 2016 at 11:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>>     On Fri, Aug  5, 2016 at 02:43:37PM -0300, Claudio Freire wrote:
>>     > But doing the WARM chain backtracking and checking for previous
>>     > versions with appropriate keys should be enough to support this use
>>     > case too, it just needs to be well optimized to avoid seriously
>>     > impacting performance on the average case.
>>
>>     Yes, that was an idea I had to --- if the in-page HOT chain already has
>>     the key, we know it is already in the index and we can skip the index
>>     addition.
>>
>> I just don't know how would you do that without delaying/not-doing HOT chain
>> prune. As soon as root and intermediate HOT tuples are gone, all information is
>> lost. You may delay that, but that will defeat the whole purpose. If chains are
>> not pruned in-time, you may not find any free space in the page and end up
>> doing a cold update anyways. But may be I am missing something and Claudio has
>> a different idea.
>
> Yes, pruning would be a problem.  :-(
>
> A check only needs to happen when the indexed key changes, right?  So we
> are comparing the cost of adding an index key vs. the cost of trying to
> find a matching key/ctid in the index.  Seems the later is cheaper, but
> it is not obvious.
>
> I think I see Claudio's idea now --- from his diagram, you can have WARM
> chains span multiple HOT chains.  What he is doing is creating a new HOT
> chain everytime _any_ key changes, and he knows the entire HOT chain has
> idential values for all indexes.  He moves from one HOT chain to another
> during an index scan by checking if the index value is the same in the
> old and new HOT chain (that is the same WARM chain).
>
> This does create more HOT chains where the root ctid cannot be removed,
> but it does avoid the index key/ctid check because any entry in the HOT
> chain has identical keys.  What this would not handle is when an entire
> HOT chain is pruned, as the keys would be gone.

I believe the only solution is to use bitmap index scans with WARM indexes.

Doing so simplifies things considerably.

For one, it isolates the new functionality and makes it opt-in.

Then, WARM indexes can always point to the root of the HOT chain, and
be ignored for satisfies HOT. The planner will only consider bitmap
index scans with WARM indexes, and always issue a recheck.

The bitmap is needed to remove duplicate item pointers, and the
recheck, as usual, to filter out unwanted versions.

On update, one only needs a pointer to the HOT head, which would
arguably be at hand, or at worst reachable by backtracking t_ctid
pointers. No key checks of any kind, and no weird flags required.

I don't see another solution to the information loss when pruning
whole HOT chains.

Suppose we start with this situation:

lp:  1  2h 3h 4w 5h 6h 7h
k1:  a  a  a  b  b  a  a
k2:  c  c  c  c  c  c  c

So we have 2 HOT chains, 1-3, 4-7, two indexes, one never updated, the
other with an update and then returning to the same key.

The first index will have two index entries, one for key a, another
for b, both pointing at 1.

a -> (p,1)
b -> (p,1)

Now versions 1-3 are dead, so we want to prune them.

We end up with a redirect on the LP for 1 -> 4, LPs 2 and 3 unused
because they were HOT
   r4  u  u
lp:  1  2  3  4w 5h 6  7h
k1:  _  _  _  b  b  a  a
k2:  _  _  _  c  c  c  c

Now suppose versions 4 and 5 die. So we end up with:
   r6  u  u  u  u
lp:  1  2  3  4  5  6  7h
k1:  _  _  _  _  _  a  a
k2:  _  _  _  _  _  c  c

We just forgot that "b" was in k1, yet we still have an index entry
pointing to the chain.

Should an update come:
   r6  u  u  u  u
lp:  1  2  3  4  5  6  7h 8?
k1:  _  _  _  _  _  a  a  b
k2:  _  _  _  _  _  c  c  c

Is an index entry b->(p,1) needed for 8?

According to the logic, it is, k1 would need a new entry b -> (p,1),
but the entry is already there. It's just unverifiable in reasonable
time.

So a simpler solution is to always add such entries, and let the
bitmap index scan filter duplicates.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
> > This does create more HOT chains where the root ctid cannot be removed,
> > but it does avoid the index key/ctid check because any entry in the HOT
> > chain has identical keys.  What this would not handle is when an entire
> > HOT chain is pruned, as the keys would be gone.
> 
> I believe the only solution is to use bitmap index scans with WARM indexes.

Let me back up and explain the benefits we get from allowing HOT chains
to be WARM:

*  no index entries for WARM indexes whose values don't change
*  improved pruning because the HOT/WARM chains can be longer because we  don't have to break chains for index changes

While I realize bitmap indexes would allow us to remove duplicate index
entries, it removes one of the two benefits of WARM, and it causes every
index read to be expensive --- I can't see how this would be a win over
doing the index check on writes.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
>> > This does create more HOT chains where the root ctid cannot be removed,
>> > but it does avoid the index key/ctid check because any entry in the HOT
>> > chain has identical keys.  What this would not handle is when an entire
>> > HOT chain is pruned, as the keys would be gone.
>>
>> I believe the only solution is to use bitmap index scans with WARM indexes.
>
> Let me back up and explain the benefits we get from allowing HOT chains
> to be WARM:
>
> *  no index entries for WARM indexes whose values don't change
> *  improved pruning because the HOT/WARM chains can be longer because we
>    don't have to break chains for index changes
>
> While I realize bitmap indexes would allow us to remove duplicate index
> entries, it removes one of the two benefits of WARM, and it causes every
> index read to be expensive --- I can't see how this would be a win over
> doing the index check on writes.

But the index check could be prohibitely expensive.

So we're back to bailing out?

When an update comes and we're considering WARM, we need to query the
indexes, each one, and if one index cannot quickly verify the
existence of an entry for the root tid for the key, bail out from
WARM.

That may prevent the benefits of WARM in a lot of real world cases.
You just need one less-than-unique index to disable WARM for all
updates, and IIRC those are common. You'd pay the CPU price of WARM
without any of the benefits.

I don't think I have a single table in my databases that don't exhibit
this behavior.



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Fri, Aug 5, 2016 at 9:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
>>> > This does create more HOT chains where the root ctid cannot be removed,
>>> > but it does avoid the index key/ctid check because any entry in the HOT
>>> > chain has identical keys.  What this would not handle is when an entire
>>> > HOT chain is pruned, as the keys would be gone.
>>>
>>> I believe the only solution is to use bitmap index scans with WARM indexes.
>>
>> Let me back up and explain the benefits we get from allowing HOT chains
>> to be WARM:
>>
>> *  no index entries for WARM indexes whose values don't change
>> *  improved pruning because the HOT/WARM chains can be longer because we
>>    don't have to break chains for index changes
>>
>> While I realize bitmap indexes would allow us to remove duplicate index
>> entries, it removes one of the two benefits of WARM, and it causes every
>> index read to be expensive --- I can't see how this would be a win over
>> doing the index check on writes.
>
> But the index check could be prohibitely expensive.

Well... it could be made efficient for the case of nbtree with a format change.

If nbtree sorted by tid as well, for instance.

But an upgrade there would involve a reindex before WARM can be applied.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 09:21:49PM -0300, Claudio Freire wrote:
> On Fri, Aug 5, 2016 at 9:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, Aug  5, 2016 at 07:51:05PM -0300, Claudio Freire wrote:
> >> > This does create more HOT chains where the root ctid cannot be removed,
> >> > but it does avoid the index key/ctid check because any entry in the HOT
> >> > chain has identical keys.  What this would not handle is when an entire
> >> > HOT chain is pruned, as the keys would be gone.
> >>
> >> I believe the only solution is to use bitmap index scans with WARM indexes.
> >
> > Let me back up and explain the benefits we get from allowing HOT chains
> > to be WARM:
> >
> > *  no index entries for WARM indexes whose values don't change
> > *  improved pruning because the HOT/WARM chains can be longer because we
> >    don't have to break chains for index changes
> >
> > While I realize bitmap indexes would allow us to remove duplicate index
> > entries, it removes one of the two benefits of WARM, and it causes every
> > index read to be expensive --- I can't see how this would be a win over
> > doing the index check on writes.
> 
> But the index check could be prohibitely expensive.

True, but I am afraid the bitmap handling on reads will be worse.

> So we're back to bailing out?
> 
> When an update comes and we're considering WARM, we need to query the
> indexes, each one, and if one index cannot quickly verify the
> existence of an entry for the root tid for the key, bail out from
> WARM.

Yes, it seems we either find the entry fast and avoid the index
addition, or don't find it quickly and use a non-HOT, non-WARM update.
The problem is that you have to do this for multiple indexes, and if you
quickly find you need to add an entry to the first index, when you get
to the second one you can't easily bail out and go with a non-HOT,
non-WARM update.  I suppose we could bail out of a long index search if
there is only one index with a changed key.

Here's how I underestand it --- if you are looking for a key that has
only a few index entries, it will be fast to find of the key/ctid is
listed.  If the index has many index entries for the key, it will be
expensive to find if there is a matching key/ctid, but a read-only-query
index lookup for that key will be expensive too, whether you use the
bitmap scan or not.  And, effectively, if we bail out and decide to go
with a non-HOT, non-WARM update, we are making the index even bigger.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 09:02:39PM -0400, Bruce Momjian wrote:
> Yes, it seems we either find the entry fast and avoid the index
> addition, or don't find it quickly and use a non-HOT, non-WARM update.
> The problem is that you have to do this for multiple indexes, and if you
> quickly find you need to add an entry to the first index, when you get
> to the second one you can't easily bail out and go with a non-HOT,
> non-WARM update.  I suppose we could bail out of a long index search if
> there is only one index with a changed key.
> 
> Here's how I understand it --- if you are looking for a key that has
> only a few index entries, it will be fast to find of the key/ctid is
> listed.  If the index has many index entries for the key, it will be
> expensive to find if there is a matching key/ctid, but a read-only-query
> index lookup for that key will be expensive too, whether you use the
> bitmap scan or not.  And, effectively, if we bail out and decide to go
> with a non-HOT, non-WARM update, we are making the index even bigger.

So to summarize again:

o  chains start out as HOT
o  they become WARM when some indexes change and others don't
o  for multiple index changes, we have to check all indexes  for key/ctid matches
o  for single index changes, we can fail HOT and create a new  non-HOT/WARM tuple if there are too many index matches
o  99% of index checks will not find a key/ctid match

I am not sure how to optimize an index non-match.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Fri, Aug  5, 2016 at 09:40:35PM -0400, Bruce Momjian wrote:
> So to summarize again:
> 
> o  chains start out as HOT
> o  they become WARM when some indexes change and others don't
> o  for multiple index changes, we have to check all indexes
>    for key/ctid matches
> o  for single index changes, we can fail HOT and create a new
>    non-HOT/WARM tuple if there are too many index matches
> o  99% of index checks will not find a key/ctid match

I think a WARM chain where the the head ctid isn't LP_REDIRECT wasn't
pruned, so here is an updated list:

o  chains start out as HOT
o  they become WARM when some indexes change and others don't
o  if WARM chain head is not LP_REDIRECT, check existing chain for key  matches
o  if WARM chain head is LP_REDIRECT:o  for single index changes, we can fail HOT and create a new   non-HOT/WARM tuple
ifthere are too many index matcheso  for multiple index changes, we have to check all indexes   for key/ctid matcheso
99%of index checks will not find a key/ctid match
 

So, we are only checking the index if the WARM chain was pruned, and we
can bail out if there is only one index changed.  This is looking more
doable.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Sat, Aug 6, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Aug  5, 2016 at 09:40:35PM -0400, Bruce Momjian wrote:
> So to summarize again:
>
> o  chains start out as HOT
> o  they become WARM when some indexes change and others don't
> o  for multiple index changes, we have to check all indexes
>    for key/ctid matches
> o  for single index changes, we can fail HOT and create a new
>    non-HOT/WARM tuple if there are too many index matches
> o  99% of index checks will not find a key/ctid match

I think a WARM chain where the the head ctid isn't LP_REDIRECT wasn't
pruned, so here is an updated list:

o  chains start out as HOT
o  they become WARM when some indexes change and others don't
o  if WARM chain head is not LP_REDIRECT, check existing chain for key
   matches
o  if WARM chain head is LP_REDIRECT:
        o  for single index changes, we can fail HOT and create a new
           non-HOT/WARM tuple if there are too many index matches
        o  for multiple index changes, we have to check all indexes
           for key/ctid matches
        o  99% of index checks will not find a key/ctid match

So, we are only checking the index if the WARM chain was pruned, and we
can bail out if there is only one index changed.  This is looking more
doable.

The duplicate tuples problem that we are focusing on, happens when an index already has two or index tuples pointing to the same root tuple/lp. When it's time to insert third index tuple, we must not insert a duplicate (key, CTID) tuple. I've a design where we can track which columns (we are interested only in the columns on which indexes use) were ever changed in the WARM chain. We allow one change for every index column, but the second change will require a duplicate lookup. This is still quite an improvement, the cold updates may reduce by at least more than 50% already, but someone can argue that this does not handle the case where same index column is repeatedly updated.

If we need to find an efficient way to convert WARM chains back to HOT, which will happen soon when the old index tuple retires, the system can attain a stable state, not for all but many use cases.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Sat, Aug  6, 2016 at 10:08:47AM +0530, Pavan Deolasee wrote:
>     So, we are only checking the index if the WARM chain was pruned, and we
>     can bail out if there is only one index changed.  This is looking more
>     doable.
> 
> 
> The duplicate tuples problem that we are focusing on, happens when an index
> already has two or index tuples pointing to the same root tuple/lp. When it's
> time to insert third index tuple, we must not insert a duplicate (key, CTID)
> tuple. I've a design where we can track which columns (we are interested only
> in the columns on which indexes use) were ever changed in the WARM chain. We
> allow one change for every index column, but the second change will require a
> duplicate lookup. This is still quite an improvement, the cold updates may
> reduce by at least more than 50% already, but someone can argue that this does
> not handle the case where same index column is repeatedly updated.

Yes, I was thinking of that too.  We can use LP_REDIRECT ctid's lp_len,
which is normally all zeros and has 15 unused bits, to record which of
the first 15 columns have been had changed tuples pruned.  A simpler
idea would be to use just one bit to record if WARM tuples have been
pruned, because if we prune HOT tuples, we can still traverse the chain.

A more serious concern is index creation.  If we already have a WARM
chain and create an index, we can't assume that all indexes will have
entries for change in the WARM chain.  One fix would be for index
creation to walk warm chains and and add index entries for all changed
columns.  We don't need to that now because we assume the index will
only be used by new transactions that can't see the old chain.  Another
idea would be to look at the index xmin and determine if the index was
recording tuples in the chain.

(FYI, in your idea above, I don't think you can track which _indexes_
had changes because index creation will need to know information on
column changes that weren't recorded at WARM chain creation time, hence
I am tracking column changes, not index changes.)

Another issue is that while index walking is usually fast, there are
pathological cases.  For example, if a column value is 80% of all rows,
the optimizer would use statistics to avoid using the index for that
constant, but if someone changes a WARM chain to use that value, we
would need to read 80% of the index to find if the key/ctid exists, and
in most cases it will not, so we have to read the entire thing.

We do have the fallback to create a non-HOT, non-WARM tuple for this
case for single-index changes.

> If we need to find an efficient way to convert WARM chains back to HOT, which
> will happen soon when the old index tuple retires, the system can attain a
> stable state, not for all but many use cases.

I don't see how that is possible, except perhaps by vacuum.

OK, now I am less certain.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Sat, Aug  6, 2016 at 10:51:21AM -0400, Bruce Momjian wrote:
> > If we need to find an efficient way to convert WARM chains back to HOT, which
> > will happen soon when the old index tuple retires, the system can attain a
> > stable state, not for all but many use cases.
> 
> I don't see how that is possible, except perhaps by vacuum.
> 
> OK, now I am less certain.

OK, crazy idea time --- what if we only do WARM chain additions when all
indexed values are increasing (with NULLs higher than all values)?  (If
a key is always-increasing, it can't match a previous value in the
chain.)  That avoids the problem of having to check the WARM chain,
except for the previous tuple, and the problem of pruning removing
changed rows.  It avoids having to check the index for matching key/ctid
values, and it prevents CREATE INDEX from having to index WARM chain
values.

Any decreasing value would cause a normal tuple be created.

Sure, it is a limitation, but it removes almost all of the the overhead
of WARM updates.  In fact, even the key comparisons of old/new tuples 
will return -1, 0, or 1, and from there you can know if the new key is
unchanged, or increasing.  I assume we already do this comparison to
determine if we can do a HOT update.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Sun, Aug  7, 2016 at 10:49:45AM -0400, Bruce Momjian wrote:
> OK, crazy idea time --- what if we only do WARM chain additions when all
> indexed values are increasing (with NULLs higher than all values)?  (If
> a key is always-increasing, it can't match a previous value in the
> chain.)  That avoids the problem of having to check the WARM chain,
> except for the previous tuple, and the problem of pruning removing
> changed rows.  It avoids having to check the index for matching key/ctid
> values, and it prevents CREATE INDEX from having to index WARM chain
> values.
> 
> Any decreasing value would cause a normal tuple be created.

Actually, when we add the first WARM tuple, we can mark the HOT/WARM
chain as either all-incrementing or all-decrementing.  We would need a
bit to indicate that.

Also, it would be possible for keys involved in multi-key indexes to not
match the direction of the chain as long as keys earlier in the index
matched, e.g. key (1,5,6) would be less than (2,1,1) since 1 < 2, even
though 5 > 1.  I am not sure it would be worth detecting this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Amit Kapila
Date:
On Fri, Aug 5, 2016 at 9:57 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> >
>> > OK, that's a lot of text, and I am confused.  Please tell me the
>> > advantage of having an index point to a string of HOT chains, rather
>> > than a single one?  Is the advantage that the index points into the
>> > middle of the HOT chain rather than at the beginning?  I can see some
>> > value to that, but having more ctid HOT headers that can't be removed
>> > except by VACUUM seems bad, plus the need for index rechecks as you
>> > cross to the next HOT chain seems bad.
>> >
>> > The value of WARM is to avoid index bloat --- right now we traverse the
>> > HOT chain on a single page just fine with no complaints about speed so I
>> > don't see a value to optimizing that traversal, and I think the key
>> > rechecks and ctid bloat will make it all much slower.
>> >
>> > It also seems much more complicated.
>>
>> The point is avoiding duplicate rows in the output of index scans.
>>
>> I don't think you can avoid it simply by applying index condition
>> rechecks as the original proposal implies, in this case:
>>
>> CREATE TABLE t (id integer not null primary key, someid integer, dat
>> integer);
>> CREATE INDEX i1 ON t (someid);
>>
>> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
>> UPDATE t SET dat = dat + 1 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 2 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
>> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
>> SELECT * FROM t WHERE someid = 2;
>>
>> That, I believe, will cause the problematic chains where the condition
>> recheck passes both times the last visible tuple is visited during the
>> scan. It will return more than one tuple when in reality there is only
>> one.
>
>
> Hmm. That seems like a real problem. The only way to address that is to
> ensure that for a given WARM chain and given index key, there must exists
> only a single index entry. There can and will be multiple entries if the
> index key changes, but if the index key changes to the original value (or
> any other value already in the WARM chain) again, we must not add another
> index entry. Now that does not seem like a very common scenario and skipping
> a duplicate index entry does have its own benefit, so may be its not a
> terrible idea to try that. You're right, it may be expensive to search for
> an existing matching index key for the same root line pointer. But if we
> could somehow short-circuit the more common case, it might still be worth
> trying.
>

I think here expensive part would be recheck for the cases where the
index value is changed to a different value (value which doesn't exist
in WARM chain).   You anyway have to add the new entry (key,TID) in
index, but each time traversing the WARM chain would be an additional
effort.  For cases, where there are just two index entries and one
them is being updated, it might regress as compare to what we do now.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Mon, Aug  8, 2016 at 06:34:46PM +0530, Amit Kapila wrote:
> I think here expensive part would be recheck for the cases where the
> index value is changed to a different value (value which doesn't exist
> in WARM chain).   You anyway have to add the new entry (key,TID) in
> index, but each time traversing the WARM chain would be an additional
> effort.  For cases, where there are just two index entries and one
> them is being updated, it might regress as compare to what we do now.

Yes, I think the all-increment or all-decrement WARM chain is going to
be the right approach.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Sun, Aug  7, 2016 at 12:55:01PM -0400, Bruce Momjian wrote:
> On Sun, Aug  7, 2016 at 10:49:45AM -0400, Bruce Momjian wrote:
> > OK, crazy idea time --- what if we only do WARM chain additions when all
> > indexed values are increasing (with NULLs higher than all values)?  (If
> > a key is always-increasing, it can't match a previous value in the
> > chain.)  That avoids the problem of having to check the WARM chain,
> > except for the previous tuple, and the problem of pruning removing
> > changed rows.  It avoids having to check the index for matching key/ctid
> > values, and it prevents CREATE INDEX from having to index WARM chain
> > values.
> > 
> > Any decreasing value would cause a normal tuple be created.
> 
> Actually, when we add the first WARM tuple, we can mark the HOT/WARM
> chain as either all-incrementing or all-decrementing.  We would need a
> bit to indicate that.

FYI, is see at least two available tuple header bits here, 0x0800 and
0x1000:
/* * information stored in t_infomask2: */#define HEAP_NATTS_MASK         0x07FF  /* 11 bits for number of attributes
*//*bits 0x1800 are available */#define HEAP_KEYS_UPDATED       0x2000  /* tuple was updated and key cols
                         * modified, or tuple deleted */#define HEAP_HOT_UPDATED        0x4000  /* tuple was
HOT-updated*/#define HEAP_ONLY_TUPLE         0x8000  /* this is heap-only tuple */#define HEAP2_XACT_MASK
0xE000 /* visibility-related bits */
 

My guess is we would need one bit to mark a WARM chain, and perhaps
reuse obsolete pre-9.0 HEAP_MOVED_OFF to indicate increment-only or
decrement-only.  These flags have to be passed to all forward tuples in
the chain so an addition to the chain knows quickly if it is WARM chain,
and which direction.

We can't use the bits LP_REDIRECT lp_len because we need to create WARM
chains before pruning, and I don't think walking the pre-pruned chain is
worth it.  (As I understand HOT, LP_REDIRECT is only created during
pruning.)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Mon, Aug 8, 2016 at 11:08 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Sun, Aug  7, 2016 at 12:55:01PM -0400, Bruce Momjian wrote:
> On Sun, Aug  7, 2016 at 10:49:45AM -0400, Bruce Momjian wrote:
> > OK, crazy idea time --- what if we only do WARM chain additions when all
> > indexed values are increasing (with NULLs higher than all values)?  (If
> > a key is always-increasing, it can't match a previous value in the
> > chain.)  That avoids the problem of having to check the WARM chain,
> > except for the previous tuple, and the problem of pruning removing
> > changed rows.  It avoids having to check the index for matching key/ctid
> > values, and it prevents CREATE INDEX from having to index WARM chain
> > values.
> >
> > Any decreasing value would cause a normal tuple be created.
>
> Actually, when we add the first WARM tuple, we can mark the HOT/WARM
> chain as either all-incrementing or all-decrementing.  We would need a
> bit to indicate that.

FYI, is see at least two available tuple header bits here, 0x0800 and
0x1000:

        /*
         * information stored in t_infomask2:
         */
        #define HEAP_NATTS_MASK         0x07FF  /* 11 bits for number of attributes */
        /* bits 0x1800 are available */
        #define HEAP_KEYS_UPDATED       0x2000  /* tuple was updated and key cols
                                                 * modified, or tuple deleted */
        #define HEAP_HOT_UPDATED        0x4000  /* tuple was HOT-updated */
        #define HEAP_ONLY_TUPLE         0x8000  /* this is heap-only tuple */

        #define HEAP2_XACT_MASK         0xE000  /* visibility-related bits */

What I am currently trying to do is to reuse at least the BlockNumber field in t_ctid. For HOT/WARM chains, that field is really unused (except the last tuple when regular update needs to store block number of the new block). My idea is to use one free bit in t_infomask2 to tell us that t_ctid is really not a CTID, but contains new information (for pg_upgrade's sake). For example, one bit in bi_hi can tell us that this is the last tuple in the chain (information today conveyed by t_ctid pointing to self). Another bit can tell us that this tuple was WARM updated. We will still have plenty of bits to store additional information about WARM chains.


My guess is we would need one bit to mark a WARM chain, and perhaps
reuse obsolete pre-9.0 HEAP_MOVED_OFF to indicate increment-only or
decrement-only. 

I am not convinced that the checking for increment/decrement adds a lot of value. Sure, we might be able to address some typical work load, but is that really a common use case? Instead, what I am looking at storing a bitmap which shows us which table columns have changed so far in the WARM chain. We only have limited bits, so we can track only limited columns. This will help the cases where different columns are updated, but not so much if the same column is updated repeatedly.

What will help, and something I haven't yet applied any thoughts, is when we can turn WARM chains back to HOT by removing stale index entries.

Some heuristics and limits on amount of work done to detect duplicate index entries will help too.
 

We can't use the bits LP_REDIRECT lp_len because we need to create WARM
chains before pruning, and I don't think walking the pre-pruned chain is
worth it.  (As I understand HOT, LP_REDIRECT is only created during
pruning.)


That's correct. But lp_len provides us some place to stash information from heap tuples when they are pruned.
 
Thanks,
Pavan

 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>>
>> My guess is we would need one bit to mark a WARM chain, and perhaps
>> reuse obsolete pre-9.0 HEAP_MOVED_OFF to indicate increment-only or
>> decrement-only.
>
>
> I am not convinced that the checking for increment/decrement adds a lot of
> value. Sure, we might be able to address some typical work load, but is that
> really a common use case? Instead, what I am looking at storing a bitmap
> which shows us which table columns have changed so far in the WARM chain. We
> only have limited bits, so we can track only limited columns. This will help
> the cases where different columns are updated, but not so much if the same
> column is updated repeatedly.
>
> What will help, and something I haven't yet applied any thoughts, is when we
> can turn WARM chains back to HOT by removing stale index entries.
>
> Some heuristics and limits on amount of work done to detect duplicate index
> entries will help too.

I think I prefer a more thorough approach.

Increment/decrement may get very complicated with custom opclasses,
for instance. A column-change bitmap won't know how to handle
funcional indexes, etc.

What I intend to try, is modify btree to allow efficient search of a
key-ctid pair, by adding the ctid to the sort order. Only inner pages
need to be changed, to include the ctid in the pointers, leaf pages
already have the ctid there, so they don't need any kind of change. A
bit in the metapage could indicate whether it's a new format or an old
one, and yes, only new indices will be able to use WARM.

But with efficient key-ctid searches, you handle all cases, and not
just a few common ones.

A lot of the work for the key-ctid search can be shared with the
update itself, so it's probably not such a big performance impact.



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Mon, Aug  8, 2016 at 11:22:49PM +0530, Pavan Deolasee wrote:
> What I am currently trying to do is to reuse at least the BlockNumber field in
> t_ctid. For HOT/WARM chains, that field is really unused (except the last tuple
> when regular update needs to store block number of the new block). My idea is
> to use one free bit in t_infomask2 to tell us that t_ctid is really not a CTID,
> but contains new information (for pg_upgrade's sake). For example, one bit in
> bi_hi can tell us that this is the last tuple in the chain (information today
> conveyed by t_ctid pointing to self). Another bit can tell us that this tuple
> was WARM updated. We will still have plenty of bits to store additional
> information about WARM chains.

Yes, that works too, though it probably only makes sense if we can make
use of more than one bit in bi_hi.

>     My guess is we would need one bit to mark a WARM chain, and perhaps
>     reuse obsolete pre-9.0 HEAP_MOVED_OFF to indicate increment-only or
>     decrement-only. 
> 
> 
> I am not convinced that the checking for increment/decrement adds a lot of
> value. Sure, we might be able to address some typical work load, but is that
> really a common use case? Instead, what I am looking at storing a bitmap which

I have no idea, but it guarantees that the first WARM update works
because there is no direction set.  Then, if the direction changes, you
create a new chain and hope the changes stay in that direction for a
while.

> shows us which table columns have changed so far in the WARM chain. We only
> have limited bits, so we can track only limited columns. This will help the
> cases where different columns are updated, but not so much if the same column
> is updated repeatedly.

Well, I don't think in 15 bits we have enough space to store many column
numbers, let alone column numbers and _values_.  You would need four
bits (1-16) to exceed what you can store in a simple bitmap of the first
15 columns.  If you want to extend that range, you can use 8 bits (2^8)
to record one of the first 256 column numbers.  You could do 7 bits (2^7
= 128) and use another bit per column to record the increment/decrement
direction, meaning that repeated changes to the same column in the same
direction would be allowed in the same WARM chain.  I think it is more
likely that the same column is going to be changed in the same WARM
chain, than changes in different columns.

Frankly, with only 16 bits, I can't see how recording specific columns
really buys us much because we have to limit the column number storage.
Plus, if a column changes twice, you need to create a new WARM chain
unless you record the increment/decrement direction.

What we could do is to record the first two changed columns in the
16-bit field, with direction, then record a bit for direction of all
columns not in the first two that change.  That allows you to record
three sets of directions in the same HOT chain.  It does not allow you
to change the direction of any column previously recorded in the WARM
chain.

You could say you are going to scan the WARM chain for changes, but that
limits pruning.  You could try storing just the changes for pruned rows,
but then you are going to have a lot of overhead scanning the WARM chain
looking for changes.

It would be interesting to store the change _direction_ for the first 15
columns in the bitmap, and then use the 16th bit for the rest of the
columns, but I can't figure out how to record which bits are set with a
direction and which are the default/unused.  You really need two bits
per column, so that only records the first seven or eight columns.

> What will help, and something I haven't yet applied any thoughts, is when we
> can turn WARM chains back to HOT by removing stale index entries.

I can't see how we can ever do that because we have multiple indexes
pointing to the chain, and keys that might be duplicated if we switched
to HOT.  Seems only VACUUM can fix that.

> Some heuristics and limits on amount of work done to detect duplicate index
> entries will help too.

Yeah, I have kind of given up on that.

> > We can't use the bits LP_REDIRECT lp_len because we need to create WARM
> > chains before pruning, and I don't think walking the pre-pruned chain is
> > worth it.  (As I understand HOT, LP_REDIRECT is only created during
> > pruning.)
> 
> That's correct. But lp_len provides us some place to stash information from
> heap tuples when they are pruned.

Right.  However, I see storing information at prune time as only useful
if you are willing to scan the chain, and frankly, I have given up on
chain scanning (with column comparisons) as being too expensive for 
its limited value.  What we could do is to use the LP_REDIRECT lp_len to
store information of two more changed column numbers, with directions. 
However, once you store one bit that records the direction of all other
columns, you can't go back and record the changes, unless you do a chain
scan at prune time.

You have to wonder how much complexity is reasonable for this.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Mon, Aug  8, 2016 at 03:36:12PM -0300, Claudio Freire wrote:
> I think I prefer a more thorough approach.
> 
> Increment/decrement may get very complicated with custom opclasses,
> for instance. A column-change bitmap won't know how to handle
> funcional indexes, etc.

Hot does HOT handle them?  If it does, it has to look at all the columns
passes to the expression index, so it seems to be the same amount of
work.

> What I intend to try, is modify btree to allow efficient search of a
> key-ctid pair, by adding the ctid to the sort order. Only inner pages
> need to be changed, to include the ctid in the pointers, leaf pages
> already have the ctid there, so they don't need any kind of change. A
> bit in the metapage could indicate whether it's a new format or an old
> one, and yes, only new indices will be able to use WARM.
> 
> But with efficient key-ctid searches, you handle all cases, and not
> just a few common ones.

True.  I am worried about page spills caused by having to insert a rows
into an existing page and and index entry having to be pushed to an
adjacent page to maintain ctid index order.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Mon, Aug 8, 2016 at 5:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug  8, 2016 at 03:36:12PM -0300, Claudio Freire wrote:
>> I think I prefer a more thorough approach.
>>
>> Increment/decrement may get very complicated with custom opclasses,
>> for instance. A column-change bitmap won't know how to handle
>> funcional indexes, etc.
>
> Hot does HOT handle them?  If it does, it has to look at all the columns
> passes to the expression index, so it seems to be the same amount of
> work.

The way HOT handles it IIRC induces false positives (as in: HOT is
forbidden when it might be ok) because it may flag as changed
expressions that don't change. Think date_trunc('day', timestamp), if
the timestamp changes within a day, there's no change in reality, but
the column changed so HOT is forbidden. But the logic won't create
false negatives (allow HOT when it's not allowed).

But for WARM that might be the case, because the guarantee that you
need is that no key already present in the WARM chain will be
re-added. An index over (a - b) could have both a and b in increasing
order yet repeat keys and violate the WARM chain invariant.

>> What I intend to try, is modify btree to allow efficient search of a
>> key-ctid pair, by adding the ctid to the sort order. Only inner pages
>> need to be changed, to include the ctid in the pointers, leaf pages
>> already have the ctid there, so they don't need any kind of change. A
>> bit in the metapage could indicate whether it's a new format or an old
>> one, and yes, only new indices will be able to use WARM.
>>
>> But with efficient key-ctid searches, you handle all cases, and not
>> just a few common ones.
>
> True.  I am worried about page spills caused by having to insert a rows
> into an existing page and and index entry having to be pushed to an
> adjacent page to maintain ctid index order.

I don't think it would be a concern, as inserting a serial column shouldn't.

Maybe some pages will split that wouldn't have, but the split will add
room to the new leaf pages and some splits that would have split
before won't happen afterwards.

Think of it as equivalent to adding the oid to the index - it's some
immutable attribute of the row being inserted, the fact that it is a
tid shouldn't make a difference to the performance of the btree, aside
from the extra comparisons perhaps.



Re: Heap WARM Tuples - Design Draft

From
Jim Nasby
Date:
On 8/8/16 3:19 PM, Bruce Momjian wrote:
>> What will help, and something I haven't yet applied any thoughts, is when we
>> > can turn WARM chains back to HOT by removing stale index entries.
> I can't see how we can ever do that because we have multiple indexes
> pointing to the chain, and keys that might be duplicated if we switched
> to HOT.  Seems only VACUUM can fix that.

Are these changes still predicated on being able to re-find all index 
entries by key value? If so, that makes incremental vacuums practical, 
perhaps eliminating a lot of these issues.

>>> > > We can't use the bits LP_REDIRECT lp_len because we need to create WARM
>>> > > chains before pruning, and I don't think walking the pre-pruned chain is
>>> > > worth it.  (As I understand HOT, LP_REDIRECT is only created during
>>> > > pruning.)
>> >
>> > That's correct. But lp_len provides us some place to stash information from
>> > heap tuples when they are pruned.
> Right.  However, I see storing information at prune time as only useful
> if you are willing to scan the chain, and frankly, I have given up on
> chain scanning (with column comparisons) as being too expensive for
> its limited value.

What if some of this work happened asynchronously? I'm thinking 
something that runs through shared_buffers in front of bgwriter.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Tue, Aug 9, 2016 at 8:19 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 8/8/16 3:19 PM, Bruce Momjian wrote:
>>>
>>> What will help, and something I haven't yet applied any thoughts, is when
>>> we
>>> > can turn WARM chains back to HOT by removing stale index entries.
>>
>> I can't see how we can ever do that because we have multiple indexes
>> pointing to the chain, and keys that might be duplicated if we switched
>> to HOT.  Seems only VACUUM can fix that.
>
>
> Are these changes still predicated on being able to re-find all index
> entries by key value? If so, that makes incremental vacuums practical,
> perhaps eliminating a lot of these issues.
>
>>>> > > We can't use the bits LP_REDIRECT lp_len because we need to create
>>>> > > WARM
>>>> > > chains before pruning, and I don't think walking the pre-pruned
>>>> > > chain is
>>>> > > worth it.  (As I understand HOT, LP_REDIRECT is only created during
>>>> > > pruning.)
>>>
>>> >
>>> > That's correct. But lp_len provides us some place to stash information
>>> > from
>>> > heap tuples when they are pruned.
>>
>> Right.  However, I see storing information at prune time as only useful
>> if you are willing to scan the chain, and frankly, I have given up on
>> chain scanning (with column comparisons) as being too expensive for
>> its limited value.
>
>
> What if some of this work happened asynchronously? I'm thinking something
> that runs through shared_buffers in front of bgwriter.

If one can find key-ctid pairs efficiently in the index, this can be
done during WARM pruning (ie: during updates, when we're already doing
the index lookups anyway).

Suppose you have the following chain:

index  0   1   2   3   4
k1      a   a   a   a   a
k2      a   a   b   a   a
i1       ^
i2       ^        ^
hot          *        *

If versions 0-2 die, pruning can free 1 (it's HOT), and leave
redirects in 0 and 2:

index  0   1   2   3   4
k1      a    .   a   a   a
k2      a    .   b   a   a
i1       ^
i2       ^         ^
lp      r2   u   r3
hot                    *

Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
and in fact we probably did so already as part of the update logic, we
can remove the first redirect by pointing indexes to 2:

index  0   1   2   3   4
k1      .    .   a   a   a
k2      .    .   b   a   a
i1       ^
i2       ^         ^
lp       u   u   r3
hot                    *

So WARM pruning would have to happen just prior to adding a WARM tuple
to be able to reuse all the index lookup work to do the index pruning
with the writing.

The indexam interface will get considerably more complex in order to
do this, however. So perhaps it would be ok to do 2 independent
lookups instead? I'm undecided yet.

But one way or the other, pruning can happen early and incrementally.



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Tue, Aug 9, 2016 at 8:44 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> index  0   1   2   3   4
> k1      .    .   a   a   a
> k2      .    .   b   a   a
> i1       ^
> i2       ^         ^
> lp       u   u   r3
> hot                    *


Sorry, that should read:

index  0   1   2   3   4
k1      .    .   a   a   a
k2      .    .   b   a   a
i1                 ^
i2                 ^
lp       u   u   r3
hot                    *



Re: Heap WARM Tuples - Design Draft

From
Bruce Momjian
Date:
On Tue, Aug  9, 2016 at 06:19:57PM -0500, Jim Nasby wrote:
> On 8/8/16 3:19 PM, Bruce Momjian wrote:
> >>What will help, and something I haven't yet applied any thoughts, is when we
> >>> can turn WARM chains back to HOT by removing stale index entries.
> >I can't see how we can ever do that because we have multiple indexes
> >pointing to the chain, and keys that might be duplicated if we switched
> >to HOT.  Seems only VACUUM can fix that.
> 
> Are these changes still predicated on being able to re-find all index
> entries by key value? If so, that makes incremental vacuums practical,
> perhaps eliminating a lot of these issues.

No, the increment/decrement doesn't require btree lookups.

> >>>> > We can't use the bits LP_REDIRECT lp_len because we need to create WARM
> >>>> > chains before pruning, and I don't think walking the pre-pruned chain is
> >>>> > worth it.  (As I understand HOT, LP_REDIRECT is only created during
> >>>> > pruning.)
> >>>
> >>> That's correct. But lp_len provides us some place to stash information from
> >>> heap tuples when they are pruned.
> >Right.  However, I see storing information at prune time as only useful
> >if you are willing to scan the chain, and frankly, I have given up on
> >chain scanning (with column comparisons) as being too expensive for
> >its limited value.
> 
> What if some of this work happened asynchronously? I'm thinking something
> that runs through shared_buffers in front of bgwriter.

Well, we prune when we need the space --- that is the big value of page
pruning, that it is fast can be done when you need it.  VACUUM is all
about background cleanup.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Heap WARM Tuples - Design Draft

From
Jim Nasby
Date:
On 8/9/16 6:46 PM, Claudio Freire wrote:
> On Tue, Aug 9, 2016 at 8:44 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> index  0   1   2   3   4
>> k1      .    .   a   a   a
>> k2      .    .   b   a   a
>> i1       ^
>> i2       ^         ^
>> lp       u   u   r3
>> hot                    *
>
>
> Sorry, that should read:
>
> index  0   1   2   3   4
> k1      .    .   a   a   a
> k2      .    .   b   a   a
> i1                 ^
> i2                 ^
> lp       u   u   r3
> hot                    *
>

FWIW, your ascii-art is unfortunately getting bungled somewhere I think :/.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Heap WARM Tuples - Design Draft

From
Jim Nasby
Date:
On 8/9/16 6:44 PM, Claudio Freire wrote:
> Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
> and in fact we probably did so already as part of the update logic

That's a change from what currently happens, right?

The reason I think that's important is that dropping the assumption that 
we can't safely re-find index entries from the heap opens up other 
optimizations, ones that should be significantly simpler to implement. 
The most obvious example being getting rid of full index scans in 
vacuum. While that won't help with write amplification, it would reduce 
the cost of vacuum enormously. Orders of magnitude wouldn't surprise me 
in the least.

If that's indeed a prerequisite to WARM it would be great to get that 
groundwork laid early so others could work on other optimizations it 
would enable.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Heap WARM Tuples - Design Draft

From
Pavan Deolasee
Date:


On Tue, Aug 9, 2016 at 12:06 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

> Some heuristics and limits on amount of work done to detect duplicate index
> entries will help too.

I think I prefer a more thorough approach.

Increment/decrement may get very complicated with custom opclasses,
for instance. A column-change bitmap won't know how to handle
funcional indexes, etc.

What I intend to try, is modify btree to allow efficient search of a
key-ctid pair, by adding the ctid to the sort order.

Yes, that's a good medium term plan. And this is kind of independent from the core idea. So I'll go ahead and write a patch that implements the core idea and some basic optimizations. We can then try different approaches such as tracking changed columns, tracking increment/decrement and teaching btree to skip duplicate (key, CTID) entries.
 
Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Tue, Aug 9, 2016 at 11:39 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 8/9/16 6:44 PM, Claudio Freire wrote:
>>
>> Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
>> and in fact we probably did so already as part of the update logic
>
>
> That's a change from what currently happens, right?
>
> The reason I think that's important is that dropping the assumption that we
> can't safely re-find index entries from the heap opens up other
> optimizations, ones that should be significantly simpler to implement. The
> most obvious example being getting rid of full index scans in vacuum. While
> that won't help with write amplification, it would reduce the cost of vacuum
> enormously. Orders of magnitude wouldn't surprise me in the least.
>
> If that's indeed a prerequisite to WARM it would be great to get that
> groundwork laid early so others could work on other optimizations it would
> enable.

I can do that. I've been prospecting the code to see what changes it
would entail already.

But it's still specific to btree, I'm not sure the same optimizations
can be applied to GIN (maybe, if the posting list is sorted) or GIST
(probably, since it's like a btree, but I don't know the code well
enough).

Certainly hash indexes won't support it.



Re: Heap WARM Tuples - Design Draft

From
Simon Riggs
Date:
On 10 August 2016 at 03:45, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Tue, Aug 9, 2016 at 12:06 AM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>>
>> > Some heuristics and limits on amount of work done to detect duplicate
>> > index
>> > entries will help too.
>>
>> I think I prefer a more thorough approach.
>>
>> Increment/decrement may get very complicated with custom opclasses,
>> for instance. A column-change bitmap won't know how to handle
>> funcional indexes, etc.
>>
>> What I intend to try, is modify btree to allow efficient search of a
>> key-ctid pair, by adding the ctid to the sort order.
>
>
> Yes, that's a good medium term plan. And this is kind of independent from
> the core idea.

+1 That seems like a good idea. It would allow us to produce a bitmap
scan in blocksorted order.

> So I'll go ahead and write a patch that implements the core
> idea and some basic optimizations.

+1

> We can then try different approaches such
> as tracking changed columns, tracking increment/decrement and teaching btree
> to skip duplicate (key, CTID) entries.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Wed, Aug 10, 2016 at 4:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 10 August 2016 at 03:45, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>>
>>
>> On Tue, Aug 9, 2016 at 12:06 AM, Claudio Freire <klaussfreire@gmail.com>
>> wrote:
>>>
>>> On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
>>> wrote:
>>>
>>> > Some heuristics and limits on amount of work done to detect duplicate
>>> > index
>>> > entries will help too.
>>>
>>> I think I prefer a more thorough approach.
>>>
>>> Increment/decrement may get very complicated with custom opclasses,
>>> for instance. A column-change bitmap won't know how to handle
>>> funcional indexes, etc.
>>>
>>> What I intend to try, is modify btree to allow efficient search of a
>>> key-ctid pair, by adding the ctid to the sort order.
>>
>>
>> Yes, that's a good medium term plan. And this is kind of independent from
>> the core idea.
>
> +1 That seems like a good idea. It would allow us to produce a bitmap
> scan in blocksorted order.

Well, not really. Only equal-key runs will be block-sorted, since the
sort order would be (key, block, offset).



Re: Heap WARM Tuples - Design Draft

From
Amit Kapila
Date:
On Mon, Aug 8, 2016 at 9:56 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug  8, 2016 at 06:34:46PM +0530, Amit Kapila wrote:
>> I think here expensive part would be recheck for the cases where the
>> index value is changed to a different value (value which doesn't exist
>> in WARM chain).   You anyway have to add the new entry (key,TID) in
>> index, but each time traversing the WARM chain would be an additional
>> effort.  For cases, where there are just two index entries and one
>> them is being updated, it might regress as compare to what we do now.
>
> Yes, I think the all-increment or all-decrement WARM chain is going to
> be the right approach.
>

Probably, it will mitigate the cost in some cases, still there will be
a cost in comparisons especially when index is on char/varchar
columns.   Also, I think it will reduce the chance of performing WARM
update in certain cases considering we can create a WARM tuple only
when it follows the order.  OTOH, if we store pointers in index to
intermediate tuples, we won't face such issues.  Yes, there will be
some delay in cleanup of intermediate line pointers, however I think
we can clear those once we mark the corresponding index tuples as dead
in the next scan on corresponding index.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Heap WARM Tuples - Design Draft

From
Jim Nasby
Date:
On 8/10/16 12:48 PM, Claudio Freire wrote:
> On Tue, Aug 9, 2016 at 11:39 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 8/9/16 6:44 PM, Claudio Freire wrote:
>>>
>>> Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
>>> and in fact we probably did so already as part of the update logic
>>
>>
>> That's a change from what currently happens, right?
>>
>> The reason I think that's important is that dropping the assumption that we
>> can't safely re-find index entries from the heap opens up other
>> optimizations, ones that should be significantly simpler to implement. The
>> most obvious example being getting rid of full index scans in vacuum. While
>> that won't help with write amplification, it would reduce the cost of vacuum
>> enormously. Orders of magnitude wouldn't surprise me in the least.
>>
>> If that's indeed a prerequisite to WARM it would be great to get that
>> groundwork laid early so others could work on other optimizations it would
>> enable.
>
> I can do that. I've been prospecting the code to see what changes it
> would entail already.
>
> But it's still specific to btree, I'm not sure the same optimizations
> can be applied to GIN (maybe, if the posting list is sorted) or GIST
> (probably, since it's like a btree, but I don't know the code well
> enough).
>
> Certainly hash indexes won't support it.

Why not? If this is all predicated on re-finding index keys based on 
heap data then this is just another index lookup, no?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Heap WARM Tuples - Design Draft

From
Claudio Freire
Date:
On Thu, Aug 11, 2016 at 11:07 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 8/10/16 12:48 PM, Claudio Freire wrote:
>>
>> On Tue, Aug 9, 2016 at 11:39 PM, Jim Nasby <Jim.Nasby@bluetreble.com>
>> wrote:
>>>
>>> On 8/9/16 6:44 PM, Claudio Freire wrote:
>>>>
>>>>
>>>> Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
>>>> and in fact we probably did so already as part of the update logic
>>>
>>>
>>>
>>> That's a change from what currently happens, right?
>>>
>>> The reason I think that's important is that dropping the assumption that
>>> we
>>> can't safely re-find index entries from the heap opens up other
>>> optimizations, ones that should be significantly simpler to implement.
>>> The
>>> most obvious example being getting rid of full index scans in vacuum.
>>> While
>>> that won't help with write amplification, it would reduce the cost of
>>> vacuum
>>> enormously. Orders of magnitude wouldn't surprise me in the least.
>>>
>>> If that's indeed a prerequisite to WARM it would be great to get that
>>> groundwork laid early so others could work on other optimizations it
>>> would
>>> enable.
>>
>>
>> I can do that. I've been prospecting the code to see what changes it
>> would entail already.
>>
>> But it's still specific to btree, I'm not sure the same optimizations
>> can be applied to GIN (maybe, if the posting list is sorted) or GIST
>> (probably, since it's like a btree, but I don't know the code well
>> enough).
>>
>> Certainly hash indexes won't support it.
>
>
> Why not? If this is all predicated on re-finding index keys based on heap
> data then this is just another index lookup, no?

A lookup on a hash index cannot be made to work for both key-only
lookups and key-ctid lookups, it's a limitation of the data structure.

A key-only lookup can potentially return too many results that don't
match the ctid so a walk of all equal-key item pointers is out of the
question.