Re: HOT latest patch - version 8 - Mailing list pgsql-patches

From Gregory Stark
Subject Re: HOT latest patch - version 8
Date
Msg-id 87d4y8y6kj.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: HOT latest patch - version 8  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> CREATE INDEX
> CREATE INDEX CONCURRENTLY
> -------------------------
> I'm not very familiar with how these, so I'll just shut up..

Here is an addendum for how HOT handles CREATE INDEX and CREATE INDEX
CONCURRENTLY.

CREATE INDEX

There are two strategies for CREATE INDEX depending on whether we're doing it
concurrently or not. For a regular locking create index we take the two
precautions:

1) When indexing a table we must take care that the index contains pointers
   only to Root tuples. These are the non-HOT tuples which everyone else
   expects to have index pointers.

   However the key value we want to index is actually the key at the *end* of
   the chain, ie, the most recent tuple on the index chain. Note that this
   means we could create broken HOT chains because the index key of the new
   index could have changed in the HOT chain.

   If we find any HOT-updated tuple with is RECENTLY_DEAD or
   DELETE_IN_PROGRESS we ignore it assuming that we'll also come across the
   *end* of the update chain and index that instead.

2) If we ignore any HOT-updated tuples above then we mark the index as
   unusable by older transactions for which those tuples might be visible.
   This ensures that nobody for whom the older tuples in a broken HOT chain
   are visible will use the index.

   We do this by putting our transaction-id in pg_index.indcreatexid after
   building the index. Queries check whether that indcreatexid is in their
   serializable snapshot, if it isn't then the index is not usable for that
   query.

   This means that transactions started before the create index commits will
   not get the benefit of the new index even if they first scan the table
   only after the index exists. However new transactions get the benefit of
   the new index immediately but they will always follow the HOT update
   chain since the old tuples with the possibly different keys will never be
   visible to them.

   The tricky case arises with queries executed in the same transaction as
   the CREATE INDEX. In the case of a new table created within the same
   transaction such as with pg_dump -1 the index will always be usable
   because there will never be any HOT update chains so the indcreatexid
   will never be set. Also in the case of a read-committed transaction new
   queries will always be able to use the index. Only in the case of a
   serializable transaction building an index on an existing table with HOT
   updates in it will the index go unused.


CREATE INDEX CONCURRENTLY

In the concurrent case we take a different approach.

1) we create the pg_index entry immediately, before we scan the table. The
   pg_index is marked as "not ready for inserts". Then we commit and wait for
   any transactions which have the table open to finish. This ensures that no
   *new* HOT updates will change the key value for our new index.

2) We wait out any transactions which had the table open. Then we build the
   index with a snapshot. Because we waited out anyone who might have been
   around before the index was created any tuples seen in the snapshot will
   have only valid HOT chains following them. They may still have older HOT
   updates behind them which are broken. As above we point the index pointers
   at the Root of the HOT-update chain but we use the key from the end of the
   chain.

4) We mark the index open for inserts (but still not ready for reads) then we
   again wait out any transactions which had the table open. Then we take a
   second reference snapshot and validate the index. This searches for tuples
   missing from the index -- but it again has to look bup the root of the HOT
   chains and search for those tuples in the index.

4) Then we wait until *every* transaction in progress in the validate_index
   reference snapshot is finished. This ensures that nobody is alive any
   longer who could possibly see any of the tuples in a broken HOT chain.

Glossary:

Broken HOT Chain
 .     A HOT chain in which the key value for an index has changed. This is
 not allowed to occur normally but if a new index is created it can happen. In
 that case various strategies are used to ensure that no transaction for which
 the older tuples are visible can use the index.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: tsearch core path, v0.58
Next
From: Tom Lane
Date:
Subject: Re: Export user visible function to make use of convert_to_scalar