Thread: Question: pg_class attributes and race conditions ?

Question: pg_class attributes and race conditions ?

From
"Pavan Deolasee"
Date:

What is the safest way to access/modify the pg_class attribute
and still avoid any race conditions with the other backends ?

A specific example is: To solve the CREATE INDEX problem with
HOT, I am thinking of adding (along with other things) a pg_class
boolean attribute, say hot_update_enable. All backends are
supposed to check this attribute before they perform an UPDATE.
The attribute would usually be available in relation->rd_rel

My understanding is that the backend which sets this attribute
must first acquire a lock on the heap relation of sufficient
strength so as to ensure that there are no concurrent UPDATErs,
update the pg_class row and then release the lock on the relation.
This would ensure that no backend has a stale "Relation"
pointer with stale value of hot_update_enable.

Also, should I use heap_inplace_update() rather than
simple_heap_update() because I want other backends to see the
change immediately irrespective of their snapshot ?

Is this a fair analysis ? Are there any rules I must follow
to avoid any deadlock and race conditions. I know we should
not be requesting a higher grade lock while holding a
lower grade lock, but are there any other restrictions/best
practices ?

Thanks,
Pavan

-- 


EnterpriseDB        http://www.enterprisedb.com



Re: Question: pg_class attributes and race conditions ?

From
Alvaro Herrera
Date:
Pavan Deolasee wrote:
> 
> 
> What is the safest way to access/modify the pg_class attribute
> and still avoid any race conditions with the other backends ?
> 
> A specific example is: To solve the CREATE INDEX problem with
> HOT, I am thinking of adding (along with other things) a pg_class
> boolean attribute, say hot_update_enable. All backends are
> supposed to check this attribute before they perform an UPDATE.
> The attribute would usually be available in relation->rd_rel
> 
> My understanding is that the backend which sets this attribute
> must first acquire a lock on the heap relation of sufficient
> strength so as to ensure that there are no concurrent UPDATErs,
> update the pg_class row and then release the lock on the relation.
> This would ensure that no backend has a stale "Relation"
> pointer with stale value of hot_update_enable.

FWIW this is pretty much the same I wanted to do with setting
relfrozenxid to FrozenTransactionId.  To this end I wrote a patch to add
a catalog pg_ntclass (later renamed to pg_class_nt), which was
ultimately rejected for reasons I don't remember at the time.  Maybe it
would be illuminating to investigate that -- please see the archives.

(I still think it would be good to have a pg_class_nt catalog, so it's
not a dead idea).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Question: pg_class attributes and race conditions ?

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:
> My understanding is that the backend which sets this attribute
> must first acquire a lock on the heap relation of sufficient
> strength so as to ensure that there are no concurrent UPDATErs,
> update the pg_class row and then release the lock on the relation.

In what context are you proposing to do that, and won't this
high-strength lock in itself lead to deadlocks?

The whole thing sounds exceedingly ugly anyway --- for example
what happens if the backend doing the CREATE INDEX fails and
is therefore unable to clear the flag again?
        regards, tom lane


Re: Question: pg_class attributes and race conditions ?

From
"Pavan Deolasee"
Date:
Tom Lane wrote:>> In what context are you proposing to do that, and won't this> high-strength lock in itself lead to
deadlocks?>>The whole thing sounds exceedingly ugly anyway --- for example> what happens if the backend doing the
CREATEINDEX fails and> is therefore unable to clear the flag again?>
 

Let me state the problem and a vague solution I am thinking of.
I would appreciate comments and suggestions.

The major known issue left with HOT is support for
CREATE INDEX and CREATE INDEX CONCURRENTLY. The
problem is with HEAP_ONLY tuples in the heap which do not have index
entries in the existing indexes. When we build a new index, some or all
of the HEAP_ONLY tuples may need index entries in the new index.
It would be very ugly if we try to keep the existing indexes
without index entries for those tuples. A clean solution
would be to add index entries for the HEAP_ONLY tuples in
the existing indexes and break all the HOT-chains.

I would leave the details, but rather explain what I have in
mind at high level. Any help to fill in the details or any
suggestions to do things differently would immensely help.

This is what I have in mind:

In the context of CREATE INDEX [CONCURRENTLY],

We first disable HOT-updates on the table. This would ensure
that no new HOT tuples are added while we CHILL the heap.
(How do we do this ?)

We then start scanning the heap and start building the new
index. If a HEAP_ONLY tuple is found which needs to be
indexed, we mark the tuple with a CHILL_IN_PROGRESS flag
and insert index entries into all the existing indexes.
(The buffer is exclusively locked and the operation is WAL
logged).

We do this until entire heap is scanned. At this point, we
would have inserted missing index entries for the HEAP_ONLY
tuples. Till this point, we don't use the direct index
entries to fetch the HEAP_ONLY tuples to avoid duplicate
fetches of the same tuple.

We now wait for all the concurrent index scans to end and
then disable HOT-chain following logic to fetch tuples.
(How do we do this ?)

At this point, all index scans would ONLY use the direct
path from the index to fetch tuples. The HOT-chains are
not followed to avoid duplicate fetches of the same tuple.

A second pass over the heap is now required to clear the
CHILL_IN_PROGRESS, HEAP_ONLY and HEAP_HOT_UPDATED flags.

At the end of this step, all the indexes and the table are
in sync. Once again we need to ensure that there are no
concurrent index scans in progress and then enable HOT-fetch.
Also, HOT-updates can be turned on.

If CREATE INDEX crashes, VACUUM is required to clear the
CHILL_IN_PROGRESS flags and the corresponding index entries
are removed. Since VACUUM runs mutually exclusive to CREATE
INDEX, we don't need any special mechanism to handle race
conditions between them.

There are some other details like running multiple CREATE
INDEX in parallel and still be able to CHILL the table
safely. May be one of them needs to act as the chiller
and others wait for it finish successfully.

Any thoughts on the overall approach ? Any suggestions to
simplify things or any alternate designs ? Can something
as simple as CHILLing the table holding VACUUM FULL
strength lock be acceptable ?


Thanks,
Pavan

-- 


EnterpriseDB        http://www.enterprisedb.com



Re: Question: pg_class attributes and race conditions ?

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:
> Any thoughts on the overall approach ?

Fragile and full of race conditions :-(.  I thought from the beginning
that CREATE INDEX might be a showstopper for the whole HOT concept,
and it's starting to look like that's the case.

I think what we need to get away from is the assumption that HOT-ness
for one index is the same as HOT-ness for all.  What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?
        regards, tom lane


Re: Question: pg_class attributes and race conditions ?

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:
>> Any thoughts on the overall approach ?
> 
> Fragile and full of race conditions :-(.  I thought from the beginning
> that CREATE INDEX might be a showstopper for the whole HOT concept,
> and it's starting to look like that's the case.
> 
> I think what we need to get away from is the assumption that HOT-ness
> for one index is the same as HOT-ness for all.  What if we only applied
> HOT to primary-key indexes, so that there was certainly not more than
> one index per table that the property applies to?

Just to throw my two bits in here :). If we do that, how does that
effect the idea that most people in the web world use (id serial primary
key), even though that is not what they are searching on?

More specifically, does HOT help conditions where a composite comes into
play (session_id,last_active) ... which would be a more heavily updated
index than just the primary key.

Sincerely,

Joshua D. Drake


> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Question: pg_class attributes and race conditions ?

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> What if we only applied
> HOT to primary-key indexes, so that there was certainly not more than
> one index per table that the property applies to?

The main objective of HOT is to enable retail vacuum of HOT-updated 
tuples. Doing the above would make it useless for that purpose, at least 
when there's more than one index on the table. Granted, there's a lot of 
tables with just one index out there, but it's a big limitation 
nevertheless.

An extension of that idea, though is to store a flag per index in the 
HOT-updated tuple. We would then need a mapping between bits in the 
tuple header to indexes, for example as a new column in pg_index.

Let's assume that we'd use one byte in the tuple header to store 
HOT-flags. That's enough to store the HOT-status for 8 indexes. A set 
bit means that the index corresponding that bit doesn't have an index 
pointer in it.

When you CREATE INDEX, assign a bit for the new index that's not 
currently in use. When you scan the table to build the index, clear that 
bit for every tuple if set and insert index entry as usual.

DROP INDEX wouldn't need to scan the heap to clear the flags, because we 
clear them on CREATE INDEX when necessary.

If you run out of bits in the header, IOW have more than 8 indexes on a 
table, indexes unlucky enough to not have a bit assigned to them 
wouldn't be HOT-updateable.

This would also enable us to skip index inserts for those indexes whose 
key columns are not updated, and do the index inserts as usual for the 
rest. The limitation that you can only retail vacuum HOT-updated tuples 
when none of the indexed keys were changed remains, but we've accepted 
that already.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Question: pg_class attributes and race conditions ?

From
"Pavan Deolasee"
Date:
Tom Lane wrote:> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:>> Any thoughts on the overall approach ?>>
Fragileand full of race conditions :-(.>
 

Yes, it looks a bit complex. But IMHO we can get around that.
Do you have any ideas in mind about doing that ?
> I thought from the beginning> that CREATE INDEX might be a showstopper for the whole HOT concept,> and it's starting
tolook like that's the case.
 

I remember you raised this concern very early, but I am hopeful
that we would be able to solve this. Would it be acceptable
to have a simple (though not the best) solution for this release
and then improve later on ? As I mentioned earlier, one option
is to CHILL the table, if required, holding AccessExclusive lock,
just like VACUUM FULL. I am assuming here that CREATE INDEX is
not such a common activity, isn't that true ?
> I think what we need to get away from is the assumption that HOT-ness> for one index is the same as HOT-ness for all.
What if we only applied> HOT to primary-key indexes, so that there was certainly not more than> one index per table
thatthe property applies to?>
 

I think that will take away the ability to reuse HEAP_ONLY tuples
without vacuuming the heap and index.

Thanks,
Pavan


-- 


EnterpriseDB        http://www.enterprisedb.com



Re: Question: pg_class attributes and race conditions ?

From
"Simon Riggs"
Date:
On Fri, 2007-03-16 at 21:56 +0530, Pavan Deolasee wrote:

> Any thoughts on the overall approach ? Any suggestions to
> simplify things or any alternate designs ? 

Well your design is very different from what we discussed, so I think I
should post my proposed design alongside this, for further discussion.

- - -

Proposed solutions for CREATE INDEX and CREATE INDEX CONCURRENTLY.

CREATE INDEX CONCURRENTLY has no feature changes to work with HOT.

CREATE INDEX works normally, except when HOT tuples are found, in which
case special processing occurs requiring additional locking. The index
build occurs in a single scan, as now.

CREATE INDEX CONCURRENTLY
-------------------------
We perform no chilling during the initial scan. We index the tuple
identified by SnapshotNow, but we take the root tuple's htid, not the
htid of the tuple version being indexed. We assume that the tuple
version indexed will be the root of the HOT chain by the time the index
is complete.

Currently, we wait until all pre-existing transactions exit before we
allow this to COMMIT. With HOT, we simply move the wait so it occurs
*before* the second scan, then we can prune the HOT chains as we pass
through the heap on the second scan. There will be no pre-existing HOT
tuples and so no chilling is required.

CREATE INDEX
------------

We add a field, xchill, to pg_class that stores TransactionIds. This is
set to InvalidTransactionId if no index has been built yet. The concept
of "main indexer" is introduced, so we allow multiple concurrent index
builds, but only one of these can chill tuples at a time.

1. In IndexBuildHeapScan, as we pass through the table:

a) if we find any any HOT rows, we check xchill and do one of steps
(i-iii). Until this point, it hasn't been important whether we are the
main or a secondary indexer.

i) if xchill is InvalidTransactionId or is committed then we attempt to
become main indexer immediately, following these steps before we
continue building the index (1b)

-- If the table is temp, or if we created the table then we immediately
become the main indexer, so return immediately. If the table being
indexed is already visible to everybody, then:
-- Update pg_class entry for the table, setting the xchill field for the
table to the builder's Xid. (Use heap_inplace_update for this, which is
OK, whether we commit or abort).
-- acquire AccessExclusiveLock on all existing indexes (not the heap)

ii) If xchill is in-progress we wait for that transaction to complete,
then do either step i) or iii). We cannot continue building our index
until the other transaction commits because we cannot yet see the other
index, yet we have to insert into it in order to correctly chill a tuple
to allow *our* index to be built.

iii) if xchill is aborted we abort also, saying that a VACUUM is needed.

b) If we get here then we are the main indexer and can chill tuples. As
we move through the scan we chill all HOT tuples, mark them
HEAP_CHILL_IN_PROGRESS, write WAL for this and insert index entries for
them in all existing indexes, as well as this one. Then remove
CHILL_IN_PROGRESS flags, without writing WAL.

c) release locks on indexes, before end of transaction

2. If we crash or a transaction abort occurs:
- we cannot prune a HEAP_ONLY_TUPLE that points to a tuple with
HEAP_CHILL_IN_PROGRESS. 

- VACUUM must be used to clean up after an aborted index build and needs
some additional code to allow this to occur.

3. Concurrent index builds are allowed. If we are not the main indexer,
then we can attempt to build an index, but any scan that sees a HOT
tuple will block and wait for the main index builder to complete before
it proceeds.

4. When an indexscan reads the table, if it finds a
HEAP_CHILL_IN_PROGRESS tuple it may or may not be valid. Concurrent
index scans and tuple chilling can mean that an index scan find the same
tuple twice, by different routes, if a CREATE INDEX crashed. To avoid
this an IndexScan will only find a tuple visible if it came across a
HEAP_CHILL_IN_PROGRESS tuple using an indirect route, i.e. it followed
the path from root->HOT tuple.

In this design, CREATE INDEX does have a deadlock risk when it is used
within a transaction *and* the index is being built on a publicly
visible table (i.e. not just-built and not temp). IMHO that risk is
acceptable, since if users are worried about concurrent access to a
table during CREATE INDEX they can use CREATE INDEX CONCURRENTLY.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Question: pg_class attributes and race conditions ?

From
Gregory Stark
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:

> Just to throw my two bits in here :). If we do that, how does that
> effect the idea that most people in the web world use (id serial primary
> key), even though that is not what they are searching on?

"affect". But I think you're right that generally you'll have two indexes.

> More specifically, does HOT help conditions where a composite comes into
> play (session_id,last_active) ... which would be a more heavily updated
> index than just the primary key.

Well if you're indexing a column that you're updating then you've already
failed your saving throw.

The case we're trying to deal with is when you're updating columns that
*aren't* indexed and therefore really don't need redundant index pointers for
each tuple version with identical to the old versions. Especially since those
index pointers are what's preventing us from vacuuming the old tuple versions.

If you are updating an index key then there's no question you're going to need
vacuum to clean out your index.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Question: pg_class attributes and race conditions ?

From
"Pavan Deolasee"
Date:
Heikki Linnakangas wrote:> Tom Lane wrote:>> What if we only applied>> HOT to primary-key indexes, so that there was
certainlynot more than>> one index per table that the property applies to?>> The main objective of HOT is to enable
retailvacuum of HOT-updated> tuples. Doing the above would make it useless for that purpose,> at least when there's
morethan one index on the table. Granted,> there's a lot of tables with just one index out there, but it's a> big
limitationnevertheless.>
 

Agree.
> An extension of that idea, though is to store a flag per index in> the HOT-updated tuple. We would then need a
mappingbetween bits in> the tuple header to indexes, for example as a new column in pg_index.>
 

I like the idea. The major objection would be that it adds a byte
to the tuple header which when considered along with the null
bitmap, may actually make the header 8 bytes larger in the
worst case.

Also, I am also worried about the additional complexity introduced
with this. We can and should work on this idea, I am wondering
whether it would be too much to do before the feature freeze.

I am personally inclined towards doing something simpler to
tackle the CREATE INDEX issue at the moment. But if that is not
acceptable and/or you or anyone else is willing help me on this,
we can work on a better solution.


Thanks,
Pavan

-- 


EnterpriseDB        http://www.enterprisedb.com



Re: Question: pg_class attributes and race conditions ?

From
"Simon Riggs"
Date:
On Fri, 2007-03-16 at 12:40 -0400, Tom Lane wrote:
> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:
> > Any thoughts on the overall approach ?
> 
> Fragile and full of race conditions :-(.  I thought from the beginning
> that CREATE INDEX might be a showstopper for the whole HOT concept,
> and it's starting to look like that's the case.

Seems like we can fix all but some strange CREATE INDEX use cases. Since
we have CREATE INDEX CONCURRENTLY, seems like HOT is a showstopper for
the whole CREATE INDEX concept.

> I think what we need to get away from is the assumption that HOT-ness
> for one index is the same as HOT-ness for all. 

Sounds interesting. I'd not considered that before.

>  What if we only applied
> HOT to primary-key indexes, so that there was certainly not more than
> one index per table that the property applies to?

On its own, I don't think this is a sufficiently wide use-case.

Perhaps we should do this PLUS make HOT-semantics optional for each
additional index. i.e. HOT is always enforced on primary indexes and
optionally on other indexes (but not by default).

If you accept the HOT option on an index, you then accept the additional
issues surrounding chilling tuples. Bear in mind that there aren't any
at all if you use CREATE INDEX CONCURRENTLY and many other cases.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Question: pg_class attributes and race conditions ?

From
"Joshua D. Drake"
Date:
Gregory Stark wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> 
>> Just to throw my two bits in here :). If we do that, how does that
>> effect the idea that most people in the web world use (id serial primary
>> key), even though that is not what they are searching on?
> 
> "affect". But I think you're right that generally you'll have two indexes.
> 
>> More specifically, does HOT help conditions where a composite comes into
>> play (session_id,last_active) ... which would be a more heavily updated
>> index than just the primary key.
> 
> Well if you're indexing a column that you're updating then you've already
> failed your saving throw.

Just for everyone who missed this. Greg Stark obviously spends his time
(or at some time) playing|ed D&D. I have an Epic level Sorcerer, how
about you Greg? ;)

Sincerely,

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Question: pg_class attributes and race conditions ?

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> >  What if we only applied
> > HOT to primary-key indexes, so that there was certainly not more than
> > one index per table that the property applies to?
> 
> On its own, I don't think this is a sufficiently wide use-case.
>
> Perhaps we should do this PLUS make HOT-semantics optional for each
> additional index. i.e. HOT is always enforced on primary indexes and
> optionally on other indexes (but not by default).

Here's is a very simple, low-tech idea.  How about checking whether the
new index requires chilling tuples; if it does, then elog(ERROR) until
all the indexes have been manually chilled, which would be done with an
"ALTER INDEX ... CHILL" command or something like that.  Only when all
indexes are known chilled, you can create another one, and then the user
can "hotify" indexes as appropriate.

(Disclaimer: I haven't followed the HOT design closely to know if this
makes enough sense)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Question: pg_class attributes and race conditions ?

From
"Simon Riggs"
Date:
On Fri, 2007-03-16 at 16:59 -0400, Alvaro Herrera wrote:

> Here's is a very simple, low-tech idea.  How about checking whether the
> new index requires chilling tuples; if it does, then elog(ERROR) until
> all the indexes have been manually chilled, which would be done with an
> "ALTER INDEX ... CHILL" command or something like that.  Only when all
> indexes are known chilled, you can create another one, and then the user
> can "hotify" indexes as appropriate.

Well, I've spent two weeks searching for a design that does CREATE INDEX
without changing existing functionality. What's been proposed is very
close, but not exact.

CREATE INDEX CONCURRENTLY can work, so we're just discussing the other
increasingly edgy cases.

I agree some kind of compromise on CREATE INDEX seems to be required if
we want HOT without some drastic reductions in function. I'm happy to go
for low tech approaches, or anything really. Simple is good, so we can
hit feature freeze.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

From
"Pavan Deolasee"
Date:
How do we move forward with the CREATE INDEX issue with
HOT ? There are quite a few suggestions and objections.
Can we please discuss and decide on the plan ? I am very
comfortable with the current state of HOT, the results
are encouraging and I hope this issue does not become
a showstopper.

Here is what different people have suggested:

1. Simon and I posted couple of designs which include prior
suggestions from Heikki, Tom, Hannu and others. I believe
these designs are not very distinct and if done correctly
would help us keep the current behavior of CREATE INDEX
and CREATE INDEX CONCURRENTLY same. There are concerns
about race conditions and deadlock issues though. What
is a general feeling at this point ? Are these issues very
hard to address ? At a high level, IMO we need some or all
of these things to make either of these designs work:
   - ability to enable/disable HOT-updates on the table   - ability to enable/disable HOT-fetches on the table   -
abilityto wait for concurrent index scans to finish   - ability to allow only one backend to CHILL the table
 

How hard would these be without causing any race conditions
and deadlocks ?


2. Heikki suggested an approach where we add a byte
to tuple header and track HOT-ness of different indexes.
The idea looks good but had a downside of increasing tuple
header and complexity.


3. Alvaro suggested to have a ALTER TABLE .. SET CHILL kind
of syntax to chill the table. I would like to extend that
a bit further:

Can we have a mechanism to explicitely ON/OFF HOT-updates
on tables ? By default, HOT is disabled. If user wants to
take advantage of HOT, he/she should do something like:

ALTER TABLE test SET HOT ON;

While creating an index, if a HEAP_ONLY tuple is found,
CREATE INDEX [CONCURRENTLY] fails with an error and the
user needs to SET HOT OFF and then try again. While turning
HOT off, the entire table is CHILLed, holding AccessExclusive
lock on the table. Once the new index is created, user
can turn HOT on again.

This infrastructure would ensure that the current
behavioral expectations of CREATE INDEX [CONCURRENTLY]
don't suddenly change unless user explicitely turns
HOT on. In that case, we can very well assume that the
user is aware of the advantages/disadvantages of HOT
and also cost associated with creating a new index on
HOT-updated tables. This would also allow users
experiment with HOT and turn it off if there is no gain
in their environment.

Any thoughts on the these approaches ? What is feasible
and acceptable ?

I am more inclined towards  the third approach for this
release and improve things later on. Is this acceptable ?


Thanks,
Pavan


-- 


EnterpriseDB        http://www.enterprisedb.com



On Sat, 2007-03-17 at 12:43 +0530, Pavan Deolasee wrote:
> How do we move forward with the CREATE INDEX issue with
> HOT ? There are quite a few suggestions and objections.
> Can we please discuss and decide on the plan ? I am very
> comfortable with the current state of HOT, the results
> are encouraging and I hope this issue does not become
> a showstopper.

My suggested plan would be:

First, we should make CREATE INDEX CONCURRENTLY work, since we have a
design to do that. Many other things are acceptable if that works fully
and correctly with HOT, if it cannot, we have problems and that is a
showstopper. 

Second, I can't put my finger on it exactly, but I think having a DDL
command to chill a table, as Alvaro suggests, sounds like a very wise
utility to include, whatever we do later. That does have the useful
by-product of making CREATE INDEX work without change, but I would not
want to rely on it in the longer term.

While those are happening we can explore the other possibilities in more
detail, but they do seem like safe bets to me.

> Here is what different people have suggested:
> 
> 1. Simon and I posted couple of designs which include prior
> suggestions from Heikki, Tom, Hannu and others. I believe
> these designs are not very distinct and if done correctly
> would help us keep the current behavior of CREATE INDEX
> and CREATE INDEX CONCURRENTLY same. 

Agreed. Apologies for confusing things by posting the second design,
that wasn't very helpful or polite.

> There are concerns
> about race conditions and deadlock issues though. What
> is a general feeling at this point ? Are these issues very
> hard to address ? At a high level, IMO we need some or all
> of these things to make either of these designs work:
> 
>     - ability to enable/disable HOT-updates on the table
>     - ability to enable/disable HOT-fetches on the table
>     - ability to wait for concurrent index scans to finish
>     - ability to allow only one backend to CHILL the table

That seems like a good summary of what we need. I would add only that
these need not be explicit user-level commands, just internal
capabilities.

> How hard would these be without causing any race conditions
> and deadlocks ?

We should be able to take advantage of two things:

- We only need to chill some of the tuples, in some cases.

- Chilling only requires us to lock the indexes, which would be more
practical if the index locking were more transitory.

I note that IndexScans hold the locks they have on an index until
transaction end, even after the IndexScans and BitmapIndexScans
complete. ISTM we could relax that locking; that seems OK since we
earlier discussed removing locks completely on indexes. The lock on the
heap would not be upgraded or changed.

> 2. Heikki suggested an approach where we add a byte
> to tuple header and track HOT-ness of different indexes.
> The idea looks good but had a downside of increasing tuple
> header and complexity.

I'm very comfortable with the idea that HOT can be turned on/off for a
table. That gives us a workaround to bugs. Previously, changing things
like WITHOUT OIDS was done over two releases, so I'd suggest the same
thing here. Add the option now, disabled, then look to make it the
default option in the next release. We can override that with the
default_use_hot parameter for those that feel bold, at least initially.
I know Bruce has been long opposed to the idea of a table-level switch,
which is why we've been trying so hard to avoid it. So we should add his
-1 to this idea from the start.

Right now, ideas around (2) sound like too much complexity. Maybe
there's a good idea there waiting to break out, so we should pursue that
also - but I'm not sure you can wait for that to happen.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:
> While creating an index, if a HEAP_ONLY tuple is found,
> CREATE INDEX [CONCURRENTLY] fails with an error and the
> user needs to SET HOT OFF and then try again. While turning
> HOT off, the entire table is CHILLed, holding AccessExclusive
> lock on the table. Once the new index is created, user
> can turn HOT on again.

It hardly seems acceptable to require exclusive lock to chill a table.
In production situations, knowing that you'd have to do that to do
index maintenance on a large table would probably scare you off of
ever enabling the feature at all.  Last year we were getting beaten up
about how it wasn't acceptable for CREATE INDEX to lock out writes
for a long time; how is it suddenly acceptable to need to lock out
both reads and writes for a long time before you can even think
about creating an index?
        regards, tom lane


On Sat, 2007-03-17 at 11:45 -0400, Tom Lane wrote:
> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:
> > While creating an index, if a HEAP_ONLY tuple is found,
> > CREATE INDEX [CONCURRENTLY] fails with an error and the
> > user needs to SET HOT OFF and then try again. While turning
> > HOT off, the entire table is CHILLed, holding AccessExclusive
> > lock on the table. Once the new index is created, user
> > can turn HOT on again.
> 
> It hardly seems acceptable to require exclusive lock to chill a table.
> In production situations, knowing that you'd have to do that to do
> index maintenance on a large table would probably scare you off of
> ever enabling the feature at all.  Last year we were getting beaten up
> about how it wasn't acceptable for CREATE INDEX to lock out writes
> for a long time; how is it suddenly acceptable to need to lock out
> both reads and writes for a long time before you can even think
> about creating an index?

I agree with you: It isn't acceptable for us to contemplate an
AccessExclusiveLock before we can build any index.

We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is
I think we can without significant difficulty.

The problems are with CREATE INDEX, in some cases. I regret that I did
not see those difficulties until recently, which is why I was concerned
that we spent time on VACUUM FULL rather than this issue. I'm glad now
that you both pressed ahead and solved that though.

As a result of the issues, I think Pavan is playing safe, to make sure
there is *an* option, so that we can build upwards from there. The
proposal is pragmatism only, while we discuss other approaches.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

From
"Pavan Deolasee"
Date:
Tom Lane wrote:> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:>> While creating an index, if a HEAP_ONLY
tupleis found,>> CREATE INDEX [CONCURRENTLY] fails with an error and the>> user needs to SET HOT OFF and then try
again.While turning>> HOT off, the entire table is CHILLed, holding AccessExclusive>> lock on the table. Once the new
indexis created, user>> can turn HOT on again.>> It hardly seems acceptable to require exclusive lock to chill a
table.>In production situations, knowing that you'd have to do that to do> index maintenance on a large table would
probablyscare you off of> ever enabling the feature at all.  Last year we were getting beaten up> about how it wasn't
acceptablefor CREATE INDEX to lock out writes> for a long time; how is it suddenly acceptable to need to lock out> both
readsand writes for a long time before you can even think> about creating an index?>
 

Yeah, I agree. I was proposing this as a stop-gap solution though.
Something which would help us solve the problem without changing
the current behavior for non-HOT tables.

So what do you suggest ? Do you feel that there is no way we can
solve the problem ?

ISTM that if we run CHILL as a seperate transaction (just like
VACUUM), we should be able to CHILL the table with
ShareUpdateExclusiveLock. Running it as a seperate transaction
would reduce the risk of causing deadlocks. Is that a fair
assessment ?

If we need to CHILL with ShareUpdateExclusiveLock, IMHO we
would again be back to something similar to the first approach.
I know you said its fragile and full of race conditions, but
do you think we can handle it better if we have a seperate
DDL command, running within its own transaction ?

The algorithm would look like:

1. Disable HOT-updates
2. CHILL the table by inserting appropriate index entries and  marking tuples CHILL_IN_PROGRESS
3. Establish a point when there are no open index scans
4. Disable HOT-fetches
5. Scan the heap again, reset CHILL_IN_PROGRESS, HEAP_ONLY  and HOT_UPDATED flags
6. Establish a point when there are no open index scans
7. Enable HOT-fetches
8. Enable HOT-updates


I need help to do the step 1,3,4,6,7 and 8 (well all :-))
in a deadlock and race condition free mannner. Any
suggestions ? Where do we keep the global state about
HOT-updates/HOT-fetches ? If we keep it in pg_class, a
crash of the CHILL command or the server may leave the
pg_class row in a stale state. That does not look like
a problem though. In the worst case, we might not be
able to do HOT-updates without manual intervention.

Again comments, suggestions ? I really appreciate
everyone's time and patience. Help is what I need to
solve this problem.

Thanks,
Pavan

-- 


EnterpriseDB        http://www.enterprisedb.com



Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> As a result of the issues, I think Pavan is playing safe, to make sure> there is *an* option, so
thatwe can build upwards from there. The> proposal is pragmatism only, while we discuss other approaches.>
 

Absolutely true. I agree that CHILLing the table with AccessExclusive
lock is not a solution that I like. I was looking at it just as a
stop-gap solution, given the time left for feature freeze.
But if there is something better that we can do, I am for it.

Thanks,
Pavan


-- 


EnterpriseDB        http://www.enterprisedb.com



On Sat, 2007-03-17 at 23:11 +0530, Pavan Deolasee wrote:

> The algorithm would look like:
> 
> 1. Disable HOT-updates

The lock held by CREATE INDEX already prevents HOT updates. So steps 1
and 8 aren't needed.

We need to be clear that we already have a solution to CREATE INDEX
CONCURRENTLY. Do you agree that we do? Does anyone see a problem with
the posted design for that?

Hopefully it is only CREATE INDEX that we need to think about.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> We need to be clear that we already have a solution to CREATE INDEX> CONCURRENTLY. Do you agree
thatwe do? Does anyone see a problem with> the posted design for that?>> Hopefully it is only CREATE INDEX that we need
tothink about.>
 

I agree. Lets first decide whether its only CREATE INDEX that
needs solution or its both.

Lets also decide whether we want to fix CREATE INDEX
[CONCURRENTLY] or we want to provide a seperate DLL to
CHILL the table and then build index normally. Tom
has already rejected the idea of holding exclusive lock
while chilling, but if we get around that, is rest of
the approach acceptable ?

If its going to be a seperate DLL, lets decide whether
its acceptable to run it as a seperate transaction.


Thanks,
Pavan


-- 


EnterpriseDB        http://www.enterprisedb.com



On Sun, 2007-03-18 at 00:44 +0530, Pavan Deolasee wrote:
> Simon Riggs wrote:
>  >
>  > We need to be clear that we already have a solution to CREATE INDEX
>  > CONCURRENTLY. Do you agree that we do? Does anyone see a problem with
>  > the posted design for that?

If we have solved CREATE INDEX CONCURRENTLY, then I would propose that
this becomes the default option for creating an index, when the
statement is issued outside of a statement block.

That seems better than reminding everybody to run with the CONCURRENTLY
option, or advise them of different performance characteristics or
behaviour of the normal CREATE INDEX route.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>>> We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is> I think we can without
significantdifficulty.>
 

Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though
I am not completely convinced that we can do that without much changes
to CREATE INDEX CONCURRENTLY logic. For example, I believe we still
need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY.
Otherwise we might end up creating two paths to the same tuple in
the new index.

Say, we have a table with two columns (int a, int b). We have an
index on 'a' and building another index on 'b'. We got a tuple
(10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY,
this tuple would be indexed. If the tuple is HOT-updated to (10, 30)
before the first phase ends, the updated tuple would again get
indexed in the second phase. This would lead to two paths to the
latest visible tuple from the new index.

Am I missing something in your design that stops this from
happening ?

Thanks,
Pavan


-- 


EnterpriseDB        http://www.enterprisedb.com



On Mon, 2007-03-19 at 14:33 +0530, Pavan Deolasee wrote:
> Simon Riggs wrote:
>  >
>  >
>  > We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is
>  > I think we can without significant difficulty.
>  >
> 
> Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though
> I am not completely convinced that we can do that without much changes
> to CREATE INDEX CONCURRENTLY logic. For example, I believe we still
> need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY.
> Otherwise we might end up creating two paths to the same tuple in
> the new index.
> 
> Say, we have a table with two columns (int a, int b). We have an
> index on 'a' and building another index on 'b'. We got a tuple
> (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY,
> this tuple would be indexed. If the tuple is HOT-updated to (10, 30)
> before the first phase ends, the updated tuple would again get
> indexed in the second phase. This would lead to two paths to the
> latest visible tuple from the new index.
> 
> Am I missing something in your design that stops this from
> happening ?

This problem is solved by moving the wait (for all transactions in
reference snapshot to finish) so that it is now between the first and
second scans, as described.

During the second scan we would prune each block, so the only remaining
tuple in the block when the second scan sees it would be (10,30) and it
would no longer be a HOT tuple - the index would have a pointer to it,
so no new index pointer would be added. The pointer to (10,30) is the
same pointer that was added in the first phase for the tuple (10,20).

The wait and subsequent prune ensures that all HOT tuples are now the
root of their HOT chain. The index created in the fist phase ensures
that the HOT chains are never added to.


--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




On Mon, 2007-03-19 at 09:28 +0000, Simon Riggs wrote:
> On Mon, 2007-03-19 at 14:33 +0530, Pavan Deolasee wrote:
> > Simon Riggs wrote:
> >  >
> >  >
> >  > We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is
> >  > I think we can without significant difficulty.
> >  >
> > 
> > Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though
> > I am not completely convinced that we can do that without much changes
> > to CREATE INDEX CONCURRENTLY logic. For example, I believe we still
> > need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY.
> > Otherwise we might end up creating two paths to the same tuple in
> > the new index.
> > 
> > Say, we have a table with two columns (int a, int b). We have an
> > index on 'a' and building another index on 'b'. We got a tuple
> > (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY,
> > this tuple would be indexed. If the tuple is HOT-updated to (10, 30)
> > before the first phase ends, the updated tuple would again get
> > indexed in the second phase. This would lead to two paths to the
> > latest visible tuple from the new index.
> > 
> > Am I missing something in your design that stops this from
> > happening ?
> 
> This problem is solved by moving the wait (for all transactions in
> reference snapshot to finish) so that it is now between the first and
> second scans, as described.
> 
> During the second scan we would prune each block, so the only remaining
> tuple in the block when the second scan sees it would be (10,30) and it
> would no longer be a HOT tuple - the index would have a pointer to it,
> so no new index pointer would be added. The pointer to (10,30) is the
> same pointer that was added in the first phase for the tuple (10,20).
> 
> The wait and subsequent prune ensures that all HOT tuples are now the
> root of their HOT chain. The index created in the fist phase ensures
> that the HOT chains are never added to.

AFAICS this is all you need to make CREATE INDEX CONCURRENTLY work with
HOT, which is even simpler than my original post. [This presumes that we
do pruning automatically on a heap scan, not sure what the current state
of that is, but it could be a scan option].

Index: src/backend/commands/indexcmds.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/indexcmds.c,v
retrieving revision 1.157
diff -c -r1.157 indexcmds.c
*** src/backend/commands/indexcmds.c    13 Mar 2007 00:33:39 -0000
1.157
--- src/backend/commands/indexcmds.c    19 Mar 2007 09:49:56 -0000
***************
*** 497,507 ****       ActiveSnapshot = snapshot;
       /*
-        * Scan the index and the heap, insert any missing index
entries.
-        */
-       validate_index(relationId, indexRelationId, snapshot);
-
-       /*        * The index is now valid in the sense that it contains all
currently        * interesting tuples.  But since it might not contain tuples
deleted just        * before the reference snap was taken, we have to wait out any
--- 497,502 ----
***************
*** 514,519 ****
--- 509,519 ----       for (ixcnt = 0; ixcnt < snapshot->xcnt; ixcnt++)
XactLockTableWait(snapshot->xip[ixcnt]);

+       /*
+        * Scan the index and the heap, insert any missing index
entries.
+        */
+       validate_index(relationId, indexRelationId, snapshot);
+       /* Index can now be marked valid -- update its pg_index entry */       pg_index = heap_open(IndexRelationId,
RowExclusiveLock);



--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

From
Heikki Linnakangas
Date:
Pavan Deolasee wrote:
> 2. Heikki suggested an approach where we add a byte
> to tuple header and track HOT-ness of different indexes.
> The idea looks good but had a downside of increasing tuple
> header and complexity.

We would only need the extra byte in HOT-updated tuples. Alternatively, 
we could use the bits we have free in infomask2. There's currently 5 
bits free, using just 2 or 3 of those would get us quite far. Or just 
one, which would be the Tom's suggestion of only using HOT for tables 
with a single index.

Complexity is in the eye of the beholder. Chilling existing tuples isn't 
exactly trivial either, and neither is getting all the locking and 
waiting needed in the other proposals correct.

The simplicity of the other proposals depend a lot on what kind of 
restrictions and changes to current semantics of CREATE INDEX 
[CONCURRENTLY] we accept. Which of the following restrictions are we OK 
with, if a table has HOT-updated tuples:

1. Throw an error
2. Require a vacuum after crash during CREATE INDEX
3. Do multiple heap-scan passes
4. Wait longer in CREATE INDEX CONCURRENTLY
5. Wait in CREATE INDEX, like we do in CREATE INDEX CONCURRENTLY
6. Lock the table exclusively
7. Disallow multiple CREATE INDEXes at the same time.

I've lost track of which proposals lead to which restrictions. Maybe we 
should look at the restrictions first, and judge which ones are 
acceptable and which ones are not?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

From
"Pavan Deolasee"
Date:
Heikki Linnakangas wrote:> Pavan Deolasee wrote:>> 2. Heikki suggested an approach where we add a byte>> to tuple
headerand track HOT-ness of different indexes.>> The idea looks good but had a downside of increasing tuple>> header
andcomplexity.>> We would only need the extra byte in HOT-updated tuples. 
 
Alternatively, we could use the bits we have free in infomask2. There's 
currently 5 bits free, using just 2 or 3 of those would get us quite 
far. Or just one, which would be the Tom's suggestion of only using HOT 
for tables with a single index.>

We've already used three of those, two for tracking HEAP_ONLY
and HOT_UPDATED tuples and one for tracking fragmented tuple.
Doing it for just one index seems too restrictive. Are we ok
with adding another byte to the tuple header ?
> Complexity is in the eye of the beholder. Chilling existing tuples 
isn't exactly trivial either, and neither is getting all the locking and 
waiting needed in the other proposals correct.>

I agree. I am just worried about the short term and long
term solution. Your proposal is certainly the better of
all as it also gives us the ability to restrict bloats
on a index whose key does not change during UPDATE.

I would like to do something which is acceptable and is
also feasible to complete by feature freeze. Do you want
to give a shot to this approach while I try to build
the ALTER TABLE and CHILL utilities ?
> The simplicity of the other proposals depend a lot on what kind of 
restrictions and changes to current semantics of CREATE INDEX 
[CONCURRENTLY] we accept. Which of the following restrictions are we OK 
with, if a table has HOT-updated tuples:>> 1. Throw an error> 2. Require a vacuum after crash during CREATE INDEX> 3.
Domultiple heap-scan passes> 4. Wait longer in CREATE INDEX CONCURRENTLY> 5. Wait in CREATE INDEX, like we do in CREATE
INDEXCONCURRENTLY> 6. Lock the table exclusively> 7. Disallow multiple CREATE INDEXes at the same time.>> I've lost
trackof which proposals lead to which restrictions. Maybe 
 
we should look at the restrictions first, and judge which ones are 
acceptable and which ones are not?>

This is a good summary. With the assumption that creating
index is not very frequent operation, I would live with
1, 2, 3 and 4. But frankly I'm least knowledgable in this
regard and would rely on others to decide.

Thanks,
Pavan

-- 


EnterpriseDB        http://www.enterprisedb.com



Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> This problem is solved by moving the wait (for all transactions in> reference snapshot to finish)
sothat it is now between the first and> second scans, as described.>> During the second Vscan we would prune each
block,so the only remaining> tuple in the block when the second scan sees it would be (10,30) and it> would no longer
bea HOT tuple - the index would have a pointer to it,> so no new index pointer would be added. The pointer to (10,30)
isthe> same pointer that was added in the first phase for the tuple (10,20).>
 

The problem is that in the first phase, the pointer was inserted
with key=20 whereas now its changed to 30. So we need to delete the old
index entry and add a new one.

Thanks,
Pavan


-- 


EnterpriseDB        http://www.enterprisedb.com



Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

From
Heikki Linnakangas
Date:
Pavan Deolasee wrote:
> Heikki Linnakangas wrote:
>  > Pavan Deolasee wrote:
>  > We would only need the extra byte in HOT-updated tuples. 
> Alternatively, we could use the bits we have free in infomask2. There's 
> currently 5 bits free, using just 2 or 3 of those would get us quite 
> far. Or just one, which would be the Tom's suggestion of only using HOT 
> for tables with a single index.
>  >
> 
> We've already used three of those, two for tracking HEAP_ONLY
> and HOT_UPDATED tuples and one for tracking fragmented tuple.

HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have 
bits available for three indexes.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


On Mon, 2007-03-19 at 16:06 +0530, Pavan Deolasee wrote:
> Simon Riggs wrote:
>  >
>  > This problem is solved by moving the wait (for all transactions in
>  > reference snapshot to finish) so that it is now between the first and
>  > second scans, as described.
>  >
>  > During the second Vscan we would prune each block, so the only remaining
>  > tuple in the block when the second scan sees it would be (10,30) and it
>  > would no longer be a HOT tuple - the index would have a pointer to it,
>  > so no new index pointer would be added. The pointer to (10,30) is the
>  > same pointer that was added in the first phase for the tuple (10,20).
>  >
> 
> The problem is that in the first phase, the pointer was inserted
> with key=20 whereas now its changed to 30. So we need to delete the old
> index entry and add a new one.

So don't index HOT tuples in the first phase, wait until the second.
That should be just a single if() test in IndexBuildHeapScan().

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




On Mon, 2007-03-19 at 10:51 +0000, Heikki Linnakangas wrote:
> Pavan Deolasee wrote:
> > Heikki Linnakangas wrote:
> >  > Pavan Deolasee wrote:
> >  > We would only need the extra byte in HOT-updated tuples. 
> > Alternatively, we could use the bits we have free in infomask2. There's 
> > currently 5 bits free, using just 2 or 3 of those would get us quite 
> > far. Or just one, which would be the Tom's suggestion of only using HOT 
> > for tables with a single index.
> >  >
> > 
> > We've already used three of those, two for tracking HEAP_ONLY
> > and HOT_UPDATED tuples and one for tracking fragmented tuple.
> 
> HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have 
> bits available for three indexes.

ISTM that we are getting very close to a great idea here.

I was unwilling to compromise to have HOT if only one index existed, but
IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this
release. (We can always use vertical partitioning techniques to allow
additional access paths to be added to the same table - I'd be very
happy to document that with worked examples, if requried).

I trust that we will think of ways of extending that limit in later
releases.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Ühel kenal päeval, E, 2007-03-19 kell 12:05, kirjutas Simon Riggs:
> On Mon, 2007-03-19 at 10:51 +0000, Heikki Linnakangas wrote:
> > Pavan Deolasee wrote:
> > > Heikki Linnakangas wrote:
> > >  > Pavan Deolasee wrote:
> > >  > We would only need the extra byte in HOT-updated tuples. 
> > > Alternatively, we could use the bits we have free in infomask2. There's 
> > > currently 5 bits free, using just 2 or 3 of those would get us quite 
> > > far. Or just one, which would be the Tom's suggestion of only using HOT 
> > > for tables with a single index.
> > >  >
> > > 
> > > We've already used three of those, two for tracking HEAP_ONLY
> > > and HOT_UPDATED tuples and one for tracking fragmented tuple.
> > 
> > HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have 
> > bits available for three indexes.

But you probably have to do some kind of SUPERFULL VACUUM if you want to
DROP and CREATE the third index. You will probably have to touch all
tuples, regardless of weather they are live or not, or if will be moved
or not, just to kclean ot bits for the just-deleted index.

Maybe a CLUSTER would be an answer here.

> ISTM that we are getting very close to a great idea here.
> 
> I was unwilling to compromise to have HOT if only one index existed, but
> IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this
> release. (We can always use vertical partitioning techniques to allow
> additional access paths to be added to the same table - I'd be very
> happy to document that with worked examples, if requried).

Maybe using more than one TOAST table as means of vertical
partitioning ?

> I trust that we will think of ways of extending that limit in later
> releases.
> 
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)

From
Heikki Linnakangas
Date:
Hannu Krosing wrote:
> Ühel kenal päeval, E, 2007-03-19 kell 12:05, kirjutas Simon Riggs:
>> On Mon, 2007-03-19 at 10:51 +0000, Heikki Linnakangas wrote:
>>> Pavan Deolasee wrote:
>>>> We've already used three of those, two for tracking HEAP_ONLY
>>>> and HOT_UPDATED tuples and one for tracking fragmented tuple.
>>> HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have 
>>> bits available for three indexes.
> 
> But you probably have to do some kind of SUPERFULL VACUUM if you want to
> DROP and CREATE the third index. You will probably have to touch all
> tuples, regardless of weather they are live or not, or if will be moved
> or not, just to kclean ot bits for the just-deleted index.

DROP INDEX wouldn't do anything extra. CREATE INDEX would have to clear 
the bit assigned to the new index, which would mean dirtying every heap 
page in the worst case. As a further optimization, CREATE INDEX could 
skip index inserts for HOT-updated tuples, if the key for new index 
wasn't changed, and leave the flag set.

Vacuum isn't needed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

From
"Merlin Moncure"
Date:
On 3/17/07, Simon Riggs <simon@2ndquadrant.com> wrote:
> I'm very comfortable with the idea that HOT can be turned on/off for a
> table. That gives us a workaround to bugs. Previously, changing things
> like WITHOUT OIDS was done over two releases, so I'd suggest the same
> thing here. Add the option now, disabled, then look to make it the
> default option in the next release. We can override that with the
> default_use_hot parameter for those that feel bold, at least initially.
> I know Bruce has been long opposed to the idea of a table-level switch,
> which is why we've been trying so hard to avoid it. So we should add his
> -1 to this idea from the start.

Is fear of bugs a justification of guc setting?  Or is there a
trade-off involved with HOT?

merlin


On Mon, 2007-03-19 at 10:29 -0500, Merlin Moncure wrote:
> On 3/17/07, Simon Riggs <simon@2ndquadrant.com> wrote:
> > I'm very comfortable with the idea that HOT can be turned on/off for a
> > table. That gives us a workaround to bugs. Previously, changing things
> > like WITHOUT OIDS was done over two releases, so I'd suggest the same
> > thing here. Add the option now, disabled, then look to make it the
> > default option in the next release. We can override that with the
> > default_use_hot parameter for those that feel bold, at least initially.
> > I know Bruce has been long opposed to the idea of a table-level switch,
> > which is why we've been trying so hard to avoid it. So we should add his
> > -1 to this idea from the start.
> 
> Is fear of bugs a justification of guc setting?  

Probably not on its own, but the inspiration was that we currently have
user-visible behaviour in the recent proposals, hence the GUC. 

> Or is there a trade-off involved with HOT?

At the moment, there is no downside to HOT in normal operation that I'm
aware of, but its a great question. 

The problem we have is with normal CREATE INDEX because there are two
sources of race conditions that complicate this: concurrent index scans
and crash safety. Currently there are no perfect solutions to this. We
have two main options:
1. additional locking, either within CIDX or as a separate DDL
2. additional complexity and possible limitation in the number of
indexes to just 3 before we stop doing HOT updates.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

From
"Merlin Moncure"
Date:
On 3/19/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote:
> Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though
> I am not completely convinced that we can do that without much changes
> to CREATE INDEX CONCURRENTLY logic. For example, I believe we still
> need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY.
> Otherwise we might end up creating two paths to the same tuple in
> the new index.
>
> Say, we have a table with two columns (int a, int b). We have an
> index on 'a' and building another index on 'b'. We got a tuple
> (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY,
> this tuple would be indexed. If the tuple is HOT-updated to (10, 30)
> before the first phase ends, the updated tuple would again get
> indexed in the second phase. This would lead to two paths to the
> latest visible tuple from the new index.

just a thought...can you disable HOT on the fly?  why not disable hot
updates completely during these types of operations?.

merlin


Re: Question: pg_class attributes and race conditions ?

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> Tom Lane wrote:
>> What if we only applied
>> HOT to primary-key indexes, so that there was certainly not more than
>> one index per table that the property applies to?
>
> The main objective of HOT is to enable retail vacuum of HOT-updated tuples.
> Doing the above would make it useless for that purpose, at least when there's
> more than one index on the table. Granted, there's a lot of tables with just
> one index out there, but it's a big limitation nevertheless.
>
> An extension of that idea, though is to store a flag per index in the
> HOT-updated tuple. We would then need a mapping between bits in the tuple
> header to indexes, for example as a new column in pg_index.

I had an interesting thought this morning, these bits might less us do retail
vacuum in a lot of cases.

When you traverse an index and find that an index pointer points to a DEAD
tuples you set the LP_DELETE flag. If at that time you set the tuple's bit
indicating that the index pointer for that index then if we find all the bits
are set (and whatever condition we have for ensuring that all indexes are
represented by bits) we know there are no index pointers left and the tuple
can now be retail vacuumed.

I think in order for this to work we may want a rule that we don't have to
dirty a page to set a "index pointer missing" bit though we would certainly
need to dirty it (and wal log) it if we *clear* a bit. In other words a set
bit would be a guarantee that the index pointer was missing but a clear bit
would only be a hint that it might be present.

The main problem with this is that it would necessitate WAL logging setting
the LP_DELETE flag on index pointers which could be a large overhead for a
SELECT.



This interacts with two other proposed changes, HOT and truncating line
pointers, which I think are both valuable. But I think it works with both.

What I would suggest is the following:

When we follow an index pointer, find a (non-truncated) DEAD tuple we truncate
the line pointer, and initialize the length bits to an empty bitmask of "index
pointer missing" flags. In the case of a HOT-updated tuple HOT would have to
provide enough information for us to initialize the bitmask -- it's the same
information that it needs anyways. Then we set the LP_DELETE flag on our own
index pointer (and wal log it) and set the corresponding bit on the page.

If we find an index pointer pointing to a truncated line pointer we set the
flag in the bitmask. If we find that all the bits are set indicating that all
indexes have successfully set their LP_DELETE flag then we the line pointer
can be marked as !LP_USED.

Of course only certain types of indexes would be able to do this, indexes that
only ever have exactly one pointer to every tuple, and which have space for an
LP_DELETED or equivalent flag. I believe currently this includes all except
GIN.



The part I'm most worried about with both this and the equivalent bits for HOT
are maintaining the mapping from index to bit. I think it could be worked out,
but it has to be done carefully. Dropping an index can't ever change the
mapping and creating a new index can't ever leave a tuple with a bit
incorrectly set for that index.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


On Mon, Mar 19, 2007 at 12:05:19PM +0000, Simon Riggs wrote:
> I was unwilling to compromise to have HOT if only one index existed, but
> IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this
> release. (We can always use vertical partitioning techniques to allow
> additional access paths to be added to the same table - I'd be very
> happy to document that with worked examples, if requried).

I'm not sure where we're sitting with this, but I've got another idea I
haven't seen (one that I think is better than an arbitrary limit on the
number of indexes)... what if we just disallow non-concurrent index
builds on hot tables? It sounds like the additional pain involved in
chilling an entire table and keeping it chilled for the index build is
even more overhead than just doing a concurrent index build.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Jim C. Nasby wrote:
> On Mon, Mar 19, 2007 at 12:05:19PM +0000, Simon Riggs wrote:
>   
>> I was unwilling to compromise to have HOT if only one index existed, but
>> IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this
>> release. (We can always use vertical partitioning techniques to allow
>> additional access paths to be added to the same table - I'd be very
>> happy to document that with worked examples, if requried).
>>     
>
> I'm not sure where we're sitting with this, but I've got another idea I
> haven't seen (one that I think is better than an arbitrary limit on the
> number of indexes)... what if we just disallow non-concurrent index
> builds on hot tables? It sounds like the additional pain involved in
> chilling an entire table and keeping it chilled for the index build is
> even more overhead than just doing a concurrent index build.
>   
I thought about making it even simpler.  Only allow CREATE INDEX builds 
on non HOT tables.  However as I mentioned in another thread, this idea 
dies if you expect to be able to have HOT enabled by default in any 
future release.  Chilling needs to be able to be done with a regular 
Vacuum style lock for chilling to be a usable reality.

I'm sure there are use cases or this, but it seems unlikely that a high 
update table is going to have an index added to it.  Am I a long way 
from reality when saying that?


Regards

Russell Smith