Re: HOT patch - version 14 - Mailing list pgsql-patches

From Pavan Deolasee
Subject Re: HOT patch - version 14
Date
Msg-id 2e78013d0708300904g10920497k5037ff8e3aa6ba3d@mail.gmail.com
Whole thread Raw
In response to Re: HOT patch - version 14  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: HOT patch - version 14  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches


On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> You are right - a new index might mean that an existing HOT chain
> is broken as far as the new index is concerned. The way we address
> that is by indexing the root tuple of the chain, but the index key is
> extracted from the last tuple in the chain. The index is marked "unusable"
> for all those existing transactions which can potentially see any
> intermediate tuples in the chain.

I don't think that works --- what if the last tuple in the chain isn't
committed good yet?  If its inserter ultimately rolls back, you've
indexed the wrong value.


I am confused. How could we get ShareLock on the relation while
there is some open transaction which has inserted a tuple in the
table ? (Of course, I am not considering the system tables here)
If the transaction which inserted the last tuple in the chain is already
aborted, we are not indexing that tuple (even if that tuple is at the
end). We would rather index the last committed-good tuple in the
chain. Running the tuples with HeapTupleSatisfiesVacuum guarantees
that. Isn't it ?

> Please see this document written by Greg Stark. He has nicely summarized
> how CREATE INDEX and CREATE INDEX CONCURRENTLY works with HOT.
> http://archives.postgresql.org/pgsql-patches/2007-07/msg00360.php

Isn't the extra machination for C.I.C. just useless complication?
What's the point of avoiding creation of new broken HOT chains when
you still have to deal with existing ones?


IMHO the extra step in C.I.C simplifies the index build. The transaction-waits
takes care of the existing broken chains and the early catalog entry for
the index helps us avoid creating new broken chains. I am not against
doing it a different way, but this is the best solution we could come up
when we worked on CIC.
 

>> I also don't think I
>> believe the reasoning for not indexing DELETE_IN_PROGRESS hot-updated
>> tuples: what if the index completion commits, but the concurrent delete
>> rolls back?  Then you've got a valid tuple that's not in the index.

> Since CREATE INDEX works with ShareLock on the relation, we can
> safely assume that we can't find any DELETE_IN_PROGRESS tuples except
> those deleted by our own transaction. The only exception is system relation,
> but we don't do HOT updates on system relation.

That chain of reasoning is way too shaky for me.  Essentially what
you're saying is you'll promise not to corrupt non-system indexes.
Nor am I really thrilled about having to disable HOT for system
catalogs.


I am not against supporting HOT for system catalogs. But IMHO
its not a strict requirements  because  system catalogs are small, less
frequently updated and HOT adds little value to them. If we don't have
a generic solution which works for system and non-system tables,
thats the best we can get. We can start with non-system
tables and expand its scope later.


> The only reason to redefine MaxHeapTuplesPerPage to higher side is
> because HOT allows us to accommodate more tuples per page because
> of redirect-dead line pointers.

No, it doesn't allow more tuples per page.  It might mean there can be
more line pointers than that on a page, but not more actual tuples.
The question is whether there is any real use in allowing more line
pointers than that --- the limit is already unrealistically high,
since it assumes no data content in any of the tuples.  If there is a
rationale for it then you should invent a different constant
MaxLinePointersPerPage or some such, but I really think there's no
point.


I agree. I think the current limit on MaxHeapTuplesPerPage is sufficiently
large. May be we can keep it the same and tune it later if we have numbers
to prove its worthiness.
 

> Doubling the value is chosen as a balance between heap page
> utilization, line pointer bloating and overhead for bitmap scans.

I'd say it allows a ridiculous amount of line pointer bloat.


OK. Lets keep MaxHeapTuplesPerPage unchanged.
 

>> Even if it's safe, ISTM what you're mostly accomplishing there is to
>> expend a lot of cycles while holding exclusive lock on the page, when
>> there is good reason to think that you're blocking other people who are
>> interested in using the page.  Eliminating the separation between that
>> and cleanup would also allow eliminating the separate "PD_FRAGMENTED"
>> page state.

> The reason we did it that way because repairing fragmentation seems
> much more costly that pruning. Please note that we prune a single
> chain during index fetch. Its only for heap-scans (and VACUUM) that
> we try to prune all chains in the page. So unless we are doing heap-scan,
> I am not sure if we are spending too much time holding the exclusive
> lock. I agree we don't have any specific numbers to prove that though.

If you don't have numbers proving that this extra complication has a
benefit, I'd vote for simplifying it.  The SnapshotAny case is going to
bite other people besides you in future.


OK.  So if I get you correctly, you are suggesting to acquire cleanup lock.
If we don't get that, we don't to any maintenance work. Otherwise, we prune
and repair fragmentation in one go.

A related question is: should we always  prune all chains in the
page ? I guess if we are going to repair fragmentation, it would make
more sense to do that.

> Another reasoning behind  separating these two steps is:  pruning
> requires exclusive lock whereas repairing fragmentation requires
> cleanup lock.

This is nonsense.  Those are the same lock.  If you have the former and
not the latter, it just means that you *know* there is contention for
the page.  It seems to me that performing optional maintenance work in
such a situation is completely wrong.


Oh yes, they are the same lock. The difference is the chances of getting
one is more than the other. But I agree with your argument about the contention
and maintenance work. I think we can do what you are suggesting and
then fine tune it.


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_dump --no-tablespaces patch
Next
From: Andrew Dunstan
Date:
Subject: Re: pg_dump --no-tablespaces patch