Thread: Performance-improvement idea: shortcircuit unique-index checks

Performance-improvement idea: shortcircuit unique-index checks

From
Tom Lane
Date:
Over the weekend I noticed that Tatsuo's pgbench benchmark seems to
spend an awful lot of its time down inside _bt_check_unique.  This
happens because with table declarations like

create table accounts(aid int, primary key(aid), bid int,        abalance int, filler char(84)) 

and commands like

update accounts set abalance = abalance + x where aid = y

the "update" is inserting a new tuple and the index on aid wants to
make sure this insertion doesn't violate the uniqueness constraint.
To do that, it has to visit all active *and dead* tuples with the same
aid index value, to make sure they're all deleted or being deleted by
the current transaction.  That's expensive if they're scattered all over
the table.

However, since we have not changed the aid column from its prior value,
it seems like this check is wasted effort.  We should be able to deduce
that if the prior state of the row was OK then this one is too.

I'm not quite sure how to implement this, but I wanted to toss the idea
out for discussion.  Probably we'd have to have some cooperation between
the heap_update level (where the fact that it's an update is known, and
where we'd have a chance to test for changes in particular columns) and
the index access level.  Maybe it's wrong for the index access level to
have primary responsibility for uniqueness checks in the first place.

Obviously this isn't going to happen for 7.1, but it might make a nice
performance improvement for 7.2.

Comments?
        regards, tom lane


Re: Performance-improvement idea: shortcircuit unique-index checks

From
Bruce Momjian
Date:
> I'm not quite sure how to implement this, but I wanted to toss the idea
> out for discussion.  Probably we'd have to have some cooperation between
> the heap_update level (where the fact that it's an update is known, and
> where we'd have a chance to test for changes in particular columns) and
> the index access level.  Maybe it's wrong for the index access level to
> have primary responsibility for uniqueness checks in the first place.
> 
> Obviously this isn't going to happen for 7.1, but it might make a nice
> performance improvement for 7.2.

Seems a better solution would be to put a 'deleted' bit in the index so
we would have to visit those heap tuples only once for a committed
status.  Similar to what we do with heap tuples so we don't have to
visit pg_log repeatedly.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance-improvement idea: shortcircuit unique-index checks

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Seems a better solution would be to put a 'deleted' bit in the index so
> we would have to visit those heap tuples only once for a committed
> status.  Similar to what we do with heap tuples so we don't have to
> visit pg_log repeatedly.

That's only a partial solution, since the index is still going to have
to visit the row's existing tuple (which is, by definition, not yet
committed dead).  My point is that the index scanning done for
uniqueness checks can be eliminated *entirely* for one pretty-common
case.

A deleted bit in index entries might be useful too, but I think it
attacks a different set of cases.
        regards, tom lane


Re: Performance-improvement idea: shortcircuit unique-index checks

From
Stephan Szabo
Date:
On Mon, 19 Feb 2001, Tom Lane wrote:

> I'm not quite sure how to implement this, but I wanted to toss the idea
> out for discussion.  Probably we'd have to have some cooperation between
> the heap_update level (where the fact that it's an update is known, and
> where we'd have a chance to test for changes in particular columns) and
> the index access level.  Maybe it's wrong for the index access level to
> have primary responsibility for uniqueness checks in the first place.
> 
> Obviously this isn't going to happen for 7.1, but it might make a nice
> performance improvement for 7.2.
> 
> Comments?

This sounds like a win for alot of updates where keys don't change.

Also, if work is going to be done here, it might be nice to make the
unique constraint have the correct semantics for checking after statement
rather than per-row when multiple rows are changed in the same statement
since I'm pretty sure the standard semantics is that as long as the
rows are different at the end of the statement it's okay (which is
not what we do currently AFAICS).  I'm really not sure what's involved in
that though.



Re: Performance-improvement idea: shortcircuit unique-index checks

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Seems a better solution would be to put a 'deleted' bit in the index so
> > we would have to visit those heap tuples only once for a committed
> > status.  Similar to what we do with heap tuples so we don't have to
> > visit pg_log repeatedly.
> 
> That's only a partial solution, since the index is still going to have
> to visit the row's existing tuple (which is, by definition, not yet
> committed dead).  My point is that the index scanning done for
> uniqueness checks can be eliminated *entirely* for one pretty-common
> case.

I see.

> 
> A deleted bit in index entries might be useful too, but I think it
> attacks a different set of cases.

Yes.  Let me add some TODO items:
* Add deleted bit to index tuples to reduce heap access   * Prevent index uniqueness checks when UPDATE does not modify
column

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Performance-improvement idea: shortcircuit unique-indexchecks

From
Hiroshi Inoue
Date:
Bruce Momjian wrote:
> 
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> Yes.  Let me add some TODO items:
> 
>         * Add deleted bit to index tuples to reduce heap access

ISTM this isn't a bad idea. However note that there remains only
1 bit unused in IndexTupleData.

Regards,
Hiroshi Inoue