Thread: AW: Coping with huge deferred-trigger lists

AW: Coping with huge deferred-trigger lists

From
Zeugswetter Andreas SB
Date:
> Perhaps instead
> of storing each and every trigger-related tuple in memory, we only need
> to store one value per affected table: the lowest CTID of any tuple
> that we need to revisit for deferred-trigger purposes.  At the end of
> the transaction, scan forward from that point to the end of the table,
> looking for tuples that were inserted by the current xact.

I thought that this current placing of new rows at end of file is subject to 
change soon (overwrite smgr) ?

I thus think it would be better to remember all ctids per table.
The rest imho sounds great.

Andreas


Re: AW: Coping with huge deferred-trigger lists

From
Tom Lane
Date:
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
>> Perhaps instead
>> of storing each and every trigger-related tuple in memory, we only need
>> to store one value per affected table: the lowest CTID of any tuple
>> that we need to revisit for deferred-trigger purposes.  At the end of
>> the transaction, scan forward from that point to the end of the table,
>> looking for tuples that were inserted by the current xact.

> I thought that this current placing of new rows at end of file is subject to 
> change soon (overwrite smgr) ?

Well, the scheme would still *work* if rows were not always placed at
the end of file, though it might get inefficient.  But you're right, the
merits of this trigger idea depend a lot on whether we decide to go to
an overwriting smgr, and so we should probably wait till that's decided
before we think about doing this.  I just wanted to get the idea
recorded before I forgot about it.

BTW, I don't think the overwriting-smgr idea is a done deal.  We haven't
seen any design yet for exactly how it should work.  Moreover, I'm
really hesitant to throw away one of the fundamental design choices of
Postgres: overwriting smgr is one of the things that got us to where we
are today.  Before we commit to that, we ought to do some serious study
of the alternatives.  ISTM the problem with VACUUM is not that you need
to do a regular maintenance procedure, it's that it grabs an exclusive
lock on the table for so long.  We could live with VACUUM if it could be
made either incremental (do a few pages and release the lock) or capable
of running in parallel with reader & writer transactions.  Vadim's
still-not-integrated LAZY VACUUM code is an indicator that progress
might be made in that direction.  (Actually, I suppose if you look at it
in the right way, you might think that a backgroundable VACUUM *is* an
overwriting smgr, just an asynchronous implementation of it...)

> I thus think it would be better to remember all ctids per table.

If we do that then we still have a problem with overrunning memory
after a sufficiently large number of tuples.  However, that'd improve
the constant factor by at least an order of magnitude, so it might be
worth doing as an intermediate step.  Still have to figure out whether
the triggered-data-change business is significant or not.
        regards, tom lane


Re: AW: Coping with huge deferred-trigger lists

From
Bruce Momjian
Date:
> BTW, I don't think the overwriting-smgr idea is a done deal.  We haven't
> seen any design yet for exactly how it should work.  Moreover, I'm
> really hesitant to throw away one of the fundamental design choices of
> Postgres: overwriting smgr is one of the things that got us to where we
> are today.  Before we commit to that, we ought to do some serious study
> of the alternatives.  ISTM the problem with VACUUM is not that you need
> to do a regular maintenance procedure, it's that it grabs an exclusive
> lock on the table for so long.  We could live with VACUUM if it could be
> made either incremental (do a few pages and release the lock) or capable
> of running in parallel with reader & writer transactions.  Vadim's
> still-not-integrated LAZY VACUUM code is an indicator that progress
> might be made in that direction.  (Actually, I suppose if you look at it
> in the right way, you might think that a backgroundable VACUUM *is* an
> overwriting smgr, just an asynchronous implementation of it...)

I agree overwriting storage manager is not a done deal, and I don't see
us eliminating it entirely.  We have to keep the old tuples in scope, so
I assume we would just create new tuples, and reuse the expired tuples
once they were out of scope.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: AW: Coping with huge deferred-trigger lists

From
Hannu Krosing
Date:
Tom Lane wrote:
> 
> BTW, I don't think the overwriting-smgr idea is a done deal.  We haven't
> seen any design yet for exactly how it should work.  Moreover, I'm
> really hesitant to throw away one of the fundamental design choices of
> Postgres: overwriting smgr is one of the things that got us to where we
> are today.  Before we commit to that, we ought to do some serious study
> of the alternatives.  ISTM the problem with VACUUM is not that you need
> to do a regular maintenance procedure, it's that it grabs an exclusive
> lock on the table for so long.  We could live with VACUUM if it could be
> made either incremental (do a few pages and release the lock) or capable
> of running in parallel with reader & writer transactions.  Vadim's
> still-not-integrated LAZY VACUUM code is an indicator that progress
> might be made in that direction.  (Actually, I suppose if you look at it
> in the right way, you might think that a backgroundable VACUUM *is* an
> overwriting smgr, just an asynchronous implementation of it...)

And it allows the writes that need to be done quickly to be kept
together
and the slow part to be asynchronous. I suspect that we will never be
able 
to get very good statistics without separate ANALYZE so we will have 
asynchronous processes anyhow.

Also, we might want to get time travel back sometime, which I guess is
still 
done most effectively with current scheme + having VACUUM keeping some
history 
on a per-table basis.

Other than that time travel only ;) needs recording wall-clock-time of
commits 
that have modified data + some extended query features.

the (wall-clock-time,xid) table is naturally ordered by said
wall-clock-time so 
it won't even need index, just a binary search access method.

------------------
Hannu


Re: AW: Coping with huge deferred-trigger lists

From
Stephan Szabo
Date:
> If we do that then we still have a problem with overrunning memory
> after a sufficiently large number of tuples.  However, that'd improve
> the constant factor by at least an order of magnitude, so it might be
> worth doing as an intermediate step.  Still have to figure out whether
> the triggered-data-change business is significant or not.

I think that was part of the misunderstanding of the spec.  I think the
spec means it to be within one statement (and its associated immediate
actions) rather than rest of transaction.  I think it's mostly to
prevent loop cases A row 1 modifies B row 1 modifies A row 1 modifies ... 
However, I only looked at it briefly a while back.



Re: AW: Coping with huge deferred-trigger lists

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
> >> Perhaps instead
> >> of storing each and every trigger-related tuple in memory, we only need
> >> to store one value per affected table: the lowest CTID of any tuple
> >> that we need to revisit for deferred-trigger purposes.  At the end of
> >> the transaction, scan forward from that point to the end of the table,
> >> looking for tuples that were inserted by the current xact.
> 
> > I thought that this current placing of new rows at end of file is subject to
> > change soon (overwrite smgr) ?
> 
> Well, the scheme would still *work* if rows were not always placed at
> the end of file, though it might get inefficient.

Even under current smgr, new rows aren't necessarily at the end.

[snip]

> 
> BTW, I don't think the overwriting-smgr idea is a done deal.  We haven't
> seen any design yet for exactly how it should work.  Moreover, I'm
> really hesitant to throw away one of the fundamental design choices of
> Postgres: overwriting smgr is one of the things that got us to where we
> are today. 

I don't think we could/should introduce an overwriting smgr
in 7.2 unless we give up current level of stablitity/
reliability. We don't have an UNDO functionality yet even
under current simple no overwrite smgr.

> Before we commit to that, we ought to do some serious study
> of the alternatives.  ISTM the problem with VACUUM is not that you need
> to do a regular maintenance procedure, it's that it grabs an exclusive
> lock on the table for so long.  We could live with VACUUM if it could be
> made either incremental (do a few pages and release the lock) or capable
> of running in parallel with reader & writer transactions.  Vadim's
> still-not-integrated LAZY VACUUM code is an indicator that progress

> might be made in that direction.  (Actually, I suppose if you look at it
> in the right way, you might think that a backgroundable VACUUM *is* an
> overwriting smgr, just an asynchronous implementation of it...)
> 

The backgroundable VACUUM, reuse dead space etc .. could never
be an overwriting smgr. When a tuple is updated corresponding
index tuples must always be inserted. 

regrads,
Hiroshi Inoue


Re: AW: Coping with huge deferred-trigger lists

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> I thought that this current placing of new rows at end of file is subject to
>> change soon (overwrite smgr) ?

> Even under current smgr, new rows aren't necessarily at the end.

Hmm ... you're right, heap_update will try to store an updated tuple on
the same page as its original.

That doesn't make my suggestion unworkable, however, since this case is
not very likely to occur except on pages at/near the end of file.  One
way to deal with it is to keep a list of pages (still not individual
tuples) that contain tuples we need to revisit for deferred triggers.
The list would be of the form "scan these individual pages plus all
pages from point X to the end of file", where point X would be at or
perhaps a little before the end of file as it stood at the start of the
transaction.  We'd only need to explicitly store the page numbers for
relatively few pages, usually.

BTW, thanks for pointing that out --- it validates my idea in another
thread that we can avoid locking on every single call to
RelationGetBufferForTuple, if it's OK to store newly inserted tuples
on pages that aren't necessarily last in the file.
        regards, tom lane