Thread: Reducing the memory footprint of large sets of pending triggers

Reducing the memory footprint of large sets of pending triggers

From
Tom Lane
Date:
We've occasionally talked about allowing pending-trigger-event lists to
spill to disk when there get to be so many events that it's a memory
problem.  I'm not especially interested in doing that right now, but
I noticed recently that we could alleviate the problem a lot by adopting
a more compact representation.

Currently, each event is a separately palloc'd instance of struct
AfterTriggerEventData.  On a 32-bit machine that struct takes 32 bytes,
plus 8 bytes palloc overhead = 40 bytes.  On a 64-bit machine the struct
takes 36 bytes, but palloc rounds that up to 64 bytes, plus there's 16
bytes palloc overhead = 80 bytes :-(.

I see several things we could do here:

* Allocate the event structs in reasonably-large arrays instead of
separate palloc chunks.  This would require some data copying where we
now get away with pointer-swinging --- but on the other hand per-event
palloc'ing isn't exactly free either, so I suspect that this would net
out to a wash if not an actual speedup.

* Don't store the ate_tgoid and ate_relid fields in each individual
event struct.  Instead keep a separate array with one instance of these
values for each distinct trigger that's been fired in the current
transaction (in most cases this list should be pretty short, even if
there are many events).  We can commandeer the high order bits of
ate_event to store an index into that array.  Currently only 8 bits
of ate_event are actually used, so we'd have room for 16 million
distinct triggers fired in a transaction.  Even if we need a few more
ate_event flag bits later, I don't see a problem there.

* Don't store two ItemPointers in insert or delete events.  This would
make the array element stride variable, but since we don't need random
access into the arrays AFAICS, that doesn't seem to be a problem.

In combination these changes would get us down to 16 bytes per
insert/delete and 20 per update event, which represents a factor of 2
or 2.5 savings on a 32-bit machine and a factor of 4 or 5 on a 64-bit
machine.  Seems worth doing to me, especially since it looks like
only about a 1-day project touching only a single source file.

It might be possible to go further and move the event status bits and
firing_id into the separate array, which would save a further four bytes
per event in the typical situation that a lot of events of the same
trigger are queued by a single command.  I think I'd want to tackle that
as a follow-on patch though, because it would be a change in the data
structure semantics not just rearranging the representation a tad.

Comments, better ideas?
        regards, tom lane


Re: Reducing the memory footprint of large sets of pending triggers

From
Simon Riggs
Date:
On Thu, 2008-10-23 at 21:32 -0400, Tom Lane wrote:

> We've occasionally talked about allowing pending-trigger-event lists to
> spill to disk when there get to be so many events that it's a memory
> problem.  I'm not especially interested in doing that right now, but
> I noticed recently that we could alleviate the problem a lot by adopting
> a more compact representation.

This change goes in the right direction, whereas the spill to disk
probably doesn't, except for certain cases.

A much better objective would be to remove duplicate trigger calls, so
there isn't any build up of trigger data in the first place. That would
apply only to immutable functions. RI checks certainly fall into that
category.

I mention this now in case you have a bright idea about how to do that
quickly and cheaply. Otherwise I'll be looking into that in the next dev
cycle.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Reducing the memory footprint of large sets of pending triggers

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> A much better objective would be to remove duplicate trigger calls, so
> there isn't any build up of trigger data in the first place. That would
> apply only to immutable functions. RI checks certainly fall into that
> category.

They're hardly "duplicates": each event is for a different tuple.

For RI checks, once you get past a certain percentage of the table it'd
be better to throw away all the per-tuple events and do a full-table
verification a la RI_Initial_Check().  I've got no idea about a sane
way to make that happen, though.
        regards, tom lane


Re: Reducing the memory footprint of large sets of pending triggers

From
Simon Riggs
Date:
On Sat, 2008-10-25 at 08:48 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > A much better objective would be to remove duplicate trigger calls, so
> > there isn't any build up of trigger data in the first place. That would
> > apply only to immutable functions. RI checks certainly fall into that
> > category.
> 
> They're hardly "duplicates": each event is for a different tuple.

That's what makes it hard; we may find the same trigger parameter values
but on different tuples.

> For RI checks, once you get past a certain percentage of the table it'd
> be better to throw away all the per-tuple events and do a full-table
> verification a la RI_Initial_Check().  I've got no idea about a sane
> way to make that happen, though.

Me neither, yet.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Reducing the memory footprint of large sets of pending triggers

From
Gregory Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Simon Riggs <simon@2ndQuadrant.com> writes:
>> A much better objective would be to remove duplicate trigger calls, so
>> there isn't any build up of trigger data in the first place. That would
>> apply only to immutable functions. RI checks certainly fall into that
>> category.
>
> They're hardly "duplicates": each event is for a different tuple.
>
> For RI checks, once you get past a certain percentage of the table it'd
> be better to throw away all the per-tuple events and do a full-table
> verification a la RI_Initial_Check().  I've got no idea about a sane
> way to make that happen, though.

One idea I had was to accumulate the data in something like a tuplestore and
then perform the RI check as a join between a materialize node and the target
table. Then we could use any join type whether a hash join, nested loop, merge
join, etc depending on how many there on each side are and how many are
distinct values.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!