Thread: Reducing the memory footprint of large sets of pending triggers
We've occasionally talked about allowing pending-trigger-event lists to spill to disk when there get to be so many events that it's a memory problem. I'm not especially interested in doing that right now, but I noticed recently that we could alleviate the problem a lot by adopting a more compact representation. Currently, each event is a separately palloc'd instance of struct AfterTriggerEventData. On a 32-bit machine that struct takes 32 bytes, plus 8 bytes palloc overhead = 40 bytes. On a 64-bit machine the struct takes 36 bytes, but palloc rounds that up to 64 bytes, plus there's 16 bytes palloc overhead = 80 bytes :-(. I see several things we could do here: * Allocate the event structs in reasonably-large arrays instead of separate palloc chunks. This would require some data copying where we now get away with pointer-swinging --- but on the other hand per-event palloc'ing isn't exactly free either, so I suspect that this would net out to a wash if not an actual speedup. * Don't store the ate_tgoid and ate_relid fields in each individual event struct. Instead keep a separate array with one instance of these values for each distinct trigger that's been fired in the current transaction (in most cases this list should be pretty short, even if there are many events). We can commandeer the high order bits of ate_event to store an index into that array. Currently only 8 bits of ate_event are actually used, so we'd have room for 16 million distinct triggers fired in a transaction. Even if we need a few more ate_event flag bits later, I don't see a problem there. * Don't store two ItemPointers in insert or delete events. This would make the array element stride variable, but since we don't need random access into the arrays AFAICS, that doesn't seem to be a problem. In combination these changes would get us down to 16 bytes per insert/delete and 20 per update event, which represents a factor of 2 or 2.5 savings on a 32-bit machine and a factor of 4 or 5 on a 64-bit machine. Seems worth doing to me, especially since it looks like only about a 1-day project touching only a single source file. It might be possible to go further and move the event status bits and firing_id into the separate array, which would save a further four bytes per event in the typical situation that a lot of events of the same trigger are queued by a single command. I think I'd want to tackle that as a follow-on patch though, because it would be a change in the data structure semantics not just rearranging the representation a tad. Comments, better ideas? regards, tom lane
On Thu, 2008-10-23 at 21:32 -0400, Tom Lane wrote: > We've occasionally talked about allowing pending-trigger-event lists to > spill to disk when there get to be so many events that it's a memory > problem. I'm not especially interested in doing that right now, but > I noticed recently that we could alleviate the problem a lot by adopting > a more compact representation. This change goes in the right direction, whereas the spill to disk probably doesn't, except for certain cases. A much better objective would be to remove duplicate trigger calls, so there isn't any build up of trigger data in the first place. That would apply only to immutable functions. RI checks certainly fall into that category. I mention this now in case you have a bright idea about how to do that quickly and cheaply. Otherwise I'll be looking into that in the next dev cycle. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > A much better objective would be to remove duplicate trigger calls, so > there isn't any build up of trigger data in the first place. That would > apply only to immutable functions. RI checks certainly fall into that > category. They're hardly "duplicates": each event is for a different tuple. For RI checks, once you get past a certain percentage of the table it'd be better to throw away all the per-tuple events and do a full-table verification a la RI_Initial_Check(). I've got no idea about a sane way to make that happen, though. regards, tom lane
On Sat, 2008-10-25 at 08:48 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > A much better objective would be to remove duplicate trigger calls, so > > there isn't any build up of trigger data in the first place. That would > > apply only to immutable functions. RI checks certainly fall into that > > category. > > They're hardly "duplicates": each event is for a different tuple. That's what makes it hard; we may find the same trigger parameter values but on different tuples. > For RI checks, once you get past a certain percentage of the table it'd > be better to throw away all the per-tuple events and do a full-table > verification a la RI_Initial_Check(). I've got no idea about a sane > way to make that happen, though. Me neither, yet. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Tom Lane <tgl@sss.pgh.pa.us> writes: > Simon Riggs <simon@2ndQuadrant.com> writes: >> A much better objective would be to remove duplicate trigger calls, so >> there isn't any build up of trigger data in the first place. That would >> apply only to immutable functions. RI checks certainly fall into that >> category. > > They're hardly "duplicates": each event is for a different tuple. > > For RI checks, once you get past a certain percentage of the table it'd > be better to throw away all the per-tuple events and do a full-table > verification a la RI_Initial_Check(). I've got no idea about a sane > way to make that happen, though. One idea I had was to accumulate the data in something like a tuplestore and then perform the RI check as a join between a materialize node and the target table. Then we could use any join type whether a hash join, nested loop, merge join, etc depending on how many there on each side are and how many are distinct values. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!