On Wed, 2008-05-28 at 22:45 +0100, Simon Riggs wrote:
> On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote:
> > Gregory Stark <stark@enterprisedb.com> writes:
> > > "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> > >> This is expected to take lots of memory because each row-requiring-check
> > >> generates an entry in the pending trigger event list.
> >
> > > Hm, it occurs to me that we could still do a join against the pending event
> > > trigger list... I wonder how feasible it would be to store the pending trigger
> > > event list in a temporary table instead of in ram.
> >
> > We could make that list spill to disk, but the problem remains that
> > verifying the rows one at a time will take forever.
> >
> > The idea that's been kicked around occasionally is that once you get
> > past N pending events, throw them all away and instead queue a single
> > operation to do a bulk verify (just like initial establishment of the
> > FK constraint). I'm not sure how to do the queue management for this
> > though.
>
> Neither of those approaches is really suitable. Just spilling to disk is
> O(N) of the number of rows loaded, the second one is O(N) at least on
> the number of rows (loaded + existing). The second one doesn't help
> either since if the table was empty you'd have added the FK afterwards,
> so we must assume there is already rows in there and in most cases rows
> already loaded will exceed those being added by the bulk operation.
>
> AFAICS we must aggregate the trigger checks. We would need a special
> property of triggers that allowed them to be aggregated when two similar
> checks arrived. We can then use hash aggregation to accumulate them. We
> might conceivably need to spill to disk also, since the aggregation may
> not always be effective.
Can't we just do the checks for the FKs accumulated at the point they
don't fit in memory, instead of spilling to disk ?
> But in most cases the tables against which FK
> checks are made are significantly smaller than the tables being loaded.
> Once we have hash aggregated them, that is then the first part of a hash
> join to the target table.
>
> We certainly need a TODO item for "improve RI checks during bulk
> operations".
Agreed.
----------------
Hannu