Greg Sabino Mullane wrote:
> A few things spring to mind:
>
> 1) Use a separate table, rather than storing things inside of
> dataset itself. This will reduce the activity on the dataset table.
A separate table just for that one column? Would that really help,
given that I'd have to add the foreign key dataset_id to the related
table? How does splitting activity across dataset and, say,
dataset_counts help things?
> 2) Do you really need bigint for the counts?
Probably not. Still, changing to INTEGER hasn't changed the performance
in any measurable way.
> 3) If you do want to do this, you'll need a different approach as
> Tom mentioned. One way to do this is to have a special method for
> bulk loading, that gets around the normal updates and requires that
> the user take responsiblity for knowing when and how to call the
> alternate path. The basic scheme is this:
>
> 1. Disable the normal triggers
> 2. Enable special (perl) triggers that keep the count in memory
> 3. Do the bulk changes
> 4. Enable normal triggers, disable special perl one
> 5. Do other things as needed....
> 6. Commit the changes to the assoc_count field.
I gave this a shot, and my initial testing looks very promising. Using
your scheme, the performance looks to be just as good as without the
trigger.
I haven't done any kind of edge case testing to see if weird things
happen when multiple datasets are added simultaneously, or when someone
inserts an assoc record out-of-band while a bulk dataset load is
happening, but you've certainly got me well on my way to a workable
solution. Many thanks!
There's one part I don't get, though...
> Number 6 can be done anytime, as long as you are in the same session. The danger
> is in leaving the session without calling the final function. This can be
> solved with some deferred FK trickery, or by careful scripting of the events.
> All this doesn't completely remove the pain, but it may shift it around enough
> in useful ways for your app.
I'm not sure I understand the danger you're talking about here. Doesn't
putting the whole start_bulkload_assoc_count(), bulk insert, and
end_bulkload_assoc_count() process in a transaction save me from any of
these problems? Or is there some more subtle problem I'm not seeing?