Thread: Triggers During COPY

Triggers During COPY

From
Thomas F.O'Connell
Date:
I'm involved in an implementation of doing trigger-based counting as a
substitute for count( * ) in real time in an application. My
trigger-based counts seem to be working fine and dramatically improve
the performance of the display of the counts in the application layer.

The problem comes in importing new data into the tables for which the
counts are maintained. The current import process does some
preprocessing and then does a COPY from the filesystem to one of the
tables on which counts are maintained. This means that for each row
being inserted by COPY, a trigger is fired. This didn't seem like a big
deal to me until testing began on realistic data sets.

For a 5,000-record import, preprocessing plus the COPY took about 5
minutes. Once the triggers used for maintaining the counts were added,
this grew to 25 minutes. While I knew there would be a slowdown per row
affected, I expected something closer to 2x than to 5x.

It's not unrealistic for this system to require data imports on the
order of 100,000 records. Whereas this would've taken at most an hour
and a half before (preprocessing takes a couple of minutes, so the
actual original COPY takes closer to 2-3 minutes, or just over 1500
rows per minute), the new version is likely to take more than 7 hours,
which seems unreasonable to me. Additionally, the process is fairly CPU
intensive.

I've examined the plans, and, as far as I can tell, the trigger
functions are being prepared and using the indexes on the involved
tables, which are hundreds of thousands of rows in the worst cases. The
basic structure of the functions is a status lookup SELECT (to
determine whether a count needs to be updated and which one) and one or
two UPDATE statements (depending on whether both an increment and a
decrement need to be performed). As I said, it looks like this basic
format is using indexes appropriately.

Is there anything I could be overlooking that would tweak some more
performance out of this scenario?

Would it be absurd to drop the triggers during import and recreate them
afterward and update the counts in a summary update based on
information from the import process?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005


Re: Triggers During COPY

From
Josh Berkus
Date:
Thomas,

> Would it be absurd to drop the triggers during import and recreate them
> afterward and update the counts in a summ> ary update based on
> information from the import process?

That's what I'd do.

Also, might I suggest storing the counts in memcached (see the pgmemached
project on pgFoundry) rather than in a table?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Triggers During COPY

From
Mark Kirkwood
Date:
Thomas F.O'Connell wrote:
>
> The problem comes in importing new data into the tables for which the
> counts are maintained. The current import process does some
> preprocessing and then does a COPY from the filesystem to one of the
> tables on which counts are maintained. This means that for each row
> being inserted by COPY, a trigger is fired. This didn't seem like a big
> deal to me until testing began on realistic data sets.
>
> For a 5,000-record import, preprocessing plus the COPY took about 5
> minutes. Once the triggers used for maintaining the counts were added,
> this grew to 25 minutes. While I knew there would be a slowdown per row
> affected, I expected something closer to 2x than to 5x.
> rformance out of this scenario?
>
Have been seeing similar behavior whilst testing sample code for the 8.0
docs (summary table plpgsql trigger example).

I think the nub of the problem is dead tuples bloat in the summary /
count table, so each additional triggered update becomes more and more
expensive as time goes on. I suspect the performance decrease is
exponential with the no of rows to be processed.


> Would it be absurd to drop the triggers during import and recreate them
> afterward and update the counts in a summary update based on information
> from the import process?
>
>
That's the conclusion I came to :-)

regards

Mark


Re: Triggers During COPY

From
Thomas F.O'Connell
Date:
I forgot to mention that I'm running 7.4.6. The README includes the
caveat that pgmemcache is designed for use with 8.0. My instinct is to
be hesitant using something like that in a production environment
without some confidence that people have done so with good and reliable
success or without more extensive testing than I'm likely to have time
for primarily because support for 7.4.x is never likely to increase.
Thanks for the tip, though.

For the time being, it sounds like I'll probably try to implement the
drop/create trigger setup during import.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jan 27, 2005, at 11:41 PM, Josh Berkus wrote:

> Thomas,
>
>> Would it be absurd to drop the triggers during import and recreate
>> them
>> afterward and update the counts in a summ> ary update based on
>> information from the import process?
>
> That's what I'd do.
>
> Also, might I suggest storing the counts in memcached (see the
> pgmemached
> project on pgFoundry) rather than in a table?
>
> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco


Re: Triggers During COPY

From
Thomas F.O'Connell
Date:
As far as dropping/recreating triggers, there seem to be two strategies:

1. Perform the drop-import-create operation in a transaction, thereby
guaranteeing the accuracy of the counts but presumably locking the
table during the operation, which could take many minutes (up to an
hour or two) in extreme cases.

2. Drop the triggers, import, create the triggers, and update with the
import count, recognizing that other updates could've occurred without
accumulating updates during the import process, then later (nightly,
maybe?) do a full update to recalibrate the counts. In this case the
count( * ) involved could also lock the table for a bit pending the
sequential scan(s) if the update is performed in a transaction.
Otherwise, again, there is a realistic possibility of inaccurate counts
occurring and persisting between calibrations.

Is there a best practice anywhere here?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jan 27, 2005, at 11:41 PM, Josh Berkus wrote:

> Thomas,
>
>> Would it be absurd to drop the triggers during import and recreate
>> them
>> afterward and update the counts in a summ> ary update based on
>> information from the import process?
>
> That's what I'd do.
>
> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco


Re: Triggers During COPY

From
Josh Berkus
Date:
Thomas,

> I forgot to mention that I'm running 7.4.6. The README includes the
> caveat that pgmemcache is designed for use with 8.0.

Well, you could always hire Sean to backport it.

> 1. Perform the drop-import-create operation in a transaction, thereby
> guaranteeing the accuracy of the counts but presumably locking the
> table during the operation, which could take many minutes (up to an
> hour or two) in extreme cases.

What other operations are ocurring on the table concurrent with the COPY?
Copy isn't really intended to be run in parallel with regular insert/update
on the same table, AFAIK.

> 2. Drop the triggers, import, create the triggers, and update with the
> import count, recognizing that other updates could've occurred without
> accumulating updates during the import process, then later (nightly,
> maybe?) do a full update to recalibrate the counts. In this case the
> count( * ) involved could also lock the table for a bit pending the
> sequential scan(s) if the update is performed in a transaction.
> Otherwise, again, there is a realistic possibility of inaccurate counts
> occurring and persisting between calibrations.

Alternately:  bulk load the new rows into a "holding" table.  Do counts on
that table.  Then, as one transaction, drop the triggers, merge the holding
table with the live table and update the counts, and restore the triggers.

Alternately:  Move the copy out of triggers into middleware where you can deal
with it more flexibly.

Alternately:  Resign yourself to the idea that keeping running statistics is
incompatible with doing a fast bulk load, and buy faster/better hardware.

--
Josh Berkus
Aglio Database Solutions
San Francisco