Triggers During COPY - Mailing list pgsql-performance

From Thomas F.O'Connell
Subject Triggers During COPY
Date
Msg-id f2a4e35d83b33da9b5d007d276df484a@sitening.com
Whole thread Raw
Responses Re: Triggers During COPY
Re: Triggers During COPY
List pgsql-performance
I'm involved in an implementation of doing trigger-based counting as a
substitute for count( * ) in real time in an application. My
trigger-based counts seem to be working fine and dramatically improve
the performance of the display of the counts in the application layer.

The problem comes in importing new data into the tables for which the
counts are maintained. The current import process does some
preprocessing and then does a COPY from the filesystem to one of the
tables on which counts are maintained. This means that for each row
being inserted by COPY, a trigger is fired. This didn't seem like a big
deal to me until testing began on realistic data sets.

For a 5,000-record import, preprocessing plus the COPY took about 5
minutes. Once the triggers used for maintaining the counts were added,
this grew to 25 minutes. While I knew there would be a slowdown per row
affected, I expected something closer to 2x than to 5x.

It's not unrealistic for this system to require data imports on the
order of 100,000 records. Whereas this would've taken at most an hour
and a half before (preprocessing takes a couple of minutes, so the
actual original COPY takes closer to 2-3 minutes, or just over 1500
rows per minute), the new version is likely to take more than 7 hours,
which seems unreasonable to me. Additionally, the process is fairly CPU
intensive.

I've examined the plans, and, as far as I can tell, the trigger
functions are being prepared and using the indexes on the involved
tables, which are hundreds of thousands of rows in the worst cases. The
basic structure of the functions is a status lookup SELECT (to
determine whether a count needs to be updated and which one) and one or
two UPDATE statements (depending on whether both an increment and a
decrement need to be performed). As I said, it looks like this basic
format is using indexes appropriately.

Is there anything I could be overlooking that would tweak some more
performance out of this scenario?

Would it be absurd to drop the triggers during import and recreate them
afterward and update the counts in a summary update based on
information from the import process?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005


pgsql-performance by date:

Previous
From: Zavier Sheran
Date:
Subject: slow count()
Next
From: Bruno Wolff III
Date:
Subject: Re: slow count()