Re: ETL optimization - Mailing list pgsql-performance

From Bricklen Anderson
Subject Re: ETL optimization
Date
Msg-id 42BB31F1.7000303@PresiNET.com
Whole thread Raw
In response to Re: ETL optimization  (Jacques Caron <jc@directinfos.com>)
List pgsql-performance
Jacques Caron wrote:
>
> I have a similar situation, and the solution I use (though I haven't
> really tested many different situations):
> - have a trigger ON INSERT which does:
> UPDATE set whatever_value=NEW.whatever_value,... WHERE
> whatever_key=NEW.whatever.key AND...
> IF FOUND THEN
>  RETURN NULL;
> ELSE
>  RETURN NEW;
> END IF;
> - use COPY
>
> For optimal performance, a different trigger function is created for
> each table, which allows the query plan of the UPDATE to be cached.
>
> Let us know how that works out for you and if you find a better solution!
>
> Jacques.
>
Hi Jacques, thanks for the suggestion. I've previously tested triggers under a
variety of situations and there was no way that they would work under the load
we currently have, and the much greater load that we will be expecting soon
(~40x increase in data).

I'm in the process of testing the delete scenario right now, and at first blush
seems to perform fairly well. 2.5 million rows before aggregation, and 171000
after, in a little under 7 minutes.

Currently testing again with about 18.5 million rows. A drawback by using the
delete method is that we cannot do any of the aggregation incrementally, but so
far that hasn't been a big help anyways. I still need to test the performance of
concurrent querying against the destination table whilst the aggregation is
occurring.

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

pgsql-performance by date:

Previous
From: Jacques Caron
Date:
Subject: Re: ETL optimization
Next
From: Dennis Bjorklund
Date:
Subject: Re: ETL optimization