On 2005-02-25, "Dave Held" <dave.held@arrayservicesgrp.com> wrote:
> A possibility that I would like to pursue is to keep the staging data
> from the previous day, do a COPY TO, import the new data into
> another staging table with a COPY FROM, then export the fresh
> data with another COPY TO. Then, I can write a fast C/C++
> program to do a line-by-line comparison of each record, isolating
> the ones that have changed from the previous day. I can then
> emit those records in a change file that should be relatively small
> and easy to update.
I have an application that does something like this, but rather than use an
external program, I do the comparison in the database itself:
- import data from external system into a temporary table - compare the temporary table against the live data (a full
outerjoin is a convenient way of doing this - I create an index on the temp table first) - perform
insert/update/deletefor each record that was added, changed or removed
In my case the compare/update is in a pl/pgsql function. My data is only
2-3 million rows, a bit smaller than yours, but I have to update hourly,
not daily, and spend no more than 5-10 minutes on each update (currently
I can do it in 5: 2 to load the data, 3 to do the compare/update).
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services