On Tue, 2005-07-26 at 10:50 -0600, Dan Harris wrote:
> I am working on a process that will be inserting tens of million rows
> and need this to be as quick as possible.
>
> The catch is that for each row I could potentially insert, I need to
> look and see if the relationship is already there to prevent
> multiple entries. Currently I am doing a SELECT before doing the
> INSERT, but I recognize the speed penalty in doing to operations. I
> wonder if there is some way I can say "insert this record, only if it
> doesn't exist already". To see if it exists, I would need to compare
> 3 fields instead of just enforcing a primary key.
>
> Even if this could be a small increase per record, even a few percent
> faster compounded over the whole load could be a significant reduction.
>
> Thanks for any ideas you might have.
>
Perhaps a trigger:
CREATE FUNCTION verify_unique() RETURNS TRIGGER AS $func$
BEGIN
PERFORM a,b,c FROM table1 WHERE a = NEW.a and b = NEW.b and c = NEW.c;
IF FOUND THEN
RETURN NULL;
END IF;
RETURN NEW;
END;
$func$ LANGUAGE plpgsql STABLE;
CREATE TRIGGER verify_unique BEFORE INSERT ON table1 FOR EACH ROW
EXECUTE PROCEDURE verify_unique();
Triggers are fired on COPY commands and if table1 is able to be cached
and you have an index on table1(a,b,c) the results should be fairly
decent. I would be interested in seeing the difference in timing between
this approach and the temp table approach.