On Thu, Jun 3, 2010 at 11:19 AM, Torsten Zühlsdorff
<foo@meisterderspiele.de> wrote:
> Scott Marlowe schrieb:
>>
>> On Tue, Jun 1, 2010 at 9:03 AM, Torsten Zühlsdorff
>> <foo@meisterderspiele.de> wrote:
>>>
>>> Hello,
>>>
>>> i have a set of unique data which about 150.000.000 rows. Regullary i get
>>> a
>>> list of data, which contains multiple times of rows than the already
>>> stored
>>> one. Often around 2.000.000.000 rows. Within this rows are many
>>> duplicates
>>> and often the set of already stored data.
>>> I want to store just every entry, which is not within the already stored
>>> one. Also i do not want to store duplicates. Example:
>>
>> The standard method in pgsql is to load the data into a temp table
>> then insert where not exists in old table.
>
> Sorry, i didn't get it. I've googled some examples, but no one match at my
> case. Every example i found was a single insert which should be done or
> ignored, if the row is already stored.
>
> But in my case i have a bulk of rows with duplicates. Either your tipp
> doesn't match my case or i didn't unterstand it correctly. Can you provide a
> simple example?
create table main (id int primary key, info text);
create table loader (id int, info text);
insert into main values (1,'abc'),(2,'def'),(3,'ghi');
insert into loader values (1,'abc'),(4,'xyz');
select * from main;
id | info
----+------
1 | abc
2 | def
3 | ghi
(3 rows)
select * from loader;
id | info
----+------
1 | abc
4 | xyz
(2 rows)
insert into main select * from loader except select * from main;
select * from main;
id | info
----+------
1 | abc
2 | def
3 | ghi
4 | xyz
(4 rows)
Note that for the where not exists to work the fields would need to be
all the same, or you'd need a more complex query. If the info field
here was different you'd get an error an no insert / update. For that
case you might want to use "where not in":
insert into main select * from loader where id not in (select id from main);
If you wanted the new rows to update pre-existing rows, then you could
run an update first where the ids matched, then the insert where no id
matches.