Home > mailing lists

Re: How to insert a bulk of data with unique-violations very fast - Mailing list pgsql-performance

From	Cédric Villemain
Subject	Re: How to insert a bulk of data with unique-violations very fast
Date	June 3, 2010 20:04:08
Msg-id	AANLkTim0X5AyDz6K7uPdwCwoqNNSOjAUEsVJ9maS8aO0@mail.gmail.com Whole thread Raw
In response to	How to insert a bulk of data with unique-violations very fast (Torsten Zühlsdorff <foo@meisterderspiele.de>)
Responses	Re: How to insert a bulk of data with unique-violations very fast
List	pgsql-performance

Tree view

2010/6/1 Torsten Zühlsdorff <foo@meisterderspiele.de>:
> Hello,
>
> i have a set of unique data which about 150.000.000 rows. Regullary i get a
> list of data, which contains multiple times of rows than the already stored
> one. Often around 2.000.000.000 rows. Within this rows are many duplicates
> and often the set of already stored data.
> I want to store just every entry, which is not within the already stored
> one. Also i do not want to store duplicates. Example:
>
> Already stored set:
> a,b,c
>
> Given set:
> a,b,a,c,d,a,c,d,b
>
> Expected set after import:
> a,b,c,d
>
> I now looking for a faster way for the import. At the moment i import the
> new data with copy into an table 'import'. then i remove the duplicates and
> insert every row which is not already known. after that import is truncated.
>
> Is there a faster way? Should i just insert every row and ignore it, if the
> unique constrain fails?
>
> Here the simplified table-schema. in real life it's with partitions:
> test=# \d urls
>                         Tabelle »public.urls«
>  Spalte |   Typ   |                       Attribute
> --------+---------+-------------------------------------------------------
>  url_id | integer | not null default nextval('urls_url_id_seq'::regclass)
>  url    | text    | not null
> Indexe:
>    »urls_url« UNIQUE, btree (url)
>    »urls_url_id« btree (url_id)
>
> Thanks for every hint or advice! :)

I think you need to have a look at pgloader. It does COPY with error
handling. very effective.

http://pgloader.projects.postgresql.org/

>
> Greetings from Germany,
> Torsten
> --
> http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
> verschiedenen Datenbanksystemen abstrahiert,
> Queries von Applikationen trennt und automatisch die Query-Ergebnisse
> auswerten kann.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

pgsql-performance by date:

From: Anj Adu
Date: 03 June 2010, 17:37:33
Subject: Re: slow query performance

From: Anj Adu
Date: 03 June 2010, 22:45:41
Subject: slow query

Re: How to insert a bulk of data with unique-violations very fast - Mailing list pgsql-performance

Previous

Next