Re: How to insert a bulk of data with unique-violations very fast

From: Torsten Zühlsdorff
Subject: Re: How to insert a bulk of data with unique-violations very fast
Date: ,
Msg-id: hungra$vlt$2@news.eternal-september.org
(view: Whole thread, Raw)
In response to: Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C")
Responses: Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C")
List: pgsql-performance

Tree view

How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
   Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
    Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
     Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
      Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
       Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
        Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
 Re: How to insert a bulk of data with unique-violations very fast  (Cédric Villemain, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Andy Colson, )

Pierre C schrieb:
>
>> Within the data to import most rows have 20 till 50 duplicates.
>> Sometime much more, sometimes less.
>
> In that case (source data has lots of redundancy), after importing the
> data chunks in parallel, you can run a first pass of de-duplication on
> the chunks, also in parallel, something like :
>
> CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;
>
> or you could compute some aggregates, counts, etc. Same as before, no
> WAL needed, and you can use all your cores in parallel.
>
>  From what you say this should reduce the size of your imported data by
> a lot (and hence the time spent in the non-parallel operation).

Thank you very much for this advice. I've tried it inanother project
with similar import-problems. This really speed the import up.

Thank everyone for your time and help!

Greetings,
Torsten
--
http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
verschiedenen Datenbanksystemen abstrahiert,
Queries von Applikationen trennt und automatisch die Query-Ergebnisse
auswerten kann.


pgsql-performance by date:

From: "Pierre C"
Date:
Subject: Re: Large (almost 50%!) performance drop after upgrading to 8.4.4?
From: Robert Haas
Date:
Subject: Re: No hash join across partitioned tables?