Home > mailing lists

Re: Removing duplicate records from a bulk upload (rationale behind selecting a method) - Mailing list pgsql-general

From	David G Johnston
Subject	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Date	December 12, 2014 18:40:38
Msg-id	1418409627311-5830353.post@n5.nabble.com Whole thread Raw
In response to	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method) (John McKown <john.archie.mckown@gmail.com>)
List	pgsql-general

Tree view

John McKown wrote
> I don't
> know, myself, why this would be faster. But I'm not any kind of a
> PostgreSQL expert either.

It is faster because PostgreSQL does not have native parallelism.  By using
a%n in a where clause you can start n separate sessions and choose a
different value of n for each one and manually introduce parallelism into
the activity.

Though given this is going to likely be I/O constrained the possible gains
do not scale lineally with the number of sessions - which themselves
effectively max out at the number of cores available to the server.

David J.




--
View this message in context:
http://postgresql.nabble.com/Re-Removing-duplicate-records-from-a-bulk-upload-rationale-behind-selecting-a-method-tp5829682p5830353.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

pgsql-general by date:

From: Bruce Momjian
Date: 12 December 2014, 18:10:24
Subject: Re: anyone using oid2name?

From: Marc Mamin
Date: 12 December 2014, 19:25:30
Subject: Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)

Re: Removing duplicate records from a bulk upload (rationale behind selecting a method) - Mailing list pgsql-general

Previous

Next