Home > mailing lists

Re: Import large data set into a table and resolve duplicates? - Mailing list pgsql-general

From	Francisco Olarte
Subject	Re: Import large data set into a table and resolve duplicates?
Date	February 15, 2015 17:59:16
Msg-id	CA+bJJbytzU2qerqmibSj4jTGcGJtQUvyg-Stw+8NC6QYSqEP1w@mail.gmail.com Whole thread Raw
In response to	Re: Import large data set into a table and resolve duplicates? (Eugene Dzhurinsky <jdevelop@gmail.com>)
List	pgsql-general

Tree view

Hi Eugene:

On Sun, Feb 15, 2015 at 6:36 PM, Eugene Dzhurinsky <jdevelop@gmail.com> wrote:

...

Since the "dictionary" already has an index on the "series", it seems that
patch_data doesn't need to have any index here.
....
At this point "patch_data" needs to get an index on "already_exists = false",
which seems to be cheap.

As I told you before, do not focus in the indexes too much. When you do bulk updates like this they tend to be much slower than a proper sort.

The reason is locality of reference. When you do the things with sorts you do two or three nicely ordered passes on the data, using full pages. When you use indexes you spend a lot of time parsing index structures and switching read-index, read-data, index, data, .... ( They are cached, but you have to switch to them anyway ). Also, with your kind of data indexes on series are going to be big, so less cache available for data.

As I said before, it depends on your data anyway, with the current machines this day what I'll do with this problem would be to just make a program ( in perl, seems adequate for this ), copy dictionary to client memory and just read the patch spitting the result file and inserting the needed lines along the way, seems it should fit in 1Gb without problems, which is not much by today standards.

Regards.

Francisco Olarte.

pgsql-general by date:

From: Francisco Olarte
Date: 15 February 2015, 17:51:37
Subject: Fwd: Import large data set into a table and resolve duplicates?

From: Eugene Dzhurinsky
Date: 15 February 2015, 18:08:43
Subject: Re: Import large data set into a table and resolve duplicates?

Re: Import large data set into a table and resolve duplicates? - Mailing list pgsql-general

Previous

Next