Home > mailing lists

Import large data set into a table and resolve duplicates? - Mailing list pgsql-general

From	Eugene Dzhurinsky
Subject	Import large data set into a table and resolve duplicates?
Date	February 16, 2015 16:41:52
Msg-id	20150214173744.GA13063@devbox Whole thread Raw
List	pgsql-general

Tree view

Hello!

I have a huge dictionary table with series data generated by a third-party
service. The table consists of 2 columns

- id : serial, primary key
- series : varchar, not null, indexed

From time to time I need to apply a "patch" to the dictionary, the patch file
consists of "series" data, one per line.

Now I need to import the patch into the database, and produce another file as
- if the passed "series" field exists in the database, then return ID:series
- otherwise insert a new row to the table and generate new ID and return ID:series
for each row in the source file.

So the new file will contain both ID and series data, separated by tab or
something.

While reading and writing the data is not a question (I simply described the
whole task just in case), I wonder what is the most efficient way of importing
such a data into a table, keeping in mind that

- the dictionary table already consists of ~200K records
- the patch could be ~1-50K of records long

Thanks!

--
Eugene N Dzhurinsky

Attachment

msg-32091-252899.dat

pgsql-general by date:

From: Ramesh T
Date: 16 February 2015, 16:41:51
Subject: Re: dbmsscheduler

From: Ramesh T
Date: 16 February 2015, 16:41:53
Subject: Re: postgres cust types

Import large data set into a table and resolve duplicates? - Mailing list pgsql-general

Attachment

Previous

Next