Home > mailing lists

Re: How to load data from CSV into a table that has array types in its columns? - Mailing list pgsql-general

From	Adrian Klaver
Subject	Re: How to load data from CSV into a table that has array types in its columns?
Date	October 27, 2022 03:59:49
Msg-id	8830ee61-cf09-7ba6-0055-efe7a910ac59@aklaver.com Whole thread Raw
In response to	How to load data from CSV into a table that has array types in its columns? (Siddharth Jain <siddhsql@gmail.com>)
List	pgsql-general

Tree view

On 10/26/22 17:30, Siddharth Jain wrote:
> Hello,
> 
> Given a long list like this:
> 
> 1,2
> 2,4 --> 2 appears once
> 7,9
> 8,9
> 5,3
> 2,5 --> note 2 appears twice
> 
> I want to load it into this table:
> 
> create table tbl (
>    id integer primary key,
>    fids integer[]
> )
> 
> so we will have 2 -> [4,5] where 2 is id and [4,5] are the fids
> 
> My actual dataset is very large and has 100M rows in it. How can it be 
> efficiently loaded into postgres?
> 
> I think I have to use a program for this and am trying to use the pg 
> library that comes with Node.js. I am reading the data in batches of 1M 
> or 100k rows for example. I have created a dictionary in Node.js where I 
> am storing the hashmap. The part where I am stuck is how to generate the 
> SQL command?
> 
> 1. I want to make one call to the server for the batch of 1M rows, not 
> 1M calls
> 2. Some keys in the hashmap might already exist in the database. For 
> these keys we want to append to the array
> 3. Some keys will not exist and for these we want to insert new rows
> 
> Can someone help me please? I am using Postgres for the first time.

The best way to deal with importing large batches of data is to use COPY

https://www.postgresql.org/docs/current/sql-copy.html

But that will not play well with modifying the data as you input it.

What I can see doing is:

1) COPY the data into a staging table:

create staging_tbl (id integer, fid integer)

2) Then using SQL statements to move the data to the final table.

As example of one possibility, using ON CONFLICT from here:

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

insert into table tbl select id, array[fid] from staging_table on 
conflict(id) DO UPDATE SET fids = array_append(fids, excluded.fid);

I would test with a smaller example data set to vetify.

> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: Siddharth Jain
Date: 27 October 2022, 03:30:00
Subject: How to load data from CSV into a table that has array types in its columns?

From: Bryn Llewellyn
Date: 27 October 2022, 04:33:24
Subject: Seeking the correct term of art for the (unique) role that is usually called "postgres"—and the mental model that underlies it all

Re: How to load data from CSV into a table that has array types in its columns? - Mailing list pgsql-general

Previous

Next