Re: need help with import - Mailing list pgsql-sql

From David Johnston
Subject Re: need help with import
Date
Msg-id D4BF5726-54B8-4887-B038-F8AA3E1792FF@yahoo.com
Whole thread Raw
In response to Re: need help with import  (Andreas <maps.on@gmx.net>)
List pgsql-sql
On Feb 15, 2012, at 21:05, Andreas <maps.on@gmx.net> wrote:

> Am 16.02.2012 02:13, schrieb David Johnston:
>> -----Original Message-----
>> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Andreas
>> Sent: Wednesday, February 15, 2012 8:03 PM
>> To: pgsql-sql@postgresql.org
>> Subject: [SQL] need help with import
>>
>> Hi
>> I get CSV files to import.
>> Th structure is like this.
>> main part, sub part
>> Could be like this
>>
>> A, a1
>> A, a2
>> A, a3
>> B, b1
>> B, b2
>>
>> The database has a table for main_part and one for sub_part.
>> The relation needs to be n:m so there is a relation table that holds ( main_id, sub_id ).
>> The 2 primary keys main_part.id and sub_part.id are both serials.
>>
>> Is there a way to do an import with SQL?
>>
>> I can read the CSV into a temporary table and I can do a INSERT INTO main_part ( ... ) SELECT DISTINCT main columns
FROMimport; as well as a INSERT INTO sub_part ( ... ) SELECT sub columns FROM import; 
>>
>> But how would I know what main_id and sub_id to insert into the n:m relation?
>>
>> At first when I do the import the relation is actually 1:n.
>>
>> --------------------------------------------
>>
>> You will need to use the temporary table and perform multiple insert+select.
>>
>> I do not understand where you are confused.  It would help to provide more meaningful sample data and/or the final
resultyou are trying to achieve.  Keep in mind any n:m setup requires three tables with the joining table usually
havingsome descriptive meaning.  Is time one of your components that you are not  showing us? 
>>
>>
> As you say there are 3 tables
> main_part ( id serial primary key, ... )
> sub_part ( id serial primary key, ... )
> main_to_sub ( main_id, sub_id )
>
> I would read the csv into a temporary table "import" and insert the main columns into main_part ().
> Then there are new tuples in main_part()
> 42, A
> 43, B
>
> Now I insert the sub columns into sub_part()
> I'll get e.g.
> 1000, a1
> 1001, a2
> 1002, a3
> 1003, b1
> 1004, b2
>
> To fill up main_to_sub ( ... ) I needed to know which sub_id belongs to which main_id.
> ( 42, 1000 )
> ( 42, 1001 )
> ( 42, 1002 )
> ( 43, 1003 )
> ( 43, 1004 )
>
> I could compare every main-column in "import" to every related data-column in main_part to get the newly created
main_idand do the same with every sub-data-column but this seems to be a wee bit tedious. 
>
> Is there a more elegant way hat I don't see, yet?
>

Ditch the whole idea of using a sequence and take your primary keys from the source data. Otherwise yes, you will need
toperform the join between the import and live tables to determine the newly created identifier. 

The question to answer is if you see the same values in subsequent import files do you create a new sequence value or
reusethe existing value?  Why? 

David J.

pgsql-sql by date:

Previous
From: Andreas
Date:
Subject: Re: need help with import
Next
From: "Raj Mathur (राज माथुर)"
Date:
Subject: Re: need help with import