Re: "Ungroup" data for import into PostgreSQL - Mailing list pgsql-general

From Adrian Klaver
Subject Re: "Ungroup" data for import into PostgreSQL
Date
Msg-id 54B8630E.1020203@aklaver.com
Whole thread Raw
In response to Re: "Ungroup" data for import into PostgreSQL  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-general
On 01/15/2015 04:56 PM, Jim Nasby wrote:
> On 1/15/15 9:43 AM, George Weaver wrote:
>> Hi List,
>>
>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>> table.  I have a program that uses ODBC to connect to Excel and
>> extract data using SQL queries.  The program then inserts the data
>> into a PostgreSQL table.
>>
>> The challenge with this particular spreadsheet is that it is arrayed
>> thus:
>>
>> Model No 1     Product Code 15
>>     Serial No 1       No on Hand
>>     Serial No 2       No on Hand
>>     Serial No 3       No on Hand
>> Model No 4    Product Code 9
>>     Serial No 12     No on Hand
>> Model No 5    Product Code 27
>>     Serial No 6       No on Hand
>>     Serial No 14     No on Hand
>>
>> etc.
>>
>> I need the data in PostgreSQL arrayed thus
>>
>> Model No 1    Product Code 15    Serial No 1       No on Hand
>> Model No 1    Product Code 15    Serial No 2       No on Hand
>> Model No 1    Product Code 15    Serial No 3       No on Hand
>> Model No 4    Product Code 9      Serial No 12     No on Hand
>> Model No 5    Product Code 27    Serial No 6       No on Hand
>> Model No 5    Product Code 27    Serial No 14     No on Hand
>>
>> I can import the data procedurely using plpgsql to match the
>> individual rows to the master for each row (import the raw data into a
>> buffer table in PostgreSQL and then looping through the rows in the
>> buffer table and checking to see when the Model No changes).
>
> Note that if you're doing that you better be putting the rownumber from
> excel into the table... result sets are NOT guaranteed to be in insert
> order!
>
>> I'm wondering if there is a more elegant way to do this using straight
>> sql from Excel?
>
> Well, that's really an excel question, not a Postgres question...
>
> If you load the whole spreadsheet into a single table and have a way to
> differentiate between the different rows then you might be able to do
> something with CTE's to relate a serial number to the product code. That
> might be faster than plpgsql.
>
> You might also be able to do something creative with formulas in excel
> to copy the product code data to the serial # rows. You could then
> import the whole thing and re-normalize it.
>
> There's probably some stuff you could do with VBA too. If you care about
> performance you don't want to execute SQL statements for each
> spreadsheet row.

Or if you really want to slice and dice and you use Python, then take a
look at Pandas:

http://pandas.pydata.org/

In particular the IO functions:

http://pandas.pydata.org/pandas-docs/stable/io.html


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Jim Nasby
Date:
Subject: Re: "Ungroup" data for import into PostgreSQL
Next
From: Fabio Ugo Venchiarutti
Date:
Subject: Proper use of pg_xlog_location_diff()