Re: [NOVICE] Normalizing Unnormalized Input - Mailing list pgsql-novice

From Stephen Froehlich
Subject Re: [NOVICE] Normalizing Unnormalized Input
Date
Msg-id DM5PR06MB2891A0E4B84FD7C5FF521313E5DA0@DM5PR06MB2891.namprd06.prod.outlook.com
Whole thread Raw
In response to Re: [NOVICE] Normalizing Unnormalized Input  (Stephen Froehlich <s.froehlich@cablelabs.com>)
List pgsql-novice
One other follow-up question.

The incoming csv files range in size from 100k lines to 14 million.  Does it make sense to build indexes on the
to-be-foreignkey columns of the temp table before doing the final set of joins to write to the primary table or is it
fasterto leave them unindexed for this one-time use?
 

Some columns have as few as 2 or 3 values.

--Stephen

-----Original Message-----
From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Stephen Froehlich
Sent: Tuesday, June 20, 2017 7:20 PM
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Normalizing Unnormalized Input

I hadn't even thought of temp tables being limited in scope ... that helps tremendously.

The main caveat is that it is a good idea to drop tt at the end as well.

As for "I don't get why" ... confused questions from novices like me are why this mailing list exists :) ... I clearly
wasn'tthinking right from a Postgres PoV.
 

One thing that baffles me is that does one just call a function as one would a table when performing the write to STDIN
fromR?
 

Once I get into it in the next couple of days, I'm sure I'll have specific code questions, but that is a big help.

Thanks again,
Stephen


-----Original Message-----
From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Tuesday, June 20, 2017 5:11 PM
To: Stephen Froehlich <s.froehlich@cablelabs.com>
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Normalizing Unnormalized Input

On Tue, Jun 20, 2017 at 3:50 PM, Stephen Froehlich <s.froehlich@cablelabs.com> wrote:
> The part of the problem that I haven’t solved conceptually yet is how 
> to normalize the incoming data.

The specifics of the data matter but...if at all possible I do something like:

BEGIN
CREATE TEMP TABLE tt
COPY tt FROM STDIN
INSERT NEW RECORDS into t FROM tt - one statement (per target table) UPDATE EXISTING RECORDS in t USING tt - one
statement(per target table) END
 

I don't get why (or how) you'd "rename the table into a temp table"...

Its nice that we've add upsert but it seems more useful for streaming compared to batch.  At scale you should try to
avoidcollisions in the first place.
 

Temporary table names only need to be unique within the session.

The need for indexes on the temporary table are usually limited since the goal is to move large subsets of it around
allat once.
 

David J.

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

pgsql-novice by date:

Previous
From: Joe Conway
Date:
Subject: Re: [NOVICE] Normalizing Unnormalized Input
Next
From: Garry Chen
Date:
Subject: [NOVICE] Column level security question