Re: optimising data load - Mailing list pgsql-novice

From Ron Johnson
Subject Re: optimising data load
Date
Msg-id 1022121607.19121.81.camel@rebel
Whole thread Raw
In response to optimising data load  (John Taylor <postgres@jtresponse.co.uk>)
Responses Re: optimising data load  (John Taylor <postgres@jtresponse.co.uk>)
List pgsql-novice
On Wed, 2002-05-22 at 16:05, John Taylor wrote:
> On Wednesday 22 May 2002 18:40, Ron Johnson wrote:
> > On Wed, 2002-05-22 at 08:45, John Taylor wrote:
> > >
> > > Hi all,
> > >
> > > I'm (still) working on an application to regularly populate my database with some data provided from
> > > a third party DB.
> > >
> > > I'm still having really bad performance problems.
> > > There are 300,000 records to be inserted, but I'm only getting 10,000/hour.
> > > 30 hours to populate a single table is just not on. There must be something seriously bad going on.
> >
> > From what I see of your INSERT statement, you are joining orderheader
> > to orderlines and then inserting those records into orderlines.
> > Is that correct?
>
> Yes, unfortunately I get incomplete data, so I need to select some details before inserting.
>
> >
> > I think I'd make a view from the SELECT portion.  Then, nightly:
> > 1. COPY out the view.
> > 2. Drop the index(es) on orderlines
> > 3. COPY into orderlines
> > 4. recreate indexes on orderlines.
>
> So, how would I merge the data from the view with the data to copy in ?

???  Unless I am misunderstanding you:

0. create view v_new_orderlines (theorder, type, stock, line,
       ordercurrent, sellingquant, price, discount, vatrate, comment)
   as  SELECT oh.theorder, 'P', '0310', coalesce(ol.line+1,1), 5, 0,
              .52, 0, 0, ''
       FROM orderheader oh
               LEFT OUTER JOIN orderlines ol
                       ON oh.theorder = ol.theorder
       WHERE oh.account=' MILN1'
         AND oh.delivery=1
         AND oh.thedate='2002-06-01'
         AND oh.ordertype='O'
       ORDER BY ol.line ;
1. COPY v_new_orderlines to '/tmp/new_orderlines.tsv';
2. DROP each index from orderlines
3. COPY orderlines from '/tmp/new_orderlines.tsv';
4. CREATE each index on orderlines


> I was thinking of generating the copy file by performing a select for each row,
> but a selecting each row into a file, and then copy in might be more expensive than
> just inserting it like I do now.
>
> >
> > Do the same for orderlineupdates.  The thing is is that
> > modifying indexes is a _very_ expensive operation, no matter
> > what RDBMS or hardware configuration you have...
> >
>
> Yes, I was thinking of that for my next step.
> Once I figure how to get copy working, I'll also try dropping indexes.

--
+---------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net     |
| Jefferson, LA  USA      http://ronandheather.dhs.org:81 |
|                                                         |
| "I have created a government of whirled peas..."        |
|   Maharishi Mahesh Yogi, 12-May-2002,                   |
!   CNN, Larry King Live                                  |
+---------------------------------------------------------+


pgsql-novice by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Moving data to different hard drive /or/ incremental backup
Next
From: "Thomas A. Lowery"
Date:
Subject: Re: pl/perl Documentation