Re: Importing Large Amounts of Data - Mailing list pgsql-hackers

From Curt Sampson
Subject Re: Importing Large Amounts of Data
Date
Msg-id Pine.NEB.4.43.0204151655080.439-100000@angelic.cynic.net
Whole thread Raw
In response to Re: Importing Large Amounts of Data  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Responses Re: Importing Large Amounts of Data  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
List pgsql-hackers
On Mon, 15 Apr 2002, Christopher Kings-Lynne wrote:

> OK, well now it depends on what kind of selects you're doing.  Do you
> regularly select over a certain subset of the data, in which case using
> partial indices might give you significant speedup.

I believe from the information I've been given that we will indeed
be regularly selecting over certain subsets, based on day. (One of
the test queries I've been asked to use selects based on user_id
and a date range.) But I was intending to partition the tables
based on date range (to keep the index rebuild time from getting
completely out of hand), so that will handily take care of that
requirement anyway.

> Do you select functions of columns?

No.

> It depends on your definition.  You have to accept a certain overhead if
> you're to have data integrity and MVCC.  If you can't handle that overhead,
> then you can't have data integrity and vice versa.

Well, a few points:
 a) I am not convinced that data integrity should cost a five-fold decrease in performance,
 b) In fact, at times I don't need that data integrity. I'm prefectly happy to risk the loss of a table during import,
ifit lets me do the import more quickly, especially if I'm taking the database off line to do the import anyway. MS SQL
serverin fact allows me to specify relaxed integrity (with attendant risks) when doing a BULK IMPORT; it would be cool
ifPostgres allowed that to.
 

> BTW, instead of:
>
> CREATE UNIQUE INDEX bigone_pkey ON bigone (rec_no);
>
> do:
>
> ALTER TABLE bigone ADD PRIMARY KEY(rec_no);
>
> And remember to run "VACUUM ANALYZE bigone;" or just "ANALYZE bigone;" after
> the COPY and before trying to use the table.  I'm not sure if it's better to
> analyze before or after the indexes are added, but it's definitely better to
> vaccum before the indexes are added.

Thanks. This is the kind of useful information I'm looking for. I
was doing a vacuum after, rather than before, generating the indices.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



pgsql-hackers by date:

Previous
From: "Christopher Kings-Lynne"
Date:
Subject: Re: Importing Large Amounts of Data
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Re: Importing Large Amounts of Data