Re: Bulk Insert and Index use - Mailing list pgsql-performance

From Gregory S. Williamson
Subject Re: Bulk Insert and Index use
Date
Msg-id 71E37EF6B7DCC1499CEA0316A256832801057DE5@loki.wc.globexplorer.net
Whole thread Raw
In response to Bulk Insert and Index use  (Rudi Starcevic <tech@wildcash.com>)
List pgsql-performance
If it has to read a majority (or even a good percentage) of the rows in question a sequential scan is probably faster
...and as Jim pointed out, a temp table can often be a useful medium for getting speed in a load and then allowing you
toclean/alter data for a final (easy) push. 

G
-----Original Message-----
From:    Rudi Starcevic [mailto:tech@wildcash.com]
Sent:    Tue 8/10/2004 8:33 PM
To:    pgsql-performance@postgresql.org
Cc:
Subject:    Re: [PERFORM] Bulk Insert and Index use
Hi Jim,

Thanks for your time.

 > If the bulk load has the possibility of duplicating data

Yes, each row will require either:

a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE

I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.

Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.

While INSERTS are more prevelant perhaps a seq. scan is better.

I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.

Many thanks.

Jim J wrote:

> If the bulk load has the possibility of duplicating data, then you need
> to change methods.  Try bulk loading into a temp table,  index it like
> the original, eliminate the dups and merge the tables.
>
> It is also possible to do an insert from the temp table into the final
> table like:
> insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left
> join original on temp.street=original.street where original.street is null)
>
> Good Luck
> Jim
>
> Rudi Starcevic wrote:
>
>> Hi,
>>
>> I have a question on bulk checking, inserting into a table and
>> how best to use an index for performance.
>>
>> The data I have to work with is a monthly CD Rom csv data dump of
>> 300,000 property owners from one area/shire.
>>
>> So every CD has 300,000 odd lines, each line of data which fills the
>> 'property' table.
>>
>> Beginning with the first CD each line should require one SELECT and
>> one INSERT as it will be the first property with this address.
>>
>> The SELECT uses fields like 'street' and 'suburb', to check for an
>> existing property,
>> so I have built an index on those fields.
>>
>> My question is does each INSERT rebuild the index on the 'street' and
>> 'suburb' fields?
>> I believe it does but I'm asking to be sure.
>>
>> If this is the case I guess performance will suffer when I have, say,
>> 200,000
>> rows in the table.
>>
>> Would it be like:
>>
>> a) Use index to search on 'street' and 'suburb'
>> b) No result? Insert new record
>> c) Rebuild index on 'street' and 'suburb'
>>
>> for each row?
>> Would this mean that after 200,000 rows each INSERT will require
>> the index of 000's of rows to be re-indexed?
>>
>> So far I believe my only options are to use either and index
>> or sequential scan and see which is faster.
>>
>> A minute for your thoughts and/or suggestions would be great.
>>
>> Thanks.
>> Regards,
>> Rudi.
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>
>


--


Regards,
Rudi.

Internet Media Productions

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org




pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: Re: Bulk Insert and Index use
Next
From: Rudi Starcevic
Date:
Subject: Re: Bulk Insert and Index use