Re: Bulk Insert and Index use - Mailing list pgsql-performance
From | Gregory S. Williamson |
---|---|
Subject | Re: Bulk Insert and Index use |
Date | |
Msg-id | 71E37EF6B7DCC1499CEA0316A256832801057DE5@loki.wc.globexplorer.net Whole thread Raw |
In response to | Bulk Insert and Index use (Rudi Starcevic <tech@wildcash.com>) |
List | pgsql-performance |
If it has to read a majority (or even a good percentage) of the rows in question a sequential scan is probably faster ...and as Jim pointed out, a temp table can often be a useful medium for getting speed in a load and then allowing you toclean/alter data for a final (easy) push. G -----Original Message----- From: Rudi Starcevic [mailto:tech@wildcash.com] Sent: Tue 8/10/2004 8:33 PM To: pgsql-performance@postgresql.org Cc: Subject: Re: [PERFORM] Bulk Insert and Index use Hi Jim, Thanks for your time. > If the bulk load has the possibility of duplicating data Yes, each row will require either: a) One SELECT + One INSERT or b) One SELECT + One UPDATE I did think of using more than one table, ie. temp table. As each month worth of data is added I expect to see a change from lots of INSERTS to lots of UPDATES. Perhaps when the UPDATES become more dominant it would be best to start using Indexes. While INSERTS are more prevelant perhaps a seq. scan is better. I guess of all the options available it boils down to which is quicker for my data: index or sequential scan. Many thanks. Jim J wrote: > If the bulk load has the possibility of duplicating data, then you need > to change methods. Try bulk loading into a temp table, index it like > the original, eliminate the dups and merge the tables. > > It is also possible to do an insert from the temp table into the final > table like: > insert into original (x,x,x) (select temp.1, temp.2, etc from temp left > join original on temp.street=original.street where original.street is null) > > Good Luck > Jim > > Rudi Starcevic wrote: > >> Hi, >> >> I have a question on bulk checking, inserting into a table and >> how best to use an index for performance. >> >> The data I have to work with is a monthly CD Rom csv data dump of >> 300,000 property owners from one area/shire. >> >> So every CD has 300,000 odd lines, each line of data which fills the >> 'property' table. >> >> Beginning with the first CD each line should require one SELECT and >> one INSERT as it will be the first property with this address. >> >> The SELECT uses fields like 'street' and 'suburb', to check for an >> existing property, >> so I have built an index on those fields. >> >> My question is does each INSERT rebuild the index on the 'street' and >> 'suburb' fields? >> I believe it does but I'm asking to be sure. >> >> If this is the case I guess performance will suffer when I have, say, >> 200,000 >> rows in the table. >> >> Would it be like: >> >> a) Use index to search on 'street' and 'suburb' >> b) No result? Insert new record >> c) Rebuild index on 'street' and 'suburb' >> >> for each row? >> Would this mean that after 200,000 rows each INSERT will require >> the index of 000's of rows to be re-indexed? >> >> So far I believe my only options are to use either and index >> or sequential scan and see which is faster. >> >> A minute for your thoughts and/or suggestions would be great. >> >> Thanks. >> Regards, >> Rudi. >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Don't 'kill -9' the postmaster >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > > -- Regards, Rudi. Internet Media Productions ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
pgsql-performance by date: