Re: [GENERAL] 7.4Beta - Mailing list pgsql-hackers
From | Andreas Pflug |
---|---|
Subject | Re: [GENERAL] 7.4Beta |
Date | |
Msg-id | 3F3D30ED.4070405@pse-consulting.de Whole thread Raw |
In response to | Re: [GENERAL] 7.4Beta ("Dann Corbit" <DCorbit@connx.com>) |
List | pgsql-hackers |
Dann Corbit wrote: >>-----Original Message----- >>From: Andreas Pflug [mailto:pgadmin@pse-consulting.de] >>Sent: Friday, August 15, 2003 11:36 AM >>To: Dann Corbit >>Cc: Stephan Szabo; PostgreSQL-development >>Subject: Re: [HACKERS] [GENERAL] 7.4Beta >> >>Dann Corbit wrote: >> >> >> >>>Simplification of bulk operations can be very important for >>> >>> >>customers >> >> >>>(on the other hand). For the CONNX tool set, we offer an escape on >>>INSERT/SELECT that performs the operation in bulk mode. >>> >>>There are serious downsides to bulk operations also (such as >>> >>> >>not being >> >> >>>logged and therefore existing outside of a transaction). Therefore, >>>they are useful really only in cases where a complete >>> >>> >>destruction and >> >> >>>repopulation is called for. If anything goes haywire, you >>> >>> >>can't simply >> >> >>>roll it back. >>> >>>Yet the speed savings can be enormous (orders of magnitude). >>> >>>Compared to iteration over a set of prepared inserts, a bulk insert >>>(such as using Microsoft's BCP API or Oracles Direct Path >>> >>> >>loading) can >> >> >>>be 100 times faster. If you are moving gigabytes of data and >>>performing a complete refresh, the method to use becomes obvious. >>> >>>When we go outside of the language bounds, a curly braced escape >>>notation is used. For instance, an insert/select might look >>> >>> >>like this: >> >> >>>INSERT INTO <destination_table> SELECT <column_list> FROM >>><source_table> {fn commitcount 1000} {bulkmode} The commit >>> >>> >>count says >> >> >>>to use batches of 1000 rows and bulkmode says to use the fastest >>>possible insert method. >>> >>>Imagine (for instance) that PostgreSQL can use an ODBC {or similar} >>>data source as a table. Then, with a syntax such as the above (or >>>SELECT INTO etc.), you could very rapidly move data from one system >>>into another. >>> >>> >>> >>> >>When saying "bulk operation" I don't necessarily mean using >>bulk load or >>stuff like that. What I mean is handling large amounts of >>similar data >>at the same time. That doesn't say anything about transactions or >>logging problems. >>Imagine you have 100k or rows to load, each having FKs to (hopefully) >>existing rows >>- Step 1: load the rows into a temp table >>- Step 2: identify duplicates, logging and deleting them >>- Step 3: insert all data satisfying the FK constraints. >>- Step 4: log all that didn't insert. >> >>This are relatively few statements (not the simplest), which can be >>handled in a safe manner. >>Just an example, how a RDBMS can do the job in a mass >>oriented (and thus >>optimizable) way. >> >> > >I really, really like SQL*Server's solution to the problem you have >stated above. A typical example is you have a pile of incoming words >and you want to create a dictionary. An example might be a library >database, where you have an abstract for each item. You parse the words >and look for new ones to add. > >What SQL*Server allows is an index with an option called "IGNORE >DUPLICATES" that simply tosses out rows that are already in the table. >For applications like what I have described and what you have described >it is an incredibly useful extension. Once I got used to it, I found >myself using it all the time. Of course, you must be very careful to >ensure that the duplicates really are completely unimportant. > > Hm, option is quite a quick hack. I like logging in the database much better than some sql messages. And it's only about duplicates, not about fk constraint violations. Regards, Andresa
pgsql-hackers by date: