Dann Corbit wrote:
>>-----Original Message-----
>>From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
>>Sent: Friday, August 15, 2003 11:36 AM
>>To: Dann Corbit
>>Cc: Stephan Szabo; PostgreSQL-development
>>Subject: Re: [HACKERS] [GENERAL] 7.4Beta
>>
>>Dann Corbit wrote:
>>
>>
>>
>>>Simplification of bulk operations can be very important for
>>>
>>>
>>customers
>>
>>
>>>(on the other hand). For the CONNX tool set, we offer an escape on
>>>INSERT/SELECT that performs the operation in bulk mode.
>>>
>>>There are serious downsides to bulk operations also (such as
>>>
>>>
>>not being
>>
>>
>>>logged and therefore existing outside of a transaction). Therefore,
>>>they are useful really only in cases where a complete
>>>
>>>
>>destruction and
>>
>>
>>>repopulation is called for. If anything goes haywire, you
>>>
>>>
>>can't simply
>>
>>
>>>roll it back.
>>>
>>>Yet the speed savings can be enormous (orders of magnitude).
>>>
>>>Compared to iteration over a set of prepared inserts, a bulk insert
>>>(such as using Microsoft's BCP API or Oracles Direct Path
>>>
>>>
>>loading) can
>>
>>
>>>be 100 times faster. If you are moving gigabytes of data and
>>>performing a complete refresh, the method to use becomes obvious.
>>>
>>>When we go outside of the language bounds, a curly braced escape
>>>notation is used. For instance, an insert/select might look
>>>
>>>
>>like this:
>>
>>
>>>INSERT INTO <destination_table> SELECT <column_list> FROM
>>><source_table> {fn commitcount 1000} {bulkmode} The commit
>>>
>>>
>>count says
>>
>>
>>>to use batches of 1000 rows and bulkmode says to use the fastest
>>>possible insert method.
>>>
>>>Imagine (for instance) that PostgreSQL can use an ODBC {or similar}
>>>data source as a table. Then, with a syntax such as the above (or
>>>SELECT INTO etc.), you could very rapidly move data from one system
>>>into another.
>>>
>>>
>>>
>>>
>>When saying "bulk operation" I don't necessarily mean using
>>bulk load or
>>stuff like that. What I mean is handling large amounts of
>>similar data
>>at the same time. That doesn't say anything about transactions or
>>logging problems.
>>Imagine you have 100k or rows to load, each having FKs to (hopefully)
>>existing rows
>>- Step 1: load the rows into a temp table
>>- Step 2: identify duplicates, logging and deleting them
>>- Step 3: insert all data satisfying the FK constraints.
>>- Step 4: log all that didn't insert.
>>
>>This are relatively few statements (not the simplest), which can be
>>handled in a safe manner.
>>Just an example, how a RDBMS can do the job in a mass
>>oriented (and thus
>>optimizable) way.
>>
>>
>
>I really, really like SQL*Server's solution to the problem you have
>stated above. A typical example is you have a pile of incoming words
>and you want to create a dictionary. An example might be a library
>database, where you have an abstract for each item. You parse the words
>and look for new ones to add.
>
>What SQL*Server allows is an index with an option called "IGNORE
>DUPLICATES" that simply tosses out rows that are already in the table.
>For applications like what I have described and what you have described
>it is an incredibly useful extension. Once I got used to it, I found
>myself using it all the time. Of course, you must be very careful to
>ensure that the duplicates really are completely unimportant.
>
>
Hm, option is quite a quick hack. I like logging in the database much
better than some sql messages. And it's only about duplicates, not about
fk constraint violations.
Regards,
Andresa