Re: [GENERAL] 7.4Beta - Mailing list pgsql-hackers

From Andreas Pflug
Subject Re: [GENERAL] 7.4Beta
Date
Msg-id 3F3D30ED.4070405@pse-consulting.de
Whole thread Raw
In response to Re: [GENERAL] 7.4Beta  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-hackers
Dann Corbit wrote:

>>-----Original Message-----
>>From: Andreas Pflug [mailto:pgadmin@pse-consulting.de] 
>>Sent: Friday, August 15, 2003 11:36 AM
>>To: Dann Corbit
>>Cc: Stephan Szabo; PostgreSQL-development
>>Subject: Re: [HACKERS] [GENERAL] 7.4Beta
>>
>>Dann Corbit wrote:
>>
>>    
>>
>>>Simplification of bulk operations can be very important for 
>>>      
>>>
>>customers 
>>    
>>
>>>(on the other hand).  For the CONNX tool set, we offer an escape on 
>>>INSERT/SELECT that performs the operation in bulk mode.
>>>
>>>There are serious downsides to bulk operations also (such as 
>>>      
>>>
>>not being 
>>    
>>
>>>logged and therefore existing outside of a transaction).  Therefore, 
>>>they are useful really only in cases where a complete 
>>>      
>>>
>>destruction and 
>>    
>>
>>>repopulation is called for.  If anything goes haywire, you 
>>>      
>>>
>>can't simply 
>>    
>>
>>>roll it back.
>>>
>>>Yet the speed savings can be enormous (orders of magnitude).
>>>
>>>Compared to iteration over a set of prepared inserts, a bulk insert 
>>>(such as using Microsoft's BCP API or Oracles Direct Path 
>>>      
>>>
>>loading) can 
>>    
>>
>>>be 100 times faster.  If you are moving gigabytes of data and 
>>>performing a complete refresh, the method to use becomes obvious.
>>>
>>>When we go outside of the language bounds, a curly braced escape 
>>>notation is used.  For instance, an insert/select might look 
>>>      
>>>
>>like this: 
>>    
>>
>>>INSERT INTO <destination_table> SELECT <column_list> FROM 
>>><source_table> {fn commitcount 1000} {bulkmode} The commit 
>>>      
>>>
>>count says 
>>    
>>
>>>to use batches of 1000 rows and bulkmode says to use the fastest 
>>>possible insert method.
>>>
>>>Imagine (for instance) that PostgreSQL can use an ODBC {or similar} 
>>>data source as a table.  Then, with a syntax such as the above (or 
>>>SELECT INTO etc.), you could very rapidly move data from one system 
>>>into another.
>>> 
>>>
>>>      
>>>
>>When saying "bulk operation" I don't necessarily mean using 
>>bulk load or 
>>stuff like that. What I mean is handling large amounts of 
>>similar data 
>>at the same time. That doesn't say anything about transactions or 
>>logging problems.
>>Imagine you have 100k or rows to load, each having FKs to (hopefully) 
>>existing rows
>>- Step 1: load the rows into a temp table
>>- Step 2: identify duplicates, logging  and deleting them
>>- Step 3: insert all data satisfying the FK constraints.
>>- Step 4: log all that didn't insert.
>>
>>This are relatively few statements (not the simplest), which can be 
>>handled in a safe manner.
>>Just an example, how a RDBMS can do the job in a mass 
>>oriented (and thus 
>>optimizable) way.
>>    
>>
>
>I really, really like SQL*Server's solution to the problem you have
>stated above.  A typical example is you have a pile of incoming words
>and you want to create a dictionary.  An example might be a library
>database, where you have an abstract for each item.  You parse the words
>and look for new ones to add.
>
>What SQL*Server allows is an index with an option called "IGNORE
>DUPLICATES" that simply tosses out rows that are already in the table.
>For applications like what I have described and what you have described
>it is an incredibly useful extension.  Once I got used to it, I found
>myself using it all the time.  Of course, you must be very careful to
>ensure that the duplicates really are completely unimportant.
>  
>

Hm, option is quite a quick hack. I like logging in the database much 
better than some sql messages. And it's only about duplicates, not about 
fk constraint violations.

Regards,
Andresa



pgsql-hackers by date:

Previous
From: "Dann Corbit"
Date:
Subject: Re: [GENERAL] 7.4Beta
Next
From: Tom Lane
Date:
Subject: Re: Bounds error in LockMethodInit().