Re: [GENERAL] 7.4Beta - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: [GENERAL] 7.4Beta
Date
Msg-id D90A5A6C612A39408103E6ECDD77B829408B8A@voyager.corporate.connx.com
Whole thread Raw
Responses Re: [GENERAL] 7.4Beta
List pgsql-hackers
> -----Original Message-----
> From: Andreas Pflug [mailto:pgadmin@pse-consulting.de]
> Sent: Friday, August 15, 2003 10:49 AM
> To: Stephan Szabo
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] [GENERAL] 7.4Beta
>
>
> Stephan Szabo wrote:
>
> >I don't know if there will be or not, but in one case it's a single
> >table select with constant values, in the other it's
> probably some kind
> >of scan and subselect. I'm just not going to rule out the
> possibility,
> >so we should profile it in large transactions with say 100k single
> >inserts and see.
> >
> >
> You're talking about bulk operations, that should be handled
> carefully
> either. Usually loading all data into a temporary table, and making a
> INSERT INTO xxx SELECT FROM tmptable should give a better
> performance if
> indices and constraints are concerned. PostgreSQL shouldn't be
> considered to accept the most abusive ways of operation, but
> it should
> offer a reasonable set of tools enabling the jobs in a
> convenient way.
> Best situation available is if many small random transactions are
> performed good, for TPC like loads, as well as bulk
> operations. Nobody
> should expect that a database will smootly convert a bunch of single
> transactions into an optimized bulk one. That's the job of a
> programmer.
>
> >Yeah, the 5 above are pretty easy to show that it's safe, but other
> >cases and referential action cases won't necessarily be so easy.
> >
> So it's the programmers responsibility to offer mass data to the
> backend, not separate inserts that by chance might be handled in a
> similar way. A RDBMS is not a clairvoyant.

Simplification of bulk operations can be very important for customers
(on the other hand).  For the CONNX tool set, we offer an escape on
INSERT/SELECT that performs the operation in bulk mode.

There are serious downsides to bulk operations also (such as not being
logged and therefore existing outside of a transaction).  Therefore,
they are useful really only in cases where a complete destruction and
repopulation is called for.  If anything goes haywire, you can't simply
roll it back.

Yet the speed savings can be enormous (orders of magnitude).

Compared to iteration over a set of prepared inserts, a bulk insert
(such as using Microsoft's BCP API or Oracles Direct Path loading) can
be 100 times faster.  If you are moving gigabytes of data and performing
a complete refresh, the method to use becomes obvious.

When we go outside of the language bounds, a curly braced escape
notation is used.  For instance, an insert/select might look like this:
INSERT INTO <destination_table> SELECT <column_list> FROM <source_table>
{fn commitcount 1000} {bulkmode}
The commit count says to use batches of 1000 rows and bulkmode says to
use the fastest possible insert method.

Imagine (for instance) that PostgreSQL can use an ODBC {or similar} data
source as a table.  Then, with a syntax such as the above (or SELECT
INTO etc.), you could very rapidly move data from one system into
another.


pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: Behavior of equality_oper and ordering_oper
Next
From: Andreas Pflug
Date:
Subject: Re: [GENERAL] 7.4Beta