On Oct 25, 2007, at 10:30 AM, Jean-David Beyer wrote:
> I have just changed around some programs that ran too slowly (too
> much time
> in io-wait) and they speeded up greatly. This was not unexpected,
> but I
> wonder about the limitations.
>
> By transaction, I mean a single INSERT or a few related INSERTs.
>
> What I used to do is roughly like this:
>
> for each file {
> for each record {
> BEGIN WORK;
> INSERT stuff in table(s);
> if error {
> ROLLBACK WORK
> }
> else {
> COMMIT WORK;
> }
> }
> }
>
> The speedup was the obvious one:
>
> for each file {
> BEGIN WORK;
> for each record {
> INSERT stuff in table(s);
> }
> if error {
> ROLLBACK WORK
> }
> else {
> COMMIT WORK;
> }
> }
>
> This means, of course, that the things I think of as transactions
> have been
> bunched into a much smaller number of what postgreSQL thinks of as
> large
> transactions, since there is only one per file rather than one per
> record.
> Now if a file has several thousand records, this seems to work out
> just great.
>
> But what is the limitation on such a thing? In this case, I am just
> populating the database and there are no other users at such a
> time. I am
> willing to lose the whole insert of a file if something goes wrong
> -- I
> would fix whatever went wrong and start over anyway.
>
> But at some point, disk IO would have to be done. Is this just a
> function of
> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or
> does it
> have to do with wal_buffers and checkpoint_segments?
You're reading data from a file and generating inserts? Can you not
use COPY? That would be the most performant.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com