Thread: Re: Slowdown problem when writing 1.7million records

Re: Slowdown problem when writing 1.7million records

From
Emmanuel Charpentier
Date:
Stephen Livesey wrote:
>
> I am very new to PostgreSQL and have installed v7.03 on a Red Hat Linux
> Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.
>
> I have created a small file as follows:
> CREATE TABLE expafh (
> postcode CHAR(8) NOT NULL,
> postcode_record_no INT,
> street_name CHAR(30),
> town CHAR(31),
> PRIMARY KEY(postcode) )
>
> I am now writing 1.7million records to this file.
>
> The first 100,000 records took 15mins.
> The next 100,000 records took 30mins
> The last 100,000 records took 4hours.
>
> In total, it took 43 hours to write 1.7million records.
>
> Is this sort of degradation normal using a PostgreSQL database?

AFAICT, no.

> I have never experienced this sort of degradation with any other database
> and I have done exactly the same test (using the same hardware) on the
> following databases:
>     DB2 v7 in total took 10hours 6mins
>     Oracle 8i in total took 3hours 20mins
>     Interbase v6 in total took 1hr 41min
>     MySQL v3.23 in total took 54mins
>
> Any Help or advise would be appreciated.

Did you "vacuum analyse" your DB ? This seems to be essential to PG
performance, for various reasons.

Do you have a unique index on your primary key ?

HTH,

                    Emmanuel Charpentier

Re: Re: Slowdown problem when writing 1.7million records

From
"Gordan Bobic"
Date:
> > I am very new to PostgreSQL and have installed v7.03 on a Red Hat
Linux
> > Server (v6.2), I am accessing the files using JDBC from a Windows
2000 PC.
> >
> > I have created a small file as follows:
> > CREATE TABLE expafh (
> > postcode CHAR(8) NOT NULL,
> > postcode_record_no INT,
> > street_name CHAR(30),
> > town CHAR(31),
> > PRIMARY KEY(postcode) )
> >
> > I am now writing 1.7million records to this file.
> >
> > The first 100,000 records took 15mins.
> > The next 100,000 records took 30mins
> > The last 100,000 records took 4hours.
> >
> > In total, it took 43 hours to write 1.7million records.
> >
> > Is this sort of degradation normal using a PostgreSQL database?
>
> AFAICT, no.

Funny you should say that, but I seem to be experiencing a similar
things when doing pgdump/restore. When restoring a table with around
40M records, everythings starts nice and fast. However, by the time
it's half done, inserts take forever. It took about 16 hours to
restore all 40M records (3 GB or so + indices). Obviously, since it
was a "restore", there was no vacuuming during the operation, but it
still seemed to get unusually slow toward the end... Judging by the
increase rates in file sizes, anyway, which I accept, isn't the most
reliable way to judge this matter by...

But, I guess this is nothing comparable to your problem. I was
inserting 40M records, and it took 16 hours.

You are inserting 1.7M records, and it's taking 43 hours.

Are you using autocommit? Are you using fsync? You should probably
periodically commit the data every few thousand inserts if you don't
have autocommit enabled.

Are you running out of memory? What amount of shared memory do you
have enabled? How much is Postgres allowed to use? I know this is
fairly obvious stuff, but it is, as such, quite easy to overlook...

> > I have never experienced this sort of degradation with any other
database
> > and I have done exactly the same test (using the same hardware) on
the
> > following databases:
> >     DB2 v7 in total took 10hours 6mins
> >     Oracle 8i in total took 3hours 20mins
> >     Interbase v6 in total took 1hr 41min
> >     MySQL v3.23 in total took 54mins
> >
> > Any Help or advise would be appreciated.

The only thing that comes to mind is that if you're doing a bulk
insert, you should probably drop all indices that aren't unique or for
primary keys, and re-create them once your insert all your data...

Regards.

Gordan



Re: Re: Slowdown problem when writing 1.7million records

From
Marc SCHAEFER
Date:
> Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.

I don't use Microsoft software, nor Java, but a few general suggestions
for you:

  - you may want to create the INDEX at the end (using CREATE INDEX)
    in one operation

  - you may want to run your requests in transactions: e.g. a transaction
    which is committed every N insertion.

  - in some rare case you may want to disable the fsync() of the
    PostgreSQL

  - you may want to use a more efficient interface, such as the
    COPY command.

Example (using transactions)

#! /usr/bin/perl -wI /home/schaefer/perl-libs
# $Id$

use DBI;
use test_db;
use strict;

my $debug = 0;

my $result = 0; # success
my $reason = "of an unknown error";

$| = 1;

my $dbh = &open_database();
if (defined($dbh)) {
   my $amount_entries =  4000000;
   my $commit_every    = 10000;

   # Sometimes, large data entries are done better with a COPY.

   $dbh->{AutoCommit} = 0; # Use transactions.

   foreach (('sol_f', 'sol_i', 'sol_d')) {
      # Using transactions should make insertion faster, since fsync()
      # are probably not required. However, when changes are very big,
      # it might actually make it slower or using much space, this
      # is why we have this $commit_every above and below.
      # was expecting a BEGIN WORK; but that
      # seem to be implicit.

      eval {
         my $i;
         my $failed_reason = "unknown db error";

     print "Populating " . $_ . " ...";
     for ($i = 0; ($i < $amount_entries) && ($result == 0); $i++) {
            my @titles = ('id', 'ref', 'sentpos', 'wordpos');
        if (!&do_query($dbh,
               "INSERT INTO " . $_ . "("
               . join(", ", @titles)
               . ") VALUES ("
               . join(", ", ('?') x @titles)
               . ")",
               \@titles,
               undef,
               [ int(rand(32768)),
                             'truc',
                             int(rand(32768)),
                             'temp'
                           ],
               undef,
                           \$failed_reason)) {
           $result = 1;
           $reason = "can't insert " . $i . ": " . $failed_reason;
        }
        else {
           if (($i % $commit_every) == 0) {
                  if ($dbh->commit) {
                     print "C ";
          }
          else {
             $result = 1;
             $reason = "can't commit: " . $dbh->errstr;
          }
           }
        }
     }

         if ($result == 0) {
            print " POPULATED.\n";
         }
         else {
            print " FAILED.\n";
         }
      }; # DB is not set to die mode, so we will catch only our bugs.
      if ($@) {
         print;
         $result = 1;
         $reason = "transaction failed: " . $@; # Not always right.
         $dbh->rollback; # res. ign. (in failure mode anyway)
      }
      elsif ($result) {
         $dbh->rollback; # res. ign. (in failure mode anyway)
      }
      else {
         if ($dbh->commit) {
            $result = 1;
        $reason = "can't commit: " . $dbh->errstr;
         }
         print "COMMITTED.\n";
      }
   }

   $dbh->{AutoCommit} = 1; # No transactions

   if (!$dbh->disconnect) {
      $result = 1;
      $reason = "disconnect error: " . $dbh->errstr;
   }
   undef $dbh;
}
else {
   # Obviously, can't use $dbh->errstr here.
   $reason = "can't database connect: " . $DBI::errstr;
   $result = 1;
}

if ($result) {
   print $0 . ": failed " . $result . " because " . $reason . "\n";
}
else {
   print "SUCCESSFUL.\n";
}
exit $result;

sub create_table {
   my($dbh, $name, $val) = @_;
   my $result = 0;

   my $sth = $dbh->prepare("CREATE TABLE $name ($val)");
   if (defined($sth)) {
      my $rv = $sth->execute;
      if (defined($rv)) {
         if ($debug) {
            print "$name: succeeded.\n";
         }

     $result = 1;
      }

      $sth->finish;
      undef $sth;
   }

   return $result;
}

And using COPY:

#! /usr/bin/perl -wI /home/schaefer/perl-libs
# USAGE
#    ./copy.pl | psql test_db
# $Id$

my $amount_entries = 4000000;
my $tell_every     =  100000;

print "COPY sol_f FROM stdin;\n";
my $i;
for ($i = 1; $i <= $amount_entries; $i++) {
   print int(rand(32768)) . "\t" . 'truc' . "\t" . int(rand(32768)) . "\t"
         . 'temp' . "\n";
   if (($i % $tell_every) == 0) {
      print STDERR $i . "\n";
   }
}
print ".\n";





Re: Re: Slowdown problem when writing 1.7million records

From
Tom Lane
Date:
"Gordan Bobic" <gordan@freeuk.com> writes:
> Are you using autocommit? Are you using fsync? You should probably
> periodically commit the data every few thousand inserts if you don't
> have autocommit enabled.
> [ various other speedup suggestions from other people ]

The thing that interests me here is the apparent slowdown as more data
is inserted.  I don't see a good reason for that, and none of the
suggested remedies seem likely to eliminate it if we don't know what's
causing it.

INSERT per se should be an essentially constant-time operation, since
it's just slapping another record at the end of the table.  Unless
you're using advanced features like triggers/rules/foreign keys,
it seems like any progressive slowdown would have to be blamed on
updates of indexes.  But that's not normally a performance problem
except in pathological cases (zillions of equal keys for example ---
but Stephen's only index is a PRIMARY KEY, therefore UNIQUE, therefore
no equal keys).  So I'm confused, and think this deserves more careful
examination as to *why* the slowdown, rather than just looking for
a quick&dirty workaround.

            regards, tom lane