Re: Slowdown problem when writing 1.7million records - Mailing list pgsql-general

From Tom Lane
Subject Re: Slowdown problem when writing 1.7million records
Date
Msg-id 5312.989536956@sss.pgh.pa.us
Whole thread Raw
In response to Re: Slowdown problem when writing 1.7million records  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses PostgreSQL: The elephant never forgets UPDATE
List pgsql-general
This is a followup to a problem report Stephen Livesey made back in
February, to the effect that successive insertions got slower and
slower.  At the time we speculated that btree indexes had a problem
with becoming out-of-balance when fed steadily increasing data values.
I have now tried to reproduce that effect --- and failed, in both
current sources and 7.0.2.

I did
    create table foo (f1 serial primary key);
and then ran a process that just did
    insert into foo default values;
over and over (each insert as a separate transaction).  This will result
in inserting a continually increasing sequence of key values into the
pkey index.

I ran this out to about 3.4million records (twice the number of records
Stephen used) on both 7.0.2 and current sources.  I do not see any real
slowdown in the insertion rate, and certainly not the drastic slowdown
that Stephen observed: he said

> I am now writing 1.7million records to this file.
>
> The first 100,000 records took 15mins.
> The next 100,000 records took 30mins
> The last 100,000 records took 4hours.
>
> In total, it took 43 hours to write 1.7million records.

It took me about 140 minutes to write 3.4million records, on a not
particularly fast machine; the insertion rate held pretty steady at
around 400 records/sec (fsync off).

So I no longer think that the problem was with the btree index.
Other possibilities are:

* If Stephen had fsync on, and his kernel was using a particularly
  stupid fsync algorithm, the time might all have been going into
  kernel searches of its buffer lists --- the files being fsync'd
  would've been growing and growing, and some fsync routines take
  time proportional to the number of buffered blocks.

* If Stephen had any relevant foreign-key checks (though he said not)
  and was inserting all the records in one transaction, then the known
  O(N^2) behavior in 7.0.*'s handling of deferred triggers could've
  been the culprit.  If so, this is fixed in 7.1.

* The problem might have been on the client side.  I'm not sure
  whether JDBC on Windows might suffer from memory leaks or anything
  like that.

Anyway: Stephen, if you haven't lost interest, we need to take another
look at your problem and figure out where the issue really is.

Bruce, please remove the TODO item
    * Be smarter about insertion of already-ordered data into btree index
It seems to have been a false alarm.

            regards, tom lane

pgsql-general by date:

Previous
From: "Richard Huxton"
Date:
Subject: Re: Distributed join query ?
Next
From: Tom Lane
Date:
Subject: Re: Re: Query not using index