Thread: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

From

"John Engelhart"

Date:

14 March 2005, 02:01:10

The following bug has been logged online:

Bug reference:      1541
Logged by:          John Engelhart
Email address:      johne@zang.com
PostgreSQL version: 8.0.1
Operating system:   FreeBSD 5.3-p5
Description:        Unusually long INSERT times after fresh clean/CREATE
TABLES
Details:

Recently upgraded from 7.4.7 to 8.0.1 and have discovered an oddity.

Since I'm developing an SQL based application, I routinely "start from
scratch" with a script that deletes all the tables in my database and
rebuilds them.  A problem started when I upgraded from 7.4.7 to 8.0.1 in
that the first run after the clean takes an UNUSUALLY long time to complete,
on the order of a few hundred inserts/sec.  A normal run takes ~30 seconds.
A from scratch run takes ~15 minutes, with the next run completing in 30
seconds with zero changes.

I believe I've eliminated any host or build anomolies due to a simple fact:
Starting fresh, running the data inserter, and then aborting a few seconds
later... then re-running the data-inserter runs at normal speed.  It's as if
it need a few records in the DB in order to work at normal speeds.  It
almost seems as if it's doing a worst case INSERT update if there are no
primary keys when the BEGIN transaction starts?

I have also dropped all CREATE INDEX statements from my rebuild script and
it behaves exactly the same.  The only other 'unusual' thing remaining would
be the various SERIAL NOT NULL UNIQUE PRIMARY KEY statements.

The specific tables and data are quite lengthy, and not appropriate for
pasting to fully recreate the problem.  I can send a tarball if requested.

As an example of the times involved (same average times, always
repeatable):

[johne@new] simp_sql> time simp_sql <test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
INFO: GC- After parse: Heap 348Kb Free 144Kb @49/main/main_sql.c
INFO: Performing house keeping and index rebuild @58/main/main_sql.c
INFO: Marking all blocks @61/main/main_sql.c
INFO: Performing house keeping and index rebuild @76/main/main_sql.c
INFO: GC- Before exiting: Heap 348Kb Free 320Kb @79/main/main_sql.c
INFO: Closing database connection @95/cleanup/main_sql.c
3.861u 3.138s 17:21.25 0.6%     83+4732k 0+0io 0pf+0w
[johne@new] simp_sql>

[IMMEDIATLY after, no accesses to the database anywhere inbetween]

[johne@new] simp_sql> time simp_sql < test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
INFO: GC- After parse: Heap 348Kb Free 24Kb @49/main/main_sql.c
INFO: Performing house keeping and index rebuild @58/main/main_sql.c
INFO: Marking all blocks @61/main/main_sql.c
INFO: Performing house keeping and index rebuild @76/main/main_sql.c
INFO: GC- Before exiting: Heap 348Kb Free 324Kb @79/main/main_sql.c
INFO: Closing database connection @95/cleanup/main_sql.c
2.990u 2.200s 0:40.00 12.9%     82+4398k 0+0io 0pf+0w
[johne@new] simp_sql>

FYI, my inserter uses the boehm garbage collection library, so the heap/free
numbers are non deterministic between runs.  This should not be a
contributing issue at all.  The "performing house cleaning" is a "vacuum
full analyze" command since it's just inserted a large amount data both
times.

All the INSERTS are wrapped in a BEGIN/COMMIT pair (per run, not per
statement).

Now a clean refresh, run the inserter with a ^C break, and immediatly re-run
it:

[johne@new] simp_sql> reload [drops the database, creates it, then recreates
the tables/functions]
[johne@new] simp_sql> time simp_sql < test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
^C0.094u 0.075s 0:01.04 15.3%   36+361k 0+0io 0pf+0w
[johne@new] simp_sql> time simp_sql < test8.cpp
INFO: Opening database connection @136/initialize/main_sql.c
INFO: Begining parse @47/main/main_sql.c
INFO: GC- After parse: Heap 348Kb Free 136Kb @49/main/main_sql.c
INFO: Performing house keeping and index rebuild @58/main/main_sql.c
INFO: Marking all blocks @61/main/main_sql.c
INFO: Performing house keeping and index rebuild @76/main/main_sql.c
INFO: GC- Before exiting: Heap 348Kb Free 324Kb @79/main/main_sql.c
INFO: Closing database connection @95/cleanup/main_sql.c
2.943u 2.218s 0:33.20 15.5%     90+5051k 0+0io 0pf+0w
[johne@new] simp_sql>

Hopefully this is enough info to track down and recreate the problem
independently.  If not, I can send you the program, db dumps, etc to help
you recreate it, it's a bit to big to include here.

Re: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

From

Tom Lane

Date:

14 March 2005, 05:14:03

"John Engelhart" <johne@zang.com> writes:
> Since I'm developing an SQL based application, I routinely "start from
> scratch" with a script that deletes all the tables in my database and
> rebuilds them.  A problem started when I upgraded from 7.4.7 to 8.0.1 in
> that the first run after the clean takes an UNUSUALLY long time to complete,
> on the order of a few hundred inserts/sec.  A normal run takes ~30 seconds.
> A from scratch run takes ~15 minutes, with the next run completing in 30
> seconds with zero changes.

The 8.0 planner is intentionally sensitive to the current actual
physical sizes of tables.  It sounds like you've managed to get it to
plan something on the assumption that the tables are tiny and keep
using that plan after they aren't tiny any more.  The old planner had
the same kind of issue but it was far easier to hit, so "revert that
change" isn't an answer that I'm particularly interested in.

> Hopefully this is enough info to track down and recreate the problem

Not really.

            regards, tom lane

Re: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

From

John Engelhart

Date:

18 March 2005, 05:19:42

On Mar 14, 2005, at 12:13 AM, Tom Lane wrote:

> "John Engelhart" <johne@zang.com> writes:
>> Since I'm developing an SQL based application, I routinely "start from
>> scratch" with a script that deletes all the tables in my database and
>> rebuilds them.  A problem started when I upgraded from 7.4.7 to 8.0.1
>> in
>> that the first run after the clean takes an UNUSUALLY long time to
>> complete,
>> on the order of a few hundred inserts/sec.  A normal run takes ~30
>> seconds.
>> A from scratch run takes ~15 minutes, with the next run completing in
>> 30
>> seconds with zero changes.
>
> The 8.0 planner is intentionally sensitive to the current actual
> physical sizes of tables.  It sounds like you've managed to get it to
> plan something on the assumption that the tables are tiny and keep
> using that plan after they aren't tiny any more.  The old planner had
> the same kind of issue but it was far easier to hit, so "revert that
> change" isn't an answer that I'm particularly interested in.

I found another data point yesterday.  It seems to be session related.
I tried various combinations of "after X number of statements, COMMIT,
ANALYZE, CHECKPOINT" and none of them helped.  The one that did help is
after X number of statements, close the database handle and re-open it.
  So, my program run does about 60K SQL statements, inserting about 22K
records.  After 500 statements, I close the handle and re-open it.
Completes in 30 seconds.

>> Hopefully this is enough info to track down and recreate the problem
>
> Not really.

Let me know if there's anything that you need.

>             regards, tom lane
>

Re: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

From

Andrew - Supernews

Date:

18 March 2005, 23:53:03

On 2005-03-14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The 8.0 planner is intentionally sensitive to the current actual
> physical sizes of tables.  It sounds like you've managed to get it to
> plan something on the assumption that the tables are tiny and keep
> using that plan after they aren't tiny any more.  The old planner had
> the same kind of issue but it was far easier to hit, so "revert that
> change" isn't an answer that I'm particularly interested in.

On the irc channel we were just investigating a very similar case to this
(which will show up as a bug report here in due course, though not from me).

It turns out that the scenario above is trivial to hit in 8.0 using
referential constraints; RI triggers cache their plans, and on 8.0 the RI
query is planned as a seqscan if the tables are freshly created. (On 7.4
the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.)

What this basically means is that you can't do even a modest size insert
of new data into a fresh schema without having to either trick the planner
(e.g. enable_seqscan=false before the first insert) or defer the addition of
constraints until after the data load (which can be quite inconvenient in
some cases).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES

From

Tom Lane

Date:

19 March 2005, 04:45:38

Andrew - Supernews <andrew+nonews@supernews.com> writes:
> It turns out that the scenario above is trivial to hit in 8.0 using
> referential constraints; RI triggers cache their plans, and on 8.0 the RI
> query is planned as a seqscan if the tables are freshly created. (On 7.4
> the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.)

Hm.  One thing we could do is to throw in some default values when we
see the table has exactly zero pages --- perhaps ye olde traditional
1000/10, or possibly something else, but anyway not exactly 0/0.

The reason I thought we didn't need to do this sort of hack anymore
is that pg_dump loads the tables first and then creates the RI
constraints.  What exactly is the common case where the wrong thing
happens?

            regards, tom lane

Re: BUG #1541: Unusually long INSERT times after fresh

From

Stephan Szabo

Date:

19 March 2005, 16:18:56

On Fri, 18 Mar 2005, Tom Lane wrote:

> Andrew - Supernews <andrew+nonews@supernews.com> writes:
> > It turns out that the scenario above is trivial to hit in 8.0 using
> > referential constraints; RI triggers cache their plans, and on 8.0 the RI
> > query is planned as a seqscan if the tables are freshly created. (On 7.4
> > the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.)
>
> Hm.  One thing we could do is to throw in some default values when we
> see the table has exactly zero pages --- perhaps ye olde traditional
> 1000/10, or possibly something else, but anyway not exactly 0/0.
>
> The reason I thought we didn't need to do this sort of hack anymore
> is that pg_dump loads the tables first and then creates the RI
> constraints.  What exactly is the common case where the wrong thing
> happens?

Probably loading a schema only dump followed by a data load that doesn't
turn off the constraint (as I believe that's non-default on data-only
dumps now).

Re: BUG #1541: Unusually long INSERT times after fresh

From

Andrew - Supernews

Date:

19 March 2005, 18:59:35

On 2005-03-19, Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
> On Fri, 18 Mar 2005, Tom Lane wrote:
>> Hm.  One thing we could do is to throw in some default values when we
>> see the table has exactly zero pages --- perhaps ye olde traditional
>> 1000/10, or possibly something else, but anyway not exactly 0/0.

I don't think that would necessarily do the job.

>> The reason I thought we didn't need to do this sort of hack anymore
>> is that pg_dump loads the tables first and then creates the RI
>> constraints.  What exactly is the common case where the wrong thing
>> happens?
>
> Probably loading a schema only dump followed by a data load that doesn't
> turn off the constraint (as I believe that's non-default on data-only
> dumps now).

Both the original post in this thread, and the recent case seen on irc
(the bug report for which I guess is in the moderation queue somewhere)
were examples of data imports being done at the application level. i.e.
the application is taking data from some external source and loading its
tables as it would in normal usage. In the most recent example this
amounted to under 200k rows, which was taking about 3 minutes to process
on 7.4.x (clearly a reasonable time) and more than 90 minutes on 8.0.1
(clearly not reasonable).

I know on the irc channel there have been at least a couple of reports of
slowness on restoring data-only dumps, but I haven't been able to pin that
down to the same problem (especially since such a restore should be filling
the referenced table before the referencing table).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services