Thread: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES
The following bug has been logged online: Bug reference: 1541 Logged by: John Engelhart Email address: johne@zang.com PostgreSQL version: 8.0.1 Operating system: FreeBSD 5.3-p5 Description: Unusually long INSERT times after fresh clean/CREATE TABLES Details: Recently upgraded from 7.4.7 to 8.0.1 and have discovered an oddity. Since I'm developing an SQL based application, I routinely "start from scratch" with a script that deletes all the tables in my database and rebuilds them. A problem started when I upgraded from 7.4.7 to 8.0.1 in that the first run after the clean takes an UNUSUALLY long time to complete, on the order of a few hundred inserts/sec. A normal run takes ~30 seconds. A from scratch run takes ~15 minutes, with the next run completing in 30 seconds with zero changes. I believe I've eliminated any host or build anomolies due to a simple fact: Starting fresh, running the data inserter, and then aborting a few seconds later... then re-running the data-inserter runs at normal speed. It's as if it need a few records in the DB in order to work at normal speeds. It almost seems as if it's doing a worst case INSERT update if there are no primary keys when the BEGIN transaction starts? I have also dropped all CREATE INDEX statements from my rebuild script and it behaves exactly the same. The only other 'unusual' thing remaining would be the various SERIAL NOT NULL UNIQUE PRIMARY KEY statements. The specific tables and data are quite lengthy, and not appropriate for pasting to fully recreate the problem. I can send a tarball if requested. As an example of the times involved (same average times, always repeatable): [johne@new] simp_sql> time simp_sql <test8.cpp INFO: Opening database connection @136/initialize/main_sql.c INFO: Begining parse @47/main/main_sql.c INFO: GC- After parse: Heap 348Kb Free 144Kb @49/main/main_sql.c INFO: Performing house keeping and index rebuild @58/main/main_sql.c INFO: Marking all blocks @61/main/main_sql.c INFO: Performing house keeping and index rebuild @76/main/main_sql.c INFO: GC- Before exiting: Heap 348Kb Free 320Kb @79/main/main_sql.c INFO: Closing database connection @95/cleanup/main_sql.c 3.861u 3.138s 17:21.25 0.6% 83+4732k 0+0io 0pf+0w [johne@new] simp_sql> [IMMEDIATLY after, no accesses to the database anywhere inbetween] [johne@new] simp_sql> time simp_sql < test8.cpp INFO: Opening database connection @136/initialize/main_sql.c INFO: Begining parse @47/main/main_sql.c INFO: GC- After parse: Heap 348Kb Free 24Kb @49/main/main_sql.c INFO: Performing house keeping and index rebuild @58/main/main_sql.c INFO: Marking all blocks @61/main/main_sql.c INFO: Performing house keeping and index rebuild @76/main/main_sql.c INFO: GC- Before exiting: Heap 348Kb Free 324Kb @79/main/main_sql.c INFO: Closing database connection @95/cleanup/main_sql.c 2.990u 2.200s 0:40.00 12.9% 82+4398k 0+0io 0pf+0w [johne@new] simp_sql> FYI, my inserter uses the boehm garbage collection library, so the heap/free numbers are non deterministic between runs. This should not be a contributing issue at all. The "performing house cleaning" is a "vacuum full analyze" command since it's just inserted a large amount data both times. All the INSERTS are wrapped in a BEGIN/COMMIT pair (per run, not per statement). Now a clean refresh, run the inserter with a ^C break, and immediatly re-run it: [johne@new] simp_sql> reload [drops the database, creates it, then recreates the tables/functions] [johne@new] simp_sql> time simp_sql < test8.cpp INFO: Opening database connection @136/initialize/main_sql.c INFO: Begining parse @47/main/main_sql.c ^C0.094u 0.075s 0:01.04 15.3% 36+361k 0+0io 0pf+0w [johne@new] simp_sql> time simp_sql < test8.cpp INFO: Opening database connection @136/initialize/main_sql.c INFO: Begining parse @47/main/main_sql.c INFO: GC- After parse: Heap 348Kb Free 136Kb @49/main/main_sql.c INFO: Performing house keeping and index rebuild @58/main/main_sql.c INFO: Marking all blocks @61/main/main_sql.c INFO: Performing house keeping and index rebuild @76/main/main_sql.c INFO: GC- Before exiting: Heap 348Kb Free 324Kb @79/main/main_sql.c INFO: Closing database connection @95/cleanup/main_sql.c 2.943u 2.218s 0:33.20 15.5% 90+5051k 0+0io 0pf+0w [johne@new] simp_sql> Hopefully this is enough info to track down and recreate the problem independently. If not, I can send you the program, db dumps, etc to help you recreate it, it's a bit to big to include here.
"John Engelhart" <johne@zang.com> writes: > Since I'm developing an SQL based application, I routinely "start from > scratch" with a script that deletes all the tables in my database and > rebuilds them. A problem started when I upgraded from 7.4.7 to 8.0.1 in > that the first run after the clean takes an UNUSUALLY long time to complete, > on the order of a few hundred inserts/sec. A normal run takes ~30 seconds. > A from scratch run takes ~15 minutes, with the next run completing in 30 > seconds with zero changes. The 8.0 planner is intentionally sensitive to the current actual physical sizes of tables. It sounds like you've managed to get it to plan something on the assumption that the tables are tiny and keep using that plan after they aren't tiny any more. The old planner had the same kind of issue but it was far easier to hit, so "revert that change" isn't an answer that I'm particularly interested in. > Hopefully this is enough info to track down and recreate the problem Not really. regards, tom lane
Re: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES
From
John Engelhart
Date:
On Mar 14, 2005, at 12:13 AM, Tom Lane wrote: > "John Engelhart" <johne@zang.com> writes: >> Since I'm developing an SQL based application, I routinely "start from >> scratch" with a script that deletes all the tables in my database and >> rebuilds them. A problem started when I upgraded from 7.4.7 to 8.0.1 >> in >> that the first run after the clean takes an UNUSUALLY long time to >> complete, >> on the order of a few hundred inserts/sec. A normal run takes ~30 >> seconds. >> A from scratch run takes ~15 minutes, with the next run completing in >> 30 >> seconds with zero changes. > > The 8.0 planner is intentionally sensitive to the current actual > physical sizes of tables. It sounds like you've managed to get it to > plan something on the assumption that the tables are tiny and keep > using that plan after they aren't tiny any more. The old planner had > the same kind of issue but it was far easier to hit, so "revert that > change" isn't an answer that I'm particularly interested in. I found another data point yesterday. It seems to be session related. I tried various combinations of "after X number of statements, COMMIT, ANALYZE, CHECKPOINT" and none of them helped. The one that did help is after X number of statements, close the database handle and re-open it. So, my program run does about 60K SQL statements, inserting about 22K records. After 500 statements, I close the handle and re-open it. Completes in 30 seconds. >> Hopefully this is enough info to track down and recreate the problem > > Not really. Let me know if there's anything that you need. > regards, tom lane >
Re: BUG #1541: Unusually long INSERT times after fresh clean/CREATE TABLES
From
Andrew - Supernews
Date:
On 2005-03-14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The 8.0 planner is intentionally sensitive to the current actual > physical sizes of tables. It sounds like you've managed to get it to > plan something on the assumption that the tables are tiny and keep > using that plan after they aren't tiny any more. The old planner had > the same kind of issue but it was far easier to hit, so "revert that > change" isn't an answer that I'm particularly interested in. On the irc channel we were just investigating a very similar case to this (which will show up as a bug report here in due course, though not from me). It turns out that the scenario above is trivial to hit in 8.0 using referential constraints; RI triggers cache their plans, and on 8.0 the RI query is planned as a seqscan if the tables are freshly created. (On 7.4 the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.) What this basically means is that you can't do even a modest size insert of new data into a fresh schema without having to either trick the planner (e.g. enable_seqscan=false before the first insert) or defer the addition of constraints until after the data load (which can be quite inconvenient in some cases). -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > It turns out that the scenario above is trivial to hit in 8.0 using > referential constraints; RI triggers cache their plans, and on 8.0 the RI > query is planned as a seqscan if the tables are freshly created. (On 7.4 > the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.) Hm. One thing we could do is to throw in some default values when we see the table has exactly zero pages --- perhaps ye olde traditional 1000/10, or possibly something else, but anyway not exactly 0/0. The reason I thought we didn't need to do this sort of hack anymore is that pg_dump loads the tables first and then creates the RI constraints. What exactly is the common case where the wrong thing happens? regards, tom lane
On Fri, 18 Mar 2005, Tom Lane wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: > > It turns out that the scenario above is trivial to hit in 8.0 using > > referential constraints; RI triggers cache their plans, and on 8.0 the RI > > query is planned as a seqscan if the tables are freshly created. (On 7.4 > > the plan is an index scan, thanks to the default 1000 rows / 10 pages stats.) > > Hm. One thing we could do is to throw in some default values when we > see the table has exactly zero pages --- perhaps ye olde traditional > 1000/10, or possibly something else, but anyway not exactly 0/0. > > The reason I thought we didn't need to do this sort of hack anymore > is that pg_dump loads the tables first and then creates the RI > constraints. What exactly is the common case where the wrong thing > happens? Probably loading a schema only dump followed by a data load that doesn't turn off the constraint (as I believe that's non-default on data-only dumps now).
On 2005-03-19, Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: > On Fri, 18 Mar 2005, Tom Lane wrote: >> Hm. One thing we could do is to throw in some default values when we >> see the table has exactly zero pages --- perhaps ye olde traditional >> 1000/10, or possibly something else, but anyway not exactly 0/0. I don't think that would necessarily do the job. >> The reason I thought we didn't need to do this sort of hack anymore >> is that pg_dump loads the tables first and then creates the RI >> constraints. What exactly is the common case where the wrong thing >> happens? > > Probably loading a schema only dump followed by a data load that doesn't > turn off the constraint (as I believe that's non-default on data-only > dumps now). Both the original post in this thread, and the recent case seen on irc (the bug report for which I guess is in the moderation queue somewhere) were examples of data imports being done at the application level. i.e. the application is taking data from some external source and loading its tables as it would in normal usage. In the most recent example this amounted to under 200k rows, which was taking about 3 minutes to process on 7.4.x (clearly a reasonable time) and more than 90 minutes on 8.0.1 (clearly not reasonable). I know on the irc channel there have been at least a couple of reports of slowness on restoring data-only dumps, but I haven't been able to pin that down to the same problem (especially since such a restore should be filling the referenced table before the referencing table). -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services