Re: BUG #1552: massive performance hit between 7.4 and 8.0.1 - Mailing list pgsql-bugs

From Keith Browne
Subject Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date
Msg-id 4241C1DF.2020000@deepsky.com
Whole thread Raw
In response to Re: BUG #1552: massive performance hit between 7.4 and 8.0.1  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
List pgsql-bugs
Simon Riggs wrote:

> The EXPLAINs you've enclosed are for SELECTs, yet your bug report
> describes INSERTs as being the things that are slow.
> [You may find better performance from using COPY]

Simon,

Brian and I are working together on this problem.

We're starting with an empty database, creating four tables, and
populating those tables with a total of 180,000-200,000 rows.  Each
table has a primary key, and several of the tables reference foreign
keys in other tables.  We've written a Python script, using psycopg,
which executes all the queries to create the tables and insert the rows.
   The database is running on the same machine where the script runs.

I've seen similar performance when issuing a COMMIT after each
insertion, and also after batching insertions in blocks of 250 per
COMMIT, so batching the commits is not helping much.  I've looked at the
possibility of using COPY, but in our production environment it will be
prohibitive to build a flat file with all this data.  I'd rather
generate it on the fly, as we've been able to do with PostgreSQL 7.4.

> Also, your tests have compared two systems, so it might be that the
> hardware or configuration of one system is different from the other.

When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
able to insert all this data in 5-7 minutes.  It's taken a while to
install Postgres 8.0.1 on the same machine, but now I have, and it's
taking 40-45 minutes to run the same insert script.  This is similar to
the performance we saw on another machine, a fast single-CPU AMD64 box
running Gentoo.

I don't think it's a hardware issue.  I dug around a bit, and found
suggestions that this sort of problem could be worked around by breaking
the database connection and restarting it after the tables had been
partially filled.  I modified our script to break and re-establish the
database connection when each table first has 4,000 records inserted,
and the performance is greatly improved; it now takes only about 3.5
minutes to insert 180,000+ rows.

I've since modified this script to build and populate a fifth table with
over 1.3 million rows.  The fifth table has no primary key, but lists a
foreign key into one of the first four tables.  With the above
modification (break and re-build the DB connection after 4,000 rows have
been inserted), the whole database can be populated in about 15 minutes.
  I wouldn't have dared try to build a one-million-plus-row table until
I found this speed-up.

> If you could repeat the test on one single system, then this would
> assist in the diagnosis of this bug report. Also, if you could describe
> the workload that is giving you a problem more exactly, that would help.
> Specifically, can you confirm that you have run ANALYZE on the tables,
> and also give us some idea of numbers of rows in each table at the time
> you first run your programs.

Just to see if it would help, I tried modifying the script to run an
ANALYZE against each table after 4,000 insertions, instead of breaking
and re-establishing the DB connection.  I still saw ~45-minute times to
insert 180,000 rows.  I then tried running ANALYZE against each table
after *each* 4,000 rows inserted, and again, it took about 45 minutes to
run the insert.

Each table is empty when I first run the program.  I am dropping and
re-creating the database for each test run.

> There is clearly a problem, but it is not yet clearly a bug. If it is a
> bug, we're interested in solving it as much as you.

I'd be happy to run further tests or provide more details, if they'll
help.  We now have a workaround which is allowing us to proceed with our
project, but I'd like to know if there's another way to do this.  While
I understand that large or complex databases require careful tuning, I
was surprised to see a six- or seven-fold increase in run times between
PostgreSQL 7.4 and 8.0.1 on the same hardware, on an operation which
seems fairly straightforward: populating an empty table.

One other thing which puzzled me: as a test, I tried modifying our
script to spit out raw SQL statements instead of connecting to the
database and performing the inserts itself.  Normally, our script
populates two tables in one pass, and then populates the third and
fourth tables in a second pass.  I massaged the SQL by hand to group the
inserts together by table, so that the first table would be entirely
populated, then the second, etc.  When I ran this SQL script by piping
it straight into psql, it finished in about four minutes.  This is
comparable to the time it takes to run my modified script which breaks
and re-establishes the connection to the database.

It would appear that psql is doing something right here which we have
had to go out of our way to get with psycopg.

Keith Browne
tuxedo@deepsky.com

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] contrib module intagg crashing the backend
Next
From: Andrew - Supernews
Date:
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1