Thread: optimizing postgres

optimizing postgres

From
lawpoop@gmail.com
Date:
Hello all -

I'm working on a postgres project after coming from a MySQL background
( no flames, please :). We are importing fairly large xml datasets
( 10-20 MB of xml files per 'project', currently 5 projects) into the
database for querying.

We are using PHP to create a web interface where users can create
projects and upload their files. We have a parsing routine that gets
the xml data into the database.

The parsing has turned out to be pretty intense. It takes about 10-20
minutes for any project. When we are parsing data, it really slows
down the site's  response. I tested serving static webpages from
apache, endless loops in php , but the choke point seems to be doing
any other query on postgres when constructing a php page during
parsing.

As an example, the original designer specified separate tables for
each project. Since they were all the same structure, I suggested
combining them into a single table with a project_id column, but he
said it would take too long to query. I was suspicious, but I went
with his design anyway.

It turned out he was right for our current set up. When I needed to
empty the project table to re-parse data, doing a cascading delete
could take up to 10 minutes! I cut re-parsing time in half by just
dropping the table and creating a new one. Which was an okay thing to
do, since the table only belonged to one project anyway. But I hate to
think how long it would have taken to do a delete, cascading to child
tables, if all the project data was in a single table.

Since I'm not an expert in Postgres database design, I'm assuming I've
done something sub-optimal. Are there some common techniques for
tuning postgres performance? Do we need beefier hardware?

Or is it a problem with how PHP or apache pools connections to the
database?


Re: optimizing postgres

From
Stephen Frost
Date:
* lawpoop@gmail.com (lawpoop@gmail.com) wrote:
> Since I'm not an expert in Postgres database design, I'm assuming I've
> done something sub-optimal. Are there some common techniques for
> tuning postgres performance? Do we need beefier hardware?

Honestly, it sounds like the database design might be the best place to
start.  Can you provide the schema definition and queries (the actual
queries and the 'explain' output from them)?  10-20MB is certainly small
enough that you'd have to be doing something particularly terrible to
make it slow on any decent hardware...

> Or is it a problem with how PHP or apache pools connections to the
> database?

This seems unlikely to be the issue..  If you're doing alot of web page
loads and they were all sluggish or something I might say you want to
make sure you're using connection pooling to Postgres but it sounds like
you've got a different problem (perhaps a constraint against a column
which doesn't have an index?).

    Thanks,

        Stephen

Attachment

Re: optimizing postgres

From
Tom Lane
Date:
lawpoop@gmail.com writes:
> It turned out he was right for our current set up. When I needed to
> empty the project table to re-parse data, doing a cascading delete
> could take up to 10 minutes!

You mean ON CASCADE DELETE foreign keys?  Usually the reason that's
slow is you forgot to put an index on the referencing column.  PG
doesn't force you to have such an index, but unless the referenced
table is nearly static you'll want one.

I too am fairly suspicious of the N-tables-are-faster-than-another-
key-column mindset, but you'd need to do some actual experimentation
(with correctly optimized table definitions ;-)) to be sure.

            regards, tom lane

Re: optimizing postgres

From
Zlatko Matić
Date:
Hello, Tom.
I don't understand relation between constraints and indexes.
By using EMS PostgreSQL Manager Lite, I created indexes on columns, some of
them are unique values.
But when I open it in PgAdmin, all such "unique" indexes are listed as
constraints and there are no  indexes in Indexes section. When I open it
again in EMS PostgreSQL Manager, they are listed as "Indexes".
Does it mean that I need to create additional indexes on the same columns?
Is "Constrain" index as well?
Thanks,

Zlatko


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: <lawpoop@gmail.com>
Cc: <pgsql-general@postgresql.org>
Sent: Friday, July 13, 2007 3:39 AM
Subject: Re: [GENERAL] optimizing postgres


> lawpoop@gmail.com writes:
>> It turned out he was right for our current set up. When I needed to
>> empty the project table to re-parse data, doing a cascading delete
>> could take up to 10 minutes!
>
> You mean ON CASCADE DELETE foreign keys?  Usually the reason that's
> slow is you forgot to put an index on the referencing column.  PG
> doesn't force you to have such an index, but unless the referenced
> table is nearly static you'll want one.
>
> I too am fairly suspicious of the N-tables-are-faster-than-another-
> key-column mindset, but you'd need to do some actual experimentation
> (with correctly optimized table definitions ;-)) to be sure.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match


Re: optimizing postgres

From
Richard Huxton
Date:
Zlatko Matić wrote:
> Hello, Tom.
> I don't understand relation between constraints and indexes.
> By using EMS PostgreSQL Manager Lite, I created indexes on columns, some
> of them are unique values.
> But when I open it in PgAdmin, all such "unique" indexes are listed as
> constraints and there are no  indexes in Indexes section. When I open it
> again in EMS PostgreSQL Manager, they are listed as "Indexes".
> Does it mean that I need to create additional indexes on the same columns?
> Is "Constrain" index as well?

The default tool is "psql", the command-line client. If in doubt, refer
to that (and the manuals).

In this case, a quick look at the manuals cover it:
   http://www.postgresql.org/docs/8.2/static/indexes-unique.html
"PostgreSQL automatically creates a unique index when a unique
constraint or a primary key is defined for a table. The index covers the
columns that make up the primary key or unique columns (a multicolumn
index, if appropriate), and is the mechanism that enforces the constraint."

--
   Richard Huxton
   Archonet Ltd

Re: optimizing postgres

From
PFC
Date:
> The parsing has turned out to be pretty intense. It takes about 10-20
> minutes for any project. When we are parsing data, it really slows
> down the site's  response. I tested serving static webpages from
> apache, endless loops in php , but the choke point seems to be doing
> any other query on postgres when constructing a php page during
> parsing.

    Do you do lots of INSERTs without explicitly using transactions ?
    You also need to run EXPLAIN ANALYZE on your most frequent queries.
    It is very possible the slowdown is just from a forgotten index.

> As an example, the original designer specified separate tables for
> each project. Since they were all the same structure, I suggested
> combining them into a single table with a project_id column, but he
> said it would take too long to query. I was suspicious, but I went
> with his design anyway.

    From the small size of the dataset I don't see a justification for this...

> It turned out he was right for our current set up. When I needed to
> empty the project table to re-parse data, doing a cascading delete
> could take up to 10 minutes! I cut re-parsing time in half by just
> dropping the table and creating a new one. Which was an okay thing to
> do, since the table only belonged to one project anyway. But I hate to
> think how long it would have taken to do a delete, cascading to child
> tables, if all the project data was in a single table.

    That's probably because you forgot to create an index on the referenced
column. They are not created automatically.

> Since I'm not an expert in Postgres database design, I'm assuming I've
> done something sub-optimal. Are there some common techniques for
> tuning postgres performance? Do we need beefier hardware?
>
> Or is it a problem with how PHP or apache pools connections to the
> database?

    It depends on a lot of stuff, but the first thing is to run EXPLAIN
ANALYZE on your queries and post the results here.