Thread: DELETE performance issues

DELETE performance issues

From
"Schwenker, Stephen"
Date:
Hello,
 
I'm having a performance issue with postgresql.  I'm using it with JBoss BPM api and when I use the api to delete a process instance from the database, it seems like the delete requests are taking quite a long time to delete.  I've also noticed that when I execute the delete, the cpu load on the postgresql process on the server ramps up to 100%.  Can someone tell me what logging I can turn on so I can figure out what is causing the high load and slow deletes? I can then use that info to tune the runtime parameters.
 
Thank you,
 
 
Steve.
 

Re: DELETE performance issues

From
Michael Fuhr
Date:
On Fri, Oct 27, 2006 at 03:40:51PM -0400, Schwenker, Stephen wrote:
> I'm having a performance issue with postgresql.  I'm using it with JBoss
> BPM api and when I use the api to delete a process instance from the
> database, it seems like the delete requests are taking quite a long time
> to delete.

How long is "quite a long time"?  How long are you expecting the
delete to take?  Do you get better performance if you don't use the
API, like when executing the same statement in psql?  How many rows
are you deleting?  How many rows are in the table?  Is the delete
able to use an index?  Do you have unindexed foreign key references
to the table you're deleting from?  What does EXPLAIN ANALYZE show?
Are you regularly vacuuming and analyzing the table?  What version
of PostgreSQL are you using?  What OS and hardware?

> Can someone tell me what logging I can turn on so I can figure out
> what is causing the high load and slow deletes? I can then use that
> info to tune the runtime parameters.

What non-default settings are you currently using?

--
Michael Fuhr

Re: DELETE performance issues

From
"Schwenker, Stephen"
Date:
Hey,

The api uses hibernate to do all the work.  Here is a bit of the log of
the sql queries it does.  The JBPM_bytelock queries seem to execute
quite fast but when the api starts switching back and forth from
JBPM_BYTEARRAY and JBPM_VARIABLEINSTANCE it seems to run slow.  Like
probably about 1 second for each set of 2 deletes.  Which means that
they are taking approximately 500ms to execute each one which seems slow
because the application can create records faster than then can be
deleted.  I'm working on a way to bulk purge the database but I haven't
got there yet so I'm just trying to tune the current process.

Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_BYTEBLOCK where PROCESSFILE_=?
Hibernate: delete from JBPM_MODULEINSTANCE where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_VARIABLEINSTANCE where ID_=?
Hibernate: delete from JBPM_BYTEARRAY where ID_=?
Hibernate: delete from JBPM_TOKENVARIABLEMAP where ID_=?
Hibernate: delete from JBPM_MODULEINSTANCE where ID_=?
Hibernate: delete from JBPM_PROCESSINSTANCE where ID_=? and VERSION_=?
Hibernate: delete from JBPM_TOKEN where ID_=? and VERSION_=?


Now, to answer your questions,

All the tables have primary keys and when I explain the queries, they
use the primary key index.  I have increased the shared memory which
only seems to help when the same table is accessed synchronously but
this switching back and forth between tables seems to be slow.  There
are approximately 600,000 rows in the tables.  And these tables do not
have any references point to them but they do reference other tables.
The autovacuum is on.

We're using Postgresql 8.1.4 and the server is a dual 3.2ghz cpu linux
system.

I'm guessing that the problem lies in loading the indexes off the disk
each time the table switches.  Is there a way to view what Postgres is
doing in the back end.  Ie,  loading index from disk or using it in
memory....  Is there a way to see what memory is being used and what's
in it?

Here is my current config changes...
shared_buffers = 24000   # 192MB
work_mem = 50240
max_fsm_pages = 40000
max_fsm_relations = 2000


bgwriter_lru_percent = 50.0
bgwriter_lru_maxpages = 50


wal_buffers = 64


I hope you can all help.

Thank you,


Steve.


-----Original Message-----
From: Michael Fuhr [mailto:mike@fuhr.org]
Sent: Sunday, October 29, 2006 11:01 AM
To: Schwenker, Stephen
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] DELETE performance issues

On Fri, Oct 27, 2006 at 03:40:51PM -0400, Schwenker, Stephen wrote:
> I'm having a performance issue with postgresql.  I'm using it with
> JBoss BPM api and when I use the api to delete a process instance from

> the database, it seems like the delete requests are taking quite a
> long time to delete.

How long is "quite a long time"?  How long are you expecting the delete
to take?  Do you get better performance if you don't use the API, like
when executing the same statement in psql?  How many rows are you
deleting?  How many rows are in the table?  Is the delete able to use an
index?  Do you have unindexed foreign key references to the table you're
deleting from?  What does EXPLAIN ANALYZE show?
Are you regularly vacuuming and analyzing the table?  What version of
PostgreSQL are you using?  What OS and hardware?

> Can someone tell me what logging I can turn on so I can figure out
> what is causing the high load and slow deletes? I can then use that
> info to tune the runtime parameters.

What non-default settings are you currently using?

--
Michael Fuhr

Re: DELETE performance issues

From
Tom Lane
Date:
"Schwenker, Stephen" <SSchwenker@thestar.ca> writes:
> All the tables have primary keys and when I explain the queries, they
> use the primary key index.  I have increased the shared memory which
> only seems to help when the same table is accessed synchronously but
> this switching back and forth between tables seems to be slow.  There
> are approximately 600,000 rows in the tables.  And these tables do not
> have any references point to them but they do reference other tables.

Are you absolutely sure about that last?  Unindexed foreign key
references are the explanation nine times out of ten when someone
complains about deletes being slow.

> I'm guessing that the problem lies in loading the indexes off the disk
> each time the table switches.

For tables with only 600K rows the indexes should be plenty small enough
to fit in memory on any modern machine.  I'm thinking maybe the problem
is enormous index bloat ... does REINDEXing the tables improve matters?

> max_fsm_pages = 40000

That seems mighty small :-(.  Do a database-wide VACUUM VERBOSE and see
what it says about FSM usage in the last few lines of output.

            regards, tom lane

Re: DELETE performance issues

From
Reece Hart
Date:
On Thu, 2006-11-02 at 11:27 -0500, Tom Lane wrote:
Are you absolutely sure about that last?  Unindexed foreign key
references are the explanation nine times out of ten when someone
complains about deletes being slow.

This was certainly the major cause when I had slow deletes. (The other problem was chaining of cascading deletes.) In order to help correct such problems, I wrote some views to identify unindexed, cascading foreign keys. An example:

rkh@csb-dev=> select * from pgutils.foreign_keys_missing_indexes ;fk_namespace | fk_relation  |      fk_column      | fk_indexed | pk_namespace | pk_relation |  pk_column  | pk_indexed | ud ...
--------------+--------------+---------------------+------------+--------------+-------------+-------------+------------+--- ...gong         | node         | alias_id            | f          | gong         | alias       | alias_id    | t          | cn ...taxonomy     | node         | division_id         | f          | taxonomy     | division    | division_id | t          | cc ...gong         | alias        | go_id               | f          | gong         | node        | go_id       | t          | cc ...
etc...

ud is an abbreviation for update and delete constraint type (cascade, set null, restrict, etc).

In this view, "indexed" means that the column is the first or only column in some index, i.e., pg_index.indkey[0] = pg_attribute.attnum. I suppose that one might want to distinguish the indexing cases more precisely as unindexed, sole-column index, first col of mult-col index, second col of multi-col index, etc, but I didn't do so. The views were originally written for 7.4 and I don't know what's appropriate for current multicolumn index behavior.

The code is in http://harts.net/reece/pgutils/ .

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Re: DELETE performance issues

From
"Schwenker, Stephen"
Date:
Hey,
 
I've gone and increased shared memory and now the deletes seem to be really fast.  I guess it had to do with postgresql not being able to keep all the tables/indexes in memory and having to read/write everything from the disk.
 
I'll look at the utils anyway and see if there really is unindexed foreign keys, but from first glance it looks like they're all indexed.
 
Thanks for your suggestion.
 
Steve.


From: Reece Hart [mailto:reece@harts.net]
Sent: Thursday, November 02, 2006 2:53 PM
To: Tom Lane
Cc: Schwenker, Stephen; Michael Fuhr; pgsql-general@postgresql.org
Subject: Re: [GENERAL] DELETE performance issues

On Thu, 2006-11-02 at 11:27 -0500, Tom Lane wrote:
Are you absolutely sure about that last?  Unindexed foreign key
references are the explanation nine times out of ten when someone
complains about deletes being slow.

This was certainly the major cause when I had slow deletes. (The other problem was chaining of cascading deletes.) In order to help correct such problems, I wrote some views to identify unindexed, cascading foreign keys. An example:

rkh@csb-dev=> select * from pgutils.foreign_keys_missing_indexes ;fk_namespace | fk_relation  |      fk_column      | fk_indexed | pk_namespace | pk_relation |  pk_column  | pk_indexed | ud ...
--------------+--------------+---------------------+------------+--------------+-------------+-------------+------------+--- ...gong         | node         | alias_id            | f          | gong         | alias       | alias_id    | t          | cn ...taxonomy     | node         | division_id         | f          | taxonomy     | division    | division_id | t          | cc ...gong         | alias        | go_id               | f          | gong         | node        | go_id       | t          | cc ...
etc...

ud is an abbreviation for update and delete constraint type (cascade, set null, restrict, etc).

In this view, "indexed" means that the column is the first or only column in some index, i.e., pg_index.indkey[0] = pg_attribute.attnum. I suppose that one might want to distinguish the indexing cases more precisely as unindexed, sole-column index, first col of mult-col index, second col of multi-col index, etc, but I didn't do so. The views were originally written for 7.4 and I don't know what's appropriate for current multicolumn index behavior.

The code is in http://harts.net/reece/pgutils/ .

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0