Thread: slow DELETE on 12 M row table

slow DELETE on 12 M row table

From
Janet Jacobsen
Date:
Hi.  We are running Postgres 8.3.7 on an eight-processor Linux system.
Because the machine has very little local disk, the database files are on
a file system running GPFS.

The machine is mostly dedicated to processing images.  After the images
are processed, the image attributes and processing parameters are
written to the database.  This is repeated nightly.

Generally, unless image processing is taking place, queries are pretty
fast - ms to seconds.  But deletes are very slow from the largest tables,
which currently have 12 M rows: on the order of four minutes for 60 rows.
We don't have to do a lot of deletions, but do need to be able to do some
from time to time, and generally several thousand at a time.

We also don't have many users - generally no more than one to five
connections at a time.

While waiting for some deletions, I went to search.postgresql.org and
typed "slow delete" in the search field.

Per what I read I tried "explain analyze delete...":
> subtest=> explain analyze delete from table1 where id > 11592550;
>                                                              QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using table1_pkey on table1  (cost=0.00..2565.25 rows=1136
> width=6) (actual time=77.819..107.476 rows=50 loops=1)
>    Index Cond: (id > 11592550)
>  Trigger for constraint table2_table1_id_fkey: time=198484.158 calls=50
>  Total runtime: 198591.894 ms
> (4 rows)
which immediately showed me that I had forgotten about the foreign key in
another table that references the primary key in the table where
I am trying to do the deletions: table2.table1_id -> table1.id.

The posts I read  and the time above suggest that I should create an
index on
the foreign key constraint field in table2 , so I am waiting for that
index to be
created.

My questions are:
(1) is my interpretation of the posts correct, i.e., if I am deleting
rows from
table1, where the pkey of table 1 is a fkey in table 2, then do I need
to create an
index on the fkey field in table 2?
(2) do you have any suggestions on how I can determine why it is taking
several hours to create an index on a field in a table with 12 M rows?  does
that seem like a reasonable amount of time?  I have maintenance_work_mem
set to 512MB - is that too low, or is that the wrong config parameter to
change?
[ps aux shows "CREATE INDEX waiting"; there is nothing (no image processing)
running on the machine at this time]
(3) would I be better off dropping the foreign keys?  in general, is it
workable to
have foreign keys on tables with > 100 M rows (assuming I create all of
the 'right'
indexes)?

Thank you,
Janet







Re: slow DELETE on 12 M row table

From
Greg Stark
Date:
On Fri, Jun 26, 2009 at 3:33 AM, Janet Jacobsen<jsjacobsen@lbl.gov> wrote:
> (1) is my interpretation of the posts correct, i.e., if I am deleting
> rows from
> table1, where the pkey of table 1 is a fkey in table 2, then do I need
> to create an
> index on the fkey field in table 2?

Exactly right. The index on the table2 is optional but deletes and
updates on table1 will be very slow without it as it has to do a full
table scan of table2 to ensure no references remain.

> (2) do you have any suggestions on how I can determine why it is taking
> several hours to create an index on a field in a table with 12 M rows?  does
> that seem like a reasonable amount of time?  I have maintenance_work_mem
> set to 512MB - is that too low, or is that the wrong config parameter to
> change?
> [ps aux shows "CREATE INDEX waiting"; there is nothing (no image processing)
> running on the machine at this time]

512MB is a perfectly reasonable maintenance_work_mem. Larger than that
is overkill.

"waiting" means it's blocked trying to acquire a lock. Some open
transaction has the table you're trying to index locked. Look in
pg_locks and pg_stat_activity to find out who.


> (3) would I be better off dropping the foreign keys?  in general, is it
> workable to
> have foreign keys on tables with > 100 M rows (assuming I create all of
> the 'right'
> indexes)?

If you have the right indexes then the size of the table shouldn't be
a large factor. The number of transactions per second being processed
are perhaps more of a factor but even on very busy systems, most of
the time foreign key constraints aren't a problem to keep.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: slow DELETE on 12 M row table

From
Richard Huxton
Date:
Greg Stark wrote:
> "waiting" means it's blocked trying to acquire a lock. Some open
> transaction has the table you're trying to index locked. Look in
> pg_locks and pg_stat_activity to find out who.

Or you might find CREATE INDEX CONCURRENTLY fits your situation.

http://www.postgresql.org/docs/8.3/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

--
   Richard Huxton
   Archonet Ltd

Re: slow DELETE on 12 M row table

From
Janet Jacobsen
Date:
Thank you for the answers.  Very helpful.

Between the time that I sent my original post and saw your reply,
I tried to drop a couple of foreign key constraints.  The alter
table statements also showed up as "waiting" when I ran ps aux. 

I took your suggestion to run pg_locks and pg_stat_activity.
pg_stat_activity showed that I had three statements that were
waiting, and that there was one user whose query was given
as "<insufficient privilege>".  I killed the process associated
with that user, and my three waiting statements executed
immediately. 

I assume that killing the user's process released the lock on the
table.  This user has only SELECT privileges.  Under what
conditions would a SELECT lock a table.  The user connects
to the database via a (Python?) script that runs on another
machine.  Would this way of connecting to the database result
in a lock?

Thanks,
Janet


On 25/06/2009 08:17 p.m., Greg Stark wrote:
On Fri, Jun 26, 2009 at 3:33 AM, Janet Jacobsen<jsjacobsen@lbl.gov> wrote: 
(1) is my interpretation of the posts correct, i.e., if I am deleting
rows from
table1, where the pkey of table 1 is a fkey in table 2, then do I need
to create an
index on the fkey field in table 2?   
Exactly right. The index on the table2 is optional but deletes and
updates on table1 will be very slow without it as it has to do a full
table scan of table2 to ensure no references remain.
 
(2) do you have any suggestions on how I can determine why it is taking
several hours to create an index on a field in a table with 12 M rows?  does
that seem like a reasonable amount of time?  I have maintenance_work_mem
set to 512MB - is that too low, or is that the wrong config parameter to
change?
[ps aux shows "CREATE INDEX waiting"; there is nothing (no image processing)
running on the machine at this time]   
512MB is a perfectly reasonable maintenance_work_mem. Larger than that
is overkill.

"waiting" means it's blocked trying to acquire a lock. Some open
transaction has the table you're trying to index locked. Look in
pg_locks and pg_stat_activity to find out who.

 
(3) would I be better off dropping the foreign keys?  in general, is it
workable to
have foreign keys on tables with > 100 M rows (assuming I create all of
the 'right'
indexes)?   
If you have the right indexes then the size of the table shouldn't be
a large factor. The number of transactions per second being processed
are perhaps more of a factor but even on very busy systems, most of
the time foreign key constraints aren't a problem to keep.
 

Re: slow DELETE on 12 M row table

From
Marcin Stępnicki
Date:
On Fri, Jun 26, 2009 at 9:34 AM, Janet Jacobsen<jsjacobsen@lbl.gov> wrote:

> I assume that killing the user's process released the lock on the
> table.  This user has only SELECT privileges.  Under what
> conditions would a SELECT lock a table.  The user connects
> to the database via a (Python?) script that runs on another
> machine.  Would this way of connecting to the database result
> in a lock?

Was this process 'idle in transaction' perhaps? Does this Python
script use any ORM, like SQLAlchemy? If not, which library does it use
to connect? If it's psycopg2, which isolation level (autocommit, read
committed, serializable) is set?

Regards,
Marcin

Re: slow DELETE on 12 M row table

From
Janet Jacobsen
Date:
Hi.  The user in question is using psycopg2, which he uses
psycopg2:
> import psycopg2
> conn = psycopg2.connect("dbname=%s  user=%s host=%s password=%s port=%s" ...)
> pg_cursor = conn.cursor()
> pg_cursor.execute(<select string>)
> rows = pg_cursor.fetchall()
Note that
(1) he said that he does not set an isolation level, and
(2) he does not close the database connection after the
fetchall - instead he has a Python sleep command, so
he is checking the database every 60 s to see whether
new entries have been added to a given table.  (His
code is part of the analysis pipeline - we process the
image data and load it into the database, and other
groups fetch the data from the database and do some
analyses.)

Yes, it is the case that the user's process shows up in
ps aux as "idle in transaction".

What would you recommend in this case?  Should the
user set the isolation_level for psycopg, and if so to what?

Is there any Postgres configuration parameter that I
should set?

Should the user close the database connection after
every fetchall?

Thank you for any help you can give.

Janet


Marcin Stępnicki wrote
> On Fri, Jun 26, 2009 at 9:34 AM, Janet Jacobsen<jsjacobsen@lbl.gov> wrote:
>
>
>> I assume that killing the user's process released the lock on the
>> table.  This user has only SELECT privileges.  Under what
>> conditions would a SELECT lock a table.  The user connects
>> to the database via a (Python?) script that runs on another
>> machine.  Would this way of connecting to the database result
>> in a lock?
>>
>
> Was this process 'idle in transaction' perhaps? Does this Python
> script use any ORM, like SQLAlchemy? If not, which library does it use
> to connect? If it's psycopg2, which isolation level (autocommit, read
> committed, serializable) is set?
>
> Regards,
> Marcin
>
>


Re: slow DELETE on 12 M row table

From
Robert Haas
Date:
2009/6/26 Janet Jacobsen <jsjacobsen@lbl.gov>:
> Hi.  The user in question is using psycopg2, which he uses
> psycopg2:
>> import psycopg2
>> conn = psycopg2.connect("dbname=%s  user=%s host=%s password=%s port=%s" ...)
>> pg_cursor = conn.cursor()
>> pg_cursor.execute(<select string>)
>> rows = pg_cursor.fetchall()
> Note that
> (1) he said that he does not set an isolation level, and
> (2) he does not close the database connection after the
> fetchall - instead he has a Python sleep command, so
> he is checking the database every 60 s to see whether
> new entries have been added to a given table.  (His
> code is part of the analysis pipeline - we process the
> image data and load it into the database, and other
> groups fetch the data from the database and do some
> analyses.)
>
> Yes, it is the case that the user's process shows up in
> ps aux as "idle in transaction".
>
> What would you recommend in this case?  Should the
> user set the isolation_level for psycopg, and if so to what?
>
> Is there any Postgres configuration parameter that I
> should set?
>
> Should the user close the database connection after
> every fetchall?

You need to COMMIT or ROLLBACK the in-process transaction and then not
start a new transaction until you're ready to execute the next query.
Possibly calling .commit() after executing your query might be all you
need to do, but never having used psycopg2 I couldn't say.  You might
try asking on the psycopg mailing list.

...Robert

Re: slow DELETE on 12 M row table

From
Scott Carey
Date:


On 6/26/09 6:36 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:

> 2009/6/26 Janet Jacobsen <jsjacobsen@lbl.gov>:
>> Hi.  The user in question is using psycopg2, which he uses
>> psycopg2:
>>> import psycopg2
>>> conn = psycopg2.connect("dbname=%s  user=%s host=%s password=%s port=%s"
>>> ...)
>>> pg_cursor = conn.cursor()
>>> pg_cursor.execute(<select string>)
>>> rows = pg_cursor.fetchall()
>> Note that
>> (1) he said that he does not set an isolation level, and
>> (2) he does not close the database connection after the
>> fetchall - instead he has a Python sleep command, so
>> he is checking the database every 60 s to see whether
>> new entries have been added to a given table.  (His
>> code is part of the analysis pipeline - we process the
>> image data and load it into the database, and other
>> groups fetch the data from the database and do some
>> analyses.)
>> 
>> Yes, it is the case that the user's process shows up in
>> ps aux as "idle in transaction".
>> 
>> What would you recommend in this case?  Should the
>> user set the isolation_level for psycopg, and if so to what?
>> 
>> Is there any Postgres configuration parameter that I
>> should set?
>> 
>> Should the user close the database connection after
>> every fetchall?
> 
> You need to COMMIT or ROLLBACK the in-process transaction and then not
> start a new transaction until you're ready to execute the next query.
> Possibly calling .commit() after executing your query might be all you
> need to do, but never having used psycopg2 I couldn't say.  You might
> try asking on the psycopg mailing list.
> 
> ...Robert
> 

In addition to the above, note that long lived transactions cause all sorts
of other problems in the database.  In particular, table and index bloat can
become severe due to this sort of poor client behavior if there is a lot of
update or delete activity.  You can find out with "vacuum analyze verbose"
on tables of interest whether there are a high ratio of dead tuples in the
tables and indexes.

> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 


Re: slow DELETE on 12 M row table

From
Robert Haas
Date:
2009/6/27 Scott Carey <scott@richrelevance.com>:
> In addition to the above, note that long lived transactions cause all sorts
> of other problems in the database.  In particular, table and index bloat can
> become severe due to this sort of poor client behavior if there is a lot of
> update or delete activity.  You can find out with "vacuum analyze verbose"
> on tables of interest whether there are a high ratio of dead tuples in the
> tables and indexes.

Yes indeed...  by the by, I understand Alvaro Herrera has improved
this situation considerably for the forthcoming 8.4 release.

...Robert

Re: slow DELETE on 12 M row table

From
Janet Jacobsen
Date:
Hi.  I posted a question about a very slow DELETE on a table
with 12 M rows last week, and I wanted to (1) thank everyone
who provided a reply since each clue helped to find the solution,
and (2) give the solution.

The slow DELETE was due to another user having a lock on
the table - which several people on this list pointed out must
be the case.  Since the user was only running SELECT on
the table (no inserts, deletes, or updates), it wasn't obvious at
first whether or how his process was locking the table.

Robert suggested the need for a commit or rollback, as well as
posting to the psycopg list.  Pasted below is the response that
I got from Federico Di Gregorio.

The user added a conn.rollback() to his script, and that solved
the problem.  Now it is possible to delete rows, create indexes,
etc. without having to kill the user's process.

Many thanks,
Janet


Robert Haas wrote:
> 2009/6/26 Janet Jacobsen <jsjacobsen@lbl.gov>:
>
>> Hi.  The user in question is using psycopg2, which he uses
>> psycopg2:
>>
>>> import psycopg2
>>> conn = psycopg2.connect("dbname=%s  user=%s host=%s password=%s port=%s" ...)
>>> pg_cursor = conn.cursor()
>>> pg_cursor.execute(<select string>)
>>> rows = pg_cursor.fetchall()
>>>
>> Note that
>> (1) he said that he does not set an isolation level, and
>> (2) he does not close the database connection after the
>> fetchall - instead he has a Python sleep command, so
>> he is checking the database every 60 s to see whether
>> new entries have been added to a given table.  (His
>> code is part of the analysis pipeline - we process the
>> image data and load it into the database, and other
>> groups fetch the data from the database and do some
>> analyses.)
>>
>> Yes, it is the case that the user's process shows up in
>> ps aux as "idle in transaction".
>>
>> What would you recommend in this case?  Should the
>> user set the isolation_level for psycopg, and if so to what?
>>
>> Is there any Postgres configuration parameter that I
>> should set?
>>
>> Should the user close the database connection after
>> every fetchall?
>>
>
> You need to COMMIT or ROLLBACK the in-process transaction and then not
> start a new transaction until you're ready to execute the next query.
> Possibly calling .commit() after executing your query might be all you
> need to do, but never having used psycopg2 I couldn't say.  You might
> try asking on the psycopg mailing list.
>
> ...Robert
>

> Il giorno lun, 29/06/2009 alle 12.26 -0700, Janet Jacobsen ha scritto:
> [snip]
>
>> > The user told me that he does not close the database connection
>> > after the fetchall - instead he has a Python sleep command, so that
>> > he is checking the database every 60 s to see whether new entries
>> > have been added to a given table
>> > His code is part of an analysis pipeline, whereas the part of the
>> > database that I work on is loading processed data into the
>> > database.
>> > Is there something missing from his code sample, like a commit or
>> > a set_isolation_level, that if added would prevent the "idle in
>> > transaction" from happening?
>>
>
> The user is wrong and you're right, the "idle in transaction" can be
> avoided by both a commit() (or rollback()) before going to sleep or by
> setting the transaction mode to "autocommit":
>
> conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
>
> Hope this helps,
> federico
>
> -- Federico Di Gregorio http://people.initd.org/fog Debian GNU/Linux
> Developer fog@debian.org INIT.D Developer fog@initd.org Sei una
> bergogna. Vergonga. Vergogna. -- Valentina

Re: slow DELETE on 12 M row table

From
Scott Marlowe
Date:
On Fri, Jun 26, 2009 at 1:34 AM, Janet Jacobsen<jsjacobsen@lbl.gov> wrote:
> Thank you for the answers.  Very helpful.
>
> Between the time that I sent my original post and saw your reply,
> I tried to drop a couple of foreign key constraints.  The alter
> table statements also showed up as "waiting" when I ran ps aux.
>
> I took your suggestion to run pg_locks and pg_stat_activity.
> pg_stat_activity showed that I had three statements that were
> waiting, and that there was one user whose query was given
> as "<insufficient privilege>".  I killed the process associated
> with that user, and my three waiting statements executed
> immediately.

FYI, that means you, the user, don't have sufficient privileges to
view their query.