Thread: Deletes and large tables

Deletes and large tables

From
Edmund Dengler
Date:
Greetings!

We have a table with more than 250 million rows. I am trying to delete the
first 100,000 rows (based on a bigint primary key), and I had to cancel
after 4 hours of the system not actually finishing the delete. I wrote a
script to delete individual rows 10,000 at a time using transactions, and
am finding each individual delete takes on the order of 0.1 seconds to 2-3
seconds. There are 4 indexes on the table, one of which is very "hashlike"
(ie, distribution is throught the index for sequential rows).

Is this considered "normal" for delete speed? Currently using 8.0.1.
Drives are capable of 40MB/sec sustained transfers, but only acheiving
about 2-10kB/sec, and mostly CPU bound.

Regards!
Ed

Re: Deletes and large tables

From
Richard Huxton
Date:
Edmund Dengler wrote:
> Greetings!
>
> We have a table with more than 250 million rows. I am trying to delete the
> first 100,000 rows (based on a bigint primary key), and I had to cancel
> after 4 hours of the system not actually finishing the delete. I wrote a
> script to delete individual rows 10,000 at a time using transactions, and
> am finding each individual delete takes on the order of 0.1 seconds to 2-3
> seconds. There are 4 indexes on the table, one of which is very "hashlike"
> (ie, distribution is throught the index for sequential rows).

I don't suppose it's off checking foreign-keys in a lot of tables is it?

--
   Richard Huxton
   Archonet Ltd

Re: Deletes and large tables

From
David Gagnon
Date:
Hi,

  You have any foreign keys pointing this table ?  That's the problem I
got  when I wanted to delete all rows from a  table with 5 FK. You may
search my name in the list archive and found the thread on this matter.

Ciao
/David
Edmund Dengler wrote:

>Greetings!
>
>We have a table with more than 250 million rows. I am trying to delete the
>first 100,000 rows (based on a bigint primary key), and I had to cancel
>after 4 hours of the system not actually finishing the delete. I wrote a
>script to delete individual rows 10,000 at a time using transactions, and
>am finding each individual delete takes on the order of 0.1 seconds to 2-3
>seconds. There are 4 indexes on the table, one of which is very "hashlike"
>(ie, distribution is throught the index for sequential rows).
>
>Is this considered "normal" for delete speed? Currently using 8.0.1.
>Drives are capable of 40MB/sec sustained transfers, but only acheiving
>about 2-10kB/sec, and mostly CPU bound.
>
>Regards!
>Ed
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org
>
>
>


Re: Deletes and large tables

From
Edmund Dengler
Date:
Greetings!

This table has 3 foreign keys, but that should not matter during deletes.
In addition, the tables being referred to are small, and should be in
cache.

There are no tables depending on it for references, so no dependent
triggers should be running.

Also, if this was a foreign key issue, I would expect I/O issues/bounds
and not CPU.

Regards!
Ed


On Fri, 10 Jun 2005, Richard Huxton wrote:

> Edmund Dengler wrote:
> > Greetings!
> >
> > We have a table with more than 250 million rows. I am trying to delete the
> > first 100,000 rows (based on a bigint primary key), and I had to cancel
> > after 4 hours of the system not actually finishing the delete. I wrote a
> > script to delete individual rows 10,000 at a time using transactions, and
> > am finding each individual delete takes on the order of 0.1 seconds to 2-3
> > seconds. There are 4 indexes on the table, one of which is very "hashlike"
> > (ie, distribution is throught the index for sequential rows).
>
> I don't suppose it's off checking foreign-keys in a lot of tables is it?
>
> --
>    Richard Huxton
>    Archonet Ltd
>

Re: Deletes and large tables

From
David Gagnon
Date:
>This table has 3 foreign keys, but that should not matter during deletes.
>In addition, the tables being referred to are small, and should be in
>cache.
>
>
I'm talking about FK that point this table... Not FK defined for this
table that point to other table.  If Table A is referenced by 10 other
tables 10 referencial check are needed.


>There are no tables depending on it for references, so no dependent
>triggers should be running.
>
>Also, if this was a foreign key issue, I would expect I/O issues/bounds
>and not CPU.
>
>
Maybe... I'm honetly not sure.

Like I said in my previous mail... I got a similar problem (maybe not
the same).  It was taking 10 minutes to delete 10k line in a table.  I
turn on some log info in postgresql.conf and I saw that for each row
deleted 4 selects were issued to check FK.  I drop those FK and the
after the delete was taking less than a second.

 Hope it help
/David



>Regards!
>Ed
>
>
>On Fri, 10 Jun 2005, Richard Huxton wrote:
>
>
>
>>Edmund Dengler wrote:
>>
>>
>>>Greetings!
>>>
>>>We have a table with more than 250 million rows. I am trying to delete the
>>>first 100,000 rows (based on a bigint primary key), and I had to cancel
>>>after 4 hours of the system not actually finishing the delete. I wrote a
>>>script to delete individual rows 10,000 at a time using transactions, and
>>>am finding each individual delete takes on the order of 0.1 seconds to 2-3
>>>seconds. There are 4 indexes on the table, one of which is very "hashlike"
>>>(ie, distribution is throught the index for sequential rows).
>>>
>>>
>>I don't suppose it's off checking foreign-keys in a lot of tables is it?
>>
>>--
>>   Richard Huxton
>>   Archonet Ltd
>>
>>
>>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>
>
>


Re: Deletes and large tables

From
Tom Lane
Date:
Edmund Dengler <edmundd@eSentire.com> writes:
> This table has 3 foreign keys, but that should not matter during deletes.

Richard was inquiring about foreign keys linking *into* this table, not
out of it.

In particular, foreign keys that don't have indexes on the referencing
columns; those will incur seqscans to check for appearances of the value
you are deleting.

            regards, tom lane

Re: Deletes and large tables

From
Edmund Dengler
Date:
Just did a sanity check. I dumped the DB schema, and there is indeed a
foreign key reference into the table. Now interestingly, the table
pointing in has no index on the column, but is a relatively small table
with only entries near the end of the large table.

So looks like I was getting CPU bound because of a sequental scan of the
smaller table per delete.

Dropped the constraint, and deletes are now much faster.

Regards!
Ed

On Fri, 10 Jun 2005, David Gagnon wrote:

>
> >This table has 3 foreign keys, but that should not matter during deletes.
> >In addition, the tables being referred to are small, and should be in
> >cache.
> >
> >
> I'm talking about FK that point this table... Not FK defined for this
> table that point to other table.  If Table A is referenced by 10 other
> tables 10 referencial check are needed.
>
>
> >There are no tables depending on it for references, so no dependent
> >triggers should be running.
> >
> >Also, if this was a foreign key issue, I would expect I/O issues/bounds
> >and not CPU.
> >
> >
> Maybe... I'm honetly not sure.
>
> Like I said in my previous mail... I got a similar problem (maybe not
> the same).  It was taking 10 minutes to delete 10k line in a table.  I
> turn on some log info in postgresql.conf and I saw that for each row
> deleted 4 selects were issued to check FK.  I drop those FK and the
> after the delete was taking less than a second.
>
>  Hope it help
> /David
>
>
>
> >Regards!
> >Ed
> >
> >
> >On Fri, 10 Jun 2005, Richard Huxton wrote:
> >
> >
> >
> >>Edmund Dengler wrote:
> >>
> >>
> >>>Greetings!
> >>>
> >>>We have a table with more than 250 million rows. I am trying to delete the
> >>>first 100,000 rows (based on a bigint primary key), and I had to cancel
> >>>after 4 hours of the system not actually finishing the delete. I wrote a
> >>>script to delete individual rows 10,000 at a time using transactions, and
> >>>am finding each individual delete takes on the order of 0.1 seconds to 2-3
> >>>seconds. There are 4 indexes on the table, one of which is very "hashlike"
> >>>(ie, distribution is throught the index for sequential rows).
> >>>
> >>>
> >>I don't suppose it's off checking foreign-keys in a lot of tables is it?
> >>
> >>--
> >>   Richard Huxton
> >>   Archonet Ltd
> >>
> >>
> >>
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 3: if posting/reading through Usenet, please send an appropriate
> >      subscribe-nomail command to majordomo@postgresql.org so that your
> >      message can get through to the mailing list cleanly
> >
> >
> >
>
>