Thread: Slow Delete : Seq scan instead of index scan

Slow Delete : Seq scan instead of index scan

From
Sylvain CAILLET
Date:
Hi to all,

I've got a trouble with some delete statements. My db contains a little more than 10000 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed).

I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of  time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Autovaccum is on and I expect the db stats to be updated in real time (pg_stats file is stored in /dev/shm RAM disk for quick access).

Do you have any idea about this trouble ?

Sylvain Caillet
Bureau : + 33 5 59 41 51 10
scaillet@alaloop.com

ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
www.alaloop.com

Re: Slow Delete : Seq scan instead of index scan

From
Sékine Coulibaly
Date:
Hi Sylvain,

Might sound like a nasty question, and gurus will correct me if I'm wrong, but first thing to investigate is why the index is not used :
- You have 2/3 million rows per table so the planner should use the index. Seqscan is prefered for small tables.
- Maybe the WHERE clause of your DELETE statement doesn't make use of your start and end date columns ? If so, in which order ?

Please, provide with your Pg version and the table setup with the index.

Regards,

Sekine

2012/10/16 Sylvain CAILLET <scaillet@alaloop.com>
Hi to all,

I've got a trouble with some delete statements. My db contains a little more than 10000 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed).

I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of  time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Autovaccum is on and I expect the db stats to be updated in real time (pg_stats file is stored in /dev/shm RAM disk for quick access).

Do you have any idea about this trouble ?

Sylvain Caillet
Bureau : + 33 5 59 41 51 10
scaillet@alaloop.com

ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
www.alaloop.com


Re: Slow Delete : Seq scan instead of index scan

From
Craig Ringer
Date:
On 10/16/2012 03:50 PM, Sylvain CAILLET wrote:
> Hi to all,
>
> I've got a trouble with some delete statements. My db contains a little
> more than 10000 tables and runs on a dedicated server (Debian 6 - bi
> quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2
> and 3 million rows and no foreign keys exist between them. Each is
> indexed (btree) on start_date / end_date fields (bigint). The Postgresql
> server has been tuned (I can give modified values if needed).
>
> I perform recurrent DELETE upon a table subset (~1900 tables) and each
> time, I delete a few lines (between 0 and 1200). Usually it takes
> between 10s and more than 2mn. It seems to me to be a huge amount of
>   time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a
> Seq Scan instead of an Index Scan.

Can you post that (or paste to explain.depesz.com and link to it here)
along with a "\d tablename" from psql?

--
Craig Ringer


Re: Slow Delete : Seq scan instead of index scan

From
Sylvain CAILLET
Date:
Hi Sékine,

You're right : my question is why the planner doesn't use the index ! My DELETE statements have WHERE clause like : start_date<1346486100000. They are executed to delete too old rows.
My postgresql version is 8.4. Below is an example of a table (they all have the same structure) : 
CREATE TABLE agg_t100_outgoing_a39_src_net_f5
(
  total_pkts bigint,
  end_date bigint,
  src_network inet,
  start_date bigint,
  total_flows bigint,
  total_bytes bigint
)
WITH (
  OIDS=FALSE
);

CREATE INDEX agg_t100_outgoing_a39_src_net_f5_end_date
  ON agg_t100_outgoing_a39_src_net_f5
  USING btree
  (end_date);

CREATE INDEX agg_t100_outgoing_a39_src_net_f5_start_date
  ON agg_t100_outgoing_a39_src_net_f5
  USING btree
  (start_date);

I have investigated in the pg_stat_all_tables table and it seems the autovaccum / autoanalyze don't do their job. Many tables have no last_autovacuum / last_autoanalyze dates ! So the planner doesn't have fresh stats to decide. Don't you think it could be a good reason for slow DELETE ? In this case, the trouble could come from the autovaccum configuration.

Regards,

Sylvain


Hi Sylvain,

Might sound like a nasty question, and gurus will correct me if I'm wrong, but first thing to investigate is why the index is not used :
- You have 2/3 million rows per table so the planner should use the index. Seqscan is prefered for small tables.
- Maybe the WHERE clause of your DELETE statement doesn't make use of your start and end date columns ? If so, in which order ?

Please, provide with your Pg version and the table setup with the index.

Regards,

Sekine

2012/10/16 Sylvain CAILLET <scaillet@alaloop.com>
Hi to all,

I've got a trouble with some delete statements. My db contains a little more than 10000 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed).

I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of  time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Autovaccum is on and I expect the db stats to be updated in real time (pg_stats file is stored in /dev/shm RAM disk for quick access).

Do you have any idea about this trouble ?

Sylvain Caillet
Bureau : + 33 5 59 41 51 10
scaillet@alaloop.com

ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
www.alaloop.com



Re: Slow Delete : Seq scan instead of index scan

From
Sylvain CAILLET
Date:
Hi Craig,

Here are the outputs :

flows=# explain analyze delete from agg_t377_incoming_a40_dst_net_f5 where start_date < 1346487911000;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on agg_t377_incoming_a40_dst_net_f5  (cost=0.00..34448.96 rows=657622 width=6) (actual time=3429.058..7135.901 rows=143 loops=1)
   Filter: (start_date < 1346487911000::bigint)
 Total runtime: 7136.191 ms
(3 rows)
flows=# \d agg_t377_incoming_a40_dst_net_f5
Table "public.agg_t377_incoming_a40_dst_net_f5"
   Column    |  Type  | Modifiers 
-------------+--------+-----------
 end_date    | bigint | 
 dst_network | inet   | 
 total_pkts  | bigint | 
 total_bytes | bigint | 
 start_date  | bigint | 
 total_flows | bigint | 
Indexes:
    "agg_t377_incoming_a40_dst_net_f5_end_date" btree (end_date)
    "agg_t377_incoming_a40_dst_net_f5_start_date" btree (start_date)

Thanks for your help,

Sylvain


On 10/16/2012 03:50 PM, Sylvain CAILLET wrote:
> Hi to all,
>
> I've got a trouble with some delete statements. My db contains a little
> more than 10000 tables and runs on a dedicated server (Debian 6 - bi
> quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2
> and 3 million rows and no foreign keys exist between them. Each is
> indexed (btree) on start_date / end_date fields (bigint). The Postgresql
> server has been tuned (I can give modified values if needed).
>
> I perform recurrent DELETE upon a table subset (~1900 tables) and each
> time, I delete a few lines (between 0 and 1200). Usually it takes
> between 10s and more than 2mn. It seems to me to be a huge amount of
>   time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a
> Seq Scan instead of an Index Scan.

Can you post that (or paste to explain.depesz.com and link to it here)
along with a "\d tablename" from psql?

--
Craig Ringer

Re: Slow Delete : Seq scan instead of index scan

From
Filippos Kalamidas
Date:
the first thing you should probably do is run an 'analyze' on one of these tables and then run again the delete statement. if there are no stats for these tables, it's normal not to have very good plans.



On Tue, Oct 16, 2012 at 11:24 AM, Sylvain CAILLET <scaillet@alaloop.com> wrote:
Hi Craig,

Here are the outputs :

flows=# explain analyze delete from agg_t377_incoming_a40_dst_net_f5 where start_date < 1346487911000;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on agg_t377_incoming_a40_dst_net_f5  (cost=0.00..34448.96 rows=657622 width=6) (actual time=3429.058..7135.901 rows=143 loops=1)
   Filter: (start_date < 1346487911000::bigint)
 Total runtime: 7136.191 ms
(3 rows)
flows=# \d agg_t377_incoming_a40_dst_net_f5
Table "public.agg_t377_incoming_a40_dst_net_f5"
   Column    |  Type  | Modifiers 
-------------+--------+-----------
 end_date    | bigint | 
 dst_network | inet   | 
 total_pkts  | bigint | 
 total_bytes | bigint | 
 start_date  | bigint | 
 total_flows | bigint | 
Indexes:
    "agg_t377_incoming_a40_dst_net_f5_end_date" btree (end_date)
    "agg_t377_incoming_a40_dst_net_f5_start_date" btree (start_date)

Thanks for your help,

Sylvain


On 10/16/2012 03:50 PM, Sylvain CAILLET wrote:
> Hi to all,
>
> I've got a trouble with some delete statements. My db contains a little
> more than 10000 tables and runs on a dedicated server (Debian 6 - bi
> quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2
> and 3 million rows and no foreign keys exist between them. Each is
> indexed (btree) on start_date / end_date fields (bigint). The Postgresql
> server has been tuned (I can give modified values if needed).
>
> I perform recurrent DELETE upon a table subset (~1900 tables) and each
> time, I delete a few lines (between 0 and 1200). Usually it takes
> between 10s and more than 2mn. It seems to me to be a huge amount of
>   time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a
> Seq Scan instead of an Index Scan.

Can you post that (or paste to explain.depesz.com and link to it here)
along with a "\d tablename" from psql?

--
Craig Ringer


Re: Slow Delete : Seq scan instead of index scan

From
Craig Ringer
Date:
On 10/16/2012 04:41 PM, Filippos Kalamidas wrote:
> the first thing you should probably do is run an 'analyze' on one of
> these tables and then run again the delete statement. if there are no
> stats for these tables, it's normal not to have very good plans.

Yep, and the fact that the stats are that bad suggests that autovaccum
probably isn't running, or isn't running often enough.

If you have a high INSERT/UPDATE/DELETE load, then turn autovacuum up on
that table. See:

   http://www.postgresql.org/docs/current/static/routine-vacuuming.html


http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html


If the table is badly bloated it might be worth running "VACUUM FULL" on
it or (if you're on PostgreSQL 8.4 or below) instead CLUSTER the table
on an index, as "VACUUM FULL" is very inefficient in 8.4 and older (I
think; I might be misremembering the versions).


Please specify your PostgreSQL version in all questions. See
https://wiki.postgresql.org/wiki/Slow_Query_Questions

--
Craig Ringer