Unexpected seq scans when expected result is 1 row out of milions - Mailing list pgsql-performance

From Svetlin Manavski
Subject Unexpected seq scans when expected result is 1 row out of milions
Date
Msg-id CAMm+ggR=-+pqw9Qo7uQ6dx9pa0bUfBEXz6z-+UfUNGvRuJuBXA@mail.gmail.com
Whole thread Raw
Responses Re: Unexpected seq scans when expected result is 1 row out of milions
List pgsql-performance
Hi all,
here is my postgresql configuration:

"version";"PostgreSQL 9.0.3 on amd64-portbld-freebsd8.0, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 64-bit"
"bytea_output";"escape"
"checkpoint_segments";"64"
"client_encoding";"UNICODE"
"effective_cache_size";"6GB"
"fsync";"off"
"lc_collate";"C"
"lc_ctype";"C"
"listen_addresses";"*"
"log_destination";"syslog"
"max_connections";"20"
"max_stack_depth";"2MB"
"server_encoding";"UTF8"
"shared_buffers";"4GB"
"silent_mode";"on"
"synchronous_commit";"off"
"TimeZone";"Europe/Jersey"
"update_process_title";"off"
"work_mem";"24MB"

I have a partitioned table tcpsessiondata:

CREATE TABLE appqosdata.tcpsessiondata
(
    detectorid smallint not null,
    createdtime bigint not null,
    sessionid bigint not null,
...
);

That table has many millions of rows in each partition and no data in the main one. Here is an example:

select count(*) from appqosdata.tcpsessiondata_1;
  count  
----------
 49377910
(1 row)

Every partition has a "Primary Key (detectorid, createdtime)"

I run the following query on that table:
select
    cast (SD.detectorid as numeric),
    CAST( (createdtime / 61000000000::bigint) AS numeric) as timegroup,
    sum(datafromsource)+sum(datafromdestination) as numbytes,
    CAST ( sum(packetsfromsource)+sum(packetsfromdestination) AS numeric) as numpackets
from
    appqosdata.tcpsessiondata SD
where
    SD.detectorid >= 0  and SD.createdtime >= 1297266601368086000::bigint
    and SD.createdtime < 1297270202368086000::bigint
group by SD.detectorid, timegroup

The table is partitioned by a "sessionid" which is not used in this particular query so I already expect all partitions to be touched. However I have a bad scan choice on at least a couple of partitions:

"HashAggregate  (cost=5679026.42..5679028.76 rows=67 width=34) (actual time=160113.366..160113.366 rows=0 loops=1)"
"  Output: (sd.detectorid)::numeric, (((sd.createdtime / 61000000000::bigint))::numeric), (sum(sd.datafromsource) + sum(sd.datafromdestination)), ((sum(sd.packetsfromsource) + sum(sd.packetsfromdestination)))::numeric, sd.detectorid"
"  ->  Result  (cost=0.00..5679025.41 rows=67 width=34) (actual time=160113.360..160113.360 rows=0 loops=1)"
"        Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, ((sd.createdtime / 61000000000::bigint))::numeric"
"        ->  Append  (cost=0.00..5679025.08 rows=67 width=34) (actual time=160113.356..160113.356 rows=0 loops=1)"
"              ->  Seq Scan on appqosdata.tcpsessiondata sd  (cost=0.00..23.65 rows=1 width=34) (actual time=0.002..0.002 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_1 sd  (cost=0.00..1373197.46 rows=1 width=34) (actual time=46436.737..46436.737 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_2 sd  (cost=0.00..2447484.00 rows=1 width=34) (actual time=108359.967..108359.967 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_3_pkey on appqosdata.tcpsessiondata_3 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.016..0.016 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"

... (many more partitions here)....

"              ->  Index Scan using tcpsessiondata_61_pkey on appqosdata.tcpsessiondata_61 sd  (cost=0.00..8162.42 rows=1 width=34) (actual time=25.446..25.446 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_62_pkey on appqosdata.tcpsessiondata_62 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.008..0.008 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_63_pkey on appqosdata.tcpsessiondata_63 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.006..0.006 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_64 sd  (cost=0.00..13.00 rows=1 width=34) (actual time=0.102..0.102 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_65 sd  (cost=0.00..117.64 rows=1 width=34) (actual time=0.854..0.854 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_66_pkey on appqosdata.tcpsessiondata_66 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.007..0.007 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_67_pkey on appqosdata.tcpsessiondata_67 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.005..0.005 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"Total runtime: 160114.339 ms"

The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row out of millions could potentially be selected? As you can see ~90% of the time is spent on those 2 partitions even if they are not apparently different from any of the others.

I would appreciate any help with this issue.

Thank you,
Svetlin Manavski

pgsql-performance by date:

Previous
From: jesper@krogh.cc
Date:
Subject: Re: Hardware advice for scalable warehouse db
Next
From: Robert Schnabel
Date:
Subject: Re: Hardware advice for scalable warehouse db