Thread: Unexpected seq scans when expected result is 1 row out of milions

Unexpected seq scans when expected result is 1 row out of milions

From
Svetlin Manavski
Date:
Hi all,
here is my postgresql configuration:

"version";"PostgreSQL 9.0.3 on amd64-portbld-freebsd8.0, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 64-bit"
"bytea_output";"escape"
"checkpoint_segments";"64"
"client_encoding";"UNICODE"
"effective_cache_size";"6GB"
"fsync";"off"
"lc_collate";"C"
"lc_ctype";"C"
"listen_addresses";"*"
"log_destination";"syslog"
"max_connections";"20"
"max_stack_depth";"2MB"
"server_encoding";"UTF8"
"shared_buffers";"4GB"
"silent_mode";"on"
"synchronous_commit";"off"
"TimeZone";"Europe/Jersey"
"update_process_title";"off"
"work_mem";"24MB"

I have a partitioned table tcpsessiondata:

CREATE TABLE appqosdata.tcpsessiondata
(
    detectorid smallint not null,
    createdtime bigint not null,
    sessionid bigint not null,
...
);

That table has many millions of rows in each partition and no data in the main one. Here is an example:

select count(*) from appqosdata.tcpsessiondata_1;
  count  
----------
 49377910
(1 row)

Every partition has a "Primary Key (detectorid, createdtime)"

I run the following query on that table:
select
    cast (SD.detectorid as numeric),
    CAST( (createdtime / 61000000000::bigint) AS numeric) as timegroup,
    sum(datafromsource)+sum(datafromdestination) as numbytes,
    CAST ( sum(packetsfromsource)+sum(packetsfromdestination) AS numeric) as numpackets
from
    appqosdata.tcpsessiondata SD
where
    SD.detectorid >= 0  and SD.createdtime >= 1297266601368086000::bigint
    and SD.createdtime < 1297270202368086000::bigint
group by SD.detectorid, timegroup

The table is partitioned by a "sessionid" which is not used in this particular query so I already expect all partitions to be touched. However I have a bad scan choice on at least a couple of partitions:

"HashAggregate  (cost=5679026.42..5679028.76 rows=67 width=34) (actual time=160113.366..160113.366 rows=0 loops=1)"
"  Output: (sd.detectorid)::numeric, (((sd.createdtime / 61000000000::bigint))::numeric), (sum(sd.datafromsource) + sum(sd.datafromdestination)), ((sum(sd.packetsfromsource) + sum(sd.packetsfromdestination)))::numeric, sd.detectorid"
"  ->  Result  (cost=0.00..5679025.41 rows=67 width=34) (actual time=160113.360..160113.360 rows=0 loops=1)"
"        Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, ((sd.createdtime / 61000000000::bigint))::numeric"
"        ->  Append  (cost=0.00..5679025.08 rows=67 width=34) (actual time=160113.356..160113.356 rows=0 loops=1)"
"              ->  Seq Scan on appqosdata.tcpsessiondata sd  (cost=0.00..23.65 rows=1 width=34) (actual time=0.002..0.002 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_1 sd  (cost=0.00..1373197.46 rows=1 width=34) (actual time=46436.737..46436.737 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_2 sd  (cost=0.00..2447484.00 rows=1 width=34) (actual time=108359.967..108359.967 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_3_pkey on appqosdata.tcpsessiondata_3 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.016..0.016 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"

... (many more partitions here)....

"              ->  Index Scan using tcpsessiondata_61_pkey on appqosdata.tcpsessiondata_61 sd  (cost=0.00..8162.42 rows=1 width=34) (actual time=25.446..25.446 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_62_pkey on appqosdata.tcpsessiondata_62 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.008..0.008 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_63_pkey on appqosdata.tcpsessiondata_63 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.006..0.006 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_64 sd  (cost=0.00..13.00 rows=1 width=34) (actual time=0.102..0.102 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Seq Scan on appqosdata.tcpsessiondata_65 sd  (cost=0.00..117.64 rows=1 width=34) (actual time=0.854..0.854 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_66_pkey on appqosdata.tcpsessiondata_66 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.007..0.007 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"              ->  Index Scan using tcpsessiondata_67_pkey on appqosdata.tcpsessiondata_67 sd  (cost=0.00..11.51 rows=1 width=34) (actual time=0.005..0.005 rows=0 loops=1)"
"                    Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination"
"                    Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))"
"Total runtime: 160114.339 ms"

The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row out of millions could potentially be selected? As you can see ~90% of the time is spent on those 2 partitions even if they are not apparently different from any of the others.

I would appreciate any help with this issue.

Thank you,
Svetlin Manavski

Re: Unexpected seq scans when expected result is 1 row out of milions

From
Tom Lane
Date:
Svetlin Manavski <svetlin.manavski@gmail.com> writes:
> The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and
> appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row
> out of millions could potentially be selected? As you can see ~90% of the
> time is spent on those 2 partitions even if they are not apparently
> different from any of the others.

Well, there must be *something* different about them.  Are you sure
they've got the same indexes as the others?  It would be useful to see
psql's \d report for those partitions, as well as for one of the
partitions that's behaving as expected.  You might also compare the
EXPLAIN results for doing the query on just one child table between
the normal and misbehaving partitions.

            regards, tom lane