Weird row estimate - Mailing list pgsql-performance

From Marc Cousin
Subject Weird row estimate
Date
Msg-id 200707111728.43416.mcousin@sigma.fr
Whole thread Raw
Responses Re: Weird row estimate  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance

Hi,

I'm having a weird problem on a query :

I've simplified it to get the significant part (see end of message).

The point is I've got a simple

SELECT field FROM table WHERE 'condition1'

Estimated returned rows : 5453

Then

SELECT field FROM table WHERE 'condition2'

Estimated returned rows : 705

Then

SELECT field FROM table WHERE 'condition1' OR 'condition2'

Estimated returned rows : 143998

Condition2 is a bit complicated (it's a subquery).

Nevertheless, shouldn't the third estimate be smaller or equal to the sum of the two others ?

Postgresql is 8.2.4 on Linux, stats are up to date,

show default_statistics_target;

default_statistics_target

---------------------------

1000

Any ideas ?

explain analyze

SELECT stc.CMD_ID

FROM STOL_STC stc

WHERE (stc.STC_DATE>='2007-07-05' AND stc.STC_DATEPLAN<='2007-07-05');

QUERY PLAN

------------------------------------------------------------------------------------------------------------------

Seq Scan on stol_stc stc (cost=0.00..24265.15 rows=5453 width=8) (actual time=17.186..100.941 rows=721 loops=1)

Filter: ((stc_date >= '2007-07-05'::date) AND (stc_dateplan <= '2007-07-05'::date))

Total runtime: 101.656 ms

(3 rows)

explain analyze

SELECT stc.CMD_ID

FROM STOL_STC stc

WHERE stc.STC_ID IN

(SELECT STC_ID FROM STOL_TRJ

WHERE TRJ_DATEARRT>='2007-07-05'

AND TRJ_DATEDEPT>=TRJ_DATEARRT

AND (TRJ_DATEDEPT<='2007-07-05'

OR TRJ_DATECREAT<='2007-07-05') );

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=4649.62..10079.52 rows=705 width=8) (actual time=6.266..13.037 rows=640 loops=1)

-> HashAggregate (cost=4649.62..4657.13 rows=751 width=8) (actual time=6.242..6.975 rows=648 loops=1)

-> Index Scan using stol_trj_fk5 on stol_trj (cost=0.00..4647.61 rows=803 width=8) (actual time=0.055..4.901 rows=688 loops=1)

Index Cond: (trj_datearrt >= '2007-07-05'::date)

Filter: ((trj_datedept >= trj_datearrt) AND ((trj_datedept <= '2007-07-05'::date) OR (trj_datecreat <= '2007-07-05'::date)))

-> Index Scan using stol_stc_pk on stol_stc stc (cost=0.00..7.21 rows=1 width=16) (actual time=0.004..0.005 rows=1 loops=648)

Index Cond: (stc.stc_id = stol_trj.stc_id)

Total runtime: 13.765 ms

(8 rows)

explain analyze

SELECT stc.CMD_ID

FROM STOL_STC stc

WHERE (stc.STC_DATE>='2007-07-05' AND stc.STC_DATEPLAN<='2007-07-05')

OR

(stc.STC_ID IN

(SELECT STC_ID FROM STOL_TRJ

WHERE TRJ_DATEARRT>='2007-07-05'

AND TRJ_DATEDEPT>=TRJ_DATEARRT

AND (TRJ_DATEDEPT<='2007-07-05'

OR TRJ_DATECREAT<='2007-07-05') ));

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on stol_stc stc (cost=4649.62..29621.12 rows=143998 width=8) (actual time=21.564..146.365 rows=1048 loops=1)

Filter: (((stc_date >= '2007-07-05'::date) AND (stc_dateplan <= '2007-07-05'::date)) OR (hashed subplan))

SubPlan

-> Index Scan using stol_trj_fk5 on stol_trj (cost=0.00..4647.61 rows=803 width=8) (actual time=0.054..4.941 rows=688 loops=1)

Index Cond: (trj_datearrt >= '2007-07-05'::date)

Filter: ((trj_datedept >= trj_datearrt) AND ((trj_datedept <= '2007-07-05'::date) OR (trj_datecreat <= '2007-07-05'::date)))

Total runtime: 147.407 ms

SELECT count(*) from stol_stc ;

count

--------

140960

(1 row)

pgsql-performance by date:

Previous
From: "Bryan Murphy"
Date:
Subject: Re: Two questions.. shared_buffers and long reader issue
Next
From: Heikki Linnakangas
Date:
Subject: Re: PostgreSQL publishes first real benchmark