Re: Performance of query (fwd) - Mailing list pgsql-general

From Tom Lane
Subject Re: Performance of query (fwd)
Date
Msg-id 7884.1055303428@sss.pgh.pa.us
Whole thread Raw
In response to Performance of query (fwd)  (Edmund Dengler <edmundd@eSentire.com>)
Responses Re: Performance of query (fwd)  (Edmund Dengler <edmundd@eSentire.com>)
List pgsql-general
Edmund Dengler <edmundd@eSentire.com> writes:
>   select * from event
>   where timestamp > (select now() - '2 hours'::interval)
>     and exists (select 1 from hack_pull_sid where sid = event.sid)

> (note: <hack_pull_sid> is a table of SIDs I am interested in so that I
> avoid the issues with IN)

I think you're creating more issues than you're avoiding.  With the
above query, the planner has little chance of guessing how many rows
will be retrieved from "event" ... and indeed the EXPLAIN output shows
that its guess is off by more than a factor of 1000:

>      ->  Index Scan using timestamp_idx on event  (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05
rows=129loops=1) 

With a misestimate of that magnitude at the core of the query, it's
unsurprising that all the other plan choices are bad too.

But actually I suspect the easiest point of attack is not the EXISTS
subquery, but the timestamp comparison.  Can you get your application
to supply a simple literal constant to compare to the timestamp, viz
'2003-06-10 21:44' rather than now()-'2 hours'?  The former gives the
planner something to compare to its statistics, the latter doesn't.

Oh ... you have done an ANALYZE on event reasonably recently, no?

            regards, tom lane

pgsql-general by date:

Previous
From: Edmund Dengler
Date:
Subject: Performance of query (fwd)
Next
From: Ron Snyder
Date:
Subject: Re: error restoring large objects during pg_restore