Re: Query optimization using order by and limit - Mailing list pgsql-performance

From Stephen Frost
Subject Re: Query optimization using order by and limit
Date
Msg-id 20110922145319.GJ12765@tamriel.snowman.net
Whole thread Raw
In response to Re: Query optimization using order by and limit  (Michael Viscuso <michael.viscuso@getcarbonblack.com>)
Responses Re: Query optimization using order by and limit  (Michael Viscuso <michael.viscuso@getcarbonblack.com>)
Re: Query optimization using order by and limit  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
* Michael Viscuso (michael.viscuso@getcarbonblack.com) wrote:
> Adding the final condition hosts_guid = '2007075705813916178' is what
> ultimately kills it http://explain.depesz.com/s/8zy.  By adding the
> host_guid, it spends considerably more time in the older tables than
> without this condition and I'm not sure why.

What I think is happening here is that PG is pushing down that filter
(not typically a bad thing..), but with that condition, it's going to
scan the index until it finds a match for that filter before returning
back up only to have that result cut out due to the limit.  Having it as
numerics isn't helping here, but the bigger issue is having to check all
those tuples for a match to the filter.

Mike, the filter has to be applied before the order by/limit, since
those clauses come after the filter has been applied (you wouldn't want
a 'where x = 2 limit 10' to return early just because it found 10
records where x didn't equal 2).

What would be great is if PG would realize that the CHECK constraints
prevent earlier records from being in these earlier tables, so it
shouldn't need to consider them at all once the records from the
'latest' table has been found and the limit reached (reverse all this
for an 'ascending' query, of course), which we can do when there's no
order by.  I don't believe we have that kind of logic or that
information available at this late stage- the CHECK constraints are used
to eliminate the impossible-to-match tables, but that's it.

One option, which isn't great of course, would be to implement your own
'nested loop' construct (something I typically despise..) in the
application which just walks backwards from the latest and pulls
whatever records it can from each day and then stops once it hits the
limit.

    Thanks,

        Stephen

Attachment

pgsql-performance by date:

Previous
From: Michael Viscuso
Date:
Subject: Re: Query optimization using order by and limit
Next
From: Jonathan Bartlett
Date:
Subject: Optimizing Trigram searches in PG 9.1