Re: BUG #2658: Query not using index - Mailing list pgsql-performance

From Mark Lewis
Subject Re: BUG #2658: Query not using index
Date
Msg-id 1159912457.18640.96.camel@archimedes
Whole thread Raw
In response to Re: BUG #2658: Query not using index  (Graham Davis <gdavis@refractions.net>)
Responses Re: BUG #2658: Query not using index
List pgsql-performance
A few hundred is quite a lot for the next proposal and it's kind of an
ugly one, but might as well throw the idea out since you never know.

Have you considered creating one partial index per assetid?  Something
along the lines of "CREATE INDEX asset_index_N ON asset_positions(ts)
WHERE assetid=N"?  I'd guess that the planner probably wouldn't be smart
enough to use the partial indexes unless you issued a separate query for
each assetid, but each one of those queries should be really fast.  Of
course, this is all assuming that PG knows how to use partial indexes to
satisfy MAX queries; I'm not sure if it does.

-- Mark Lewis

On Tue, 2006-10-03 at 14:35 -0700, Graham Davis wrote:
> Not many.  It fluctuates, but there are usually only ever a few hundred
> at most.  Each assetid has multi-millions of positions though.
>
> Mark Lewis wrote:
>
> >Hmmm.  How many distinct assetids are there?
> >-- Mark Lewis
> >
> >On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
> >
> >
> >>The "summary table" approach maintained by triggers is something we are
> >>considering, but it becomes a bit more complicated to implement.
> >>Currently we have groups of new positions coming in every few seconds or
> >>less.  They are not guaranteed to be in order.  So for instance, a group
> >>of positions from today could come in and be inserted, then a group of
> >>positions that got lost from yesterday could come in and be inserted
> >>afterwards.
> >>
> >>This means the triggers would have to do some sort of logic to figure
> >>out if the newly inserted position is actually the most recent by
> >>timestamp.  If positions are ever deleted or updated, the same sort of
> >>query that is currently running slow will need to be executed in order
> >>to get the new most recent position.  So there is the possibility that
> >>new positions can be inserted faster than the triggers can calculate
> >>and  maintain the summary table.  There are some other complications
> >>with maintaining such a summary table in our system too, but I won't get
> >>into those.
> >>
> >>Right now I'm just trying to see if I can get the query itself running
> >>faster, which would be the easiest solution for now.
> >>
> >>Graham.
> >>
> >>
> >>Mark Lewis wrote:
> >>
> >>
> >>
> >>>Have you looked into a materialized view sort of approach?  You could
> >>>create a table which had assetid as a primary key, and max_ts as a
> >>>column.  Then use triggers to keep that table up to date as rows are
> >>>added/updated/removed from the main table.
> >>>
> >>>This approach would only make sense if there were far fewer distinct
> >>>assetid values than rows in the main table, and would get slow if you
> >>>commonly delete rows from the main table or decrease the value for ts in
> >>>the row with the highest ts for a given assetid.
> >>>
> >>>-- Mark Lewis
> >>>
> >>>On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>Thanks Tom, that explains it and makes sense.  I guess I will have to
> >>>>accept this query taking 40 seconds, unless I can figure out another way
> >>>>to write it so it can use indexes.  If there are any more syntax
> >>>>suggestions, please pass them on.  Thanks for the help everyone.
> >>>>
> >>>>Graham.
> >>>>
> >>>>
> >>>>Tom Lane wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Graham Davis <gdavis@refractions.net> writes:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>How come an aggreate like that has to use a sequential scan?  I know
> >>>>>>that PostgreSQL use to have to do a sequential scan for all aggregates,
> >>>>>>but there was support added to version 8 so that aggregates would take
> >>>>>>advantage of indexes.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>Not in a GROUP BY context, only for the simple case.  Per the comment in
> >>>>>planagg.c:
> >>>>>
> >>>>>     * We don't handle GROUP BY, because our current implementations of
> >>>>>     * grouping require looking at all the rows anyway, and so there's not
> >>>>>     * much point in optimizing MIN/MAX.
> >>>>>
> >>>>>The problem is that using an index to obtain the maximum value of ts for
> >>>>>a given value of assetid is not the same thing as finding out what all
> >>>>>the distinct values of assetid are.
> >>>>>
> >>>>>This could possibly be improved but it would take a considerable amount
> >>>>>more work.  It's definitely not in the category of "bug fix".
> >>>>>
> >>>>>            regards, tom lane
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>

pgsql-performance by date:

Previous
From: "Adnan DURSUN"
Date:
Subject: PostgreSQL Caching
Next
From: "Tomeh, Husam"
Date:
Subject: Re: PostgreSQL Caching