Re: BUG #2658: Query not using index - Mailing list pgsql-performance

From Graham Davis
Subject Re: BUG #2658: Query not using index
Date
Msg-id 4522D7B1.3040008@refractions.net
Whole thread Raw
In response to Re: BUG #2658: Query not using index  (Mark Lewis <mark.lewis@mir3.com>)
Responses Re: BUG #2658: Query not using index  (Mark Lewis <mark.lewis@mir3.com>)
List pgsql-performance
Not many.  It fluctuates, but there are usually only ever a few hundred
at most.  Each assetid has multi-millions of positions though.

Mark Lewis wrote:

>Hmmm.  How many distinct assetids are there?
>-- Mark Lewis
>
>On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
>
>
>>The "summary table" approach maintained by triggers is something we are
>>considering, but it becomes a bit more complicated to implement.
>>Currently we have groups of new positions coming in every few seconds or
>>less.  They are not guaranteed to be in order.  So for instance, a group
>>of positions from today could come in and be inserted, then a group of
>>positions that got lost from yesterday could come in and be inserted
>>afterwards.
>>
>>This means the triggers would have to do some sort of logic to figure
>>out if the newly inserted position is actually the most recent by
>>timestamp.  If positions are ever deleted or updated, the same sort of
>>query that is currently running slow will need to be executed in order
>>to get the new most recent position.  So there is the possibility that
>>new positions can be inserted faster than the triggers can calculate
>>and  maintain the summary table.  There are some other complications
>>with maintaining such a summary table in our system too, but I won't get
>>into those.
>>
>>Right now I'm just trying to see if I can get the query itself running
>>faster, which would be the easiest solution for now.
>>
>>Graham.
>>
>>
>>Mark Lewis wrote:
>>
>>
>>
>>>Have you looked into a materialized view sort of approach?  You could
>>>create a table which had assetid as a primary key, and max_ts as a
>>>column.  Then use triggers to keep that table up to date as rows are
>>>added/updated/removed from the main table.
>>>
>>>This approach would only make sense if there were far fewer distinct
>>>assetid values than rows in the main table, and would get slow if you
>>>commonly delete rows from the main table or decrease the value for ts in
>>>the row with the highest ts for a given assetid.
>>>
>>>-- Mark Lewis
>>>
>>>On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
>>>
>>>
>>>
>>>
>>>>Thanks Tom, that explains it and makes sense.  I guess I will have to
>>>>accept this query taking 40 seconds, unless I can figure out another way
>>>>to write it so it can use indexes.  If there are any more syntax
>>>>suggestions, please pass them on.  Thanks for the help everyone.
>>>>
>>>>Graham.
>>>>
>>>>
>>>>Tom Lane wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Graham Davis <gdavis@refractions.net> writes:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>How come an aggreate like that has to use a sequential scan?  I know
>>>>>>that PostgreSQL use to have to do a sequential scan for all aggregates,
>>>>>>but there was support added to version 8 so that aggregates would take
>>>>>>advantage of indexes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>Not in a GROUP BY context, only for the simple case.  Per the comment in
>>>>>planagg.c:
>>>>>
>>>>>     * We don't handle GROUP BY, because our current implementations of
>>>>>     * grouping require looking at all the rows anyway, and so there's not
>>>>>     * much point in optimizing MIN/MAX.
>>>>>
>>>>>The problem is that using an index to obtain the maximum value of ts for
>>>>>a given value of assetid is not the same thing as finding out what all
>>>>>the distinct values of assetid are.
>>>>>
>>>>>This could possibly be improved but it would take a considerable amount
>>>>>more work.  It's definitely not in the category of "bug fix".
>>>>>
>>>>>            regards, tom lane
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>


--
Graham Davis
Refractions Research Inc.
gdavis@refractions.net


pgsql-performance by date:

Previous
From: Mark Lewis
Date:
Subject: Re: BUG #2658: Query not using index
Next
From: "Adnan DURSUN"
Date:
Subject: PostgreSQL Caching