Thread: Performance problems on a fairly big table with two key columns.

Performance problems on a fairly big table with two key columns.

From
"Rasmus Aveskogh"
Date:
Hi,

I have a table that looks like this:

  DATA   ID   TIME
|------|----|------|

The table holds app. 14M rows now and grows by app. 350k rows a day.

The ID-column holds about 1500 unique values (integer).
The TIME-columns is of type timestamp without timezone.

I have one index (b-tree) on the ID-column and one index (b-tree) on the
time-column.

My queries most often look like this:

SELECT DATA FROM <tbl> WHERE ID = 1 AND TIME > now() - '1 day'::interval;

or

SELECT DATA FROM <tbl> WHERE ID = 2 AND TIME > now() - '1 week'::interval;


Since I have about 350000 rows the last 24 hours the query planner chooses
to use my ID-index to get hold of the rows - then using only a filter on
the time column.

This takes a lot of time (over a minute) on a P4 1900MHz which
unfortenately isn't good enough for my purpose (webpages times out and so
on..).


If I SELECT only the rows with a certain ID (regardless of time):

SELECT DATA FROM <tbl> WHERE ID = 3;

..it still takes almost a minute so I guess this is the problem (not the
filtering on the TIME-column), especially since it recieves a lot of rows
which will be descarded using my filter anyway.
(I recieve ~6000 rows and want about 250).

But using the TIME-column as a first subset of rows and discarding using
the ID-column as a filter is even worse since I then get 350k rows and
discards about 349750 of them using the filter.

I tried applying a multicolumn index on ID and TIME, but that one won't
even be used (after ANALYZE).

My only option here seems to have like a "daily" table which will only
carry the rows for the past 24 hours which will give my SELECT a result of
6000 initial rows out of ~350k (instead of 14M like now) and then 250 when
filtered.
But I really hope there is a cleaner solution to the problem - actually I
though a multicolumn index would do it.

-ra


Re: Performance problems on a fairly big table with two key columns.

From
Richard Huxton
Date:
On Thursday 04 September 2003 23:53, Rasmus Aveskogh wrote:
> Hi,
>
> I have a table that looks like this:
>
>   DATA   ID   TIME
>
> |------|----|------|
>
> The table holds app. 14M rows now and grows by app. 350k rows a day.
>
> The ID-column holds about 1500 unique values (integer).
> The TIME-columns is of type timestamp without timezone.
>
> I have one index (b-tree) on the ID-column and one index (b-tree) on the
> time-column.
>
> My queries most often look like this:
>
> SELECT DATA FROM <tbl> WHERE ID = 1 AND TIME > now() - '1 day'::interval;
[snip]
> I tried applying a multicolumn index on ID and TIME, but that one won't
> even be used (after ANALYZE).

The problem is likely to be that the parser isn't spotting that now()-'1 day'
is constant. Try an explicit time and see if the index is used. If so, you
can write a wrapper function for your expression (mark it STABLE so the
planner knows it won't change during the statement).

Alternatively, you can do the calculation in the application and use an
explicit time.

HTH
--
  Richard Huxton
  Archonet Ltd

Re: Performance problems on a fairly big table with two

From
"Rasmus Aveskogh"
Date:
Richard,

Thanks a lot! You were right - the query parser "misunderstood"
now() - '1 day'::interval and only used one of the indexes (as I already
noticed).

Actually all I had to do was to cast the result like this:

(now() - '1 day'::interval)::date

75s is not between 10ms and 200ms.

Thanks again!

-ra


> On Thursday 04 September 2003 23:53, Rasmus Aveskogh wrote:
>> Hi,
>>
>> I have a table that looks like this:
>>
>>   DATA   ID   TIME
>>
>> |------|----|------|
>>
>> The table holds app. 14M rows now and grows by app. 350k rows a day.
>>
>> The ID-column holds about 1500 unique values (integer).
>> The TIME-columns is of type timestamp without timezone.
>>
>> I have one index (b-tree) on the ID-column and one index (b-tree) on the
>> time-column.
>>
>> My queries most often look like this:
>>
>> SELECT DATA FROM <tbl> WHERE ID = 1 AND TIME > now() - '1
>> day'::interval;
> [snip]
>> I tried applying a multicolumn index on ID and TIME, but that one won't
>> even be used (after ANALYZE).
>
> The problem is likely to be that the parser isn't spotting that now()-'1
> day'
> is constant. Try an explicit time and see if the index is used. If so, you
> can write a wrapper function for your expression (mark it STABLE so the
> planner knows it won't change during the statement).
>
> Alternatively, you can do the calculation in the application and use an
> explicit time.
>
> HTH
> --
>   Richard Huxton
>   Archonet Ltd
>


Re: Performance problems on a fairly big table with two

From
Richard Huxton
Date:
On Friday 05 September 2003 16:36, Rasmus Aveskogh wrote:
> Richard,
>
> Thanks a lot! You were right - the query parser "misunderstood"
> now() - '1 day'::interval and only used one of the indexes (as I already
> noticed).
>
> Actually all I had to do was to cast the result like this:
>
> (now() - '1 day'::interval)::date
>
> 75s is not between 10ms and 200ms.
>
> Thanks again!

Ah - good. You also want to be careful with differences between timestamp
with/without time zone etc.
--
  Richard Huxton
  Archonet Ltd