Home > mailing lists

Re: Optimising "in" queries - Mailing list pgsql-performance

From	Russell Smith
Subject	Re: Optimising "in" queries
Date	August 22, 2007 08:02:58
Msg-id	46CC16F3.7060606@pws.com.au Whole thread Raw
In response to	Optimising "in" queries (Stephen Davies <scldad@sdc.com.au>)
Responses	Re: Optimising "in" queries Re: Optimising "in" queries
List	pgsql-performance

Tree view

Stephen Davies wrote:
> I have a PostgreSQL 8.2.4 table with some seven million rows.
>
> The psql query:
>
> select count(rdate),rdate from reading where sensor_id in
> (1137,1138,1139) group by rdate order by rdate desc limit 1;
>
> takes a few seconds but:
>
> select count(rdate),rdate from reading where sensor_id in
> (1137,1138,1139,1140) group by rdate order by rdate desc limit 1;
>
>
It would have been helpful to see the table definition here.  I can say
up front that array processing in postgres is SLOW.

> (anything with four or more values in the "in" list) takes several
> minutes.
>
> Is there any way to make the "larger" queries more efficient?
>
> Both rdate and sensor_id are indexed and the database is vacuumed every
> night.
>
> The values in the "in" list are seldom as "neat" as in the above
> examples. Actual values can range from 1 to about 2000. The number of
> values ranges from 2 to about 10.
>
> Explain outputs are:
>
> benparts=# explain select count(rdate),rdate from reading where
> sensor_id in (1137,1138,1139,1140) group by rdate order by rdate desc
> limit 1;
>                                             QUERY PLAN
> ---------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..39890.96 rows=1 width=8)
>    ->  GroupAggregate  (cost=0.00..7938300.21 rows=199 width=8)
>          ->  Index Scan Backward using date on reading
> (cost=0.00..7937884.59 rows=82625 width=8)
>                Filter: (sensor_id = ANY
> ('{1137,1138,1139,1140}'::integer[]))
> (4 rows)
>
I'm unsure of how you produced a plan like this without the benefit of
seeing the table definition.
> benparts=# explain select count(rdate),rdate from reading where
> sensor_id in (1137,1138,1139) group by rdate order by rdate desc limit
> 1;
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Limit  (cost=48364.32..48364.32 rows=1 width=8)
>    ->  Sort  (cost=48364.32..48364.49 rows=69 width=8)
>          Sort Key: rdate
>          ->  HashAggregate  (cost=48361.35..48362.21 rows=69 width=8)
>                ->  Bitmap Heap Scan on reading  (cost=535.53..48218.10
> rows=28650 width=8)
>                      Recheck Cond: (sensor_id = ANY
> ('{1137,1138,1139}'::integer[]))
>                      ->  Bitmap Index Scan on reading_sensor
> (cost=0.00..528.37 rows=28650 width=0)
>                            Index Cond: (sensor_id = ANY
> ('{1137,1138,1139}'::integer[]))
> (8 rows)
>
>
>
As mentioned already, you need explain analyze.

However I again will say that array processing is postgres is SLOW.  It
would strongly recommend redesigning your schema to use a table with
sensor_id's that correspond to the primary key in the reading table.

Rethinking the way you are going about this will probably be the most
effective solution, but we will need more information if you are not
comfortable doing that yourself.

Regards

Russell Smith

pgsql-performance by date:

From: Russell Smith
Date: 22 August 2007, 07:56:38
Subject: Re: Poor Performance after Upgrade

From: Michael Glaesemann
Date: 22 August 2007, 10:25:39
Subject: Re: Optimising "in" queries

Re: Optimising "in" queries - Mailing list pgsql-performance

Previous

Next