Thread: Improve Query

Improve Query

From

Zach Calvert

Date:

27 May 2009, 15:17:44

So Google hasn't been helpful and I'm not entirely sure what  to look
for in the mailing lists to find the answer to my problem, so here
goes.

I have a query and I have run
explain analyze
select count(*)
from score
where leaderboardid=35 and score <= 6841 and active

The result is
"Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
time=38.878..38.878 rows=1 loops=1)"
"  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"
"        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 38.937 ms"

I have an index on score, I have an index on score and leaderboard and
active.  I can't seem to figure out how to create an index that will
turn that "Seq Scan" into an index scan. The biggest problem is that
the query degrades very quickly with a lot more rows and I will be
getting A LOT MORE rows.  What can I do to improve the performance of
this query?





Thank you so much,
ZC

Re: Improve Query

From

Grzegorz Jaśkiewicz

Date:

27 May 2009, 15:25:18

try creating index on all three columns.
Btw, 38ms is pretty fast. If you run that query very often, do prepare
it, cos I reckon it takes few ms to actually create plan for it.

Re: Improve Query

From

Nikolas Everett

Date:

27 May 2009, 16:11:52

The plan ought to be different when there are more scores and the table is analyzed and your statistics target is high enough. At this point you don't have enough data to merit doing anything but a seq scan. The overhead is simply not worth it.

You could try inserting a lot more rows. I'd create a function to do several million inserts with random numbers and then analyze and rerun.

I think in the end your probably going to see a couple of bitmap index scans, anding the results together, and then a bitmap scan.

Keep in mind that postgres stores statistics information about each column, but doesn't at this point store statistics about two columns together.

Preparing the query might actually hurt performance because postgres treats a prepare as "plan this query but I'm not going to tell you value of the parameters". If you actually let the query replan every time then you will get a different plan for the leaderboards or score ranges that are more popular.

On Wed, May 27, 2009 at 8:09 AM, Zach Calvert <zachcalvert@hemerasoftware.com> wrote:

So Google hasn't been helpful and I'm not entirely sure what to look
for in the mailing lists to find the answer to my problem, so here
goes.

I have a query and I have run
explain analyze
select count(*)
from score
where leaderboardid=35 and score <= 6841 and active

The result is
"Aggregate (cost=2491.06..2491.07 rows=1 width=0) (actual
time=38.878..38.878 rows=1 loops=1)"
" -> Seq Scan on score (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"
" Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 38.937 ms"

I have an index on score, I have an index on score and leaderboard and
active. I can't seem to figure out how to create an index that will
turn that "Seq Scan" into an index scan. The biggest problem is that
the query degrades very quickly with a lot more rows and I will be
getting A LOT MORE rows. What can I do to improve the performance of
this query?

Thank you so much,
ZC

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Improve Query

From

Zach Calvert

Date:

27 May 2009, 16:25:22

I'm running the inserts now via a JDBC call I have, which is then
followed up by the query I'm showing and a few others.  I have run
tests on all of the others, and all others run index scans and are
very fast, 10 ms or less.  This one started at 2 milliseconds when the
table is empty and is up to 40 milliseconds with 40K inserts.  It is
degrading fast and I can't imagine what will happen with 400K, let
alone 400 million.

It is getting slower at a fairly fast clip and I need it to remain
fast.  Does postgre just not do count(*) with index scans?  Is that my
problem?

I'm still running the exact same query.  Here are the indexes I have tried
CREATE INDEX idx_score_score
  ON score
  USING btree
  (score);

CREATE INDEX idx_score_ldbscore
  ON score
  USING btree
  (leaderboardid, score);

CREATE INDEX idx_score_ldbactive
  ON score
  USING btree
  (leaderboardid, active);


CREATE INDEX idx_score_ldbactivescore
  ON score
  USING btree
  (leaderboardid, active, score);

CREATE INDEX idx_score_scoreactiveldb
  ON score
  USING btree
  (score, active, leaderboardid);

Yet still I run
explain analyze
select count(*)
from score
where leaderboardid=35 and active and score <= 6841

and get
"Aggregate  (cost=2641.29..2641.30 rows=1 width=0) (actual
time=134.826..134.826 rows=1 loops=1)"
"  ->  Seq Scan on score  (cost=0.00..2536.44 rows=41938 width=0)
(actual time=0.011..126.250 rows=40918 loops=1)"
"        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 48.891 ms"




On Wed, May 27, 2009 at 11:06 AM, Nikolas Everett <nik9000@gmail.com> wrote:
> The plan ought to be different when there are more scores and the table is
> analyzed and your statistics target is high enough.  At this point you don't
> have enough data to merit doing anything but a seq scan.  The overhead is
> simply not worth it.
>
> You could try inserting a lot more rows.  I'd create a function to do
> several million inserts with random numbers and then analyze and rerun.
>
> I think in the end your probably going to see a couple of bitmap index
> scans, anding the results together, and then a bitmap scan.
>
> Keep in mind that postgres stores statistics information about each column,
> but doesn't at this point store statistics about two columns together.
>
> Preparing the query might actually hurt performance because postgres treats
> a prepare as "plan this query but I'm not going to tell you value of the
> parameters".  If you actually let the query replan every time then you will
> get a different plan for the leaderboards or score ranges that are more
> popular.
>
> On Wed, May 27, 2009 at 8:09 AM, Zach Calvert
> <zachcalvert@hemerasoftware.com> wrote:
>>
>> So Google hasn't been helpful and I'm not entirely sure what  to look
>> for in the mailing lists to find the answer to my problem, so here
>> goes.
>>
>> I have a query and I have run
>> explain analyze
>> select count(*)
>> from score
>> where leaderboardid=35 and score <= 6841 and active
>>
>> The result is
>> "Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
>> time=38.878..38.878 rows=1 loops=1)"
>> "  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
>> (actual time=0.012..30.760 rows=38571 loops=1)"
>> "        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
>> "Total runtime: 38.937 ms"
>>
>> I have an index on score, I have an index on score and leaderboard and
>> active.  I can't seem to figure out how to create an index that will
>> turn that "Seq Scan" into an index scan. The biggest problem is that
>> the query degrades very quickly with a lot more rows and I will be
>> getting A LOT MORE rows.  What can I do to improve the performance of
>> this query?
>>
>>
>>
>>
>>
>> Thank you so much,
>> ZC
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>

Re: Improve Query

From

Grzegorz Jaśkiewicz

Date:

27 May 2009, 16:32:53

you have to vacuum analyze after you've created index, afaik.
No, count(*) is still counting rows.

Re: Improve Query

From

Zach Calvert

Date:

27 May 2009, 16:42:56

Still getting a seq scan after doing vacuum analyze.  Any other ideas?

2009/5/27 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> you have to vacuum analyze after you've created index, afaik.
> No, count(*) is still counting rows.
>

Re: Improve Query

From

Heikki Linnakangas

Date:

27 May 2009, 17:10:35

Zach Calvert wrote:
> Still getting a seq scan after doing vacuum analyze.  Any other ideas?

Try CLUSTERing the table on the (leaderboardid, active, score) index.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com