Thread: Optimizing select count query which often takes over 10 seconds

Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hello,

for a PostgreSQL 8.4.13 database + pgbouncer
on a 32 GB RAM machine with CentOS 6.3 / 64 bit
I use the following settings:

max_connections = 100
shared_buffers = 4096MB
work_mem = 32MB
checkpoint_segments = 32   # to shut up nightly pg_dump
escape_string_warning = off # to shut up Drupal 7.19 warnings
log_min_duration_statement = 10000

And the latter statement always
reports me just 1 command
(besides Drupal which I can't fix):

LOG:  duration: 12590.394 ms  statement:
        select count(id) from (
            select id,
                   row_number() over(partition by yw order by money
desc) as ranking
            from pref_money
        ) x
        where x.ranking = 1 and id='OK471018960997'

This command comes from a PHP-script
of mine which displays "medals" on
a player profile page - meaning how many
times she won a weekly tournament:

# \d pref_money
                        Table "public.pref_money"
 Column |         Type          |                Modifiers
--------+-----------------------+-----------------------------------------
 id     | character varying(32) |
 money  | integer               | not null
 yw     | character(7)          | default to_char(now(), 'IYYY-IW'::text)
Indexes:
    "pref_money_yw_index" btree (yw)
Foreign-key constraints:
    "pref_money_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
DELETE CASCADE

Does anybody please have an idea
how could I optimize it or should
I introduce a hourly job and a "medals"
column (that would make my players
stats less "live")?

Here is the EXPLAIN output
(which I hardly understand) for
a player with 9 weekly medals:

# explain analyze select count(id) from (
            select id,
                   row_number() over(partition by yw order by money
desc) as ranking
            from pref_money
        ) x
        where x.ranking = 1 and id='OK452217781481';
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
time=4520.719..4520.719 rows=1 loops=1)
   ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
(actual time=4470.620..4520.710 rows=6 loops=1)
         Filter: ((x.ranking = 1) AND ((x.id)::text = 'OK452217781481'::text))
         ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
(actual time=4293.315..4491.652 rows=429803 loops=1)
               ->  Sort  (cost=48519.10..49603.03 rows=433574
width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
                     Sort Key: pref_money.yw, pref_money.money
                     Sort Method:  external sort  Disk: 15856kB
                     ->  Seq Scan on pref_money  (cost=0.00..7923.74
rows=433574 width=26) (actual time=0.006..41.907 rows=429803 loops=1)
 Total runtime: 4525.662 ms
(9 rows)

Thank you for any hints
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Moshe Jacobson
Date:
On Thu, Jan 24, 2013 at 4:57 AM, Alexander Farber <alexander.farber@gmail.com> wrote:
        select count(id) from (
            select id,
                   row_number() over(partition by yw order by money
desc) as ranking
            from pref_money
        ) x
        where x.ranking = 1 and id='OK471018960997'

Does anybody please have an idea
how could I optimize it or should
I introduce a hourly job and a "medals"
column (that would make my players
stats less "live")?

One idea is to have a new ranking column to cache every player's ranking for every weekly tournament. However, instead of updating it hourly with a cron job, you could have a trigger on the table, such that when any row is updated/inserted, you recalculate the rankings for only those rows having the same "yw" value.

Then, you might want to create an index on the ranking column as well as the yw column, which you already have indexed

Moshe
 
--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | 
www.neadwerx.com

Re: Optimizing select count query which often takes over 10 seconds

From
Alban Hertroys
Date:
On 24 January 2013 10:57, Alexander Farber <alexander.farber@gmail.com> wrote:
# explain analyze select count(id) from (
            select id,
                   row_number() over(partition by yw order by money
desc) as ranking
            from pref_money
        ) x
        where x.ranking = 1 and id='OK452217781481';
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
time=4520.719..4520.719 rows=1 loops=1)
   ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
(actual time=4470.620..4520.710 rows=6 loops=1)
         Filter: ((x.ranking = 1) AND ((x.id)::text = 'OK452217781481'::text))
         ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
(actual time=4293.315..4491.652 rows=429803 loops=1)
               ->  Sort  (cost=48519.10..49603.03 rows=433574
width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
                     Sort Key: pref_money.yw, pref_money.money
                     Sort Method:  external sort  Disk: 15856kB

It's sorting on disk. That's not going to be fast. Indeed, it's taking nearly all the time the query takes (4.4s for this step out of 4.5s for the query).
 
                     ->  Seq Scan on pref_money  (cost=0.00..7923.74
rows=433574 width=26) (actual time=0.006..41.907 rows=429803 loops=1)

And then it's doing a sequential scan to sort the data. I suspect that's because it's sorting on disk. Then again, this only takes 42ms, just once (loops=1), so perhaps a seqscan is indeed the fastest approach here (actually, wow, it scans 10000 records/ms - rows are 26 bytes wide, so that's 260MB/s! I'm doubting my math here...).

 Total runtime: 4525.662 ms
(9 rows)

Thank you for any hints
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hello -

On Thu, Jan 24, 2013 at 3:39 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> On 24 January 2013 10:57, Alexander Farber <alexander.farber@gmail.com>
> wrote:
>>
>> # explain analyze select count(id) from (
>>             select id,
>>                    row_number() over(partition by yw order by money
>> desc) as ranking
>>             from pref_money
>>         ) x
>>         where x.ranking = 1 and id='OK452217781481';
>>                                                                 QUERY PLAN
>>
>>
-------------------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
>> time=4520.719..4520.719 rows=1 loops=1)
>>    ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
>> (actual time=4470.620..4520.710 rows=6 loops=1)
>>          Filter: ((x.ranking = 1) AND ((x.id)::text =
>> 'OK452217781481'::text))
>>          ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
>> (actual time=4293.315..4491.652 rows=429803 loops=1)
>>                ->  Sort  (cost=48519.10..49603.03 rows=433574
>> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
>>                      Sort Key: pref_money.yw, pref_money.money
>>                      Sort Method:  external sort  Disk: 15856kB
>
>
> It's sorting on disk. That's not going to be fast. Indeed, it's taking
> nearly all the time the query takes (4.4s for this step out of 4.5s for the
> query).

I've noticed that too, but what
does "sorting on disk" mean?

I have a lot of RAM (32 GB) ,
should I increase work_mem even more?
(it is currenlty 32 MB)

Regards
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Alban Hertroys
Date:

> It's sorting on disk. That's not going to be fast. Indeed, it's taking
> nearly all the time the query takes (4.4s for this step out of 4.5s for the
> query).

I've noticed that too, but what
does "sorting on disk" mean?

I have a lot of RAM (32 GB) ,
should I increase work_mem even more?
(it is currenlty 32 MB)

You can try increasing the amount of work_mem in your psql session only and see what amount helps. That way you don't need to permanently increase it for all your queries.
I'd start with 48 MB and increase in increments of 16 MB (as that's the size the sort operation claims to require on disk).

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Optimizing select count query which often takes over 10 seconds

From
Guillaume Lelarge
Date:
On Thu, 2013-01-24 at 15:45 +0100, Alexander Farber wrote:
> Hello -
>
> On Thu, Jan 24, 2013 at 3:39 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> > On 24 January 2013 10:57, Alexander Farber <alexander.farber@gmail.com>
> > wrote:
> >>
> >> # explain analyze select count(id) from (
> >>             select id,
> >>                    row_number() over(partition by yw order by money
> >> desc) as ranking
> >>             from pref_money
> >>         ) x
> >>         where x.ranking = 1 and id='OK452217781481';
> >>                                                                 QUERY PLAN
> >>
> >>
-------------------------------------------------------------------------------------------------------------------------------------------
> >>  Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
> >> time=4520.719..4520.719 rows=1 loops=1)
> >>    ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
> >> (actual time=4470.620..4520.710 rows=6 loops=1)
> >>          Filter: ((x.ranking = 1) AND ((x.id)::text =
> >> 'OK452217781481'::text))
> >>          ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
> >> (actual time=4293.315..4491.652 rows=429803 loops=1)
> >>                ->  Sort  (cost=48519.10..49603.03 rows=433574
> >> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
> >>                      Sort Key: pref_money.yw, pref_money.money
> >>                      Sort Method:  external sort  Disk: 15856kB
> >
> >
> > It's sorting on disk. That's not going to be fast. Indeed, it's taking
> > nearly all the time the query takes (4.4s for this step out of 4.5s for the
> > query).
>
> I've noticed that too, but what
> does "sorting on disk" mean?
>
> I have a lot of RAM (32 GB) ,
> should I increase work_mem even more?
> (it is currenlty 32 MB)
>

You should better create an index on pref_money(yw, money). It could
help you get rid of the seqscan and sort operations.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hi -

On Thu, Jan 24, 2013 at 8:12 PM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:
> You should better create an index on pref_money(yw, money). It could
> help you get rid of the seqscan and sort operations.

I've created an index with

# create index pref_money_money_index on pref_money(money desc);

and posted the new EXPLAIN output here:

http://stackoverflow.com/questions/14498974/optimizing-select-count-query-which-often-takes-over-10-seconds

But it doesn't seem to change much or
am I too unexperienced to see the change?

Thanks
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Guillaume Lelarge
Date:
On Fri, 2013-01-25 at 16:13 +0100, Alexander Farber wrote:
> Hi -
>
> On Thu, Jan 24, 2013 at 8:12 PM, Guillaume Lelarge
> <guillaume@lelarge.info> wrote:
> > You should better create an index on pref_money(yw, money). It could
> > help you get rid of the seqscan and sort operations.
>
> I've created an index with
>
> # create index pref_money_money_index on pref_money(money desc);
>
> and posted the new EXPLAIN output here:
>
> http://stackoverflow.com/questions/14498974/optimizing-select-count-query-which-often-takes-over-10-seconds
>
> But it doesn't seem to change much or
> am I too unexperienced to see the change?
>

There's no change because you created an index on money alone, and that
change sure didn't give PostgreSQL a chance to do anything better. What
I told you before was to create an index on yw, and money, like this :

create index pref_money_yw_money_idx on pref_money(yw, money);

This should help you to change the plan and, I hope, get better
performances.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Jeff Janes
Date:
On Thu, Jan 24, 2013 at 1:57 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
> Hello,
>
> for a PostgreSQL 8.4.13 database + pgbouncer

Using 8.4 is really going to limit your options.

..

>
> LOG:  duration: 12590.394 ms  statement:
>         select count(id) from (
>             select id,
>                    row_number() over(partition by yw order by money
> desc) as ranking
>             from pref_money
>         ) x
>         where x.ranking = 1 and id='OK471018960997'

Since you only care about ranking=1, it might be better to rewrite
that using something like:

where money = (select max(money....)

But, I doubt it.  I don't think even the 9.2 planner has the smarts to
do what you want efficiently.  It might be possible to make it do it
efficiently using a recursive query, once you have the index on
(yw,money).


> This command comes from a PHP-script
> of mine which displays "medals" on
> a player profile page - meaning how many
> times she won a weekly tournament:
..
>
> Does anybody please have an idea
> how could I optimize it or should
> I introduce a hourly job and a "medals"
> column (that would make my players
> stats less "live")?

This sounds like a good idea.  But if the tournament is weekly why
would the job have to be hourly?  Why do the results of a weekly
tournament need to be 'live'?

Cheers,

Jeff


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Guillaume Lelarge
Date:
On Thu, 2013-01-24 at 15:45 +0100, Alexander Farber wrote:
> Hello -
>
> On Thu, Jan 24, 2013 at 3:39 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> > On 24 January 2013 10:57, Alexander Farber <alexander.farber@gmail.com>
> > wrote:
> >>
> >> # explain analyze select count(id) from (
> >>             select id,
> >>                    row_number() over(partition by yw order by money
> >> desc) as ranking
> >>             from pref_money
> >>         ) x
> >>         where x.ranking = 1 and id='OK452217781481';
> >>                                                                 QUERY PLAN
> >>
> >>
-------------------------------------------------------------------------------------------------------------------------------------------
> >>  Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
> >> time=4520.719..4520.719 rows=1 loops=1)
> >>    ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
> >> (actual time=4470.620..4520.710 rows=6 loops=1)
> >>          Filter: ((x.ranking = 1) AND ((x.id)::text =
> >> 'OK452217781481'::text))
> >>          ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
> >> (actual time=4293.315..4491.652 rows=429803 loops=1)
> >>                ->  Sort  (cost=48519.10..49603.03 rows=433574
> >> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
> >>                      Sort Key: pref_money.yw, pref_money.money
> >>                      Sort Method:  external sort  Disk: 15856kB
> >
> >
> > It's sorting on disk. That's not going to be fast. Indeed, it's taking
> > nearly all the time the query takes (4.4s for this step out of 4.5s for the
> > query).
>
> I've noticed that too, but what
> does "sorting on disk" mean?
>
> I have a lot of RAM (32 GB) ,
> should I increase work_mem even more?
> (it is currenlty 32 MB)
>

You should better create an index on pref_money(yw, money). It could
help you get rid of the seqscan and sort operations.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Guillaume Lelarge
Date:
On Fri, 2013-01-25 at 16:13 +0100, Alexander Farber wrote:
> Hi -
>
> On Thu, Jan 24, 2013 at 8:12 PM, Guillaume Lelarge
> <guillaume@lelarge.info> wrote:
> > You should better create an index on pref_money(yw, money). It could
> > help you get rid of the seqscan and sort operations.
>
> I've created an index with
>
> # create index pref_money_money_index on pref_money(money desc);
>
> and posted the new EXPLAIN output here:
>
> http://stackoverflow.com/questions/14498974/optimizing-select-count-query-which-often-takes-over-10-seconds
>
> But it doesn't seem to change much or
> am I too unexperienced to see the change?
>

There's no change because you created an index on money alone, and that
change sure didn't give PostgreSQL a chance to do anything better. What
I told you before was to create an index on yw, and money, like this :

create index pref_money_yw_money_idx on pref_money(yw, money);

This should help you to change the plan and, I hope, get better
performances.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Alban Hertroys
Date:

> It's sorting on disk. That's not going to be fast. Indeed, it's taking
> nearly all the time the query takes (4.4s for this step out of 4.5s for the
> query).

I've noticed that too, but what
does "sorting on disk" mean?

I have a lot of RAM (32 GB) ,
should I increase work_mem even more?
(it is currenlty 32 MB)

You can try increasing the amount of work_mem in your psql session only and see what amount helps. That way you don't need to permanently increase it for all your queries.
I'd start with 48 MB and increase in increments of 16 MB (as that's the size the sort operation claims to require on disk).

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Optimizing select count query which often takes over 10 seconds

From
Alban Hertroys
Date:
On 24 January 2013 10:57, Alexander Farber <alexander.farber@gmail.com> wrote:
# explain analyze select count(id) from (
            select id,
                   row_number() over(partition by yw order by money
desc) as ranking
            from pref_money
        ) x
        where x.ranking = 1 and id='OK452217781481';
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
time=4520.719..4520.719 rows=1 loops=1)
   ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
(actual time=4470.620..4520.710 rows=6 loops=1)
         Filter: ((x.ranking = 1) AND ((x.id)::text = 'OK452217781481'::text))
         ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
(actual time=4293.315..4491.652 rows=429803 loops=1)
               ->  Sort  (cost=48519.10..49603.03 rows=433574
width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
                     Sort Key: pref_money.yw, pref_money.money
                     Sort Method:  external sort  Disk: 15856kB

It's sorting on disk. That's not going to be fast. Indeed, it's taking nearly all the time the query takes (4.4s for this step out of 4.5s for the query).
 
                     ->  Seq Scan on pref_money  (cost=0.00..7923.74
rows=433574 width=26) (actual time=0.006..41.907 rows=429803 loops=1)

And then it's doing a sequential scan to sort the data. I suspect that's because it's sorting on disk. Then again, this only takes 42ms, just once (loops=1), so perhaps a seqscan is indeed the fastest approach here (actually, wow, it scans 10000 records/ms - rows are 26 bytes wide, so that's 260MB/s! I'm doubting my math here...).

 Total runtime: 4525.662 ms
(9 rows)

Thank you for any hints
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hello -

On Thu, Jan 24, 2013 at 3:39 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> On 24 January 2013 10:57, Alexander Farber <alexander.farber@gmail.com>
> wrote:
>>
>> # explain analyze select count(id) from (
>>             select id,
>>                    row_number() over(partition by yw order by money
>> desc) as ranking
>>             from pref_money
>>         ) x
>>         where x.ranking = 1 and id='OK452217781481';
>>                                                                 QUERY PLAN
>>
>>
-------------------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=63694.22..63694.23 rows=1 width=82) (actual
>> time=4520.719..4520.719 rows=1 loops=1)
>>    ->  Subquery Scan x  (cost=48519.10..63694.19 rows=11 width=82)
>> (actual time=4470.620..4520.710 rows=6 loops=1)
>>          Filter: ((x.ranking = 1) AND ((x.id)::text =
>> 'OK452217781481'::text))
>>          ->  WindowAgg  (cost=48519.10..57190.58 rows=433574 width=26)
>> (actual time=4293.315..4491.652 rows=429803 loops=1)
>>                ->  Sort  (cost=48519.10..49603.03 rows=433574
>> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
>>                      Sort Key: pref_money.yw, pref_money.money
>>                      Sort Method:  external sort  Disk: 15856kB
>
>
> It's sorting on disk. That's not going to be fast. Indeed, it's taking
> nearly all the time the query takes (4.4s for this step out of 4.5s for the
> query).

I've noticed that too, but what
does "sorting on disk" mean?

I have a lot of RAM (32 GB) ,
should I increase work_mem even more?
(it is currenlty 32 MB)

Regards
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Jeff Janes
Date:
On Thu, Jan 24, 2013 at 1:57 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
> Hello,
>
> for a PostgreSQL 8.4.13 database + pgbouncer

Using 8.4 is really going to limit your options.

..

>
> LOG:  duration: 12590.394 ms  statement:
>         select count(id) from (
>             select id,
>                    row_number() over(partition by yw order by money
> desc) as ranking
>             from pref_money
>         ) x
>         where x.ranking = 1 and id='OK471018960997'

Since you only care about ranking=1, it might be better to rewrite
that using something like:

where money = (select max(money....)

But, I doubt it.  I don't think even the 9.2 planner has the smarts to
do what you want efficiently.  It might be possible to make it do it
efficiently using a recursive query, once you have the index on
(yw,money).


> This command comes from a PHP-script
> of mine which displays "medals" on
> a player profile page - meaning how many
> times she won a weekly tournament:
..
>
> Does anybody please have an idea
> how could I optimize it or should
> I introduce a hourly job and a "medals"
> column (that would make my players
> stats less "live")?

This sounds like a good idea.  But if the tournament is weekly why
would the job have to be hourly?  Why do the results of a weekly
tournament need to be 'live'?

Cheers,

Jeff


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Moshe Jacobson
Date:
On Thu, Jan 24, 2013 at 4:57 AM, Alexander Farber <alexander.farber@gmail.com> wrote:
        select count(id) from (
            select id,
                   row_number() over(partition by yw order by money
desc) as ranking
            from pref_money
        ) x
        where x.ranking = 1 and id='OK471018960997'

Does anybody please have an idea
how could I optimize it or should
I introduce a hourly job and a "medals"
column (that would make my players
stats less "live")?

One idea is to have a new ranking column to cache every player's ranking for every weekly tournament. However, instead of updating it hourly with a cron job, you could have a trigger on the table, such that when any row is updated/inserted, you recalculate the rankings for only those rows having the same "yw" value.

Then, you might want to create an index on the ranking column as well as the yw column, which you already have indexed

Moshe
 
--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | 
www.neadwerx.com

Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hi -

On Thu, Jan 24, 2013 at 8:12 PM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:
> You should better create an index on pref_money(yw, money). It could
> help you get rid of the seqscan and sort operations.

I've created an index with

# create index pref_money_money_index on pref_money(money desc);

and posted the new EXPLAIN output here:

http://stackoverflow.com/questions/14498974/optimizing-select-count-query-which-often-takes-over-10-seconds

But it doesn't seem to change much or
am I too unexperienced to see the change?

Thanks
Alex


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hello -

On Fri, Jan 25, 2013 at 7:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 1:57 AM, Alexander Farber
> <alexander.farber@gmail.com> wrote:
>>
>> LOG:  duration: 12590.394 ms  statement:
>>         select count(id) from (
>>             select id,
>>                    row_number() over(partition by yw order by money
>> desc) as ranking
>>             from pref_money
>>         ) x
>>         where x.ranking = 1 and id='OK471018960997'
>
> This sounds like a good idea.  But if the tournament is weekly why
> would the job have to be hourly?  Why do the results of a weekly
> tournament need to be 'live'?

because for the current week
the medals are displayed too.

And when a player enters a top
then he should get +1 medals and
the one he pushed from the top -1 medals

So even hourly isn't really good enough for me...
It should be "live" stats.

Regards
Alex


Re: Optimizing select count query which often takes over 10 seconds

From
Jeff Janes
Date:
On Sun, Jan 27, 2013 at 9:25 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
> Hello -
>
> On Fri, Jan 25, 2013 at 7:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

>> This sounds like a good idea.  But if the tournament is weekly why
>> would the job have to be hourly?  Why do the results of a weekly
>> tournament need to be 'live'?
>
> because for the current week
> the medals are displayed too.
>
> And when a player enters a top
> then he should get +1 medals and
> the one he pushed from the top -1 medals
>
> So even hourly isn't really good enough for me...
> It should be "live" stats.

Once the week is over, materialize the medals for that week.  Then the
live part of the query only needs to specify the currently live week,
not the entire history.  And in that case, the query should be quite
efficient if you have in index on both week and money.

Cheers,

Jeff


Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Thanks -

On Sun, Jan 27, 2013 at 8:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Once the week is over, materialize the medals for that week.  Then the
> live part of the query only needs to specify the currently live week,
> not the entire history.  And in that case, the query should be quite
> efficient if you have in index on both week and money.

I will try that!


Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Thank you, I've ended up with this cronjob
(yes I'm unfrotunately having week numbers as strings):

/* reset and then update medals count */

update pref_users set medals = 0;
UPDATE 223456

update pref_users u
set medals = s.medals
from (
    select id, count(id) medals
    from (
        select id,
            row_number() over(partition by yw order by money desc) as ranking
        from pref_money where yw <> to_char(CURRENT_TIMESTAMP, 'IYYY-IW')
    ) x
    where ranking = 1
    group by id
) s
where u.id = s.id;
UPDATE 65


Re: Optimizing select count query which often takes over 10 seconds

From
Kevin Grittner
Date:
Alexander Farber <alexander.farber@gmail.com> wrote:

> update pref_users set medals = 0;
> UPDATE 223456

You're probably going to like your performance a lot better if you
modify that to:

update pref_users set medals = 0 where medals <> 0;

-Kevin



Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
Hello Kevin,

On Wed, Jan 30, 2013 at 12:09 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Alexander Farber <alexander.farber@gmail.com> wrote:
>
>> update pref_users set medals = 0;
>> UPDATE 223456
>
> You're probably going to like your performance a lot better if you
> modify that to:
>
> update pref_users set medals = 0 where medals <> 0;

is it really so?

I only have 65 users (out of 223456) with medals != 0.

When programming other languages, I never do
if (x != 0) { x = 0; } but just set x = 0 straight away.

Regards
Alex


Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
The cronjob gives me now occasionally:

/* reset and then update medals count */
update pref_users set medals = 0;
psql:/home/afarber/bin/clean-database.sql:63: ERROR:  deadlock detected

DETAIL:  Process 31072 waits for ShareLock on transaction 124735679;
blocked by process 30368.
Process 30368 waits for ShareLock on transaction 124735675; blocked by
process 31072.
HINT:  See server log for query details.
update pref_users u
set medals = s.medals
from (
    select id, count(id) medals
    from (
        select id,
            row_number() over(partition by yw order by money desc) as ranking
        from pref_money where yw <> to_char(CURRENT_TIMESTAMP, 'IYYY-IW')
    ) x
    where ranking = 1
    group by id
) s
where u.id = s.id;

Any ideas please how to workaround?

Thank you
Alex


Re: Optimizing select count query which often takes over 10 seconds

From
Kevin Grittner
Date:
Alexander Farber <alexander.farber@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> Alexander Farber <alexander.farber@gmail.com> wrote:
>>
>>> update pref_users set medals = 0;
>>> UPDATE 223456
>>
>> You're probably going to like your performance a lot better if
>> you modify that to:
>>
>> update pref_users set medals = 0 where medals <> 0;
>
> is it really so?

Yes.

> I only have 65 users (out of 223456) with medals != 0.

That's precisely the point.  You don't want to update all 223456
rows when there are only 65 which need to be changed.

> When programming other languages, I never do
> if (x != 0) { x = 0; } but just set x = 0 straight away.

Well, if updating a row was as cheap as assigning zero to x I
wouldn't suggest a change to your code.  If assigning something to
x involved an expensive function or disk access, you might try to
put an "if" around it.

If you don't want to burden your query with the condition, you
could consider attaching a trigger to every table that you might
want to assign existing values to rows.  See the
suppress_redundant_updates_trigger() function for details:

http://www.postgresql.org/docs/current/interactive/functions-trigger.html

-Kevin



Re: Optimizing select count query which often takes over 10 seconds

From
Kevin Grittner
Date:
Alexander Farber <alexander.farber@gmail.com> wrote:

> The cronjob gives me now occasionally:
>
> /* reset and then update medals count */
> update pref_users set medals = 0;
> psql:/home/afarber/bin/clean-database.sql:63: ERROR:  deadlock detected
> DETAIL:  Process 31072 waits for ShareLock on transaction 124735679; blocked by process 30368.
> Process 30368 waits for ShareLock on transaction 124735675; blocked by process 31072.
> HINT:  See server log for query details.

> Any ideas please how to workaround?

Yeah, try this:

update pref_users set medals = 0 where medals <> 0;

:-)

That should significantly reduce the frequency of deadlocks;
however, IMO any application using a relational database should be
prepared to retry database transactions which fail with a
serialization error, and a deadlock is one form of that.  The
standard SQLSTATE to look for is '40001' and in PostgreSQL you
should also check for '40P01'.

-Kevin



Re: Optimizing select count query which often takes over 10 seconds

From
Jeff Janes
Date:
On Tue, Jan 29, 2013 at 11:41 PM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>
> When programming other languages, I never do
> if (x != 0) { x = 0; } but just set x = 0 straight away.

Most other languages are not transactional and durable.  Databases are
different.

Cheers,

Jeff


Re: Optimizing select count query which often takes over 10 seconds

From
Alexander Farber
Date:
On Wed, Jan 30, 2013 at 2:06 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> update pref_users set medals = 0 where medals <> 0;

Thank you all for your insightful comments

This has cured my cronjob