Re: Not sure which part of the query needs optimization - Mailing list pgsql-general

From Andrew Gierth
Subject Re: Not sure which part of the query needs optimization
Date
Msg-id 87ef9oqnr2.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Not sure which part of the query needs optimization  (Alexander Farber <alexander.farber@gmail.com>)
Responses Re: Not sure which part of the query needs optimization  (Alexander Farber <alexander.farber@gmail.com>)
List pgsql-general
>>>>> "Alexander" == Alexander Farber <alexander.farber@gmail.com> writes:

 Alexander> Good afternoon,

 Alexander> for each visitor of my website I generate a JSON list of 30
 Alexander> top players ( https://slova.de/words/top.php ), who played
 Alexander> in the past week, with their average scores and average time
 Alexander> between moves.

 -> Parallel Seq Scan on words_moves m_1
     (cost=0.00..73600.05 rows=8666 width=16)
     (actual time=0.761..130.844 rows=11074 loops=3)
      Filter: (played > (CURRENT_TIMESTAMP - '7 days'::interval))
      Rows Removed by Filter: 492241

This is telling you that an index on words_moves(played) would likely
help.

But the real hot point in the query is here (reformatted for clarity):

 -> Aggregate  (cost=6097.83..6097.84 rows=1 width=32)
               (actual time=19.401..19.401 rows=1 loops=30)
    -> Bitmap Heap Scan on words_moves
        (cost=33.97..6093.45 rows=1748 width=4)
        (actual time=1.680..18.153 rows=15011 loops=30)
        Recheck Cond: (uid = u.uid)
        Heap Blocks: exact=216312
       -> Bitmap Index Scan on words_moves_uid_idx
           (cost=0.00..33.54 rows=1748 width=0)
           (actual time=0.979..0.979 rows=15011 loops=30)
           Index Cond: (uid = u.uid)

(the explain.depesz.com view points this out with an orange highlight)

This corresponds to this subquery:

  (SELECT ROUND(AVG(score), 1)
     FROM words_moves
    WHERE uid = u.uid) AS avg_score,

The basic issue here is that you're calculating an average over around
15k rows per user, for each user in the query result (so 30 times,
making 450k rows). You can see from the "Heap Blocks" stat that this is
having to scan a lot of data; it's taking on average 19.4ms per user,
but multiply that by 30 users and you get ~580ms total, or about 70% of
the total execution time.

The obvious thing to do is to keep a computed average score for each
user - either in a separate table which you update based on changes to
words_moves, which you could do with a trigger, or using a materialized
view which you refresh at suitable intervals (this has the drawback that
the data will not be immediately up-to-date).

Combining these two changes should get you to under 100ms, maybe.

-- 
Andrew (irc:RhodiumToad)


pgsql-general by date:

Previous
From: Mitar
Date:
Subject: Re: Optimizing the same PREPAREd static query (without parameters)
Next
From: Thomas Carroll
Date:
Subject: Re: Memory exhaustion due to temporary tables?