Thread: CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play'

CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play'

From
Alexander Farber
Date:
Good evening,

in PostgreSQL 11 I have a table holding player moves (could be: 'play', 'swap', 'skip', ...) in a word game:

# \d words_moves;
                                      Table "public.words_moves"
 Column  |           Type           | Collation | Nullable |                 Default                  
---------+--------------------------+-----------+----------+------------------------------------------
 mid     | bigint                   |           | not null | nextval('words_moves_mid_seq'::regclass)
 action  | text                     |           | not null |
 gid     | integer                  |           | not null |
 uid     | integer                  |           | not null |
 played  | timestamp with time zone |           | not null |
 tiles   | jsonb                    |           |          |
 score   | integer                  |           |          |
 letters | text                     |           |          |
 hand    | text                     |           |          |
 puzzle  | boolean                  |           | not null | false
Indexes:
    "words_moves_pkey" PRIMARY KEY, btree (mid)
    "words_moves_gid_played_idx" btree (gid, played DESC)
    "words_moves_uid_action_played_idx" btree (uid, action, played)
    "words_moves_uid_idx" btree (uid)
Check constraints:
    "words_moves_score_check" CHECK (score >= 0)
Foreign-key constraints:
    "words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    "words_moves_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE
Referenced by:
    TABLE "words_scores" CONSTRAINT "words_scores_mid_fkey" FOREIGN KEY (mid) REFERENCES words_moves(mid) ON DELETE CASCADE

When I search for "interesting moves" with score higher than 90 or all 7 tiles played, then the query takes a bit longer:

EXPLAIN ANALYZE 
SELECT 
TO_CHAR(played, 'Mon YYYY'), 
COUNT(NULLIF(puzzle, FALSE)) OVER (PARTITION BY TO_CHAR(played, 'Mon YYYY')),
puzzle, 
mid, 
 MD5(mid || 'cookie'), 
 gid, 
score
FROM words_moves 
 WHERE action = 'play'  
AND LENGTH(hand) = 7 
AND (LENGTH(letters) = 7 OR score > 90) 
AND played > CURRENT_TIMESTAMP - interval '1 year' 
AND played < CURRENT_TIMESTAMP - interval '3 day'  
 ORDER BY played DESC;
                                                                                                                   QUERY PLAN                                                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=168533.19..168533.30 rows=44 width=97) (actual time=2126.433..2126.541 rows=1036 loops=1)
   Sort Key: played DESC
   Sort Method: quicksort  Memory: 194kB
   ->  WindowAgg  (cost=168530.56..168531.99 rows=44 width=97) (actual time=2122.991..2125.593 rows=1036 loops=1)
         ->  Sort  (cost=168530.56..168530.67 rows=44 width=57) (actual time=2122.934..2123.049 rows=1036 loops=1)
               Sort Key: (to_char(played, 'Mon YYYY'::text))
               Sort Method: quicksort  Memory: 129kB
               ->  Gather  (cost=1000.00..168529.36 rows=44 width=57) (actual time=287.461..2121.445 rows=1036 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Parallel Seq Scan on words_moves  (cost=0.00..167524.96 rows=18 width=57) (actual time=207.234..2115.686 rows=345 loops=3)
                           Filter: ((action = 'play'::text) AND (length(hand) = 7) AND ((length(letters) = 7) OR (score > 90)) AND (played > (CURRENT_TIMESTAMP - '1 year'::interval)) AND (played < (CURRENT_TIMESTAMP - '3 days'::interval)))
                           Rows Removed by Filter: 1088073
 Planning Time: 0.383 ms
 Execution Time: 2126.922 ms
(15 rows)


So I add an index with -

CREATE INDEX ON words_moves(length(letters), score, action); 

That is because I have several such queries looking for interesting 'play' moves with high score or all 7 tiles played.

This helps a bit:

                                                                                  QUERY PLAN                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=126487.34..126487.45 rows=44 width=97) (actual time=219.649..219.744 rows=1036 loops=1)
   Sort Key: played DESC
   Sort Method: quicksort  Memory: 194kB
   ->  WindowAgg  (cost=126484.71..126486.14 rows=44 width=97) (actual time=216.180..218.680 rows=1036 loops=1)
         ->  Sort  (cost=126484.71..126484.82 rows=44 width=57) (actual time=216.125..216.226 rows=1036 loops=1)
               Sort Key: (to_char(played, 'Mon YYYY'::text))
               Sort Method: quicksort  Memory: 129kB
               ->  Bitmap Heap Scan on words_moves  (cost=83892.69..126483.50 rows=44 width=57) (actual time=209.447..214.879 rows=1036 loops=1)
                     Recheck Cond: (((length(letters) = 7) AND (action = 'play'::text)) OR ((score > 90) AND (action = 'play'::text)))
                     Filter: ((length(hand) = 7) AND (played > (CURRENT_TIMESTAMP - '1 year'::interval)) AND (played < (CURRENT_TIMESTAMP - '3 days'::interval)))
                     Rows Removed by Filter: 468
                     Heap Blocks: exact=1495
                     ->  BitmapOr  (cost=83892.69..83892.69 rows=15280 width=0) (actual time=209.083..209.083 rows=0 loops=1)
                           ->  Bitmap Index Scan on words_moves_length_score_action_idx  (cost=0.00..419.69 rows=14838 width=0) (actual time=2.040..2.040 rows=912 loops=1)
                                 Index Cond: ((length(letters) = 7) AND (action = 'play'::text))
                           ->  Bitmap Index Scan on words_moves_length_score_action_idx  (cost=0.00..83472.98 rows=442 width=0) (actual time=207.038..207.038 rows=608 loops=1)
                                 Index Cond: ((score > 90) AND (action = 'play'::text))
 Planning Time: 1.102 ms
 Execution Time: 220.676 ms
(19 rows)

Here the resulting link: https://explain.depesz.com/s/Pwbt

Then I drop that index and create another one:

CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play';

And it seems to have a better performance on the query:

                                                                              QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=97687.61..97687.72 rows=44 width=97) (actual time=175.832..175.927 rows=1036 loops=1)
   Sort Key: played DESC
   Sort Method: quicksort  Memory: 194kB
   ->  WindowAgg  (cost=97684.98..97686.41 rows=44 width=97) (actual time=172.443..174.937 rows=1036 loops=1)
         ->  Sort  (cost=97684.98..97685.09 rows=44 width=57) (actual time=172.390..172.490 rows=1036 loops=1)
               Sort Key: (to_char(played, 'Mon YYYY'::text))
               Sort Method: quicksort  Memory: 129kB
               ->  Bitmap Heap Scan on words_moves  (cost=55092.96..97683.78 rows=44 width=57) (actual time=165.420..171.164 rows=1036 loops=1)
                     Recheck Cond: (((length(letters) = 7) AND (action = 'play'::text)) OR ((score > 90) AND (action = 'play'::text)))
                     Filter: ((length(hand) = 7) AND (played > (CURRENT_TIMESTAMP - '1 year'::interval)) AND (played < (CURRENT_TIMESTAMP - '3 days'::interval)))
                     Rows Removed by Filter: 468
                     Heap Blocks: exact=1495
                     ->  BitmapOr  (cost=55092.96..55092.96 rows=15280 width=0) (actual time=165.036..165.036 rows=0 loops=1)
                           ->  Bitmap Index Scan on words_moves_length_score_idx  (cost=0.00..275.71 rows=14838 width=0) (actual time=0.620..0.620 rows=912 loops=1)
                                 Index Cond: (length(letters) = 7)
                           ->  Bitmap Index Scan on words_moves_length_score_idx  (cost=0.00..54817.23 rows=442 width=0) (actual time=164.413..164.413 rows=608 loops=1)
                                 Index Cond: (score > 90)
 Planning Time: 0.948 ms
 Execution Time: 177.604 ms
(19 rows)

Here the resulting link: https://explain.depesz.com/s/pmCw

I still wonder, what am I missing, what could be improved there?

Does anybody please have a hint?

Thanks
Alex

Re: CREATE INDEX ON words_moves(length(letters), score) WHERE action= 'play'

From
"Peter J. Holzer"
Date:
On 2019-12-07 16:20:44 +0100, Alexander Farber wrote:
> in PostgreSQL 11 I have a table holding player moves (could be: 'play', 'swap',
> 'skip', ...) in a word game:
>
> # \d words_moves;
>                                       Table "public.words_moves"
>  Column  |           Type           | Collation | Nullable |                
> Default                  
> ---------+--------------------------+-----------+----------+------------------------------------------
>  mid     | bigint                   |           | not null | nextval
> ('words_moves_mid_seq'::regclass)
>  action  | text                     |           | not null |
>  gid     | integer                  |           | not null |
>  uid     | integer                  |           | not null |
>  played  | timestamp with time zone |           | not null |
>  tiles   | jsonb                    |           |          |
>  score   | integer                  |           |          |
>  letters | text                     |           |          |
>  hand    | text                     |           |          |
>  puzzle  | boolean                  |           | not null | false
[...]
> When I search for "interesting moves" with score higher than 90 or all 7 tiles
> played, then the query takes a bit longer:
>
> EXPLAIN ANALYZE 
> SELECT 
[...]
> FROM words_moves 
>  WHERE action = 'play'  
> AND LENGTH(hand) = 7 
> AND (LENGTH(letters) = 7 OR score > 90) 
> AND played > CURRENT_TIMESTAMP - interval '1 year' 
> AND played < CURRENT_TIMESTAMP - interval '3 day'  
>  ORDER BY played DESC;
>                                                                                
[...]
> Then I drop that index and create another one:
>
> CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play';
>
> And it seems to have a better performance on the query:
>
>                                                                              
> QUERY PLAN                                                                    
>            
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=97687.61..97687.72 rows=44 width=97) (actual time=175.832..175.927
> rows=1036 loops=1)
>    Sort Key: played DESC
>    Sort Method: quicksort  Memory: 194kB
>    ->  WindowAgg  (cost=97684.98..97686.41 rows=44 width=97) (actual time= > 172.443..174.937 rows=1036 loops=1)
>          ->  Sort  (cost=97684.98..97685.09 rows=44 width=57) (actual time= > 172.390..172.490 rows=1036 loops=1)
>                Sort Key: (to_char(played, 'Mon YYYY'::text))
>                Sort Method: quicksort  Memory: 129kB
>                ->  Bitmap Heap Scan on words_moves  (cost=55092.96..97683.78 > rows=44 width=57) (actual
time=165.420..171.164rows=1036 loops=1) 
>                      Recheck Cond: (((length(letters) = 7) AND (action = > 'play'::text)) OR ((score > 90) AND
(action= 'play'::text))) 
>                      Filter: ((length(hand) = 7) AND (played > > (CURRENT_TIMESTAMP - '1 year'::interval)) AND
(played< (CURRENT_TIMESTAMP - '3 > days'::interval))) 
>                      Rows Removed by Filter: 468
>                      Heap Blocks: exact=1495
>                      ->  BitmapOr  (cost=55092.96..55092.96 rows=15280 width=0) > (actual time=165.036..165.036
rows=0loops=1) 
>                            ->  Bitmap Index Scan on > words_moves_length_score_idx  (cost=0.00..275.71 rows=14838
width=0)(actual > time=0.620..0.620 rows=912 loops=1) 
>                                  Index Cond: (length(letters) = 7)
>                            ->  Bitmap Index Scan on > words_moves_length_score_idx  (cost=0.00..54817.23 rows=442
width=0)(actual > time=164.413..164.413 rows=608 loops=1) 
>                                  Index Cond: (score > 90)
>  Planning Time: 0.948 ms
>  Execution Time: 177.604 ms
> (19 rows)
>
> Here the resulting link: https://explain.depesz.com/s/pmCw
>
> I still wonder, what am I missing, what could be improved there?

Several ideas:

 1. You are only interested in moves from the last year or so. How much
    history do have stored in your table? If it's much more than one
    year, then an index on played will help. However, it looks like that
    filter removes at most 468 of ~ 1500 rows, so that doesn't seem to
    be case (yet).

 2. You spend now most of the time scanning words_moves_length_score_idx
    for scores > 90. This is basically a full index scan since that
    index is ordered by length(letters). I'm surprised that PostgreSQL
    can even do that :-). This is a separate index scan than the one for
    length(letters) = 7, so separating the indexes should be no worse
    and probably a lot better. Create an index on score (possibly
    conditional on action = 'play').

 3. A lot of the conditions is fixed. So you might want to move them into
    the condition of a partial index:

        create index on words_moves(played)
            where action = 'play' and LENGTH(hand) = 7 and (LENGTH(letters) = 7 OR score > 90);

    Then the planner should recognize that it can use this index and the
    index contains only interesting moves - no logical operations needed
    at query time at all, only an index scan to find recent moves.

    Be warned though that creating such ultra-specific indexes comes at
    a cost: You may end up with a lot of them if you have many different
    queries and maintaining them may make inserts noticably slower.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment