Re: SELECT LIMIT 1 VIEW Performance Issue - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: SELECT LIMIT 1 VIEW Performance Issue
Date
Msg-id 4334AE89.8060806@paradise.net.nz
Whole thread Raw
In response to Re: SELECT LIMIT 1 VIEW Performance Issue  (K C Lau <kclau60@netvigator.com>)
Responses Re: SELECT LIMIT 1 VIEW Performance Issue  (K C Lau <kclau60@netvigator.com>)
List pgsql-performance
K C Lau wrote:
> Thank you all for your suggestions. I' tried, with some variations too,
> but still no success. The times given are the best of a few repeated
> tries on an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.
>

A small denormalization, where you mark the row with the latest atdate
for each playerid may get you the performance you want.

e.g: (8.1beta1)

ALTER TABLE player ADD islastatdate boolean;

UPDATE player SET islastatdate = true where (playerid,atdate) IN
(SELECT playerid, atdate FROM vcurplayer);

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE islastatdate;

CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate)
WHERE islastatdate;

ANALYZE player;

Generating some test data produced:

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='22220';

                                                            QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------
  Index Scan using player_id_lastatdate on player a  (cost=0.00..4.33
rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1)
    Index Cond: ((playerid = '22220'::text) AND (lastatdate = true))
    Filter: lastatdate
  Total runtime: 0.272 ms
(4 rows)

Whereas with the original view definition:

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE a.atdate =
( SELECT max(b.atdate) FROM player b
   WHERE a.playerid = b.playerid);

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='22220';

     QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using player_id_date on player a  (cost=0.00..7399.23
rows=11 width=13) (actual time=121.738..121.745 rows=1 loops=1)
    Index Cond: (playerid = '22220'::text)
    Filter: (atdate = (subplan))
    SubPlan
      ->  Result  (cost=1.72..1.73 rows=1 width=0) (actual
time=0.044..0.047 rows=1 loops=2000)
            InitPlan
              ->  Limit  (cost=0.00..1.72 rows=1 width=4) (actual
time=0.028..0.031 rows=1 loops=2000)
                    ->  Index Scan Backward using player_id_date on
player b  (cost=0.00..3787.94 rows=2198 width=4) (actual
time=0.019..0.019 rows=1 loops=2000)
                          Index Cond: ($0 = playerid)
                          Filter: (atdate IS NOT NULL)
  Total runtime: 121.916 ms
(11 rows)

Note that my generated data has too many rows for each playerid, but the
  difference in performance should illustrate the idea.

Cheers

Mark

pgsql-performance by date:

Previous
From: "Cristian Prieto"
Date:
Subject: Index use in BETWEEN statement...
Next
From: Tom Lane
Date:
Subject: Re: optimization downgrade perfomance?