Re: SELECT LIMIT 1 VIEW Performance Issue - Mailing list pgsql-performance
From | K C Lau |
---|---|
Subject | Re: SELECT LIMIT 1 VIEW Performance Issue |
Date | |
Msg-id | 6.2.1.2.0.20050924105728.05388920@localhost Whole thread Raw |
In response to | Re: SELECT LIMIT 1 VIEW Performance Issue (Mark Kirkwood <markir@paradise.net.nz>) |
Responses |
Re: SELECT LIMIT 1 VIEW Performance Issue
|
List | pgsql-performance |
Dear Mark, Thank you. That seems like a more manageable alternative if nothing else works out. It should cover many of the OLTP update transactions. But it does mean quite a bit of programming changes and adding another index on all such tables, and it would not cover those cases when we need to get the latest record before a certain time, for example. I'm wondering if this performance issue is common enough for other users to merit a fix in pg, especially as it seems that with MVCC, each of the data records need to be accessed in addition to scanning the index. Best regards, KC. At 09:40 05/09/24, Mark Kirkwood wrote: >A small denormalization, where you mark the row with the latest atdate for >each playerid may get you the performance you want. > >e.g: (8.1beta1) > >ALTER TABLE player ADD islastatdate boolean; > >UPDATE player SET islastatdate = true where (playerid,atdate) IN >(SELECT playerid, atdate FROM vcurplayer); > >CREATE OR REPLACE VIEW vcurplayer AS >SELECT * FROM player a >WHERE islastatdate; > >CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate) >WHERE islastatdate; > >ANALYZE player; > >Generating some test data produced: > >EXPLAIN ANALYZE >SELECT playerid,atdate >FROM vcurplayer >WHERE playerid='22220'; > > QUERY PLAN >-------------------------------------------------------------------------------------------------------------------------------- > Index Scan using player_id_lastatdate on player a (cost=0.00..4.33 > rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1) > Index Cond: ((playerid = '22220'::text) AND (lastatdate = true)) > Filter: lastatdate > Total runtime: 0.272 ms >(4 rows) > >Whereas with the original view definition: > >CREATE OR REPLACE VIEW vcurplayer AS >SELECT * FROM player a >WHERE a.atdate = >( SELECT max(b.atdate) FROM player b > WHERE a.playerid = b.playerid); > >EXPLAIN ANALYZE >SELECT playerid,atdate >FROM vcurplayer >WHERE playerid='22220'; > QUERY PLAN >----------------------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using player_id_date on player a (cost=0.00..7399.23 rows=11 > width=13) (actual time=121.738..121.745 rows=1 loops=1) > Index Cond: (playerid = '22220'::text) > Filter: (atdate = (subplan)) > SubPlan > -> Result (cost=1.72..1.73 rows=1 width=0) (actual > time=0.044..0.047 rows=1 loops=2000) > InitPlan > -> Limit (cost=0.00..1.72 rows=1 width=4) (actual > time=0.028..0.031 rows=1 loops=2000) > -> Index Scan Backward using player_id_date on player > b (cost=0.00..3787.94 rows=2198 width=4) (actual time=0.019..0.019 > rows=1 loops=2000) > Index Cond: ($0 = playerid) > Filter: (atdate IS NOT NULL) > Total runtime: 121.916 ms >(11 rows) > >Note that my generated data has too many rows for each playerid, but >the difference in performance should illustrate the idea. > >Cheers > >Mark
pgsql-performance by date: