Re: Query involving views - Mailing list pgsql-performance
From | Laurent Martelli |
---|---|
Subject | Re: Query involving views |
Date | |
Msg-id | 87llj0tmqg.fsf@stan.aopsys.com Whole thread Raw |
In response to | Re: Query involving views (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Laurent Martelli <laurent@aopsys.com> writes: >> Now, if I use the following view to abstract access rights: >> CREATE VIEW userpictures ( >> PictureID,RollID,FrameID,Description,Filename, >> Owner,EntryDate,Date, >> NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates, UserID) AS >> SELECT DISTINCT ON (Permissions.PictureID,UserID) >> Pictures.PictureID,RollID,FrameID,Description,Filename,Owner, >> EntryDate,Date,NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates, >> UserID FROM Permissions JOIN Groupsdef using (GroupID) JOIN >> pictures using (PictureID); >> [ performance sucks ] Tom> Find a way to get rid of the DISTINCT ON. That's essentially Tom> an optimization fence. Worse, the way you are using it here, Tom> it doesn't even give well-defined results, since there's no Tom> ORDER BY constraining which row will be selected out of a set Tom> of duplicates. (I think it may not matter to you, since you Tom> don't really care which groupsdef row is selected, That's true. I do not use columns from groupsdef in the end. Tom> but in general a view constructed like this is broken.) Tom> It might work to do the view as Tom> SELECT ... all that stuff ... FROM pictures p, users u WHERE Tom> EXISTS (SELECT 1 FROM permissions prm, groupsdef g WHERE Tom> p.pictureid = prm.pictureid AND prm.groupid = g.groupid AND Tom> g.userid = u.userid); Tom> I'm not sure offhand about the performance properties of this Tom> either, but it would be worth trying. This one does not yield very good performance. In fact, the best performances I have is when I use a where clause like this one: WHERE PictureID IN (SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID) WHERE groupsdef.UserID=2) But it's not as elegant to write as the initial view using "distinct on". I could create a view like this: CREATE VIEW userpictures (PictureID,UserID) AS SELECT pictureid,userid FROM permissions JOIN groupsdef USING(GroupID) and then do queries like this: SELECT * FROM pictures WHERE PictureID IN (SELECT PictureID FROM userpictures WHERE UserID=2) but it's stillnot as elegant as SELECT * FROM userpictures WHERE UserID=2 I think I'll try a function: CREATE FUNCTION picturesID(int) RETURNS SETOF int AS ' SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID) WHERE groupsdef.UserID=$1 ' LANGUAGE sql; SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2)); Here's something funny: using a function seems gives slihtly better results than inlining the query (I did a dozen of runs and the timings were consistent): SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2)); QUERY PLAN Hash Join (cost=15.50..100.49 rows=200 width=97) (actual time=28.609..46.568 rows=2906 loops=1) Hash Cond: ("outer".pictureid = "inner".picturesid) -> Seq Scan on pictures (cost=0.00..68.33 rows=2933 width=97) (actual time=0.018..2.610 rows=2933 loops=1) -> Hash (cost=15.00..15.00 rows=200 width=4) (actual time=28.467..28.467 rows=0 loops=1) -> HashAggregate (cost=15.00..15.00 rows=200 width=4) (actual time=23.698..26.201 rows=2906 loops=1) -> Function Scan on picturesid (cost=0.00..12.50 rows=1000 width=4) (actual time=16.202..19.952 rows=5076loops=1) Total runtime: 48.601 ms SELECT * FROM pictures WHERE PictureID IN ( SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID) WHERE groupsdef.UserID=2); QUERY PLAN Hash Join (cost=394.93..504.24 rows=2632 width=97) (actual time=35.770..53.574 rows=2906 loops=1) Hash Cond: ("outer".pictureid = "inner".pictureid) -> Seq Scan on pictures (cost=0.00..68.33 rows=2933 width=97) (actual time=0.014..2.543 rows=2933 loops=1) -> Hash (cost=388.35..388.35 rows=2632 width=4) (actual time=35.626..35.626 rows=0 loops=1) -> HashAggregate (cost=388.35..388.35 rows=2632 width=4) (actual time=30.988..33.502 rows=2906 loops=1) -> Merge Join (cost=5.40..376.72 rows=4652 width=4) (actual time=0.247..26.628 rows=5076 loops=1) Merge Cond: ("outer".groupid = "inner".groupid) -> Index Scan using permissions_groupid_key on permissions (cost=0.00..280.77 rows=8305 width=8) (actualtime=0.031..11.629 rows=7633 loops=1) -> Sort (cost=5.40..5.43 rows=12 width=4) (actual time=0.207..1.720 rows=5078 loops=1) Sort Key: groupsdef.groupid -> Seq Scan on groupsdef (cost=0.00..5.19 rows=12 width=4) (actual time=0.030..0.182 rows=11loops=1) Filter: (userid = 2) Total runtime: 54.748 ms Tom> A cruder answer is just to accept that the view may give you Tom> multiple hits, and put the DISTINCT in the top-level query. I thought of that. But it has the drawback that if you use an ORDER BY, you must have the same columns in the DISTINCT. Tom> I think though that in the long run you're going to need to Tom> rethink this representation of permissions. It's nice and Tom> simple but it's not going to scale well. Even your "fast" Tom> query is going to look like a dog once you get to many Tom> thousands of permission entries. Tom> It might work to maintain a derived table (basically a Tom> materialized view) of the form (userid, groupid, pictureid) Tom> signifying that a user can access a picture through membership Tom> in a group. Put a nonunique index on (userid, pictureid) on Tom> it. This could then drive the EXISTS test efficiently. I'll probably do that if perf goes down when the database grows bigger. Thanks for all the advice. Best regards, Laurent -- Laurent Martelli laurent@aopsys.com Java Aspect Components http://www.aopsys.com/ http://jac.objectweb.org
pgsql-performance by date: