Re: Query involving views - Mailing list pgsql-performance

From Laurent Martelli
Subject Re: Query involving views
Date
Msg-id 87llj0tmqg.fsf@stan.aopsys.com
Whole thread Raw
In response to Re: Query involving views  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

  Tom> Laurent Martelli <laurent@aopsys.com> writes:
  >> Now, if I use the following view to abstract access rights:

  >> CREATE VIEW userpictures (
  >> PictureID,RollID,FrameID,Description,Filename,
  >> Owner,EntryDate,Date,
  >> NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates, UserID) AS
  >> SELECT DISTINCT ON (Permissions.PictureID,UserID)
  >> Pictures.PictureID,RollID,FrameID,Description,Filename,Owner,
  >> EntryDate,Date,NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates,
  >> UserID FROM Permissions JOIN Groupsdef using (GroupID) JOIN
  >> pictures using (PictureID);

  >> [ performance sucks ]

  Tom> Find a way to get rid of the DISTINCT ON.  That's essentially
  Tom> an optimization fence.  Worse, the way you are using it here,
  Tom> it doesn't even give well-defined results, since there's no
  Tom> ORDER BY constraining which row will be selected out of a set
  Tom> of duplicates.  (I think it may not matter to you, since you
  Tom> don't really care which groupsdef row is selected,

That's true. I do not use columns from groupsdef in the end.

  Tom> but in general a view constructed like this is broken.)

  Tom> It might work to do the view as

  Tom> SELECT ... all that stuff ...  FROM pictures p, users u WHERE
  Tom> EXISTS (SELECT 1 FROM permissions prm, groupsdef g WHERE
  Tom> p.pictureid = prm.pictureid AND prm.groupid = g.groupid AND
  Tom> g.userid = u.userid);

  Tom> I'm not sure offhand about the performance properties of this
  Tom> either, but it would be worth trying.

This one does not yield very good performance. In fact, the best
performances I have is when I use a where clause like this one:

    WHERE PictureID IN
       (SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
              WHERE groupsdef.UserID=2)

But it's not as elegant to write as the initial view using "distinct
on". I could create a view like this:

    CREATE VIEW userpictures (PictureID,UserID)
        AS SELECT pictureid,userid
            FROM permissions JOIN groupsdef USING(GroupID)

and then do queries like this:

    SELECT * FROM pictures
        WHERE PictureID IN (SELECT PictureID FROM userpictures WHERE UserID=2)

but it's stillnot as elegant as

    SELECT * FROM userpictures WHERE UserID=2

I think I'll try a function:

CREATE FUNCTION picturesID(int) RETURNS SETOF int AS '
    SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
    WHERE groupsdef.UserID=$1
' LANGUAGE sql;

SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2));

Here's something funny: using a function seems gives slihtly better results
than inlining the query (I did a dozen of runs and the timings were consistent):

SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2));
QUERY PLAN
 Hash Join  (cost=15.50..100.49 rows=200 width=97) (actual time=28.609..46.568 rows=2906 loops=1)
   Hash Cond: ("outer".pictureid = "inner".picturesid)
   ->  Seq Scan on pictures  (cost=0.00..68.33 rows=2933 width=97) (actual time=0.018..2.610 rows=2933 loops=1)
   ->  Hash  (cost=15.00..15.00 rows=200 width=4) (actual time=28.467..28.467 rows=0 loops=1)
         ->  HashAggregate  (cost=15.00..15.00 rows=200 width=4) (actual time=23.698..26.201 rows=2906 loops=1)
               ->  Function Scan on picturesid  (cost=0.00..12.50 rows=1000 width=4) (actual time=16.202..19.952
rows=5076loops=1) 
 Total runtime: 48.601 ms

SELECT * FROM pictures WHERE PictureID IN (
    SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
        WHERE groupsdef.UserID=2);
QUERY PLAN

 Hash Join  (cost=394.93..504.24 rows=2632 width=97) (actual time=35.770..53.574 rows=2906 loops=1)
   Hash Cond: ("outer".pictureid = "inner".pictureid)
   ->  Seq Scan on pictures  (cost=0.00..68.33 rows=2933 width=97) (actual time=0.014..2.543 rows=2933 loops=1)
   ->  Hash  (cost=388.35..388.35 rows=2632 width=4) (actual time=35.626..35.626 rows=0 loops=1)
         ->  HashAggregate  (cost=388.35..388.35 rows=2632 width=4) (actual time=30.988..33.502 rows=2906 loops=1)
               ->  Merge Join  (cost=5.40..376.72 rows=4652 width=4) (actual time=0.247..26.628 rows=5076 loops=1)
                     Merge Cond: ("outer".groupid = "inner".groupid)
                     ->  Index Scan using permissions_groupid_key on permissions  (cost=0.00..280.77 rows=8305 width=8)
(actualtime=0.031..11.629 rows=7633 loops=1) 
                     ->  Sort  (cost=5.40..5.43 rows=12 width=4) (actual time=0.207..1.720 rows=5078 loops=1)
                           Sort Key: groupsdef.groupid
                           ->  Seq Scan on groupsdef  (cost=0.00..5.19 rows=12 width=4) (actual time=0.030..0.182
rows=11loops=1) 
                                 Filter: (userid = 2)
 Total runtime: 54.748 ms

  Tom> A cruder answer is just to accept that the view may give you
  Tom> multiple hits, and put the DISTINCT in the top-level query.

I thought of that. But it has the drawback that if you use an ORDER
BY, you must have the same columns in the DISTINCT.

  Tom> I think though that in the long run you're going to need to
  Tom> rethink this representation of permissions.  It's nice and
  Tom> simple but it's not going to scale well.  Even your "fast"
  Tom> query is going to look like a dog once you get to many
  Tom> thousands of permission entries.

  Tom> It might work to maintain a derived table (basically a
  Tom> materialized view) of the form (userid, groupid, pictureid)
  Tom> signifying that a user can access a picture through membership
  Tom> in a group.  Put a nonunique index on (userid, pictureid) on
  Tom> it.  This could then drive the EXISTS test efficiently.

I'll probably do that if perf goes down when the database grows
bigger.

Thanks for all the advice.

Best regards,
Laurent

--
Laurent Martelli
laurent@aopsys.com                                Java Aspect Components
http://www.aopsys.com/                          http://jac.objectweb.org


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query involving views
Next
From: Rod Taylor
Date:
Subject: Re: postgres performance: comparing 2 data centers