Re: Query involving views - Mailing list pgsql-performance

From Tom Lane
Subject Re: Query involving views
Date
Msg-id 4646.1086538945@sss.pgh.pa.us
Whole thread Raw
In response to Query involving views  (Laurent Martelli <laurent@aopsys.com>)
Responses Re: Query involving views  (Laurent Martelli <laurent@aopsys.com>)
List pgsql-performance
Laurent Martelli <laurent@aopsys.com> writes:
> Now, if I use the following view to abstract access rights:

> CREATE VIEW userpictures (
>        PictureID,RollID,FrameID,Description,Filename,
>        Owner,EntryDate,Date,
>        NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates,
>        UserID)
>    AS SELECT DISTINCT ON (Permissions.PictureID,UserID)
>          Pictures.PictureID,RollID,FrameID,Description,Filename,Owner,
>          EntryDate,Date,NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates,
>          UserID
>      FROM Permissions
>           JOIN Groupsdef using (GroupID)
>           JOIN pictures using (PictureID);

> [ performance sucks ]

Find a way to get rid of the DISTINCT ON.  That's essentially an
optimization fence.  Worse, the way you are using it here, it doesn't
even give well-defined results, since there's no ORDER BY constraining
which row will be selected out of a set of duplicates.  (I think it may
not matter to you, since you don't really care which groupsdef row is
selected, but in general a view constructed like this is broken.)

It might work to do the view as

SELECT ... all that stuff ...
FROM pictures p, users u
WHERE
  EXISTS (SELECT 1 FROM permissions prm, groupsdef g
          WHERE p.pictureid = prm.pictureid AND prm.groupid = g.groupid
                AND g.userid = u.userid);

I'm not sure offhand about the performance properties of this either,
but it would be worth trying.

A cruder answer is just to accept that the view may give you multiple
hits, and put the DISTINCT in the top-level query.

I think though that in the long run you're going to need to rethink this
representation of permissions.  It's nice and simple but it's not going
to scale well.  Even your "fast" query is going to look like a dog once
you get to many thousands of permission entries.

It might work to maintain a derived table (basically a materialized
view) of the form (userid, groupid, pictureid) signifying that a user
can access a picture through membership in a group.  Put a nonunique
index on (userid, pictureid) on it.  This could then drive the EXISTS
test efficiently.

            regards, tom lane

pgsql-performance by date:

Previous
From: Laurent Martelli
Date:
Subject: Query involving views
Next
From: Laurent Martelli
Date:
Subject: Re: Query involving views