Re: Optimize query for listing un-read messages - Mailing list pgsql-performance

From Andreas Joseph Krogh
Subject Re: Optimize query for listing un-read messages
Date
Msg-id OfficeNetEmail.2f.30dd17c640e2229.145b9c57087@prod2
Whole thread Raw
In response to Re: Optimize query for listing un-read messages  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: Optimize query for listing un-read messages
List pgsql-performance
På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra <tv@fuzzy.cz>:
On 1.5.2014 23:19, Andreas Joseph Krogh wrote:
> På torsdag 01. mai 2014 kl. 23:02:13, skrev Pavel Stehule
> <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:
>
>     
>     
>     2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh <andreas@visena.com
>     <mailto:andreas@visena.com>>:
>
>         På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule
>         <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:
>
>             
>             
>             2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh
>             <andreas@visena.com <mailto:andreas@visena.com>>:
>
>                 På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel
>                 Stehule <pavel.stehule@gmail.com
>                 <mailto:pavel.stehule@gmail.com>>:
>
>                     Hello
>                     [snip]
>                     
>                     I had a perfect success on similar use case with
>                     descent ordered partial index
>
>                     http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html
>
>                 
>                 I'm not getting good performance. Are you able to craft
>                 an example using my schema and partial index?
>
>             
>             maybe some like
>             
>             CREATE INDEX ON message_property (person_id, message_id)
>             WHERE pr.is_read
>             
>             When I am thinking about your schema, it is designed well,
>             but it is not index friendly, so for some fast access you
>             should to hold a cache (table) of unread messages
>
>         
>         Ah, that's what I was hoping to not having to do. In my system,
>         messages arrive all the time and having to update a cache for
>         all new messages for all users seems messy... Seems I could just
>         as well create a message_property for all users when a new
>         message arrives, so I can INNER JOIN it and get good
>         performance. But that table will quickly grow *very* large...
>
>     
>     What you need is a JOIN index, that is not possible in Postgres.
>     
>     I afraid so some "ugly" solutions is necessary (when you require
>     extra fast access). You need a index (small index) and it require
>     some existing set - you cannot do index on the difference two sets.
>     
>     I expect so some flag on the relation "message" - like "it should
>     not be not read" can helps little bit - and can be used in partial
>     index as conditions. Other possibility is some variant of
>     partitioning - you can divide a messages and users to distinct sets
>     and then you decrease a number of possible combinations.
>

> Just curious:
> Is such a JOIN index possible in other DBs, if so - which?
> Can other DBs do index on difference between two sets?
> Will PG ever have this, is it even possible?

I'm not aware of such database, but maybe it's possible at least for
some cases. But I'd expect that to significantly depend on the schema.
And I'm not aware of any such effort in case of PostgreSQL, do don't
hold your breath.

IMHO the problem with your schema is that while each 'read' message has
a matching row in message_property, 'undread' messages may or may not
have a matching row. Is there a particular reason for that?
 
 
Yes. The point is that maintaining a message_property pair for all messages for all users in the system imposes quite a maintainance-headache. As the schema is now any new message is per definition un-read, and when a user reads it then it gets an entry with is_read=true in message_property. This table holds other properties too. This way I'm avoiding having to book-keep so much when a new message arrives and when a new user is created. A message in my system does not necessarily have only one recipient, it might have one, many or none, and might be visible to many.
 
If you could get rid of this, i.e. require that each pair (message,
recipient) has a row in message_propery (irrespectedly whether the
message was read or not), you can do this:

CREATE INDEX message_unread_idx
    ON message_property(message_id, person_id) WHERE (NOT is_read)

and then simply do the query like this:

SELECT
    m.id,
    prop.person_id,
    prop.is_read,
    m.subject
FROM messages m JOIN message_property p ON (m.id = p.message_id)
WHERE (NOT is_read) AND person_id = :pid

and I'd expect this to use the partial index, and being much faster.
 
I'm aware of the performance-gain using such a plain JOIN-query.
 
Thanks for your feedback.
 
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Optimize query for listing un-read messages
Next
From: Tomas Vondra
Date:
Subject: Re: Optimize query for listing un-read messages