Query optimization....

From: Karl Denninger
Subject: Query optimization....
Date: ,
Msg-id: 46AE8B9C.1000208@denninger.net
(view: Whole thread, Raw)
List: pgsql-performance

In a followup to a question I put forward here on performance which I
traced to the "stats" bug (and fixed it).  Now I'm trying to optimize
that query and....... I'm getting confused fast...

I have the following (fairly complex) statement which is run with some

select post.forum, post.subject, post.replied from post where toppost =
1 and (replied > (select lastview from forumlog where login='someone'
and forum=post.forum and number is null)) is not false AND (replied >
(select lastview from forumlog where login='someone' and
forum=post.forum and number=post.number)) is not false order by pinned
desc, replied desc;

This gives me exactly what I'm looking for BUT can be quite slow.

The "forumlog" table has one tuple for each post and user; it has the
fields "forum", "number", "login" and "lastview".  The "post" items have
a "forum", "number" and "replied" field (which is used to match the
"lastview" one.)

When you look at a "post" (which may have replies) the application
updates your existing entry in that table if there is one, or INSERTs a
new tuple if not.

Therefore, for each post you have viewed, there is a tuple in the
"forumlog" table which represents the last time you looked at that item.

The problem is that for a person who has NOT visited a specific thread
of discussion, there is no "forumlog" entry for that person and post in
the table.  Thus, to get all posts which (1) you've not seen at all, or
(2) you've seen but someone has added to since you saw them, the above
complex query is what I've come up with; there may be a "null" table
entry which a "wildcard" match if its present - if there is no match
then the item also must treated as new.  The above statement works - but
its slow.

The following query is VERY fast but only returns those in which there
IS an entry in the table (e.g. you've visited the item at least once)

select post.forum, post.subject, post.replied from post, forumlog where
post.number = forumlog.number and post.toppost = 1 and post.replied >
forumlog.lastview and forumlog.login='someone' order by pinned desc,
replied desc;

What I haven't been able to figure out is how to structure a query that
is both fast and will return the posts for which you DO NOT have a
matching entry in the "forumlog" table for the specific post but DO
either (1) match the "null" number entry (that is, they're posted later
than that) OR (2) have no match at all.  (The first statement matches
these other two cases)

Any ideas?  (Its ok if that query(s) are separate; in other words, its
cool if I have to execute two or even three queries and get the results
separately - in fact, that might be preferrable in some circumstances)


Karl Denninger ()

%SPAMBLOCK-SYS: Matched [@postgresql.org+], message ok

pgsql-performance by date:

From: Decibel!
Subject: Re: Vacuum looping?
From: Henrik Zagerholm
Subject: Seq scan on join table despite index and high statistics