DISTINCT ON, when you want it, you're glad it's there - Mailing list pgsql-sql

From Robert Forsman
Subject DISTINCT ON, when you want it, you're glad it's there
Date
Msg-id 199910310138.VAA01343@nile.purplefrog.com
Whole thread Raw
List pgsql-sql
 I was recently reading the DISTINCT ON conversation logged at
http://www.postgresql.org/docs/pgsql/doc/TODO.detail/distinct and felt I
had to kick in.
 Here at Incanta I'm writing a query to choose a list of video clips for
a user.  One of the features of this problem is that we may have multiple
encodings for a single clip.  If the user can decode two formats, I'd
like to offer them the shortest version of the movie (let's assume the
quality is identical).
 I'm using the following for a jdbc PreparedStatement (using a
PreparedStatement doesn't buy me anything in this case, but ignore that):

"SELECT DISTINCT ON clipid   c.clipid, f.fileid, c.producerid,
c.creationdate, c.duration, rankVideoClip(c.clipid, c.producerid) as
rank, f.length FROM videoclipregistry c, videofileregistry f WHERE 
c.clipid = f.clipid AND f.encrate < ? AND c.producerid = ? AND f.format
IN ("+formatStr+") AND c.creationdate > ? ORDER BY rank DESC, clipid,
length ASC, creationdate"

 I discovered rather quickly that DISTINCT ON seems to eliminate
SUCCESSIVE duplicates, not global duplicates, so I order by clipid, and
then length to eliminate the longer files representing a clip.  I still
consider the DISTINCT ON behavior a bug.
 I feel that DISTINCT ON is handy, otherwise I'd have to eliminate
duplicates in the calling code (which might be faster, but would cost
more transmission overhead and make me write code, and we all know "more
code = more bugs")
 The other alternative would be a temporary table, then a SELECT MAX
correlated subquery in the WHERE clause which I suspect postgresql can't
do so you'd write an SQL function for the subquery, except how do you
pass a list for IN into an SQL function?.
 Still, you could end up with two files with identical length and you'd
have to write code to eliminate that duplicate in the calling language
anyway.
 ARGH!

P.S. If I could use a ? for the formatStr for the IN clause I could
actually get some benefit from the PreparedStatement, but I don't think
JDBC or ODBC has any way to do that.


P.P.S.  prepare to vomit at the following function definitions.  Note
carefully the use of COALESCE:

"CREATE FUNCTION userRankVideoClip(INT, INT) RETURNS INT AS 'select
COALESCE(sum(INT4( p.value) * t.tagvalue), 0) FROM "+TAGREGISTRY+" t,
userpreferences p WHERE $1 = t.clipid AND
(''InMotion.Producer.''||($2)||''.''||t.tagname) = p.key' LANGUAGE
'SQL'"

"CREATE FUNCTION marqueeRankVideoClip(INT) RETURNS INT AS 'SELECT
COALESCE(SUM(t.tagvalue), 0) FROM "+TAGREGISTRY+" t WHERE t.clipid = $1
AND t.tagname = ''marquee'' ' language 'SQL'"

"CREATE FUNCTION rankVideoClip(INT, INT) RETURNS INT AS 'select
userRankVideoClip($1, $2)+marqueeRankVideoClip($1)' LANGUAGE 'SQL'"


pgsql-sql by date:

Previous
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [SQL] Decimal precsion?
Next
From: "Mario Simeone"
Date:
Subject: subscribe