Thread: Restriction by grouping problem.

Restriction by grouping problem.

From
"Jeff Barrett"
Date:
The query I have now:

SELECT min(datettime), sid FROM logs GROUP by sid;

This returns the first instance of an sid in the logs table, there can be
many rows in the table for each sid. The problem I have is that I do not
need the SID I just need to group the min(datetime) by it. I need to return
the column getfile. So the question I have is how would I create the
retstriction of the min(datetime), sid and only return the value of getfile?

The english of what I am looking for is.

Select the getfile for the earliest of each sid.

Any help would be greatly appreciated.

--Jeff




Re: Restriction by grouping problem.

From
"Josh Berkus"
Date:
Jeff,

> The query I have now:

> 

> SELECT min(datettime), sid FROM logs GROUP by sid;

> 

> This returns the first instance of an sid in the logs table, there

> can be

> many rows in the table for each sid. The problem I have is that I do

> not

> need the SID I just need to group the min(datetime) by it.

The answer to this question is simple:

SELECT min(datetime) as mintime FROM logs GROUP BY sid;

However, I get the impression that your intended problem was more
complicated.  Can you re-explain it?

-Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                      Josh Berkus
 Complete information technology      josh@agliodbs.com
  and data management solutions       (415) 565-7293
 for law firms, small businesses        fax 621-2533
   and non-profit organizations.      San Francisco


Re: Restriction by grouping problem.

From
"Jeff Barrett"
Date:
Yeah I have been having a bit of a problem expressing myself in terms of
this query, sorry about that.

What I need is:

SELECT getfile FROM logs (and a restriction that results in finding the rows
with the lowest datetime for each unique sid)

To define this table a bit more:

Logs table has a primary key of logid (int4) and serveral columns, of which
I am deling with sid (text), getfile (text), and datetime (int4). Now a
select getfile, datetime, logid from logs where sid = onevalue; would return
a set of rows for that sid, the row I want is for the one with the smallest
aka min(datetime) and I want this for every row in the table.

Thanks for the quick response Josh, hopefully above will help out some.

--Jeff

""Josh Berkus"" <josh@agliodbs.com> wrote in message
news:web-92249@davinci.ethosmedia.com...
> Jeff,
>
> > The query I have now:
>
> >
>
> > SELECT min(datettime), sid FROM logs GROUP by sid;
>
> >
>
> > This returns the first instance of an sid in the logs table, there
>
> > can be
>
> > many rows in the table for each sid. The problem I have is that I do
>
> > not
>
> > need the SID I just need to group the min(datetime) by it.
>
> The answer to this question is simple:
>
> SELECT min(datetime) as mintime FROM logs GROUP BY sid;
>
> However, I get the impression that your intended problem was more
> complicated.  Can you re-explain it?
>
> -Josh
>
>
> ______AGLIO DATABASE SOLUTIONS___________________________
>
>                                        Josh Berkus
>
>   Complete information technology      josh@agliodbs.com
>
>    and data management solutions       (415) 565-7293
>
>   for law firms, small businesses        fax 621-2533
>
>     and non-profit organizations.      San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly




Re: Re: Restriction by grouping problem.

From
Philip Warner
Date:
A way to do this is:
   Select GETFILE From       (Select Min(DATETIME),SID From LOGS Group by SID) as MDT,       LOGS L   Where      L.SID
=MDT.SID      And L.DATETIME = MDT.DATETIME
 

But this fails if there are more than one row for a given SID/DATETIME pair
(doe you have a unique index on them?).

Alternatively, you could try:
   Select        (Select GETFILE From LOGS L Where L.SID = S.SID         Order By DATETIME Asc Limit 1) as GETFILE
From      (Select Distinct SID From LOGS) as S
 

...the FROM statement would be a lot nicer if your had a table of SIDs, or
at least had a good way of generating the list of SIDs.


[Not sure if I got the PG dialect right]


At 18:03 26/07/01 -0400, Jeff Barrett wrote:
>
>Logs table has a primary key of logid (int4) and serveral columns, of which
>I am deling with sid (text), getfile (text), and datetime (int4). Now a
>select getfile, datetime, logid from logs where sid = onevalue; would return
>a set of rows for that sid, the row I want is for the one with the smallest
>aka min(datetime) and I want this for every row in the table.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: Restriction by grouping problem.

From
Philip Warner
Date:
At 11:47 27/07/01 +1000, Philip Warner wrote:
>
>...the FROM statement would be a lot nicer if your had a table of SIDs, or
>at least had a good way of generating the list of SIDs.
>

A final (& possibly best) option, is:
   Select   MDT.GETFILE,  [anything else from 'best' record]   From       (Select Distinct SID From LOGS) as S,
(Select* From LOGS L Where L.SID = S.SID         Order By DATETIME Asc Limit 1) as MDT
 

But unfortunately PG 7.1 does not allow this - the second select can not
reference the first select. I have not tried it in 7.1.2.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: Restriction by grouping problem.

From
"Josh Berkus"
Date:
Phillip,

> What it effectively does is executes the second SELECT for each row
> of the
> first select, a bit like a column-select expression, but it allows
> access
> to all rows.

Ah!  I knew that there had to be a simple way to do what you want to do
in PGSQL:

SELECT MIN(getfile) as bestfile
FROM logs,   (SELECT SID, MAX(logtime) as lasttime FROM logs) MDT
WHERE logs.SID = MDT.SID   AND logs.logtime = MDT.lasttime
GROUP BY SID

This gives you the "getfile" from the logs with the latest timestamp.
In the event of two with the exact same logtime, it selects the first
one alphabetically so that you don't get two files for one SID.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: Re: Restriction by grouping problem.

From
Philip Warner
Date:
At 17:19 27/07/01 -0700, Josh Berkus wrote:
>Phillip,
>
>> What it effectively does is executes the second SELECT for each row
>> of the
>> first select, a bit like a column-select expression, but it allows
>> access
>> to all rows.
>
>Ah!  I knew that there had to be a simple way to do what you want to do
>in PGSQL:

The first two examples I sent do it 'simply' in PG. Unfortunately the
example you give will only return one row, and since Jeff wanted one
gettime for each SID, we'd need to modify both your and my first solution to:
   Select SID,Min(GETFILE) From       (Select Min(DATETIME),SID From LOGS Group by SID) as MDT,       LOGS L   Where
 L.SID = MDT.SID      And L.DATETIME = MDT.DATETIME
 


This will work, but will not solve the larger problem of 'give me all the
attrs of of the record with the least datetime for a given SID'. Jeff does
not need this, but it is a common problem. Simply using min/max works for
one attr, but fails to provide a consistent view of multiple attrs. PG has
no elegant single-statement solution to this problem. The SQL solution I
use in other systems is, as I mentioned earlier:
   Select  [any list of attrs from the 'best' record]   From       (Select Distinct SID From LOGS) as S, -- Get the
groupingattrs       (Select * From LOGS L Where L.SID = S.SID  -- For each group do a
 
'row-select'        Order By DATETIME Asc Limit 1) as MDT

The second select can be thought of as analagous to a
column-select-expression, but allowing access to all attrs of the resulting
row.

If you want f1,f2,and f3 from LOGS, then a similar result would be achieved
(inelegantly) by:
   Select  (Select F1 From LOGS L Where L.SID = S.SID                     Order By DATETIME Asc Limit 1) as F1,
(SelectF2 From LOGS L Where L.SID = S.SID                     Order By DATETIME Asc Limit 1) as F2,  (Select F3 From
LOGSL Where L.SID = S.SID                     Order By DATETIME Asc Limit 1) as F3   From       (Select Distinct SID
FromLOGS) as S, -- Get the grouping attrs       
 
Assuming DATETIME is unique then F1, F2, F3 will all come from the same row
and you will have a consistent record.

I have not checked, but I'd guess that PG will create a cross block with
four entries, whereas the original syntax above should just use 2 entries.
Like I said, it's just an optimizer hint.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: Restriction by grouping problem.

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> This will work, but will not solve the larger problem of 'give me all the
> attrs of of the record with the least datetime for a given SID'. Jeff does
> not need this, but it is a common problem. Simply using min/max works for
> one attr, but fails to provide a consistent view of multiple attrs. PG has
> no elegant single-statement solution to this problem.

Yes it does:
SELECT DISTINCT ON (sid) * FROM logs ORDER BY sid, datetime;

This gives you just one output row per distinct "sid" value, and that
row will be the one with least datetime.

See the DISTINCT ON example in the SELECT reference page.
        regards, tom lane