Thread: Finding the "most recent" rows

Finding the "most recent" rows

From

Julian Scarfe

Date:

22 April 1999, 10:14:33

I have a table (representing a set of observations) with datetime fields and a
non-unique place field.

e.g.
create table obs (
the_time datetime,
the_place char(8),
...other fields...
)

I'd like an efficient way to pull out the most recent row (i.e. highest
datatime) belonging to *each* of a number of places selected by a simple
query.

e.g. given a table such as:

the_time    the_place    ...
0910        London
1130        London
0910        Paris
0930        London
0840        Paris
1020        London
0740        Paris

I'd like to select:
1130        London
0910        Paris

Most of my attempts at this (as an SQL novice) feel very clumsy and
inefficient. Is there an efficient way of doing this in SQL?
-- 

Julian Scarfe

Re: [SQL] Finding the "most recent" rows

From

Chris Bitmead

Date:

22 April 1999, 10:45:41

Try
SELECT the_place, max(the_time) FROM the_place GROUP BY the_place;

Julian Scarfe wrote:
> 
> I have a table (representing a set of observations) with datetime fields and a
> non-unique place field.
> 
> e.g.
> create table obs (
> the_time datetime,
> the_place char(8),
> ...other fields...
> )
> 
> I'd like an efficient way to pull out the most recent row (i.e. highest
> datatime) belonging to *each* of a number of places selected by a simple
> query.
> 
> e.g. given a table such as:
> 
> the_time    the_place   ...
> 0910        London
> 1130        London
> 0910        Paris
> 0930        London
> 0840        Paris
> 1020        London
> 0740        Paris
> 
> I'd like to select:
> 1130        London
> 0910        Paris
> 
> Most of my attempts at this (as an SQL novice) feel very clumsy and
> inefficient. Is there an efficient way of doing this in SQL?
> --
> 
> Julian Scarfe

-- 
Chris Bitmead
http://www.bigfoot.com/~chris.bitmead
mailto:chris.bitmead@bigfoot.com

Re: [SQL] Finding the "most recent" rows

From

George Moga

Date:

22 April 1999, 10:50:34

Julian Scarfe wrote: <blockquote type="CITE">I have a table (representing a set of observations) with datetime fields
anda <br />non-unique place field. <p>e.g. <br />create table obs ( <br />the_time datetime, <br />the_place char(8),
<br/>...other fields... <br />) <p>I'd like an efficient way to pull out the most recent row (i.e. highest <br
/>datatime)belonging to *each* of a number of places selected by a simple <br />query. <p>e.g. given a table such as:
<p>the_time   the_place   ... <br />0910        London <br />1130        London <br />0910        Paris <br
/>0930       London <br />0840        Paris <br />1020        London <br />0740        Paris <p>I'd like to select: <br
/>1130       London <br />0910        Paris <p>Most of my attempts at this (as an SQL novice) feel very clumsy and <br
/>inefficient.Is there an efficient way of doing this in SQL? <br />-- <p>Julian Scarfe</blockquote> If I understund
theproblem try this: <br /> <tt></tt><p><tt>test=> create table test (time datetime default now(), place
char(16));</tt><br/><tt>CREATE</tt><br /><tt>test=> insert into test (place) values ('London');</tt><br /><tt>INSERT
1948241</tt><br /><tt>test=> insert into test (place) values ('Paris');</tt><br /><tt>INSERT 194825 1</tt><br
/><tt>.............</tt><br/><tt>test=> insert into test (place) values ('Berlin');</tt><br /><tt>INSERT 194835
1</tt><br/><tt>test=> insert into test (place) values ('London');</tt><br /><tt>INSERT 194836 1</tt><br
/><tt>test=>insert into test (place) values ('Berlin');</tt><br /><tt>INSERT 194837 1</tt><br /><tt>test=> select
*from test;</tt><br /><tt>time                         |place</tt><br
/><tt>-----------------------------+----------------</tt><br/><tt>Thu 22 Apr 17:33:23 1999 EEST|London</tt><br
/><tt>Thu22 Apr 17:33:30 1999 EEST|Paris</tt><br /><tt>Thu 22 Apr 17:33:36 1999 EEST|London</tt><br /><tt>Thu 22 Apr
17:33:491999 EEST|Madrid</tt><br /><tt>Thu 22 Apr 17:33:54 1999 EEST|Paris</tt><br /><tt>Thu 22 Apr 17:34:03 1999
EEST|Berlin</tt><br/><tt>Thu 22 Apr 17:34:05 1999 EEST|Madrid</tt><br /><tt>Thu 22 Apr 17:34:08 1999
EEST|London</tt><br/><tt>Thu 22 Apr 17:34:12 1999 EEST|Paris</tt><br /><tt>Thu 22 Apr 17:34:14 1999 EEST|Madrid</tt><br
/><tt>Thu22 Apr 17:34:16 1999 EEST|Paris</tt><br /><tt>Thu 22 Apr 17:34:20 1999 EEST|Berlin</tt><br /><tt>Thu 22 Apr
17:34:221999 EEST|London</tt><br /><tt>Thu 22 Apr 17:34:31 1999 EEST|Berlin</tt><br /><tt>(14
rows)</tt><tt></tt><p><tt>test=>select place, time from test t where time = (select max(s.time) from test s where
s.place= t.place) order by place;</tt><br /><tt>place           |time</tt><br
/><tt>----------------+-----------------------------</tt><br/><tt>Berlin          |Thu 22 Apr 17:34:31 1999
EEST</tt><br/><tt>London          |Thu 22 Apr 17:34:22 1999 EEST</tt><br /><tt>Madrid          |Thu 22 Apr 17:34:14
1999EEST</tt><br /><tt>Paris           |Thu 22 Apr 17:34:16 1999 EEST</tt><br /><tt>(4
rows)</tt><tt></tt><p><tt>test=></tt><p>Iuse PostgreSQL 6.5.0 beta1 on Red Hat LINUX 5.2 with 2.2.2 kernel.
<p>-- <br/> Best,<br />   George Moga,<br />   george@flex.ro<br />   Braila, ROMANIA <br />

Re: [SQL] Finding the "most recent" rows

From

Tom Lane

Date:

22 April 1999, 11:30:07

Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> Julian Scarfe wrote:
>> I'd like an efficient way to pull out the most recent row (i.e. highest
>> datatime) belonging to *each* of a number of places selected by a simple
>> query.
>
> Try
> SELECT the_place, max(the_time) FROM the_place GROUP BY the_place;

But I'll bet he wants the whole row containing the max time, not just
the place and time columns.  I've run into similar problems and never
felt like I had a clean solution, either.

You could do something like

SELECT * FROM table AS t1 WHERE NOT
EXISTS(SELECT * FROM table AS t2 WHERE t2.place = t1.place AND t2.time > t1.time);

but this is ugly, and probably horribly inefficient as well.  (It might
not be unacceptably slow if the table has indexes on place and time,
but it sure looks like a brute-force approach.)

What you'd really like is something like a SELECT DISTINCT with a user-
specifiable row comparison operator; then you'd just "ORDER BY place, time"
and make a comparator that discards all but the last row for each place
value.  Hmm ... a little experimentation suggests that

SELECT DISTINCT ON place * FROM table ORDER BY place, time DESC;

might do the right thing.  It *seems* to select the first row for each
value of place.  I've never seen a spec for this feature, however, so
I'm not sure if it's reliable or not...
        regards, tom lane

Re: [SQL] Finding the "most recent" rows

From

Brook Milligan

Date:

22 April 1999, 14:52:07

I'd like an efficient way to pull out the most recent row (i.e. highest  datatime) belonging to *each* of a number
ofplaces selected by a simple  query.
 

The "Practical SQL Handbook" has a description of exactly what you are
looking for (don't have it handy or I'd give you the page number).
They discuss two ways to do it.  One uses the HAVING clause with GROUP
BY (I think that is the section of the book to look in), but I don't
think psql supports this.  The other way uses a subselect which is
supported by psql.

The script at the bottom illustrates some of the ideas.

Cheers,
Brook

===========================================================================
/* -*- C -*-* recent.sql*/

/** find the most recent entry (order) for each group (customer)*/

-- create tables

drop sequence invoices_id_seq;
drop table invoices;
create table invoices
(id        serial,customer    int,order_no    int,
unique (customer, order_no)
);

insert into invoices (customer, order_no) values (1, 1);
insert into invoices (customer, order_no) values (1, 2);
insert into invoices (customer, order_no) values (1, 3);
insert into invoices (customer, order_no) values (2, 1);
insert into invoices (customer, order_no) values (2, 2);
insert into invoices (customer, order_no) values (3, 1);

select * from invoices order by customer, order_no;

select * from invoices rwhere order_no = (select max (order_no) from invoices r2 where r.customer = r2.customer and
order_no< 3)order by r.customer, r.order_no;

Re: [SQL] Finding the "most recent" rows

From

Chris Bitmead

Date:

22 April 1999, 19:56:11

Tom Lane wrote:

> SELECT DISTINCT ON place * FROM table ORDER BY place, time DESC;

Clever. But why doesn't this work....

select title, summary, time from story t where time = (select
max(s.time) from story s GROUP BY s.title);               
ERROR:  parser: Subselect has too many or too few fields.

Re: [SQL] Finding the "most recent" rows

From

Tom Lane

Date:

22 April 1999, 20:46:31

Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> Clever. But why doesn't this work....

> select title, summary, time from story t where time = (select
> max(s.time) from story s GROUP BY s.title);               
> ERROR:  parser: Subselect has too many or too few fields.

A subselect used in an expression has to return exactly one value;
yours will return as many tuples as there are distinct titles.

I think you meant

select title, summary, time from story t where time = (select
max(s.time) from story s WHERE s.title = t.title);               

Here the subselect should give a single result each time it's
executed.  Unfortunately, it's gonna be executed once for each
tuple scanned by the outer select :-(
        regards, tom lane

Re: [SQL] Finding the "most recent" rows

From

Chris Bitmead

Date:

22 April 1999, 22:48:18

Tom Lane wrote:
> 
> Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> > Clever. But why doesn't this work....
> 
> > select title, summary, time from story t where time = (select
> > max(s.time) from story s GROUP BY s.title);
> > ERROR:  parser: Subselect has too many or too few fields.
> 
> A subselect used in an expression has to return exactly one value;
> yours will return as many tuples as there are distinct titles.

Why doesn't replacing "=" with "IN" produce a result? It wouldn't be the
desired result, but I thought this was legal.

Re: [SQL] Finding the "most recent" rows

From

Chairudin Sentosa

Date:

22 April 1999, 22:58:53

Your script doesn't show how to get the "most recent" rows.
The output from the 1st SQL is :
id|customer|order_no
--+--------+--------1|       1|       12|       1|       23|       1|       34|       2|       15|       2|       26|
   3|       1
 

The output from the 2nd SQL is:
id|customer|order_no
--+--------+--------2|       1|       25|       2|       26|       3|       1

What are you trying to show here?

I think you could just select the highest id, which means the latest data input.

Regards,
Chai


Brook Milligan wrote:

>    I'd like an efficient way to pull out the most recent row (i.e. highest
>    datatime) belonging to *each* of a number of places selected by a simple
>    query.
>
> The "Practical SQL Handbook" has a description of exactly what you are
> looking for (don't have it handy or I'd give you the page number).
> They discuss two ways to do it.  One uses the HAVING clause with GROUP
> BY (I think that is the section of the book to look in), but I don't
> think psql supports this.  The other way uses a subselect which is
> supported by psql.
>
> The script at the bottom illustrates some of the ideas.
>
> Cheers,
> Brook
>
> ===========================================================================
> /* -*- C -*-
>  * recent.sql
>  */
>
> /*
>  * find the most recent entry (order) for each group (customer)
>  */
>
> -- create tables
>
> drop sequence invoices_id_seq;
> drop table invoices;
> create table invoices
> (
>  id             serial,
>  customer       int,
>  order_no       int,
>
>  unique (customer, order_no)
> );
>
> insert into invoices (customer, order_no) values (1, 1);
> insert into invoices (customer, order_no) values (1, 2);
> insert into invoices (customer, order_no) values (1, 3);
> insert into invoices (customer, order_no) values (2, 1);
> insert into invoices (customer, order_no) values (2, 2);
> insert into invoices (customer, order_no) values (3, 1);
>
> select * from invoices order by customer, order_no;
>
> select * from invoices r
>         where order_no = (select max (order_no) from invoices r2 where r.customer = r2.customer and order_no < 3)
>         order by r.customer, r.order_no;

Re: [SQL] Finding the "most recent" rows

From

Tom Lane

Date:

23 April 1999, 00:09:15

Chris Bitmead <chris.bitmead@bigfoot.com> writes:
>>>> select title, summary, time from story t where time = (select
>>>> max(s.time) from story s GROUP BY s.title);

> Why doesn't replacing "=" with "IN" produce a result? It wouldn't be the
> desired result, but I thought this was legal.

I thought so too (on both counts).  Are you saying it doesn't work?
What happens?  Which version are you using?
        regards, tom lane

Re: [SQL] Finding the "most recent" rows

From

Brook Milligan

Date:

23 April 1999, 00:11:40

Your script doesn't show how to get the "most recent" rows.

True enough, but that's just because of the < 3 condition within the
subselect. Take that out and you'll get all the most recent rows (in
this example that means largest order_no, but the same works with
dates or whatever).
What are you trying to show here?

The first query was just showing the table, the second the action of
the relevant select.

I thought this was what you wanted and that you would recognize the
effect of the < 3 condition. I just happened to have this example
already that illustrated the general idea and expected that you would
get the idea. Sorry if I should have been more complete.
I think you could just select the highest id, which means the latest data input.

In this particular case the id and the order_no column happen to yield
the same results (they are both ordered in the same way). That
generally won't be the case. But, if you want the highest id within
each customer category a simple change to the select below will do it.

In any case, to get the largest (or most recent or whatever) anything
in each category you need to do a condition involving a subselect with
an aggregate. Something like the following will give you the largest
(without additional constraint) order_no within each customer.
select * from invoices rwhere order_no = (select max (order_no) from invoices r2 where r.customer = r2.customer)order
byr.customer, r.order_no;

Sorry for the confusion.

Cheers,
Brook

Re: [SQL] Finding the "most recent" rows

From

Chris Bitmead

Date:

23 April 1999, 00:37:59

Tom Lane wrote:
> 
> Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> >>>> select title, summary, time from story t where time = (select
> >>>> max(s.time) from story s GROUP BY s.title);
> 
> > Why doesn't replacing "=" with "IN" produce a result? It wouldn't be the
> > desired result, but I thought this was legal.
> 
> I thought so too (on both counts).  Are you saying it doesn't work?
> What happens?  Which version are you using?

httpd=> select title, summary, time from story t where time IN (select
max(s.time) from story s GROUP BY s.title);
ERROR:  parser: Subselect has too many or too few fields.

I'm using postgresql-snap-990329.tgz

Re: [SQL] Finding the "most recent" rows

From

Julian Scarfe

Date:

23 April 1999, 06:30:03

Tom Lane wrote:
> 
> SELECT DISTINCT ON place * FROM table ORDER BY place, time DESC;
> 
> might do the right thing.  It *seems* to select the first row for each
> value of place.  I've never seen a spec for this feature, however, so
> I'm not sure if it's reliable or not...

Works a treat when I try it on the "real" database (using 6.4.2). It seems
very efficient. I didn't realise that "DISTINCT ON" worked that way with an
"ORDER BY".

I must say I'm overwhelmed by the variety of constructive responses to my
request.  Thank you all very much.  It beats the hell out of sitting as no 17
in a premium rate telephone queue for support from a vendor of a "commercial"
product.
-- 

Julian Scarfe

Re: [SQL] Finding the "most recent" rows

From

Brook Milligan

Date:

23 April 1999, 10:43:14

Tom Lane wrote:  >   > SELECT DISTINCT ON place * FROM table ORDER BY place, time DESC;  >   > might do the right
thing. It *seems* to select the first row for each  > value of place.  I've never seen a spec for this feature,
however,so  > I'm not sure if it's reliable or not...
 

Cool idea.  Is DISTINCT guarranteed to choose the first row that
matches or can it choose any row?

Cheers,
Brook

Re: [SQL] Finding the "most recent" rows

From

Tom Lane

Date:

23 April 1999, 12:06:47

Chris Bitmead <chris.bitmead@bigfoot.com> writes:
>>>>>>> select title, summary, time from story t where time = (select
>>>>>>> max(s.time) from story s GROUP BY s.title);
>> 
>>>> Why doesn't replacing "=" with "IN" produce a result? It wouldn't be the
>>>> desired result, but I thought this was legal.
>> 
>> I thought so too (on both counts).  Are you saying it doesn't work?
>> What happens?  Which version are you using?

> httpd=> select title, summary, time from story t where time IN (select
> max(s.time) from story s GROUP BY s.title);
> ERROR:  parser: Subselect has too many or too few fields.

> I'm using postgresql-snap-990329.tgz

Yeah, I see it too.  This looks like a definite bug to me, but I have
other bugs to squash right now :-(.  Anyone else want to jump on this?
        regards, tom lane

Re: [SQL] Finding the "most recent" rows

From

Tom Lane

Date:

23 April 1999, 12:31:41

Brook Milligan <brook@trillium.NMSU.Edu> writes:
>    Tom Lane wrote:
>> 
>> SELECT DISTINCT ON place * FROM table ORDER BY place, time DESC;
>> 
>> might do the right thing.  It *seems* to select the first row for each
>> value of place.  I've never seen a spec for this feature, however, so
>> I'm not sure if it's reliable or not...

> Cool idea.  Is DISTINCT guarranteed to choose the first row that
> matches or can it choose any row?

I dunno, that's why I said I was unsure that this method was reliable.
By experimentation it seems that Postgres' DISTINCT code works that way,
but I have no idea whether the SQL spec mandates it or allows any row
within a group to be chosen.

I recall now that when I first heard of "SELECT DISTINCT ON field"
I objected that the results weren't well-defined (since it's not clear
how DISTINCT will choose which tuple to return).  It might be that the
SQL spec requires the first tuple to be chosen for each value of
"field", which would allow the user to control the results by inserting
a preceding ORDER BY step --- or skip the ORDER BY, if he doesn't really
care which tuple he gets.  That'd actually be a pretty cool design.
Anyone have an SQL spec handy to check it?
        regards, tom lane

Re: [HACKERS] Re: [SQL] Finding the "most recent" rows

From

Bruce Momjian

Date:

23 April 1999, 15:43:48

> Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> >>>>>>> select title, summary, time from story t where time = (select
> >>>>>>> max(s.time) from story s GROUP BY s.title);
> >> 
> >>>> Why doesn't replacing "=" with "IN" produce a result? It wouldn't be the
> >>>> desired result, but I thought this was legal.
> >> 
> >> I thought so too (on both counts).  Are you saying it doesn't work?
> >> What happens?  Which version are you using?
> 
> > httpd=> select title, summary, time from story t where time IN (select
> > max(s.time) from story s GROUP BY s.title);
> > ERROR:  parser: Subselect has too many or too few fields.

This is not legal.  If you use GROUP BY, the field must be in the target
list.  In this case, s.title is not in the target list of the subselect.
I realize it can't be in the subselect target list because you can only
have one column in the target list, but that is the case.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Percentages?

From

Chris Bitmead

Date:

24 April 1999, 12:12:47

How to calculate percentages? What is the correct SQL?

CREATE TABLE poll (candidate text, votes int4);

I want to do something like (this is wrong)...
SELECT candidate, votes, (votes / sum(votes)) * 100) AS percent FROM
poll;

Fred Smith  |  500 | 25
Bill Bloggs | 1000 | 50
Jim Jones   |  500 | 25


-- 
Chris Bitmead
http://www.bigfoot.com/~chris.bitmead
mailto:chris.bitmead@bigfoot.com

Confusion about DISTINCT.

From

Chris Bitmead

Date:

25 April 1999, 05:23:48

I'm a bit confused about DISTINCT. Firstly, what does DISTINCT mean when
you don't use "ON". Like this...

SELECT DISTINCT category.image FROM story, category* WHERE
story.category = category.oid ORDER BY datetime DESC;

The DISTINCT here has no effect on the output other than changing the
ordering. There are duplicates!

2nd question: Why does the following query result in duplicates even
though I use DISTINCT ON? If I change the ORDER BY to be on image, then
there are no duplicates but that isn't what I want. I want the time to
be the sort order because I want the X most recent images but only
unique ones. Is this a bug? It certainly seems wierd that DISTINCT would
return duplicates. Why should it be up to the user to order the output
with reference to the DISTINCT clause? Shouldn't the database take care
of that?
...

SELECT DISTINCT ON image category.image FROM story, category* WHERE
story.category = category.oid ORDER BY datetime DESC;
image               
--------------------
/icon/canon.gif     
/icon/arca-swiss.gif
/icon/canon.gif     
/icon/hasselblad.gif
/icon/nikon.gif     
/icon/olympus.gif   
(6 rows)


-- 
Chris Bitmead
http://www.bigfoot.com/~chris.bitmead
mailto:chris.bitmead@bigfoot.com

Re: [SQL] Confusion about DISTINCT.

From

Tom Lane

Date:

25 April 1999, 12:23:21

Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> I'm a bit confused about DISTINCT. Firstly, what does DISTINCT mean when
> you don't use "ON".

Means the equality test is taken across all columns of the output.
"ON" restricts the test for "duplicate row" to look at just some
of the columns (thus creating the issue of which tuple out of a
group of "duplicates" gets through the filter to determine the values
of the other columns).

> Like this...
> SELECT DISTINCT category.image FROM story, category* WHERE
> story.category = category.oid ORDER BY datetime DESC;
> The DISTINCT here has no effect on the output other than changing the
> ordering. There are duplicates!

This is a known bug, or at least IMHO it's a bug; it's triggered by your
use of ORDER BY with DISTINCT.  You need to sayORDER BY category.image, datetime DESC 
to make the above example work.

The reason is that DISTINCT is implemented by a sort followed by an
adjacent-duplicate-elimination pass (just like "sort | uniq" in Unix
shell programming).  When you put on an explicit ORDER BY clause,
you override the sort order that the DISTINCT wants, and so the
duplicate filter doesn't necessarily spot the duplicates.  You have
to make sure to sort in an order that will keep the unwanted duplicates
together, ie, list all the DISTINCT columns first in the ORDER BY.

A straightforward solution would be to apply the user-specified ORDER BY
sort *after* the DISTINCT-generated sort and dup-filter steps.  I used
to think this was the right fix.  However, this would break that nifty
technique of letting a user-specified ORDER BY resolve the ambiguity of
DISTINCT ON --- the user ordering has to happen before the dup-filter
for that to work.

Alternatively, the system could alter the user's ORDER BY to ensure the
DISTINCT columns are the primary sort keys --- ie, silently change your
ORDER BY in the above example.  I don't think I like that too much
either.  For one thing, people would complain that the resulting order
wasn't what they asked for.  For another, it seems possible that there
are applications where applying the dup-filter to columns that aren't
the primary keys might be useful.  Let's see, if you didSELECT DISTINCT category.image FROM story, category*
WHEREstory.category= category.oid ORDER BY datetime DESC, category.image;

then you'd get each image listed only once *per date* (I think).
Doesn't that sound like it could be a useful behavior?

So right at the moment, I think the system ought not tinker with the
user's ORDER BY.  But it probably should emit a warning message if the
ORDER BY fails to mention all the columns being DISTINCTed on.  If you
aren't sorting by a DISTINCT column at all, you won't even get
reproducible results, let alone desirable ones.

> 2nd question: Why does the following query result in duplicates even
> though I use DISTINCT ON?

Same deal.
        regards, tom lane

Re: [SQL] Percentages?

From

Nuchanard Chiannilkulchai

Date:

27 April 1999, 05:12:56

Chris Bitmead wrote:

> How to calculate percentages? What is the correct SQL?
>
> CREATE TABLE poll (candidate text, votes int4);
>
> I want to do something like (this is wrong)...
> SELECT candidate, votes, (votes / sum(votes)) * 100) AS percent FROM
> poll;
>
> Fred Smith  |  500 | 25
> Bill Bloggs | 1000 | 50
> Jim Jones   |  500 | 25
>
> --
> Chris Bitmead
> http://www.bigfoot.com/~chris.bitmead
> mailto:chris.bitmead@bigfoot.com

I always solve by a temporary table

select sum(votes) as sumvotes  into tmp from poll;
SELECT candidate, votes, (float4(votes)/float4(sumvotes))*100
as percent from poll, tmp;
candidate|votes|         percent
---------+-----+----------------
Fred     |  500|23.8095238804817
Bill     | 1000|47.6190477609634
James    |  600| 28.571429848671
(3 rows)

The problem is now cutting to only to 2 decimal point (ie:  23.80,
47.61, 28.57)
and we need some further help.

Nuch

Re: [SQL] Percentages?

From

José Soares

Date:

27 April 1999, 08:58:39

Nuchanard Chiannilkulchai ha scritto:

Chris Bitmead wrote:
> How to calculate percentages? What is the correct SQL?
>
> CREATE TABLE poll (candidate text, votes int4);
>
> I want to do something like (this is wrong)...
> SELECT candidate, votes, (votes / sum(votes)) * 100) AS percent FROM
> poll;
>
> Fred Smith | 500 | 25
> Bill Bloggs | 1000 | 50
> Jim Jones   | 500 | 25
>
> --
> Chris Bitmead
> http://www.bigfoot.com/~chris.bitmead
> mailto:chris.bitmead@bigfoot.com
I always solve by a temporary table
select sum(votes) as sumvotes into tmp from poll;
SELECT candidate, votes, (float4(votes)/float4(sumvotes))*100
as percent from poll, tmp;
candidate|votes|         percent
---------+-----+----------------
Fred     | 500|23.8095238804817
Bill     | 1000|47.6190477609634
James    | 600| 28.571429848671
(3 rows)
The problem is now cutting to only to 2 decimal point (ie: 23.80,
47.61, 28.57)
and we need some further help.
Nuch

In v6.5 you can use decimal type.
otherwise try this:

select candidate, votes, substr(cast(percents as text),1,position('.' in cast(pe
rcents as text))) ||
substr(cast(percents as text),position('.' in cast(percents as text)) + 1, 2) as
percent from test;
candidate|votes|percent
---------+-----+-------
Fred     | 500| 23.80
Bill     | 1000| 47.61
James    | 600| 28.57
(3 rows)

or this ...

select candidate, votes, format(percents,'###,##') from test;

-- NB: Here in Italy we use comma instead of decimal point.

Attached file contains:
format() function
text(float8) function (to cast percents to text)

José

Re: [SQL] Finding the "most recent" rows

From

Herouth Maoz

Date:

29 April 1999, 06:04:52

At 07:39 +0300 on 23/04/1999, Chris Bitmead wrote:

> httpd=> select title, summary, time from story t where time IN (select
> max(s.time) from story s GROUP BY s.title);
> ERROR:  parser: Subselect has too many or too few fields.

Of course it does. Since you group by a field which is not in the select
list, Postgres adds it silently to the fields to be selected. Thus the
subselect has two fields in each row, not just a single time field.

I think the proper syntax by the standard would be something like:
SELECT title, summary, timeFROM story tWHERE (time, title) IN (   SELECT s.title, max( s.time )   FROM story s   GROUP
BYs.title);

But I'm not sure Postgres even supports this format (of comparing against
several fields).

In any case, the best would be to select just one tuple in the subselect
and have it return only the time, by constraining the title. But
syntactically, the above is what you were trying to do.

Anyway, Postgres adds the group field to the query, which the standard
requires and common practice doesn't. I think perhaps after doing the
grouping, Postgres should drop that field, since it wasn't originally
requested.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] Finding the "most recent" rows

From

Herouth Maoz

Date:

29 April 1999, 06:29:27

At 19:29 +0300 on 23/04/1999, Tom Lane wrote:

> I recall now that when I first heard of "SELECT DISTINCT ON field"
> I objected that the results weren't well-defined (since it's not clear
> how DISTINCT will choose which tuple to return).  It might be that the
> SQL spec requires the first tuple to be chosen for each value of
> "field", which would allow the user to control the results by inserting
> a preceding ORDER BY step --- or skip the ORDER BY, if he doesn't really
> care which tuple he gets.  That'd actually be a pretty cool design.
> Anyone have an SQL spec handy to check it?

DISTINCT ON is a non-standard structure. Your initial assertion, that it
will return different results in different implementations holds true.
Standard SQL should return the same result sets (order not important unless
ORDER BY was mentioned).

Tables are considered as unordered sets. There can not be anything in the
standard that relies on the order in the table. And what do you mean by
inserting a preceding ORDER BY step?

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] Finding the "most recent" rows

From

Chris Bitmead

Date:

29 April 1999, 10:28:21

Herouth Maoz wrote:

> Tables are considered as unordered sets. There can not be anything in the
> standard that relies on the order in the table. And what do you mean by
> inserting a preceding ORDER BY step?

He was suggesting that an ORDER BY combined with a DISTINCT might be a
good way of solving the difficult problem of finding the, say,
max(field) in different groups in the table. (And it does work for
postgresql) Like a SELECT ...GROUP BY except getting back not just the
group fields but all fields.

You say that a table is "considered an unordered set". But surely a
table is not an unordered set if you've specified an ORDER BY clause?
This idea is so nice, it would be ashame to dismiss it too quickly.

-- 
Chris Bitmead
http://www.bigfoot.com/~chris.bitmead
mailto:chris.bitmead@bigfoot.com

Re: [SQL] Finding the "most recent" rows

From

Herouth Maoz

Date:

29 April 1999, 11:03:51

At 17:28 +0300 on 29/04/1999, Chris Bitmead wrote:

>
> You say that a table is "considered an unordered set". But surely a
> table is not an unordered set if you've specified an ORDER BY clause?
> This idea is so nice, it would be ashame to dismiss it too quickly.

Charming, but alas, non standard. Moreover, what if you want to get the
maximal value, but order in an ascending in the output?

No, if you want compatible SQL, you won't use DISTINCT ON, and you will
specify the exact method of picking the distinguished value.

Furthermore, ORDER BY has to be a late stage, if it should work on the
entire results of unions, for example, or the result of groups.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] Confusion about DISTINCT.

From

Bruce Momjian

Date:

10 May 1999, 12:26:52


Nice summary of the issue.


> Chris Bitmead <chris.bitmead@bigfoot.com> writes:
> > I'm a bit confused about DISTINCT. Firstly, what does DISTINCT mean when
> > you don't use "ON".
> 
> Means the equality test is taken across all columns of the output.
> "ON" restricts the test for "duplicate row" to look at just some
> of the columns (thus creating the issue of which tuple out of a
> group of "duplicates" gets through the filter to determine the values
> of the other columns).
> 
> > Like this...
> > SELECT DISTINCT category.image FROM story, category* WHERE
> > story.category = category.oid ORDER BY datetime DESC;
> > The DISTINCT here has no effect on the output other than changing the
> > ordering. There are duplicates!
> 
> This is a known bug, or at least IMHO it's a bug; it's triggered by your
> use of ORDER BY with DISTINCT.  You need to say
>     ORDER BY category.image, datetime DESC 
> to make the above example work.
> 
> The reason is that DISTINCT is implemented by a sort followed by an
> adjacent-duplicate-elimination pass (just like "sort | uniq" in Unix
> shell programming).  When you put on an explicit ORDER BY clause,
> you override the sort order that the DISTINCT wants, and so the
> duplicate filter doesn't necessarily spot the duplicates.  You have
> to make sure to sort in an order that will keep the unwanted duplicates
> together, ie, list all the DISTINCT columns first in the ORDER BY.
> 
> A straightforward solution would be to apply the user-specified ORDER BY
> sort *after* the DISTINCT-generated sort and dup-filter steps.  I used
> to think this was the right fix.  However, this would break that nifty
> technique of letting a user-specified ORDER BY resolve the ambiguity of
> DISTINCT ON --- the user ordering has to happen before the dup-filter
> for that to work.
> 
> Alternatively, the system could alter the user's ORDER BY to ensure the
> DISTINCT columns are the primary sort keys --- ie, silently change your
> ORDER BY in the above example.  I don't think I like that too much
> either.  For one thing, people would complain that the resulting order
> wasn't what they asked for.  For another, it seems possible that there
> are applications where applying the dup-filter to columns that aren't
> the primary keys might be useful.  Let's see, if you did
>     SELECT DISTINCT category.image FROM story, category* WHERE
>     story.category = category.oid ORDER BY datetime DESC, category.image;
> then you'd get each image listed only once *per date* (I think).
> Doesn't that sound like it could be a useful behavior?
> 
> So right at the moment, I think the system ought not tinker with the
> user's ORDER BY.  But it probably should emit a warning message if the
> ORDER BY fails to mention all the columns being DISTINCTed on.  If you
> aren't sorting by a DISTINCT column at all, you won't even get
> reproducible results, let alone desirable ones.
> 
> > 2nd question: Why does the following query result in duplicates even
> > though I use DISTINCT ON?
> 
> Same deal.
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026