Thread: help with pagila

help with pagila

From

"Walter Cruz"

Date:

01 September 2006, 14:17:04

Hi all. I'm with a little doubt.<br /><br />I'm testing the pagila (the postgres port of mysql sakila sample).<br /><br
/>Well,I was trying to translate the query:<br /><br /><pre><code>select<br />     film.film_id AS FID,<br />
film.titleAStitle,<br />     film.description AS description,<br />     <a
href="http://category.name">category.name</a>AS category,<br />     film.rental_rate AS price,<br />     film.length AS
length,<br/>     film.rating AS rating,<br />    group_concat(concat(actor.first_name,_utf8' ',actor.last_name)
separator',') AS actors <br />from<br />     category<br />     inner join film on(category.category_id =
film.category_id)<br/>     inner join film_actor on(film.film_id= film_actor.film_id)<br />     inner join actor
on(film_actor.actor_id= actor.actor_id)<br />group by<br />    film.film_id;</code></pre><br />That cant be find here:
<ahref="http://www.stardata.it/articoli_en/mysql_sample_db_articoli_en.html">
http://www.stardata.it/articoli_en/mysql_sample_db_articoli_en.html</a><br/><br />I read a comment by David Fetter on a
blogabout group concat. I'm following his example:<br /><br />select<br />     film.title AS title,  <br />    
array_to_string(array_accum(actor.first_name || ' ' || actor.last_name),',') AS actors <br />from<br />     film<br
/>    inner join film_actor on film.film_id = film_actor.film_id<br />     inner join actor on film_actor.actor_id =
actor.actor_id<br />    GROUP BY film.title<br />     <br /><br /><br />But, when I add another column on select, like,
film_description,I get the following error:<br /><br />"ERROR:  column "film.description" must appear in the GROUP BY
clauseor be used in an aggregate function" <br /><br />If I put that column on GROUP BY everything works ok. But I want
understantwhy do I need to do that. Can someone teach me, please?<br /><br />[]'s<br />- Walter<br />

Re: help with pagila

From

Andrew Sullivan

Date:

01 September 2006, 17:14:38

On Fri, Sep 01, 2006 at 10:31:48AM -0300, Walter Cruz wrote:
> "ERROR:  column "film.description" must appear in the GROUP BY clause or be
> used in an aggregate function"
> 
> If I put that column on GROUP BY everything works ok. But I want understant
> why do I need to do that. Can someone teach me, please?

You need to because everything else is being grouped or aggregated.
Otherwise, you should get one row for every match of
film.description, and that's not what you want.  (More precisely and
yet still completely imprecise, that's not even something you can
have, because of the way sets work.)

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.                --Brad Holland

Re: help with pagila

From

"Walter Cruz"

Date:

01 September 2006, 19:52:13

So I can assume that the MySQL implementation is strange? (It accepts that kind of query)

[]'s
- Walter

On 9/1/06, Andrew Sullivan < ajs@crankycanuck.ca> wrote:

On Fri, Sep 01, 2006 at 10:31:48AM -0300, Walter Cruz wrote:
> "ERROR:  column "film.description" must appear in the GROUP BY clause or be
> used in an aggregate function"
>
> If I put that column on GROUP BY everything works ok. But I want understant
> why do I need to do that. Can someone teach me, please?

You need to because everything else is being grouped or aggregated.
Otherwise, you should get one row for every match of
film.description, and that's not what you want.  (More precisely and
yet still completely imprecise, that's not even something you can
have, because of the way sets work.)

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
                --Brad Holland

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: help with pagila

From

Andrew Sullivan

Date:

01 September 2006, 20:06:25

On Fri, Sep 01, 2006 at 02:26:39PM -0300, Walter Cruz wrote:
> So I can assume that the MySQL implementation is strange? (It accepts that
> kind of query)

In my experience, it is almost never safe to assume that the MySQL
approach to SQL bears anything but a passing resemblance to SQL
proper.  This is considerably better under recent releases, however,
and I think you'd find, if you used the strict mode in the most
recent release, that MySQL would choke on a query like you posted as
well.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.    --Philip Greenspun

Re: help with pagila

From

Tomas Vondra

Date:

01 September 2006, 20:37:50

> But, when I add another column on select, like, film_description, I get
> the following error:
> 
> "ERROR:  column "film.description" must appear in the GROUP BY clause or
> be used in an aggregate function"
> 
> If I put that column on GROUP BY everything works ok. But I want
> understant why do I need to do that. Can someone teach me, please?

The reason is pretty simple - GROUPing actually means "sorting into
boxes by values in the columns after the GROUP BY keyword" (and then
applying some aggregate functions to these boxes, as for example COUNT,
AVG etc.) Besides these aggregates, you can select a column that
'constant' for each of the boxes, that is all the rows in that box have
the same value in this column.

That's the case of the first SQL query you've posted - you're grouping
by 'film_id', thus all the rows in a box have the same value in this
column. And thanks to this you can select the value in the SELECT.

But in the second query, you'd like to select another column (directly,
not through an aggregate function) - title. But there could be different
values for each row in the box (PostgreSQL doesn't know that the ID
uniquely identifies the title). For example imagine you would group by
'published_year' instead of the 'film_id' - in this case there would be
many different movies in the same box, in which case it's impossible to
select 'title' for all of them.

There are two ways to solve this:

1) add the 'title' to the GROUP BY clause, thus all the rows in a box  have the same value of 'title' (and postgresql
knowsabout that)
 

2) use a subselect
  film_id,  (SELECT title FROM film WHERE film_id = film_outer.film_id) AS title  FROM film AS film_outer  ...  GROUP
BYfilm_id;
 

Tomas

Re: help with pagila

From

Scott Marlowe

Date:

01 September 2006, 21:43:04

On Fri, 2006-09-01 at 12:26, Walter Cruz wrote:
> So I can assume that the MySQL implementation is strange? (It accepts
> that kind of query)

Yes, according to the SQL spec, you should generally get an error when
you run a query like this:

select field1, field2 from table group by field1

since you could theoretically get a different value for field2 each time
you run the query.  If the data looked like this:

field1 | field2
-------+--------1     | 11     | 32     | 52     | 3

The possible answers to that query would be (1,1)(2,5), (1,1)(2,3),
(1,3)(2,5), (1,3)(2,3)

Re: help with pagila

From

Tom Lane

Date:

01 September 2006, 21:55:26

Andrew Sullivan <ajs@crankycanuck.ca> writes:
> On Fri, Sep 01, 2006 at 02:26:39PM -0300, Walter Cruz wrote:
>> So I can assume that the MySQL implementation is strange? (It accepts that
>> kind of query)

> In my experience, it is almost never safe to assume that the MySQL
> approach to SQL bears anything but a passing resemblance to SQL
> proper.  This is considerably better under recent releases, however,
> and I think you'd find, if you used the strict mode in the most
> recent release, that MySQL would choke on a query like you posted as
> well.

If film_id is a primary key for film, then it's actually legal per SQL99
(though not in earlier SQL specs) to just GROUP BY film_id and then
reference the other columns of film without explicit grouping, because
clearly there can be only one value of them per film_id value.  However
the quoted query includes ungrouped references to other tables as well,
and it's not immediately obvious that those references must have unique
values for any one value of film_id.

It's possible that MySQL is taking the trouble to validate that this
query is legal per SQL99 rules, but I'd find it quite surprising given
their project philosophy --- fine points like whether a query has a
single right answer tend not to matter to them.

Postgres currently implements only the older SQL92 rules, under which
you gotta explicitly GROUP BY all the columns you want to reference
outside aggregate functions.  We'll probably implement some parts of the
looser SQL99 rules in the future, but that's where it stands today.
        regards, tom lane

Re: help with pagila

From

Tomas Vondra

Date:

01 September 2006, 21:57:15

> So I can assume that the MySQL implementation is strange? (It accepts
> that kind of query)

Yes, MySQL behaves strangely in this case (as well as in several other
cases). I wouldn't rely on this as it probably can choose different
values each time (although as far as I remember I haven't seen this).

t.v.

Re: help with pagila

From

"Walter Cruz"

Date:

01 September 2006, 22:04:07

Thank you all.

I found myself with the same trouble in last week, when I tried to port mambo CMS to PostgreSQL.

After some work, In fall i a query like the one that was quoted by Andrew and decided to stop.

In that time, my guess wae that something was wrong with MySQL.

(I don't know.. I have seen queries like that on a lot of mysql free projects. Maybe we can put that on a FAQ?)

[]'s
- Walter

On 9/1/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Sullivan <ajs@crankycanuck.ca> writes:
> On Fri, Sep 01, 2006 at 02:26:39PM -0300, Walter Cruz wrote:
>> So I can assume that the MySQL implementation is strange? (It accepts that
>> kind of query)

> In my experience, it is almost never safe to assume that the MySQL
> approach to SQL bears anything but a passing resemblance to SQL
> proper.  This is considerably better under recent releases, however,
> and I think you'd find, if you used the strict mode in the most
> recent release, that MySQL would choke on a query like you posted as
> well.

If film_id is a primary key for film, then it's actually legal per SQL99
(though not in earlier SQL specs) to just GROUP BY film_id and then
reference the other columns of film without explicit grouping, because
clearly there can be only one value of them per film_id value.  However
the quoted query includes ungrouped references to other tables as well,
and it's not immediately obvious that those references must have unique
values for any one value of film_id.

It's possible that MySQL is taking the trouble to validate that this
query is legal per SQL99 rules, but I'd find it quite surprising given
their project philosophy --- fine points like whether a query has a
single right answer tend not to matter to them.

Postgres currently implements only the older SQL92 rules, under which
you gotta explicitly GROUP BY all the columns you want to reference
outside aggregate functions.  We'll probably implement some parts of the
looser SQL99 rules in the future, but that's where it stands today.

                        regards, tom lane

Re: help with pagila

From

Stephan Szabo

Date:

01 September 2006, 22:24:13

On Fri, 1 Sep 2006, Walter Cruz wrote:

> Hi all. I'm with a little doubt.
>
> I'm testing the pagila (the postgres port of mysql sakila sample).
>
> Well, I was trying to translate the query:
>
> select
>      film.film_id AS FID,
>      film.title AS title,
>      film.description AS description,
>      category.name AS category,
>      film.rental_rate AS price,
>      film.length AS length,
>      film.rating AS rating,
>      group_concat(concat(actor.first_name,_utf8' ',actor.last_name)
> separator ',') AS actors
> from
>      category
>      inner join film on(category.category_id = film.category_id)
>      inner join film_actor on(film.film_id = film_actor.film_id)
>      inner join actor on(film_actor.actor_id = actor.actor_id)
> group by
>     film.film_id;

Assuming that film_id is the primary key on film and category_id is the
primary key on category, I think you'd be allowed to have the other column
references in SQL03 (and 99?) but not in SQL92 (which is the version that
PostgreSQL currently implements).

IIRC, the later specs allow you to not mention columns in group by that
are functionally dependant on other columns that are mentioned.

Re: help with pagila

From

Markus Schaber

Date:

04 September 2006, 08:10:17

Hi, Tom,

Tom Lane wrote:

> If film_id is a primary key for film, then it's actually legal per SQL99
> (though not in earlier SQL specs) to just GROUP BY film_id and then
> reference the other columns of film without explicit grouping, because
> clearly there can be only one value of them per film_id value.  However
> the quoted query includes ungrouped references to other tables as well,
> and it's not immediately obvious that those references must have unique
> values for any one value of film_id.

In situations like this, I often missed (and sometimes implemented) a
simple "first()" aggregate to put around those other columns.

Some of those cases could be fixed by creative use of "DISTINCT ON", but
sometimes lead to suboptimal query plans (due to the useless sorting).

Markus

-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org