Thread: help with pagila
Hi all. I'm with a little doubt.<br /><br />I'm testing the pagila (the postgres port of mysql sakila sample).<br /><br />Well,I was trying to translate the query:<br /><br /><pre><code>select<br /> film.film_id AS FID,<br /> film.titleAStitle,<br /> film.description AS description,<br /> <a href="http://category.name">category.name</a>AS category,<br /> film.rental_rate AS price,<br /> film.length AS length,<br/> film.rating AS rating,<br /> group_concat(concat(actor.first_name,_utf8' ',actor.last_name) separator',') AS actors <br />from<br /> category<br /> inner join film on(category.category_id = film.category_id)<br/> inner join film_actor on(film.film_id= film_actor.film_id)<br /> inner join actor on(film_actor.actor_id= actor.actor_id)<br />group by<br /> film.film_id;</code></pre><br />That cant be find here: <ahref="http://www.stardata.it/articoli_en/mysql_sample_db_articoli_en.html"> http://www.stardata.it/articoli_en/mysql_sample_db_articoli_en.html</a><br/><br />I read a comment by David Fetter on a blogabout group concat. I'm following his example:<br /><br />select<br /> film.title AS title, <br /> array_to_string(array_accum(actor.first_name || ' ' || actor.last_name),',') AS actors <br />from<br /> film<br /> inner join film_actor on film.film_id = film_actor.film_id<br /> inner join actor on film_actor.actor_id = actor.actor_id<br /> GROUP BY film.title<br /> <br /><br /><br />But, when I add another column on select, like, film_description,I get the following error:<br /><br />"ERROR: column "film.description" must appear in the GROUP BY clauseor be used in an aggregate function" <br /><br />If I put that column on GROUP BY everything works ok. But I want understantwhy do I need to do that. Can someone teach me, please?<br /><br />[]'s<br />- Walter<br />
On Fri, Sep 01, 2006 at 10:31:48AM -0300, Walter Cruz wrote: > "ERROR: column "film.description" must appear in the GROUP BY clause or be > used in an aggregate function" > > If I put that column on GROUP BY everything works ok. But I want understant > why do I need to do that. Can someone teach me, please? You need to because everything else is being grouped or aggregated. Otherwise, you should get one row for every match of film.description, and that's not what you want. (More precisely and yet still completely imprecise, that's not even something you can have, because of the way sets work.) A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
So I can assume that the MySQL implementation is strange? (It accepts that kind of query)
[]'s
- Walter
[]'s
- Walter
On 9/1/06, Andrew Sullivan < ajs@crankycanuck.ca> wrote:
On Fri, Sep 01, 2006 at 10:31:48AM -0300, Walter Cruz wrote:
> "ERROR: column "film.description" must appear in the GROUP BY clause or be
> used in an aggregate function"
>
> If I put that column on GROUP BY everything works ok. But I want understant
> why do I need to do that. Can someone teach me, please?
You need to because everything else is being grouped or aggregated.
Otherwise, you should get one row for every match of
film.description, and that's not what you want. (More precisely and
yet still completely imprecise, that's not even something you can
have, because of the way sets work.)
A
--
Andrew Sullivan | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
--Brad Holland
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Fri, Sep 01, 2006 at 02:26:39PM -0300, Walter Cruz wrote: > So I can assume that the MySQL implementation is strange? (It accepts that > kind of query) In my experience, it is almost never safe to assume that the MySQL approach to SQL bears anything but a passing resemblance to SQL proper. This is considerably better under recent releases, however, and I think you'd find, if you used the strict mode in the most recent release, that MySQL would choke on a query like you posted as well. A -- Andrew Sullivan | ajs@crankycanuck.ca The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun
> But, when I add another column on select, like, film_description, I get > the following error: > > "ERROR: column "film.description" must appear in the GROUP BY clause or > be used in an aggregate function" > > If I put that column on GROUP BY everything works ok. But I want > understant why do I need to do that. Can someone teach me, please? The reason is pretty simple - GROUPing actually means "sorting into boxes by values in the columns after the GROUP BY keyword" (and then applying some aggregate functions to these boxes, as for example COUNT, AVG etc.) Besides these aggregates, you can select a column that 'constant' for each of the boxes, that is all the rows in that box have the same value in this column. That's the case of the first SQL query you've posted - you're grouping by 'film_id', thus all the rows in a box have the same value in this column. And thanks to this you can select the value in the SELECT. But in the second query, you'd like to select another column (directly, not through an aggregate function) - title. But there could be different values for each row in the box (PostgreSQL doesn't know that the ID uniquely identifies the title). For example imagine you would group by 'published_year' instead of the 'film_id' - in this case there would be many different movies in the same box, in which case it's impossible to select 'title' for all of them. There are two ways to solve this: 1) add the 'title' to the GROUP BY clause, thus all the rows in a box have the same value of 'title' (and postgresql knowsabout that) 2) use a subselect film_id, (SELECT title FROM film WHERE film_id = film_outer.film_id) AS title FROM film AS film_outer ... GROUP BYfilm_id; Tomas
On Fri, 2006-09-01 at 12:26, Walter Cruz wrote: > So I can assume that the MySQL implementation is strange? (It accepts > that kind of query) Yes, according to the SQL spec, you should generally get an error when you run a query like this: select field1, field2 from table group by field1 since you could theoretically get a different value for field2 each time you run the query. If the data looked like this: field1 | field2 -------+--------1 | 11 | 32 | 52 | 3 The possible answers to that query would be (1,1)(2,5), (1,1)(2,3), (1,3)(2,5), (1,3)(2,3)
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Fri, Sep 01, 2006 at 02:26:39PM -0300, Walter Cruz wrote: >> So I can assume that the MySQL implementation is strange? (It accepts that >> kind of query) > In my experience, it is almost never safe to assume that the MySQL > approach to SQL bears anything but a passing resemblance to SQL > proper. This is considerably better under recent releases, however, > and I think you'd find, if you used the strict mode in the most > recent release, that MySQL would choke on a query like you posted as > well. If film_id is a primary key for film, then it's actually legal per SQL99 (though not in earlier SQL specs) to just GROUP BY film_id and then reference the other columns of film without explicit grouping, because clearly there can be only one value of them per film_id value. However the quoted query includes ungrouped references to other tables as well, and it's not immediately obvious that those references must have unique values for any one value of film_id. It's possible that MySQL is taking the trouble to validate that this query is legal per SQL99 rules, but I'd find it quite surprising given their project philosophy --- fine points like whether a query has a single right answer tend not to matter to them. Postgres currently implements only the older SQL92 rules, under which you gotta explicitly GROUP BY all the columns you want to reference outside aggregate functions. We'll probably implement some parts of the looser SQL99 rules in the future, but that's where it stands today. regards, tom lane
> So I can assume that the MySQL implementation is strange? (It accepts > that kind of query) Yes, MySQL behaves strangely in this case (as well as in several other cases). I wouldn't rely on this as it probably can choose different values each time (although as far as I remember I haven't seen this). t.v.
Thank you all.
I found myself with the same trouble in last week, when I tried to port mambo CMS to PostgreSQL.
After some work, In fall i a query like the one that was quoted by Andrew and decided to stop.
In that time, my guess wae that something was wrong with MySQL.
(I don't know.. I have seen queries like that on a lot of mysql free projects. Maybe we can put that on a FAQ?)
[]'s
- Walter
I found myself with the same trouble in last week, when I tried to port mambo CMS to PostgreSQL.
After some work, In fall i a query like the one that was quoted by Andrew and decided to stop.
In that time, my guess wae that something was wrong with MySQL.
(I don't know.. I have seen queries like that on a lot of mysql free projects. Maybe we can put that on a FAQ?)
[]'s
- Walter
On 9/1/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Sullivan <ajs@crankycanuck.ca> writes:
> On Fri, Sep 01, 2006 at 02:26:39PM -0300, Walter Cruz wrote:
>> So I can assume that the MySQL implementation is strange? (It accepts that
>> kind of query)
> In my experience, it is almost never safe to assume that the MySQL
> approach to SQL bears anything but a passing resemblance to SQL
> proper. This is considerably better under recent releases, however,
> and I think you'd find, if you used the strict mode in the most
> recent release, that MySQL would choke on a query like you posted as
> well.
If film_id is a primary key for film, then it's actually legal per SQL99
(though not in earlier SQL specs) to just GROUP BY film_id and then
reference the other columns of film without explicit grouping, because
clearly there can be only one value of them per film_id value. However
the quoted query includes ungrouped references to other tables as well,
and it's not immediately obvious that those references must have unique
values for any one value of film_id.
It's possible that MySQL is taking the trouble to validate that this
query is legal per SQL99 rules, but I'd find it quite surprising given
their project philosophy --- fine points like whether a query has a
single right answer tend not to matter to them.
Postgres currently implements only the older SQL92 rules, under which
you gotta explicitly GROUP BY all the columns you want to reference
outside aggregate functions. We'll probably implement some parts of the
looser SQL99 rules in the future, but that's where it stands today.
regards, tom lane
On Fri, 1 Sep 2006, Walter Cruz wrote: > Hi all. I'm with a little doubt. > > I'm testing the pagila (the postgres port of mysql sakila sample). > > Well, I was trying to translate the query: > > select > film.film_id AS FID, > film.title AS title, > film.description AS description, > category.name AS category, > film.rental_rate AS price, > film.length AS length, > film.rating AS rating, > group_concat(concat(actor.first_name,_utf8' ',actor.last_name) > separator ',') AS actors > from > category > inner join film on(category.category_id = film.category_id) > inner join film_actor on(film.film_id = film_actor.film_id) > inner join actor on(film_actor.actor_id = actor.actor_id) > group by > film.film_id; Assuming that film_id is the primary key on film and category_id is the primary key on category, I think you'd be allowed to have the other column references in SQL03 (and 99?) but not in SQL92 (which is the version that PostgreSQL currently implements). IIRC, the later specs allow you to not mention columns in group by that are functionally dependant on other columns that are mentioned.
Hi, Tom, Tom Lane wrote: > If film_id is a primary key for film, then it's actually legal per SQL99 > (though not in earlier SQL specs) to just GROUP BY film_id and then > reference the other columns of film without explicit grouping, because > clearly there can be only one value of them per film_id value. However > the quoted query includes ungrouped references to other tables as well, > and it's not immediately obvious that those references must have unique > values for any one value of film_id. In situations like this, I often missed (and sometimes implemented) a simple "first()" aggregate to put around those other columns. Some of those cases could be fixed by creative use of "DISTINCT ON", but sometimes lead to suboptimal query plans (due to the useless sorting). Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org