Thread: Feedback about Drupal SQL debugging
Dear friends, I have been using PostgreSQL since 6.3 releases and I am a real fan. Of course, I never use nor trust MySQL to deliver data. Now I use Drupal 6.3 with PostgreSQL 8.4. I loose a lot of time correcting Drupal SQL. You may be interested in my developer feedback. I gathered some real examples here: Guidelines for writing MySQL and PostgreSQL compliant SQL => http://drupal.org/node/555514 This page gathers most frequent problems that Drupal users and developers encounter when using PostgreSQL. I would be delighted to have your feedback. Could some issues reasonably be fixed for a better Drupal support? Kind regards, Jean-Michel
> This page gathers most frequent problems that Drupal users and > developers encounter when using PostgreSQL. > > I would be delighted to have your feedback. > Could some issues reasonably be fixed for a better Drupal support? Well I doubt we would do anything to copy MySQL. However Drupal has already made strides to solve this in 7.x. The abstraction layer is much smarter. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On Fri, 21 Aug 2009 18:22:41 +0200 Jean-Michel Pouré <jm@poure.com> wrote: > I gathered some real examples here: Guidelines for writing MySQL and > PostgreSQL compliant SQL => http://drupal.org/node/555514 > > This page gathers most frequent problems that Drupal users and > developers encounter when using PostgreSQL. > > I would be delighted to have your feedback. It looks to me like you could just reference SQL99 rather than mentioning PostgreSQL other than as an example of a standards compliant database engine. How would those constructs work in MS-SQL or Oracle? -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
Jean-Michel Pour� wrote: -- Start of PGP signed section. > Dear friends, > > I have been using PostgreSQL since 6.3 releases and I am a real fan. > Of course, I never use nor trust MySQL to deliver data. > > Now I use Drupal 6.3 with PostgreSQL 8.4. > > I loose a lot of time correcting Drupal SQL. > You may be interested in my developer feedback. > > I gathered some real examples here: Guidelines for writing MySQL and > PostgreSQL compliant SQL => http://drupal.org/node/555514 > > This page gathers most frequent problems that Drupal users and > developers encounter when using PostgreSQL. > > I would be delighted to have your feedback. > Could some issues reasonably be fixed for a better Drupal support? I doubt we are going to change Postgres to improve Drupal support --- it would be better to fix Drupal. However, I have a few suggestions: For this item, http://drupal.org/node/555580, use || for concatentation. I can't believe MySQL doesn't support multi-column indexes, http://drupal.org/node/555558. For this item, I think you want DELETE FROM history USING ..., http://drupal.org/node/555562. The SQL standard doesn't support multiple deletes, so odds are we will not either, http://drupal.org/node/555648. I show multi-value INSERT was added in PG 8.2, not 8.4, * Add support for multiple-row VALUES clauses, per SQL standard (Joe, Tom), http://drupal.org/node/555568. I am confused because I thought Drupal worked with Postgres, but looking at your list, it seems it doesn't. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, 2009-08-21 at 12:50 -0400, Bruce Momjian wrote: > I show multi-value INSERT was added in PG 8.2, not 8.4, * Add > support for multiple-row VALUES clauses, per SQL standard (Joe, Tom), > http://drupal.org/node/555568. > > I am confused because I thought Drupal worked with Postgres, but looking > at your list, it seems it doesn't. Drupal itself works perfectly on Postgres. The problem is module authors that are MySQL developers who never test their module on Postgres. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
Joshua D. Drake wrote: >> This page gathers most frequent problems that Drupal users and >> developers encounter when using PostgreSQL. >> >> I would be delighted to have your feedback. >> Could some issues reasonably be fixed for a better Drupal support? >> > > Well I doubt we would do anything to copy MySQL. However Drupal has > already made strides to solve this in 7.x. The abstraction layer is much > smarter. > > > Joshua, Since you haven't shown us what page this refers to, I at least am totally in the dark about what is being discussed. cheers andrew
2009/8/21 Andrew Dunstan <andrew@dunslane.net>: > > Since you haven't shown us what page this refers to, I at least am totally > in the dark about what is being discussed. It was in the original post http://drupal.org/node/555514 -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > 2009/8/21 Andrew Dunstan <andrew@dunslane.net>: > >> Since you haven't shown us what page this refers to, I at least am totally >> in the dark about what is being discussed. >> > > It was in the original post > > http://drupal.org/node/555514 > > Darn. Our mail system sucks badly. For some insane reason some postgresbut not all emails to me get badly delayed. cheers andrew
2009/8/21 Jean-Michel Pouré <jm@poure.com>: > Dear friends, > > I have been using PostgreSQL since 6.3 releases and I am a real fan. > Of course, I never use nor trust MySQL to deliver data. > > Now I use Drupal 6.3 with PostgreSQL 8.4. > > I loose a lot of time correcting Drupal SQL. > You may be interested in my developer feedback. > > I gathered some real examples here: Guidelines for writing MySQL and > PostgreSQL compliant SQL => http://drupal.org/node/555514 > > This page gathers most frequent problems that Drupal users and > developers encounter when using PostgreSQL. > > I would be delighted to have your feedback. > Could some issues reasonably be fixed for a better Drupal support? A lot of these issues seem to have easy workarounds, so I'm not sure what the big deal is. If you don't write standards-compliant SQL, you shouldn't be surprised when you find out that it's not portable. Most of those constructions wouldn't work on Microsoft SQL server either, and I bet at least some of them would fail under Oracle as well. For the int/varchar casting issue, you might try putting single quotes around the values. I would expect that to work in both databases, though I don't use MySQL. ...Robert
Jean-Michel, Thank you for doing this! I've registered for the Drupal site so that I can fix and/or expand some of your items. People who know Drupal better than me should add to them. If you want to discuss Drupal & PostgreSQL again, please post on the pgsql-advocacy list or the pgsql-php mailing lists. pgsql-hackers isn't the best place to get people to help you. BTW, why don't we have a multi-argument version of CONCAT()? In 8.4, it would be possible ... I should add it to mysqlcompat library. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > BTW, why don't we have a multi-argument version of CONCAT()? Why wouldn't people use the SQL-standard || operator instead? regards, tom lane
> I've registered for the Drupal site so that I can fix and/or expand > some > of your items. Thanks. I corrected the index on dual fields page. > If you want to discuss Drupal & PostgreSQL again, please post on the > pgsql-advocacy list or the pgsql-php mailing lists. pgsql-hackers > isn't > the best place to get people to help you. I would prefer no, please. This post is made to understand what needs to be done at PostgreSQL level for better Drupal supports. As written previously, Drupal developers write MySQL code. Some of this code is not portable, okay. > BTW, why don't we have a multi-argument version of CONCAT()? In 8.4, > it > would be possible ... I should add it to mysqlcompat library. yes. In PostgreSQL core ... PostgreSQL requires all non-aggregated fields to be present in the GROUP BY clause (I fixed 10 such issues in Drupal code). http://drupal.org/node/555530 Why can't PostgreSQL add the required field automatically? Could this be added to PostgreSQL to-do-list? Kind regards, Jean-Michel
On Fri, Aug 21, 2009 at 04:19:43PM -0400, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > BTW, why don't we have a multi-argument version of CONCAT()? > > Why wouldn't people use the SQL-standard || operator instead? Because by default, MySQL uses that as, get this, "logical OR." Cheers, David (grateful he's with a project that doesn't just gratuitously go around breaking stuff) -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Fri, Aug 21, 2009 at 04:19:43PM -0400, Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>> BTW, why don't we have a multi-argument version of CONCAT()? >> >> Why wouldn't people use the SQL-standard || operator instead? > Because by default, MySQL uses that as, get this, "logical OR." Egad. Well, I think that's something for the mysqlcompat project not core ... regards, tom lane
On 8/21/09 3:17 PM, Tom Lane wrote: > David Fetter <david@fetter.org> writes: >> On Fri, Aug 21, 2009 at 04:19:43PM -0400, Tom Lane wrote: >>> Josh Berkus <josh@agliodbs.com> writes: >>>> BTW, why don't we have a multi-argument version of CONCAT()? >>> Why wouldn't people use the SQL-standard || operator instead? > >> Because by default, MySQL uses that as, get this, "logical OR." > > Egad. Well, I think that's something for the mysqlcompat project > not core ... Yeah, I'll write one. I'll also blog it as an example of the new variable argument functions. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Jean-Michel Pouré wrote: > >> BTW, why don't we have a multi-argument version of CONCAT()? In 8.4, >> it >> would be possible ... I should add it to mysqlcompat library. >> > > yes. In PostgreSQL core ... > No. That is exactly where it shouldn't go. And frankly, Drupal developers should stop writing non-portable code. Isn't there a Mysql mode that is supposed to conform to the standard, at least more than their default mode? > PostgreSQL requires all non-aggregated fields to be present in the GROUP > BY clause (I fixed 10 such issues in Drupal code). > http://drupal.org/node/555530 > > Why can't PostgreSQL add the required field automatically? Could this be > added to PostgreSQL to-do-list? > > > Isn't that contrary to the standard? cheers andrew
2009/8/21 Jean-Michel Pouré <jm@poure.com>: > PostgreSQL requires all non-aggregated fields to be present in the GROUP > BY clause (I fixed 10 such issues in Drupal code). > http://drupal.org/node/555530 > > Why can't PostgreSQL add the required field automatically? Could this be > added to PostgreSQL to-do-list? This is a more complex (and more interesting) topic than what your blog discusses. Firstly understand what MySQL is *actually* doing: select a,b,c from tab group by a Only sorts and groups by "a" as instructed. The b columns and c columns are not included in the grouping. So if you have data like: a,b,c 1,1,1 1,2,2 2,1,1 2,2,2 You'll get two groups because there are only two values of "a". One group will have a=1 and one group will have a=2. Which value you get for b and c will be completely arbitrary and unpredictable. If Postgres added b,c to the GROUP BY it would produce four groups, because there four different values of <a,b,c>. You *can* get something similar to MySQL's behaviour using DISTINCT ON: select distinct on (a) a,b,c from a ORDER BY a,b,c But Postgres insists you have an ORDER BY which has to agree with the DISTINCT ON columns and provide some extra column(s) to determine which values of b,c are chosen. If Postgres changed on this front it would be to support the SQL Standard concept of "functional dependency". In cases where some columns are guaranteed to be unique you can leave them out of the GROUP BY but still use them in the select list. This isn't MySQL's behaviour of just allowing you to leave them out and hope that it doesn't matter which row's values are used. The database has to actually determine that it really doesn't matter. Typically that would be because you've grouped by a set of columns which form the key of a unique constraint, in which case every other column from that table would also necessarily be the same since they would all come from the same row of that table. -- greg http://mit.edu/~gsstark/resume.pdf
> This page gathers most frequent problems that Drupal users and > developers encounter when using PostgreSQL. > > I would be delighted to have your feedback. > Could some issues reasonably be fixed for a better Drupal support? Well I doubt we would do anything to copy MySQL. However Drupal has already made strides to solve this in 7.x. The abstraction layer is much smarter. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On Fri, 2009-08-21 at 12:50 -0400, Bruce Momjian wrote: > I show multi-value INSERT was added in PG 8.2, not 8.4, * Add > support for multiple-row VALUES clauses, per SQL standard (Joe, Tom), > http://drupal.org/node/555568. > > I am confused because I thought Drupal worked with Postgres, but looking > at your list, it seems it doesn't. Drupal itself works perfectly on Postgres. The problem is module authors that are MySQL developers who never test their module on Postgres. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
* Greg Stark (gsstark@mit.edu) wrote: > select distinct on (a) a,b,c from a ORDER BY a,b,c > > But Postgres insists you have an ORDER BY which has to agree with the > DISTINCT ON columns and provide some extra column(s) to determine > which values of b,c are chosen. Not quite technically correct. You have to have an ORDER BY which includes the columns inside the DISTINCT ON, but not any more than that. At that point, the values you get for the other columns are arbitrary. PG does *allow* you to provide other columns in the ORDER BY, so you can specify which values from those other columns should be used. I'm not advocating that we force another column to be used, nor do I think you are, but I have to admit that I don't think I've ever used it w/o other columns in the ORDER BY. > If Postgres changed on this front it would be to support the SQL > Standard concept of "functional dependency". In cases where some > columns are guaranteed to be unique you can leave them out of the > GROUP BY but still use them in the select list. This isn't MySQL's > behaviour of just allowing you to leave them out and hope that it > doesn't matter which row's values are used. The database has to > actually determine that it really doesn't matter. Typically that would > be because you've grouped by a set of columns which form the key of a > unique constraint, in which case every other column from that table > would also necessarily be the same since they would all come from the > same row of that table. Hrmm. That sounds kinda neat, but you'd still have to specify one of the columns in the GROUP BY, I presume? Or could you just say 'GROUP BY' without any columns, and have it GROUP BY the key of the table you're using? Thanks, Stephen
2009/8/22 Stephen Frost <sfrost@snowman.net>: > Hrmm. That sounds kinda neat, but you'd still have to specify one of > the columns in the GROUP BY, I presume? Or could you just say 'GROUP > BY' without any columns, and have it GROUP BY the key of the table > you're using? You would have to specify the key. I think typically you would have something like: SELECT a.*, sum(b.col) FROM a,bGROUP BY a.pk Since you have the primary key of a in your group by column you're allowed to use any columns from a in your select list even if they're not listed in the group by clause. The database knows that it can use those values from any output row of the group since they'll all come from the same orginal row of a. Or possibly it could use some plan that doesn't involve multiplying that data in the first place. -- greg http://mit.edu/~gsstark/resume.pdf
* Greg Stark (gsstark@mit.edu) wrote: > You would have to specify the key. I think typically you would have > something like: > > SELECT a.*, sum(b.col) > FROM a,b > GROUP BY a.pk Ahhh, ok, this makes more sense. This is SQL standard? Do we have a TODO for it? > The database knows that it can use those values from any output row of > the group since they'll all come from the same orginal row of a. Or > possibly it could use some plan that doesn't involve multiplying that > data in the first place. Right. It strikes me as a relativly small amount of work to get the initial "just add the columns to the group by" logic implemented. I'd start from exactly where that ERROR comes from, to minimize any performance hit from having to go figure out if the columns in the GROUP BY comprise a key. Doing something different in the planner based on that could come later, if necessary. I havn't looked at any code yet, but those who are familiar with these areas- any gotchas you can think of off-hand to make this more difficult than I'm hoping it is? Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Right. It strikes me as a relativly small amount of work to get the > initial "just add the columns to the group by" logic implemented. Well, no, you *aren't* adding the columns to the GROUP BY. You're just not throwing the error. You really don't want to add redundant columns to GROUP BY because it makes more work for the planner and executor (unless the planner can figure out they're redundant, which in itself takes work). This is a bit trickier than it looks because it makes the validity of a query dependent on the existence of an appropriate uniqueness constraint; thus for example DROP CONSTRAINT might invalidate a stored rule or view. See prior discussions. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Right. It strikes me as a relativly small amount of work to get the > > initial "just add the columns to the group by" logic implemented. > > Well, no, you *aren't* adding the columns to the GROUP BY. You're just > not throwing the error. You really don't want to add redundant columns > to GROUP BY because it makes more work for the planner and executor > (unless the planner can figure out they're redundant, which in itself > takes work). Hmm, right. Possibly also add some bit of info to pass to something down the line, if necessary. > This is a bit trickier than it looks because it makes the validity of a > query dependent on the existence of an appropriate uniqueness > constraint; thus for example DROP CONSTRAINT might invalidate a stored > rule or view. See prior discussions. Ah, yes. Couldn't the dependency system be used to handle this though? If you try to drop that constraint it'll complain unless you use cascade which would drop the view? I'll try and find older discussions. Sadly, I don't see it on the TODO. Perhaps I can add it. Thanks, Stephen
2009/8/22 Stephen Frost <sfrost@snowman.net>: > * Greg Stark (gsstark@mit.edu) wrote: >> You would have to specify the key. I think typically you would have >> something like: >> >> SELECT a.*, sum(b.col) >> FROM a,b >> GROUP BY a.pk > > Ahhh, ok, this makes more sense. This is SQL standard? Incidentally it makes even more sense that MySQL would do what they do when you remember that they didn't have subqueries until recently. So MySQL programmers had all become accustomed to the circumlocutions like: SELECT a.* FROM a left join b USING (a.b_id = b.id)WHERE b.id IS NULLGROUP BY a.id to express the much simpler select * from a where b_id in (select id from b) So not many uses of it in MySQL actually *would* be valid if we implemented the shortcut. But MySQL doesn't enforce that so it serves that purpose as well as what we get out of DISTINCT ON. -- greg http://mit.edu/~gsstark/resume.pdf
* Greg Stark (gsstark@mit.edu) wrote: > So not many uses of it in MySQL actually *would* be valid if we > implemented the shortcut. But MySQL doesn't enforce that so it serves > that purpose as well as what we get out of DISTINCT ON. That's probably a good thing- if they're valid then we'd probably return something different which would be a suprise. I'm not really looking at this from the "help MySQL apps" point of view.. It just strikes me as something nice to have. Thanks, Stephen
2009/8/22 Stephen Frost <sfrost@snowman.net>: >> This is a bit trickier than it looks because it makes the validity of a >> query dependent on the existence of an appropriate uniqueness >> constraint; thus for example DROP CONSTRAINT might invalidate a stored >> rule or view. See prior discussions. > > Ah, yes. Couldn't the dependency system be used to handle this though? > If you try to drop that constraint it'll complain unless you use cascade > which would drop the view? I'll try and find older discussions. Sadly, > I don't see it on the TODO. Perhaps I can add it. All this wasn't possible before 8.3 so there are a whole slew of optimizations that have been kind of waiting in the wings until we got that infrastructure. The first step is probably to do the opposite of what we're talking about here: cases where people *have* added extra columns to the GROUP BY key so they can use those columns in their select list. We can remove those columns from the sort or hash comparison key if there's a column (or columns) which is a unique constraint key for the same source. Similarly we can remove columns from an ORDER BY if the order key has earlier columns which are already a unique key for the same source. That would be fairly simple and it would provide a good test case for the dependency tracking stuff and plan invalidation triggered by constraint ddl. It wouldn't be a massive performance change but it would help some cases where the sort node is comparing a lot of redundant keys. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > The first step is probably to do the opposite of what we're talking > about here: cases where people *have* added extra columns to the GROUP > BY key so they can use those columns in their select list. We can > remove those columns from the sort or hash comparison key if there's a > column (or columns) which is a unique constraint key for the same > source. Similarly we can remove columns from an ORDER BY if the order > key has earlier columns which are already a unique key for the same > source. This is something we could only do at plan time --- if we do it at parse time we risk making a robust query into one that will break when somebody drops a constraint. So it's not really the inverse of the other case. regards, tom lane
On Friday 21 August 2009 04:01:36 pm Andrew Dunstan wrote: > Jean-Michel Pouré wrote: > >> BTW, why don't we have a multi-argument version of CONCAT()? In 8.4, > >> it > >> would be possible ... I should add it to mysqlcompat library. > > > > yes. In PostgreSQL core ... > > No. That is exactly where it shouldn't go. And frankly, Drupal > developers should stop writing non-portable code. Isn't there a Mysql > mode that is supposed to conform to the standard, at least more than > their default mode? This is all solved with Drupal 7. We really shouldn't be burning time on this. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc : 503-667-4564 - http://www.commandprompt.com/ Since 1997, Consulting, Development, Support, Training
Greg Stark wrote: > If Postgres changed on this front it would be to support the SQL > Standard concept of "functional dependency". In cases where some > columns are guaranteed to be unique you can leave them out of the > GROUP BY but still use them in the select list. This isn't MySQL's > behaviour of just allowing you to leave them out and hope that it > doesn't matter which row's values are used. The database has to > actually determine that it really doesn't matter. Typically that would > be because you've grouped by a set of columns which form the key of a > unique constraint, in which case every other column from that table > would also necessarily be the same since they would all come from the > same row of that table. > > That would make much more sense. You can also get the effect of picking an arbitrary row now by use max(column) or min(column) in place of the straight column. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Jean-Michel Pour� wrote: >> Why can't PostgreSQL add the required field automatically? Could this be >> added to PostgreSQL to-do-list? > Isn't that contrary to the standard? As of SQL99 it's supposed to be legal if you're grouping by a primary key (or some other cases where the other columns can be proved functionally dependent on the grouping columns, but that's the most useful one). We haven't got round to implementing that, but I'm not sure that it would make the Drupal code work anyway. Are they actually writing to spec here, or just doing whatever mysql will let them? BTW, I was under the impression there already *was* a TODO entry about improving our standards compliance in this area. I can't find it in the list right now, though. regards, tom lane
On fre, 2009-08-21 at 20:07 -0400, Tom Lane wrote: > As of SQL99 it's supposed to be legal if you're grouping by a primary key > (or some other cases where the other columns can be proved functionally > dependent on the grouping columns, but that's the most useful one). > We haven't got round to implementing that, but I'm not sure that it > would make the Drupal code work anyway. Are they actually writing to > spec here, or just doing whatever mysql will let them? > > BTW, I was under the impression there already *was* a TODO entry about > improving our standards compliance in this area. I can't find it in > the list right now, though. I added "Add support for functional dependencies" just now, with a comment how this relates to GROUP BY.