Thread: 8.2 features?
What is the state of the following items that have been previously discussed? . MERGE (at least in PK case) . multiple values clauses for INSERT . recursive WITH queries Thanks andrew
Andrew Dunstan <andrew@dunslane.net> writes: > What is the state of the following items that have been previously > discussed? > . MERGE (at least in PK case) No submitted patch; no one working on it AFAIK; doesn't look like something that could get done in the next three weeks. > . multiple values clauses for INSERT Also not done, but if we are willing to accept a limited patch (ie, not necessarily everything that SQL92 says you can do with VALUES, but at least the INSERT case) I think it could get done. I might even volunteer to do it ... but won't object if someone else volunteers to. > . recursive WITH queries I believe Jonah has given up on fixing the originally-submitted patch, but he mentioned at the code sprint that non-recursive WITH is potentially doable in time for 8.2. Not sure if that's a sufficiently important case to be worth doing. regards, tom lane
On 7/13/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > . recursive WITH queries > > I believe Jonah has given up on fixing the originally-submitted > patch, but he mentioned at the code sprint that non-recursive > WITH is potentially doable in time for 8.2. Not sure if that's > a sufficiently important case to be worth doing. A working WITH clause which will work in most usual use-cases will be submitted. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
Tom Lane wrote: >>. multiple values clauses for INSERT >> >> > >Also not done, but if we are willing to accept a limited patch >(ie, not necessarily everything that SQL92 says you can do with >VALUES, but at least the INSERT case) I think it could get done. >I might even volunteer to do it ... but won't object if someone >else volunteers to. > > > I would be very happy to see it accepted. cheers andrew
On 7/13/06, Andrew Dunstan <andrew@dunslane.net> wrote: > >>. multiple values clauses for INSERT > > I would be very happy to see it accepted. Same here. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
On Thu, Jul 13, 2006 at 05:09:32PM -0400, Jonah H. Harris wrote: > On 7/13/06, Andrew Dunstan <andrew@dunslane.net> wrote: > >>>. multiple values clauses for INSERT > > > >I would be very happy to see it accepted. > > Same here. <aol>Me, too!</aol> Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >>What is the state of the following items that have been previously >>discussed? >>. multiple values clauses for INSERT > > Also not done, but if we are willing to accept a limited patch > (ie, not necessarily everything that SQL92 says you can do with > VALUES, but at least the INSERT case) I think it could get done. > I might even volunteer to do it ... but won't object if someone > else volunteers to. I'm looking to contribute something useful for the 8.2 release, and it seems Bernd is going to finish up updateable views himself, so I'd be glad to take a look (at the limited case, that is). Any landmines I should watch out for? Joe
Andrew Dunstan wrote: > . MERGE (at least in PK case) I think that died after we figured out that it didn't do the sort of UPDATE-else-INSERT thing that people wanted out of it. > . multiple values clauses for INSERT Susanne Ebrecht <susanne.ebrecht@credativ.de> was last heard to work on it. Updates, Susanne?
* Peter Eisentraut (peter_e@gmx.net) wrote: > Andrew Dunstan wrote: > > . MERGE (at least in PK case) > > I think that died after we figured out that it didn't do the sort of > UPDATE-else-INSERT thing that people wanted out of it. I agree that it's probably not going to happen for 8.2 but I certainly have uses for the SQL spec's definition of MERGE (table-level instead of the individual-tuple case). I'd like to see the individual-tuple UPSERT/REPLACE issue handled as well but I don't believe MERGE lacking that necessairly means MERGE should be ignored.. Thanks, Stephen
On 7/13/06, Stephen Frost <sfrost@snowman.net> wrote: > I agree that it's probably not going to happen for 8.2 but I certainly > have uses for the SQL spec's definition of MERGE (table-level instead of > the individual-tuple case). I'd like to see the individual-tuple > UPSERT/REPLACE issue handled as well but I don't believe MERGE lacking > that necessairly means MERGE should be ignored.. Where does Jan stand on it, I know he was doing some thinking about how to accomplish it. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
--On Freitag, Juli 14, 2006 01:23:11 +0200 Peter Eisentraut <peter_e@gmx.net> wrote: >> . multiple values clauses for INSERT > > Susanne Ebrecht <susanne.ebrecht@credativ.de> was last heard to work on > it. Updates, Susanne? I've talked to Susanne today and she's just back from hospital and not available online until next week. She was working on the SET (col1, col2) = (val1, val2) syntax for UPDATE commands. Don't know what the status is on this, though. -- Thanks Bernd
Bernd Helmle wrote: > > > --On Freitag, Juli 14, 2006 01:23:11 +0200 Peter Eisentraut > <peter_e@gmx.net> wrote: > >>> . multiple values clauses for INSERT >> >> >> Susanne Ebrecht <susanne.ebrecht@credativ.de> was last heard to work on >> it. Updates, Susanne? > > > I've talked to Susanne today and she's just back from hospital and not > available > online until next week. > She was working on the SET (col1, col2) = (val1, val2) syntax for > UPDATE commands. > Don't know what the status is on this, though. > Not the same thing, surely. So maybe we should gratefully accept Joe Conway's offer to work on it. cheers andrew
Am Freitag, den 14.07.2006, 16:26 +0200 schrieb Bernd Helmle: > > --On Freitag, Juli 14, 2006 01:23:11 +0200 Peter Eisentraut > <peter_e@gmx.net> wrote: > > >> . multiple values clauses for INSERT > > > > Susanne Ebrecht <susanne.ebrecht@credativ.de> was last heard to work on > > it. Updates, Susanne? > > I've talked to Susanne today and she's just back from hospital and not > available > online until next week. > She was working on the SET (col1, col2) = (val1, val2) syntax for UPDATE > commands. > Don't know what the status is on this, though. > Thanks Peter and Bernd for your postings. I'am working on update table set (col1, col2, ...) = (val1, val2, ...), (colx, coly, ...) = (valx, valy, ...), ... I hope, it will be finished this week. Most of work is done. Susanne
Andrew Dunstan wrote: > Bernd Helmle wrote: >> >> --On Freitag, Juli 14, 2006 01:23:11 +0200 Peter Eisentraut >> <peter_e@gmx.net> wrote: >> >>>> . multiple values clauses for INSERT >>> >>> Susanne Ebrecht <susanne.ebrecht@credativ.de> was last heard to work on >>> it. Updates, Susanne? >> >> I've talked to Susanne today and she's just back from hospital and not >> available online until next week. >> She was working on the SET (col1, col2) = (val1, val2) syntax for >> UPDATE commands. >> Don't know what the status is on this, though. > > Not the same thing, surely. So maybe we should gratefully accept Joe > Conway's offer to work on it. I've played with this a bit now, and the grammar changes seem pretty straightforward, but the other half is kind of ugly. I can't see a good way to propagate multiple targetlists that isn't a big hack. The best way might be to fabricate a selectStmt equiv to "SELECT <targetlist> UNION ALL SELECT <targetlist>...", but that still feels like a hack. Have there been any past discussions on how this might be implemented (FWIW I couldn't find any in the archives)? Any better ideas for an approach? Thanks, Joe
Hello, I did some work on mutliple value insert. First: SELECT .. UNION ALL SELECT is wrong idea. VALUES can contain DEFAULT keyword. Second: It's neccessery general implementation of table values constructor like SQL:2003. Main problem what I see is biger request on sources if we implement MVI as classic PostgreSQL stmt. Regards Pavel Stehule _________________________________________________________________ Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. http://messenger.msn.cz/
>> If the use case is people running MySQL dumps, then there will be >> millions of values-targetlists in MySQL dumps. I did some experimentation just now, and could not get mysql to accept a command longer than about 1 million bytes. It complains about Got a packet bigger than 'max_allowed_packet' bytes which seems a bit odd because max_allowed_packet is allegedly set to 16 million, but anyway I don't think people are going to be loading any million-row tables using single INSERT commands in mysql either. regards, tom lane
from http://dev.mysql.com/doc/refman/4.1/en/blob.html You can change the message buffer size by changing the value of the max_allowed_packet variable, but you must do so for both the server and your client program. For example, both mysql and mysqldump allow you to change the client-side max_allowed_packet value. Tom Lane wrote: >>> If the use case is people running MySQL dumps, then there will be >>> millions of values-targetlists in MySQL dumps. >>> > > I did some experimentation just now, and could not get mysql to accept a > command longer than about 1 million bytes. It complains about > Got a packet bigger than 'max_allowed_packet' bytes > which seems a bit odd because max_allowed_packet is allegedly set to > 16 million, but anyway I don't think people are going to be loading any > million-row tables using single INSERT commands in mysql either. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > >
On Tue, Jul 18, 2006 at 02:19:01PM -0400 I heard the voice of Tom Lane, and lo! it spake thus: > > I did some experimentation just now, and could not get mysql to accept a > command longer than about 1 million bytes. It complains about > Got a packet bigger than 'max_allowed_packet' bytes > which seems a bit odd because max_allowed_packet is allegedly set to > 16 million, but anyway I don't think people are going to be loading any > million-row tables using single INSERT commands in mysql either. On the contrary, I've hit it several times by just trying to import [into another database] the output of a mysqldump I just did. Great design, that... -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream.
> I did some experimentation just now, and could not get mysql to accept a > command longer than about 1 million bytes. It complains about > Got a packet bigger than 'max_allowed_packet' bytes > which seems a bit odd because max_allowed_packet is allegedly set to > 16 million, but anyway I don't think people are going to be loading any > million-row tables using single INSERT commands in mysql either. Strange. Last time I checked I thought MySQL dump used 'multivalue lists in inserts' for dumps, for the same reason that we use COPY
> I did some experimentation just now, and could not get mysql to accept a > command longer than about 1 million bytes. It complains about > Got a packet bigger than 'max_allowed_packet' bytes > which seems a bit odd because max_allowed_packet is allegedly set to > 16 million, but anyway I don't think people are going to be loading any > million-row tables using single INSERT commands in mysql either. Ah no, I'm mistaken. It's not by default in mysqldump, but it does seem "recommended". This is from "man mysqldump": -e|--extended-insert Allows utilization of the new, much faster INSERT syntax.
Christopher Kings-Lynne <chris.kings-lynne@calorieking.com> writes: > Strange. Last time I checked I thought MySQL dump used 'multivalue > lists in inserts' for dumps, for the same reason that we use COPY I think Andrew identified the critical point upthread: they don't try to put an unlimited number of rows into one INSERT, only a megabyte or so's worth. Typical klugy-but-effective mysql design approach ... regards, tom lane
Joe Conway wrote: > Tom Lane wrote: > >> Christopher Kings-Lynne <chris.kings-lynne@calorieking.com> writes: >> >>> Strange. Last time I checked I thought MySQL dump used 'multivalue >>> lists in inserts' for dumps, for the same reason that we use COPY >> >> I think Andrew identified the critical point upthread: they don't try >> to put an unlimited number of rows into one INSERT, only a megabyte >> or so's worth. Typical klugy-but-effective mysql design approach ... > > OK, so given that we don't need to be able to do 1 million > multi-targetlist insert statements, here is rev 2 of the patch. I did some testing today against mysql and found that it will easily absorb insert statements with 1 million targetlists provided you set max_allowed_packet high enough for the server. It peaked out at about 600MB, compared to my test similar last night where it was using about 3.8 GB when I killed it. So the question is, do we care? If we do, I'll start looking for a new rev 3 strategy (ideas/pointers etc very welcome). If not, I'll start working on docs and regression test. Thanks, Joe
Joe Conway <mail@joeconway.com> writes: > I did some testing today against mysql and found that it will easily > absorb insert statements with 1 million targetlists provided you set > max_allowed_packet high enough for the server. It peaked out at about > 600MB, compared to my test similar last night where it was using about > 3.8 GB when I killed it. > So the question is, do we care? What's the performance like relative to mysql? It seems hard to believe that we can afford the overhead of a separate INSERT statement per row (duplicating all the work of parse analysis, rewrite, planning, executor start/stop) ... at least not without looking mighty bad. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > >>I did some testing today against mysql and found that it will easily >>absorb insert statements with 1 million targetlists provided you set >>max_allowed_packet high enough for the server. It peaked out at about >>600MB, compared to my test similar last night where it was using about >>3.8 GB when I killed it. > >>So the question is, do we care? > > What's the performance like relative to mysql? It seems hard to believe > that we can afford the overhead of a separate INSERT statement per row > (duplicating all the work of parse analysis, rewrite, planning, executor > start/stop) ... at least not without looking mighty bad. I don't have the exact numbers handy, but not too great. As I recall, with last night's patch we did 100K inserts in about 4 seconds, and today mysql did 100K in about 1 second. We never finished the 1 million insert test due to swapping (I killed it after quite a while), and mysql did 1 million in about 18 seconds (we did 300K in 13 seconds). The hardware was not identical between last night's test and today's on mysql, but very similar (similar CPUs and memory, although the machine I did the mysql tests on had scsi drives, while the pg test was done on sata). The difficulty is finding a way to avoid all that extra work without a very ugly special case kludge just for inserts. I've been banging my head on that on-and-off for a few days now, and every idea looks uglier than the last. One suggestion I got off list was to figure out a way to build a tuplestore and use it to feed the executor. That's starting to sound better and better to me. Any ideas or guidance would be greatly appreciated. Joe
Joe Conway <mail@joeconway.com> writes: > The difficulty is finding a way to avoid all that extra work without a > very ugly special case kludge just for inserts. [ thinks a bit ... ] It seems to me that the reason it's painful is exactly that INSERT ... VALUES is a kluge already. We've special-cased the situation where the INSERT's <query expression> is a <table value constructor> with exactly one row --- but actually a <table value constructor> with multiple rows ought to be allowed anywhere you can currently write "SELECT ...". So ideally fixing this would include eliminating the current artificial distinction between two types of INSERT command. I think the place we'd ultimately like to get to involves changing the executor's Result node type to have a list of targetlists and sequence through those lists to produce its results (cf Append --- perhaps while at it, divorce the "gating node" functionality into a different node type). That part seems clear, what's a bit less clear is what the ripple effect on the upstream parser/planner data structures should be. Should *all* occurrences of Query be changed to have a list-of-targetlists? Sounds ugly, and I don't understand what it would mean for any Query other than one representing a VALUES construct. [ thinks some more ... ] Maybe the right place to put the list-of-targetlists functionality is not in Query per se, but in a new type of jointree node. This would localize the impact as far as changing the datastructures go, but I've not thought hard enough about what the impact would actually be. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > >>The difficulty is finding a way to avoid all that extra work without a >>very ugly special case kludge just for inserts. > > [ thinks a bit ... ] > > It seems to me that the reason it's painful is exactly that INSERT > ... VALUES is a kluge already. We've special-cased the situation where > the INSERT's <query expression> is a <table value constructor> with > exactly one row --- but actually a <table value constructor> with > multiple rows ought to be allowed anywhere you can currently write > "SELECT ...". So ideally fixing this would include eliminating the > current artificial distinction between two types of INSERT command. > > I think the place we'd ultimately like to get to involves changing the > executor's Result node type to have a list of targetlists and sequence > through those lists to produce its results I was actually just looking at that and ended up thinking that it might be better to deal with it one level down in ExecProject (because it is already passing targetlists directly to ExecTargetList). > That part seems clear, what's a bit less clear is what the > ripple effect on the upstream parser/planner data structures should be. > Should *all* occurrences of Query be changed to have a > list-of-targetlists? Sounds ugly, and I don't understand what it would > mean for any Query other than one representing a VALUES construct. There are certainly many places to be looked at if Query.targetList becomes a list-of-targetlists (about 153 if I grep'd correctly). > > [ thinks some more ... ] > > Maybe the right place to put the list-of-targetlists functionality is > not in Query per se, but in a new type of jointree node. This would > localize the impact as far as changing the datastructures go, but I've > not thought hard enough about what the impact would actually be. OK. You've given me a good bit to think about -- thanks! Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> I think the place we'd ultimately like to get to involves changing the >> executor's Result node type to have a list of targetlists and sequence >> through those lists to produce its results > I was actually just looking at that and ended up thinking that it might > be better to deal with it one level down in ExecProject (because it is > already passing targetlists directly to ExecTargetList). I'd vote against that, because (a) ExecProject is used by all executor node types, and we shouldn't add overhead to all of them for the benefit of one; (b) ExecProject doesn't actually have any internal state at the moment. To keep track of which targetlist to evaluate next, it would not only need some internal state, it would have to be told the current "es_direction". This stuff fits much better at the exec node level --- again, I'd suggest looking at Append for a comparison. But really the executor part of this is not the hard part; what we need to think about first is what's the impact on the Query datastructure that the parser/rewriter/planner use. I'm still liking the idea of pushing multi-values into a jointree node type. Basically this would suggest representing "VALUES ..." as if it were "SELECT * FROM VALUES ..." (which I believe is actually legal syntax per spec) --- in the general case you'd need to have a Query node that has a trivial "col1, col2, col3, ..." targetlist and then the multiple values lists are in some kind of jointree entry. But possibly this could be short-circuited somehow, at least for INSERT. BTW, I noticed an interesting property of historical Postgres behavior: you can put a table reference into a VALUES targetlist. regression=# create table foo (like tenk1); CREATE TABLE regression=# insert into foo values (tenk1.*); ERROR: missing FROM-clause entry for table "tenk1" LINE 1: insert into foo values (tenk1.*); ^ regression=# set add_missing_from to 1; SET regression=# insert into foo values (tenk1.*); NOTICE: adding missing FROM-clause entry for table "tenk1" LINE 1: insert into foo values (tenk1.*); ^ INSERT 0 10000 regression=# So that last is really exactly equivalent to insert into foo select * from tenk1; I do not feel a need to support this sort of thing when there are multiple VALUES targetlists, but it'd be nice not to break it for the single-targetlist case. At least not till we're ready to disable add_missing_from entirely. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>I was actually just looking at that and ended up thinking that it might >>be better to deal with it one level down in ExecProject (because it is >>already passing targetlists directly to ExecTargetList). > > I'd vote against that, because (a) ExecProject is used by all executor > node types, and we shouldn't add overhead to all of them for the benefit > of one; (b) ExecProject doesn't actually have any internal state at the > moment. To keep track of which targetlist to evaluate next, it would > not only need some internal state, it would have to be told the current > "es_direction". This stuff fits much better at the exec node level --- > again, I'd suggest looking at Append for a comparison. OK. > But really the executor part of this is not the hard part; what we need > to think about first is what's the impact on the Query datastructure > that the parser/rewriter/planner use. After a quick look, I think changing Query.targetList is too big an impact, and probably unneeded given your suggestion below. One of the problems with the current code is that the targetList in the "VALUES..." case is being used for two purposes -- 1) to define the column types, and 2) to hold the actual data. By putting the data into a new node type, I think the targetList reverts to being just a list of datatypes as it is with INSERT ... SELECT ... > I'm still liking the idea of pushing multi-values into a jointree node > type. Basically this would suggest representing "VALUES ..." as if it > were "SELECT * FROM VALUES ..." (which I believe is actually legal > syntax per spec) --- in the general case you'd need to have a Query node > that has a trivial "col1, col2, col3, ..." targetlist and then the > multiple values lists are in some kind of jointree entry. But possibly > this could be short-circuited somehow, at least for INSERT. I'm liking this too. But when you say "jointree node", are you saying to model the new node type after NestLoop/MergeJoin/HashJoin nodes? These are referred to as "join nodes" in ExecInitNode. Or as you mentioned a couple of times, should this look more like an Append node? Thanks, Joe
Joe Conway <mail@joeconway.com> writes: > I'm liking this too. But when you say "jointree node", are you saying to > model the new node type after NestLoop/MergeJoin/HashJoin nodes? These > are referred to as "join nodes" in ExecInitNode. Or as you mentioned a > couple of times, should this look more like an Append node? No, I guess I confused you by talking about the executor representation at the same time. This is really unrelated to the executor. The join tree I'm thinking of here is the data structure that dangles off Query.jointree --- it's a representation of the query's FROM clause, and (at present) can contain RangeTblRef, FromExpr, and JoinExpr nodes. See the last hundred or so lines of primnodes.h for some details. The jointree is used by the planner to compute the plan node tree that the executor will run, but it's not the same thing. There are basically two ways you could go about this: 1. Make a new jointree leaf node type to represent a VALUES construct, and dangle the list of lists of expressions off that. 2. Make a new RangeTblEntry type to represent a VALUES construct, and just put a RangeTblRef to it into the jointree. Theexpressions dangle off the RangeTblEntry. Offhand I'm not certain which of these would be cleanest. The second way has some similarities to the way we handle set operation trees (UNION et al), so it might be worth looking at that stuff. However, being a RangeTblEntry has a lot of baggage (eg, various routines expect to find an RTE alias, column names, column types, etc) and maybe we don't need all that for VALUES. One advantage of the first way is that you could use the same node type for the "raw parser output" delivered by gram.y. This is a bit of a type cheat, because raw parser output is logically distinct from what parse analysis produces, but we do it in lots of other places too (JoinExpr for instance is used that way). You should in any case have a clear idea of the difference between the raw and analyzed parser representations --- for instance, the raw form won't contain any datatype info, whereas the analyzed form must. This might or might not need to be visible directly in the VALUES node --- it might be that you can rely on the datatype info embedded in the analyzed expressions. regards, tom lane
Tom Lane wrote: > No, I guess I confused you by talking about the executor representation > at the same time. This is really unrelated to the executor. The join > tree I'm thinking of here is the data structure that dangles off > Query.jointree --- it's a representation of the query's FROM clause, > and (at present) can contain RangeTblRef, FromExpr, and JoinExpr nodes. > See the last hundred or so lines of primnodes.h for some details. > The jointree is used by the planner to compute the plan node tree that > the executor will run, but it's not the same thing. Ah, that helps. Thanks for the explanation. I'll start digging in again... Thanks, Joe
On Thu, Jul 20, 2006 at 08:46:13PM -0400, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > > I'm liking this too. But when you say "jointree node", are you saying to > > model the new node type after NestLoop/MergeJoin/HashJoin nodes? These > > are referred to as "join nodes" in ExecInitNode. Or as you mentioned a > > couple of times, should this look more like an Append node? > > No, I guess I confused you by talking about the executor representation > at the same time. This is really unrelated to the executor. The join > tree I'm thinking of here is the data structure that dangles off > Query.jointree --- it's a representation of the query's FROM clause, > and (at present) can contain RangeTblRef, FromExpr, and JoinExpr nodes. > See the last hundred or so lines of primnodes.h for some details. > The jointree is used by the planner to compute the plan node tree that > the executor will run, but it's not the same thing. > > There are basically two ways you could go about this: > 1. Make a new jointree leaf node type to represent a VALUES construct, > and dangle the list of lists of expressions off that. > 2. Make a new RangeTblEntry type to represent a VALUES construct, and > just put a RangeTblRef to it into the jointree. The expressions > dangle off the RangeTblEntry. > > Offhand I'm not certain which of these would be cleanest. The second > way has some similarities to the way we handle set operation trees > (UNION et al), so it might be worth looking at that stuff. However, > being a RangeTblEntry has a lot of baggage (eg, various routines expect > to find an RTE alias, column names, column types, etc) and maybe we > don't need all that for VALUES. I misread that to include SRFs, but it got me thinking... another possibility would be to changes VALUES() so that it was treated as a function, and allow it to have an arbitrary number of parameters. That would automatically allow the case of SELECT * FROM VALUES(...). INSERT would need to learn how to accept SRFs, but that would have the nice side-effect of allowing INSERT INTO table set_returning_function(); Of course, adding the ability for functions to have an arbitrary argument list could well be more complex than any of the options discussed thusfar... though it would be a very handy feature to have. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Tom Lane wrote: > If I'm reading the spec correctly, VALUES is exactly parallel to SELECT > in the grammar, which means that to use it in FROM you would need > parentheses and an alias: > > SELECT ... FROM (SELECT ...) AS foo > > SELECT ... FROM (VALUES ...) AS foo One of the things I'm struggling with is lack of column aliases. Would it be reasonable to require something like this? SELECT ... FROM (VALUES ...) AS foo(col1, col2, ...) The other issue is how to determine column type. Even better would be to require (similar to SRF returning record): SELECT ... FROM (VALUES ...) AS foo(col1 type1, col2 type2, ...) This would unambiguously identify the column aliases and types. Assuming we stick with the spec:SELECT ... FROM (VALUES ...) AS foo 1. How should we assign column names? values1, values2, ...? or col1, col2, ...? or ??? 2. How should we assign datatypes? Use the first "row" and try to coerce the rest to that type? Joe
Re: Values list-of-targetlists patch for comments (was Re: [PATCHES] 8.2 features?)
From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes: > One of the things I'm struggling with is lack of column aliases. Would > it be reasonable to require something like this? > SELECT ... FROM (VALUES ...) AS foo(col1, col2, ...) Requiring column aliases is counter to spec ... > The other issue is how to determine column type. Even better would be to > require (similar to SRF returning record): > SELECT ... FROM (VALUES ...) AS foo(col1 type1, col2 type2, ...) ... and this is even further away from it. As for the names, just use "?column?", same as we do now in INSERT ... VALUES. Anyone who wants to refer to those columns explicitly will need to assign aliases, but if they don't assign aliases, we don't have to do anything very intelligent. As for the types, I believe that the spec pretty much dictates that we apply the same type resolution algorithm as for a UNION. This is fairly expensive and we should avoid it in the case of INSERT ... VALUES, but for VALUES appearing anywhere else I think we have little choice. regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>One of the things I'm struggling with is lack of column aliases. Would >>it be reasonable to require something like this? > Requiring column aliases is counter to spec ... >> SELECT ... FROM (VALUES ...) AS foo(col1 type1, col2 type2, ...) > ... and this is even further away from it. I figured as much, but thought I'd ask anyway :-). I did find something in the appendix to the spec after sending this: Annex C (informative) Implementation-dependent elements 18) Subclause 7.3, “<table value constructor>”: a) The column names of a <table value constructor> or a <contextually typed table value constructor> are implementation-dependent. > As for the names, just use "?column?", same as we do now in INSERT > ... VALUES. Anyone who wants to refer to those columns explicitly will > need to assign aliases, but if they don't assign aliases, we don't have > to do anything very intelligent. OK, I just thought "?column?" was ugly and useless. > As for the types, I believe that the spec pretty much dictates that we > apply the same type resolution algorithm as for a UNION. This is fairly > expensive and we should avoid it in the case of INSERT ... VALUES, but > for VALUES appearing anywhere else I think we have little choice. Where do I find that algorithm -- somewhere in nodeAppend.c? Thanks, Joe
Re: Values list-of-targetlists patch for comments (was Re: [PATCHES] 8.2 features?)
From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> As for the types, I believe that the spec pretty much dictates that we >> apply the same type resolution algorithm as for a UNION. > Where do I find that algorithm -- somewhere in nodeAppend.c? select_common_type(), in the parser. regards, tom lane
Re: Values list-of-targetlists patch for comments (was Re: [PATCHES] 8.2 features?)
From
Tom Lane
Date:
I've found a problem with the VALUES-as-RTE approach: regression=# create table src(f1 int, f2 int); CREATE TABLE regression=# create table log(f1 int, f2 int, tag text); CREATE TABLE regression=# insert into src values(1,2); INSERT 0 1 regression=# create rule r2 as on update to src do regression-# insert into log values(old.*, 'old'), (new.*, 'new'); CREATE RULE regression=# update src set f2 = f2 + 1; server closed the connection unexpectedly The problem with this is that the rewriter is substituting Vars referencing "src" into the values lists of the VALUES RTE, within a query that looks like a Cartesian product of src and *VALUES*: regression=# explain update src set f2 = f2 + 1; QUERY PLAN --------------------------------------------------------------------Nested Loop (cost=0.00..97.62 rows=3880 width=40) -> Values Scan on "*VALUES*" (cost=0.00..0.02 rows=2 width=40) -> Seq Scan on src (cost=0.00..29.40 rows=1940 width=0) Seq Scan on src (cost=0.00..34.25 rows=1940 width=14) (5 rows) The ValuesScan node doesn't have access to the values of the current row of src ... indeed, the planner doesn't know that it shouldn't put the VALUES on the outside of the join, as it's done here, so there *isn't* a current row of src. AFAICT, the only way to make this work would be to implement SQL99's LATERAL construct (or something pretty close to it --- I'm not entirely sure I understand what LATERAL is supposed to do) so that the rewritten query could be expressed like insert into log select ... from src, LATERAL VALUES(src.f1, ...) That's obviously not something we can get done for 8.2. We could maybe kluge something to work for 8.2 if we were willing to abandon the VALUES-as-RTE approach and go back to the notion of some kind of multiple targetlist in a Query. I'm disinclined to do that though, because as I've been working with your patch I've come to agree that the RTE solution is a pretty clean one. What I'm inclined to do for 8.2 is to disallow OLD/NEW references in multi-element VALUES clauses; the feature is still tremendously useful without that. regards, tom lane
Re: Values list-of-targetlists patch for comments (was Re: [PATCHES] 8.2 features?)
From
Alvaro Herrera
Date:
Tom Lane wrote: > I've found a problem with the VALUES-as-RTE approach: > > regression=# create table src(f1 int, f2 int); > CREATE TABLE > regression=# create table log(f1 int, f2 int, tag text); > CREATE TABLE > regression=# insert into src values(1,2); > INSERT 0 1 > regression=# create rule r2 as on update to src do > regression-# insert into log values(old.*, 'old'), (new.*, 'new'); > CREATE RULE > regression=# update src set f2 = f2 + 1; > server closed the connection unexpectedly Does it work if you do regression=# create rule r2 as on update to src do regression-# insert into log values(old.f1, old.f2, 'old'), (new.f1, new.f2, 'new'); ? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote: > What I'm inclined to do for 8.2 is to disallow OLD/NEW references in > multi-element VALUES clauses; the feature is still tremendously useful > without that. Given the timing, this sounds like a reasonable approach. I agree that the feature has lots of interesting uses -- I'd hate to see us lose that. Disallowing OLD/NEW references doesn't contradict the spec in any way AFAIK either. Joe
Re: Values list-of-targetlists patch for comments (was Re: [PATCHES] 8.2 features?)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > Does it work if you do > regression=# create rule r2 as on update to src do > regression-# insert into log values(old.f1, old.f2, 'old'), (new.f1, new.f2, 'new'); No, that's not the problem. "*" expansion works just fine here, it's the executor that can't deal with the resulting Vars. regards, tom lane
Re: Values list-of-targetlists patch for comments (was Re: [PATCHES] 8.2 features?)
From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> What I'm inclined to do for 8.2 is to disallow OLD/NEW references in >> multi-element VALUES clauses; the feature is still tremendously useful >> without that. > Given the timing, this sounds like a reasonable approach. I agree that > the feature has lots of interesting uses -- I'd hate to see us lose > that. Disallowing OLD/NEW references doesn't contradict the spec in any > way AFAIK either. I don't think rules are in the spec at all ;-) ... so no, that's not a problem. My example demonstrated a pretty likely use: create rule r2 as on update to src doinsert into log values(old.*, 'old'), (new.*, 'new'); but for the moment we can tell people to work around it the way they always have: create rule r2 as on update to src doinsert into log select old.*, 'old' union all new.*, 'new'; or just use two separate INSERT commands in the rule. We oughta fix it later, but I don't feel ashamed to have a restriction like this in the first cut. regards, tom lane
Should we wait for someone to actually ask for this before adding it to the TODO list? Does it cause a crash now? --------------------------------------------------------------------------- Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > > Tom Lane wrote: > >> What I'm inclined to do for 8.2 is to disallow OLD/NEW references in > >> multi-element VALUES clauses; the feature is still tremendously useful > >> without that. > > > Given the timing, this sounds like a reasonable approach. I agree that > > the feature has lots of interesting uses -- I'd hate to see us lose > > that. Disallowing OLD/NEW references doesn't contradict the spec in any > > way AFAIK either. > > I don't think rules are in the spec at all ;-) ... so no, that's not > a problem. My example demonstrated a pretty likely use: > > create rule r2 as on update to src do > insert into log values(old.*, 'old'), (new.*, 'new'); > > but for the moment we can tell people to work around it the way > they always have: > > create rule r2 as on update to src do > insert into log select old.*, 'old' union all new.*, 'new'; > > or just use two separate INSERT commands in the rule. > > We oughta fix it later, but I don't feel ashamed to have a restriction > like this in the first cut. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Should we wait for someone to actually ask for this before adding it > to the TODO list? Probably worth adding it to the TODO list so it doen't get lost. > Does it cause a crash now? Nope: regression=# create table log(f1 int, f2 int, tag text); CREATE TABLE regression=# insert into src values(1,2); INSERT 0 1 regression=# create rule r2 as on update to src do insert into log values(old.*, 'old'), (new.*, 'new'); ERROR: VALUES must not contain OLD or NEW references Joe
On Aug 2, 2006, at 12:47 , Joe Conway wrote: > regression=# create rule r2 as on update to src do insert into log > values(old.*, 'old'), (new.*, 'new'); > ERROR: VALUES must not contain OLD or NEW references > Would it make sense to add a HINT as well, recommending the UNION construct Tom mentioned earlier? On Aug 2, 2006, at 11:52 , Tom Lane wrote: > create rule r2 as on update to src do > insert into log select old.*, 'old' union all new.*, 'new'; > > or just use two separate INSERT commands in the rule. > Or is the general case too, uh, general for a HINT? Michael Glaesemann grzm seespotcode net
Joe Conway wrote: > Bruce Momjian wrote: > > Should we wait for someone to actually ask for this before adding it > > to the TODO list? > > Probably worth adding it to the TODO list so it doen't get lost. Added: o In rules, allow VALUES() to contain a mixture of 'old' and 'new' references -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Michael Glaesemann wrote: > On Aug 2, 2006, at 12:47 , Joe Conway wrote: > > > > regression=# create rule r2 as on update to src do insert into log > > values(old.*, 'old'), (new.*, 'new'); > > ERROR: VALUES must not contain OLD or NEW references > > > > Would it make sense to add a HINT as well, recommending the UNION > construct Tom mentioned earlier? Agreed. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Michael Glaesemann wrote: >> Would it make sense to add a HINT as well, recommending the UNION >> construct Tom mentioned earlier? > Agreed. Done. regards, tom lane
Tom Lane wrote: > I think we could safely list_free the input list in transformInsertRow > as your patch suggests, which would buy back the 144M part. But I don't > believe it's safe at all to free the raw_parser output --- the grammar > sometimes makes multiple links to the same subtree, eg in BETWEEN. In transformExpr the comment implies that it should be safe to reapply to an already transformed expression. What if we free the raw_parser expression list/cells/nodes and replace it with the as-transformed one? Joe
Re: VALUES clause memory optimization (was: Values list-of-targetlists patch...)
From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes: > In transformExpr the comment implies that it should be safe to reapply > to an already transformed expression. What if we free the raw_parser > expression list/cells/nodes and replace it with the as-transformed one? How are you going to do the "replace" bit? The entire problem is that you don't know where are all the down-links leading to the subexpression you are currently working on. The reason we could safely list_free inside transformInsertRow is that we know all its callers have just built the passed-in list and so there are no other pointers to it. That doesn't apply in the general case of grammar output. I think in the long run we probably ought to fix things so that the grammar never outputs any multiply-linked trees; that little shortcut has been a continuing source of grief for many reasons. I can't see doing that for 8.2 though. My advice is to get that low-hanging fruit in transformInsertRow and leave the other ideas for 8.3. regards, tom lane
Gavin Sherry <swm@linuxworld.com.au> writes: > On Fri, 4 Aug 2006, Michael Glaesemann wrote: >> On Aug 3, 2006, at 23:58 , Tom Lane wrote: >>> Should we give VALUES its own reference page? That doesn't quite >>> seem helpful either. >> >> I think we should go for a separate reference page, as VALUES appears >> to be expanding quite a bit. > ... with update? I associate it very closely with INSERT. After all, > INSERT is the only statement where we've had VALUES as part of the > grammar. True, but I think that's just a historical artifact. If you look at the SQL spec, INSERT ... VALUES and INSERT ... SELECT are not distinct constructs: they fall out of the fact that VALUES and SELECT are allowed interchangeably. <insert statement> ::= INSERT INTO <table name> <insert columns and source> <insert columns and source> ::= [ <left paren> <insert column list> <right paren> ] <query expression> | DEFAULT VALUES <insert column list> ::= <column name list> and when you trace down <query expression> you find the SELECT and VALUES options entering at exactly the same place ... I'd like to see us refactor the docs as necessary to reflect that idea. Peter is right that this needs some discussion in syntax.sgml as well as in the reference pages --- but I'm still not very clear on how the presentation should go. regards, tom lane
Tom Lane wrote: > The reason we could safely list_free inside transformInsertRow is that > we know all its callers have just built the passed-in list and so there > are no other pointers to it. That doesn't apply in the general case of > grammar output. What about for the specific case of an InsertStmt? It seems that we could at least get away with freeing the raw-expression list in that case. In terms of freeing an entire arbitrary node, could we create a backend/nodes/freefuncs.c file that does a recursive freeObject() similar to the way copyObject() does in backend/nodes/copyfuncs.c? > My advice is to get that low-hanging fruit > in transformInsertRow and leave the other ideas for 8.3. OK. This should be safe also, correct? Thanks, Joe 8<---------------------------------------- diff -c -r1.341 analyze.c *** src/backend/parser/analyze.c 2 Aug 2006 01:59:46 -0000 1.341 --- src/backend/parser/analyze.c 2 Aug 2006 05:13:20 -0000 *************** *** 2191,2196 **** --- 2196,2202 ---- for (i = 0; i < sublist_length; i++) { coltypes[i] = select_common_type(coltype_lists[i],"VALUES"); + list_free(coltype_lists[i]); } newExprsLists = NIL;
Joe Conway <mail@joeconway.com> writes: > What about for the specific case of an InsertStmt? It seems that we > could at least get away with freeing the raw-expression list in that case. Not sure ... what about rules, BETWEEN, yadda yadda? > In terms of freeing an entire arbitrary node, could we create a > backend/nodes/freefuncs.c file that does a recursive freeObject() > similar to the way copyObject() does in backend/nodes/copyfuncs.c? We got rid of freefuncs.c years ago, for good and sufficient reasons that have not gone away. The problem is exactly that you don't know whether any shortcuts were taken in constructing the node tree: multiple links, pointers to constants, pointers to stuff that wasn't supposed to be freed are all severe hazards. >> My advice is to get that low-hanging fruit >> in transformInsertRow and leave the other ideas for 8.3. > OK. This should be safe also, correct? Yes, but what's your point? The case that seems worth trying to optimize is "INSERT INTO foo VALUES ... real long list ...". Certainly the MySQL crowd is not going to be stressing transformValuesClause, because they don't know it exists. $ mysql test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 12 to server version: 5.0.22 Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> values (1),(2); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server versionfor the right syntax to use near 'values (1),(2)' at line 1 mysql> select * from (values (1),(2)) as x(y); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server versionfor the right syntax to use near 'values (1),(2)) as x(y)' at line 1 mysql> select * from foo where x in (values (1),(2)); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server versionfor the right syntax to use near '1),(2))' at line 1 mysql> mysql shortcomings aside, I don't really see the use-case for enormously long VALUES lists anywhere except the bulk-data-load scenario, ie, exactly INSERT ... VALUES. So I don't feel a need to be real tense in transformValuesClause. regards, tom lane
Re: [PATCHES] [DOCS] Values list-of-targetlists patch for comments (was Re:
From
Peter Eisentraut
Date:
Am Freitag, 4. August 2006 04:50 schrieb Tom Lane: > I'd like to see us refactor the docs as necessary to reflect that idea. > Peter is right that this needs some discussion in syntax.sgml as well as > in the reference pages --- but I'm still not very clear on how the > presentation should go. I'm beginning to think that VALUES might be a separate command after all. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Wed, Aug 09, 2006 at 03:05:02PM +0200, Peter Eisentraut wrote: > Am Freitag, 4. August 2006 04:50 schrieb Tom Lane: > > I'd like to see us refactor the docs as necessary to reflect that > > idea. Peter is right that this needs some discussion in > > syntax.sgml as well as in the reference pages --- but I'm still > > not very clear on how the presentation should go. > > I'm beginning to think that VALUES might be a separate command after > all. What's below definitely bolsters that idea :) postgres=# VALUES(1); column1 --------- 1 (1 row) However, there are some oddities: postgres=# SELECT * FROM (VALUES (1,2)) AS foo(bar,baz); bar | baz -----+----- 1 | 2 (1 row) postgres=# (VALUES (1,2)) AS foo(bar,baz); ERROR: syntax error at or near "AS" LINE 1: (VALUES (1,2)) AS foo(bar,baz); Does the SQL standard have anything to say about assigning identifiers both to the entire VALUES() statement and to its columns when the VALUES() statement is by itself? Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
David Fetter <david@fetter.org> writes: > However, there are some oddities: > postgres=# SELECT * FROM (VALUES (1,2)) AS foo(bar,baz); > [ works ] > postgres=# (VALUES (1,2)) AS foo(bar,baz); > ERROR: syntax error at or near "AS" This is per spec. Effectively, AS is part of the FROM-clause syntax not part of a standalone command. You can't write this either: regression=# (select 1,2) as foo(bar,baz); ERROR: syntax error at or near "as" LINE 1: (select 1,2) as foo(bar,baz); ^ regards, tom lane
Has this been addressed? --------------------------------------------------------------------------- Joe Conway wrote: > Tom Lane wrote: > > The reason we could safely list_free inside transformInsertRow is that > > we know all its callers have just built the passed-in list and so there > > are no other pointers to it. That doesn't apply in the general case of > > grammar output. > > What about for the specific case of an InsertStmt? It seems that we > could at least get away with freeing the raw-expression list in that case. > > In terms of freeing an entire arbitrary node, could we create a > backend/nodes/freefuncs.c file that does a recursive freeObject() > similar to the way copyObject() does in backend/nodes/copyfuncs.c? > > > My advice is to get that low-hanging fruit > > in transformInsertRow and leave the other ideas for 8.3. > > OK. This should be safe also, correct? > > Thanks, > > Joe > > 8<---------------------------------------- > diff -c -r1.341 analyze.c > *** src/backend/parser/analyze.c 2 Aug 2006 01:59:46 -0000 1.341 > --- src/backend/parser/analyze.c 2 Aug 2006 05:13:20 -0000 > *************** > *** 2191,2196 **** > --- 2196,2202 ---- > for (i = 0; i < sublist_length; i++) > { > coltypes[i] = select_common_type(coltype_lists[i], "VALUES"); > + list_free(coltype_lists[i]); > } > > newExprsLists = NIL; > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Has this been addressed? > >>Tom Lane wrote: >>>The reason we could safely list_free inside transformInsertRow is that >>>we know all its callers have just built the passed-in list and so there >>>are no other pointers to it. That doesn't apply in the general case of >>>grammar output. After another look, even in this case there could be other pointers. Starting around line 667 of analyze.c: 8<------------------ foreach(lc, selectQuery->targetList) { TargetEntry *tle = (TargetEntry *) lfirst(lc); Expr *expr; if (tle->resjunk) continue; if (tle->expr && (IsA(tle->expr, Const) ||IsA(tle->expr,Param)) && exprType((Node *) tle->expr) == UNKNOWNOID) expr = tle->expr; else expr = (Expr *) makeVar(rtr->rtindex, tle->resno, exprType((Node *) tle->expr), exprTypmod((Node *) tle->expr), 0); exprList = lappend(exprList, expr); } /* Prepare row for assignment to target table */ exprList = transformInsertRow(pstate, exprList, stmt->cols, icolumns, attrnos); 8<------------------ So in the UNKNOWNOID case, it wouldn't be safe to free the node after transformation because it originates in the grammar output. The *only* safe thing to free up would be a the input exprList itself. Not much to be saved there. >>>My advice is to get that low-hanging fruit >>>in transformInsertRow and leave the other ideas for 8.3. This one case doesn't provide that much memory savings by itself anyway. I guess we'll just live with it until we can come up with a safe way to free grammar output. Joe