Thread: Idea: INSERT INTO ... NATURAL SELECT ...
Hi everyone, I started a discussion on the hackers mailing list regarding a patch that was made more than two years ago but was never merged for lack of time then and lack of interest and time now (see https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5ed61@joh.to for the original thread and https://www.postgresql.org/message-id/840eb7b0-17ae-fe52-1643-cd7395eed5df%40berkvens.net for the new thread). During that discussion, I arrived at the standpoint that the original idea was perhaps not the best idea to start with after all. I write quite some handmade SQL, and usually, SQL allows me to write what I want to express in a compact, clear fashion. There is one exception that I keep running into. I frequently need to insert manually entered values into various tables with many columns. Of course, INSERT INTO ... (...) {SELECT | VALUES} ... allows me to do just that, however, it is very hard to find which values will go into which column that way: column names and values are not near to each other in this syntax. The patch that I initially mailed about implements a syntax that MySQL has had for a long time: INSERT INTO ... SET a = 1, b = 2, etc. That looks okay on first sight but has several drawbacks. One is that it is non-standard, but a more significant drawback is that it allows only one row to be inserted per statement. The idea that I got was the following, and I'd like to know what other people think about this. If other people think it's a good idea too and nobody has objections against it, I'd be willing to try and create a patch for an implementation. The syntax that I'm proposing is: INSERT INTO table NATURAL query [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] The idea of the NATURAL keyword is that one does not explicitly name the columns of the target table in parentheses, and does not use its default ordering of columns if one does not name any columns, but instead have the NATURAL keyword trigger behavior where the column names to be used in the target table are determined from the column names produced by the query instead. For example: CREATE TABLE test_table(a INTEGER, b INTEGER, c INTEGER, d INTEGER); These two would be equivalent: INSERT INTO test_table (b, c, a) VALUES (3, 5, 7); INSERT INTO test_table NATURAL SELECT 3 AS b, 5 AS c, 7 AS a; As would these two: INSERT INTO test_table (b, c, a) SELECT source.foo, source.bar, source.beep FROM source; INSERT INTO test_table NATURAL SELECT source.foo AS b, source.bar AS c, source.beep AS a FROM source; These examples are of course contrived, and the benefit only really shows when the table has many columns. The query would fail to execute if one (or more) of the column names in the query does not exist in the target table, or if a column name is used more than once in the query. Everything else works just as you would expect, the behavior is identical to using a normal SELECT or VALUES where one specifies the column names by hand. So, specifically highlighting the differences, these queries would fail: INSERT INTO test_table NATURAL SELECT 1 AS does_not_exist, 2 AS also_nonexistent; INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a; Anyone with any thoughts about this? An implementation would make inserting data into wide tables by hand very much easier. Because of the placement of the NATURAL keyword, I don't think this will conflict with any current or future proposal from the SQL standard (except maybe for this one :-) ). With kind regards, Sven Berkvens-Matthijsse PS I have not subscribed to the SQL list yet, please copy me in explicitly.
On Fri, Feb 1, 2019 at 8:07 AM Sven Berkvens-Matthijsse <sven@postgresql.berkvens.net> wrote:
<snip>
INSERT INTO test_table NATURAL SELECT
1 AS does_not_exist, 2 AS also_nonexistent;
INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a;
Anyone with any thoughts about this? An implementation would make
inserting data into wide tables by hand very much easier. Because of the
placement of the NATURAL keyword, I don't think this will conflict with
any current or future proposal from the SQL standard (except maybe for
this one :-) ).
Hi Sven,
I can clearly see the benefits of your proposal from a custom SQL writing perspective. Personally, I would just write this capability as a DSL in a higher level language and have it translated into the ugly SQL you're trying to avoid (that has the field values and names split far apart, visually). I have no idea what the implications of your NATURAL keyword on the implementation of Postgres itself, but I do think (as another person who writes manual SQL on occasion) that your proposal has merits for this narrow use-case. Personally I don't think the use-case is substantial enough to merit implementing a custom keyword unless there are a lot more people with use-cases that would benefit also. But if you can write an optional module that can be installed outside of core, it seems like why not? Just my two cents.
Sincerely,
Steve
Hi Sven, in many cases I prefer "copy from" to import data. So I can create my test data via spreadsheet and csv. Sometimes it is helpfull to do some shell stuff to create it or import with psql. But if you prefer to have column names and column data near to each other, why not using json? postgres=#create table test ( i integer, t text); postgres=# insert into test(select * from json_to_recordset('[{"i":1,"t":"foo"},{"i":"7","t":"bar"}]') as x(i int, t text)); INSERT 0 2 postgres=# select * from test; i | t ---+----- 1 | foo 7 | bar (2 rows) Regards Martin Am 01.02.2019 um 17:06 schrieb Sven Berkvens-Matthijsse: > Hi everyone, > > I started a discussion on the hackers mailing list regarding a patch > that was made more than two years ago but was never merged for lack of > time then and lack of interest and time now (see > https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5ed61@joh.to > for the original thread and > https://www.postgresql.org/message-id/840eb7b0-17ae-fe52-1643-cd7395eed5df%40berkvens.net > for the new thread). During that discussion, I arrived at the > standpoint that the original idea was perhaps not the best idea to > start with after all. > > I write quite some handmade SQL, and usually, SQL allows me to write > what I want to express in a compact, clear fashion. There is one > exception that I keep running into. I frequently need to insert > manually entered values into various tables with many columns. Of > course, INSERT INTO ... (...) {SELECT | VALUES} ... allows me to do > just that, however, it is very hard to find which values will go into > which column that way: column names and values are not near to each > other in this syntax. > > The patch that I initially mailed about implements a syntax that MySQL > has had for a long time: INSERT INTO ... SET a = 1, b = 2, etc. That > looks okay on first sight but has several drawbacks. One is that it is > non-standard, but a more significant drawback is that it allows only > one row to be inserted per statement. > > The idea that I got was the following, and I'd like to know what other > people think about this. If other people think it's a good idea too > and nobody has objections against it, I'd be willing to try and create > a patch for an implementation. > > The syntax that I'm proposing is: > > INSERT INTO table NATURAL query > [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] > > The idea of the NATURAL keyword is that one does not explicitly name > the columns of the target table in parentheses, and does not use its > default ordering of columns if one does not name any columns, but > instead have the NATURAL keyword trigger behavior where the column > names to be used in the target table are determined from the column > names produced by the query instead. For example: > > CREATE TABLE test_table(a INTEGER, b INTEGER, c INTEGER, d INTEGER); > > These two would be equivalent: > > INSERT INTO test_table (b, c, a) VALUES (3, 5, 7); > > INSERT INTO test_table NATURAL SELECT 3 AS b, 5 AS c, 7 AS a; > > As would these two: > > INSERT INTO test_table (b, c, a) > SELECT source.foo, source.bar, source.beep FROM source; > > INSERT INTO test_table NATURAL SELECT > source.foo AS b, source.bar AS c, source.beep AS a FROM source; > > These examples are of course contrived, and the benefit only really > shows when the table has many columns. > > The query would fail to execute if one (or more) of the column names > in the query does not exist in the target table, or if a column name > is used more than once in the query. Everything else works just as you > would expect, the behavior is identical to using a normal SELECT or > VALUES where one specifies the column names by hand. So, specifically > highlighting the differences, these queries would fail: > > INSERT INTO test_table NATURAL SELECT > 1 AS does_not_exist, 2 AS also_nonexistent; > > INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a; > > Anyone with any thoughts about this? An implementation would make > inserting data into wide tables by hand very much easier. Because of > the placement of the NATURAL keyword, I don't think this will conflict > with any current or future proposal from the SQL standard (except > maybe for this one :-) ). > > With kind regards, > Sven Berkvens-Matthijsse > > PS I have not subscribed to the SQL list yet, please copy me in > explicitly. > > >
Hi Steve, On 01/02/2019 18.32, Steve Midgley wrote: > On Fri, Feb 1, 2019 at 8:07 AM Sven Berkvens-Matthijsse > <sven@postgresql.berkvens.net <mailto:sven@postgresql.berkvens.net>> > wrote: > > <snip> > INSERT INTO test_table NATURAL SELECT > 1 AS does_not_exist, 2 AS also_nonexistent; > > INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a; > > Anyone with any thoughts about this? An implementation would make > inserting data into wide tables by hand very much easier. Because > of the > placement of the NATURAL keyword, I don't think this will conflict > with > any current or future proposal from the SQL standard (except maybe > for > this one :-) ). > > > Hi Sven, > > I can clearly see the benefits of your proposal from a custom SQL > writing perspective. Personally, I would just write this capability as > a DSL in a higher level language and have it translated into the ugly > SQL you're trying to avoid (that has the field values and names split > far apart, visually). Sure, that's definitely an option! The INSERT statements I have are usually part of larger SQL files that do other things as well, but there's no reason I couldn't make a preprocessor or something that expands something custom into the usually formatted INSERT statements. > I have no idea what the implications of your NATURAL keyword on the > implementation of Postgres itself, but I do think (as another person > who writes manual SQL on occasion) that your proposal has merits for > this narrow use-case. Maybe it's just not useful that often because people don't write much SQL by hand these days (maybe they never did, I don't know). > Personally I don't think the use-case is substantial enough to merit > implementing a custom keyword unless there are a lot more people with > use-cases that would benefit also. But if you can write an optional > module that can be installed outside of core, it seems like why not? > Just my two cents. I'm afraid the plug-in architecture of PostgreSQL wouldn't allow my proposal to be implemented that way (unless I'm mistaken and I didn't read and/or understand correctly). > Sincerely, > Steve Thanks for your response and thoughts! Sven
Hi Martin, On 01/02/2019 19.41, Martin Stöcker wrote: > Hi Sven, > > in many cases I prefer "copy from" to import data. So I can create my > test data via spreadsheet and csv. Sure, that works, but it would then be useful if the COPY command would actually read the first line of a CSV file and use it to find the target columns in the table, but it does not. > Sometimes it is helpfull to do some shell stuff to create it or import > with psql. Yes, that definitely works. > But if you prefer to have column names and column data near to each > other, why not using json? > > postgres=#create table test ( i integer, t text); > postgres=# insert into test(select * from > json_to_recordset('[{"i":1,"t":"foo"},{"i":"7","t":"bar"}]') as x(i > int, t text)); That also works, but requires one to name all the columns and their types in the "AS x" part. That makes the statement very verbose. My proposal would not require the types to be stated and would only require the column names in the "data part" of the statement. > INSERT 0 2 > postgres=# select * from test; > i | t > ---+----- > 1 | foo > 7 | bar > (2 rows) > > Regards Martin Thanks for your thoughts! With kind regards, Sven