Re: Proposal: RETURNING primary_key() - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Proposal: RETURNING primary_key()
Date
Msg-id CAMsr+YGH2RkxeEpOzTqb=Lsn_CDXwBE0b03=t81R4bt-wLUoCw@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: RETURNING primary_key()  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: Proposal: RETURNING primary_key()
List pgsql-hackers
On 11 March 2016 at 16:00, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
Igal, thanks for the analysis.

Craig>Part of the question for Pg is what exactly we should and should not be returning.

I think the following might be a good starting point: return set of columns that would identify the inserted row(s).
E.g. columns of any primary key would do. Columns of any unique key would do as well.
"returning *" would do as well, however it would return excessive columns, thus it would be less efficient.

I do not think it makes sense to tie "generated keys" to sequences or things like that.

That's why (sorry, Igal) I'd like to see some more tests for cases other than identity columns. How is GENERATED ALWAYS handled, if supported? What about if it's on a UNIQUE column? How about a PRIMARY KEY whose value is assigned by a DEFAULT or by a trigger?

Based on the rather funky behaviour Igal found I suspect the answer will be "nothing much" for all of those, i.e. it just doesn't work with other drivers/vendors. But I'd like to know. I
 
2) Same for multicolumn keys:  Pg just returns (col1, col2) == (42, 146). Then client would be able to locate the row via "where col1=42 and col2=146

Yeah, I was wondering about composite PKs.  I think Igal focused only on generated synthetic keys, which are after all overwhelmingly common case when getting generated keys.

3) If multiple unique keys present, it is fine if Pg returns one or the another depending on the phase of the moon. Yet more compact key would be preferable to save on bandwidth.

I disagree there. Behavour must be well-defined and predictable unless it's really unavoidable.
 
I think naming the resulting column(s) like "generated_key" / "generated_keys" does not make much sense. Especially, for multi-column keys.

Yeah. At least in PgJDBC where it's a separate resultset (IIRC), so you have metadata that means you don't have to guess column names etc.
 
If ctid was update-proof, it could could do. Unfortunately, ctid might easily get changed.

Indeed. Especially since many of the apps that want to fetch generated keys will be connection-pool oriented apps doing optimistic concurrency control - ORMs and the like. So they won't be able to hold the transaction that added the row open (to hold locks and protect against vacuum) while fetching more info about the row. That'd be quite undesirable for performance anyway, since it'd force at least one extra round-trip; you couldn't pipeline the query for more info about the row until you knew the ctid of the inserted row.

using ctid is a nonstarter IMO, at least as far as the client goes.

PS. Frankly speaking, I feel "generated keys" is more like a "plug & pray" kind of API. ORMs should know the column names of the primary keys => ORMs should use "return specific column names" API, not just "return something generated".

Yep. There are many "should"s.  I absolutely agree that this is one of them.

One reason some clients do it this way is that earlier versions of the JDBC API didn't have the String[] generatedKeys form of prepareStatement. So they had to cope with not being able to ask for specific cols and getting whatever the DB handed them.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: The plan for FDW-based sharding
Next
From: Daniel Gustafsson
Date:
Subject: Re: Refectoring of receivelog.c