Thread: partial JOIN (was: ID column naming convention)
Hi, Recent interesting discussion on the list, on (just) naming convention reminded me of a related problem which I haven't resolved myself, jet. As slowly I learn SQL (like a blind dog in a meat market), currently I've just started to use table JOINS more extensively ... and I often bump on a "two columns have the same name" error. The point is, that my "large object" often contain sets of (quite) identical component objects; like: a SLED has LEFT_RUNNER and RIGHT_RUNNER, both referring to the same RUNNERS table. And yes, I have sattled with naming convention where table is a plural noun (table RUNNERS), while primary key column name is an unspeciffic singular noun (A_RUNNER). So, when I join the SLEDS table with RUNNERS table (twice: left and right runner, to get a complete bom for a sled): SELECT * FROM sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r ON (s.right=r.runner); .... it doesn't work, when SLED table and RUNNERS table both have the same column (like LENGTH). And it wouldn't work even if I called sled.length a SLED.SLED_LENGTH and runner.length a RUNNER.RUNNER_LENGTH, since RUNNERS are joined twice. Is there an sql-language level solution (idiom) to cope with such queries? And I don't mean: SELECT s.*,r.*,l.* FROM sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r ON (s.right=r.runner); ...since in such case, there would be even more duplicate column names then in the first example. The only way I know to avoid the column name duplication is to explicity select column list: SELECT s.sled,s.length,s....,r.runner as right,r.length as right_length,r....,l.runner as left,l.length as left_length,l.* FROM sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r ON (s.right=r.runner); .... which is truely overtalkative (and thus obfuscates future query analize during code maintenance). To explain the problem a little better, here is a "pseudoSQL" query example, which should ilustrate the problem/solution: SELECT * FROM sleds s JOIN runners(length as left_length, weight as left_weight, runner as left) l USING (left) JOIN runners(length as right_length, weight as right_weight, runner as right) r USING (right); I found one way to get "almost exactly" to this point, that is by createing VIEWs to do the job of renameing columns: CREATE VIEW right_runner AS SELECT length as right_length, weight as right_weight, runner as right FROM runners; CREATE VIEW left_runner AS SELECT length as left_length, weight as right_weight, runner as left FROM runners; .... one problem with this solution is that during the livetime of an application, columns of tables (like RUNNERS) change (like by adding new attributes), and it's quite tricky to promote those changes smoothly up to the top "SELECT * FROM sleds ...", particularly if that last one is actually a CREATED VIEW. ... the other problem is that it exploads application/db schema, and thus make it more difficult to "comprehend" during future maintenance. A real live example of such query is quering addreses of a person: residential_address, office_address, delivery_address, etc... all of them from a single ADDRESSES table. Or people telephone numbers from PHONE_NUMERS table. I have seen the WINDOW-PARTITION-OVER syntax very similar to this problem, but countrary to JOINS, WINDOW can be defined outside of a select column list, which is helpfull. I havent found such construct for JOINs. Is there any? Is there an ordinary SQL phrase/idiom (laconic/tarse in it's form) to get the above result? -R
The only way I know to avoid the column name
duplication is to explicity select column list:
SELECT s.sled,s.length,s....,r.runner as right,r.length as
right_length,r....,l.runner as left,l.length as left_length,l.* FROM
sleds s JOIN runners l ON (s.left=l.runner) JOIN runners r ON
(s.right=r.runner);
.... which is truely overtalkative (and thus obfuscates future query
analize during code maintenance).
Skimmed...
Using explicit column names is expected - using "*" in non-trivial and production queries is not.
You can move the aliases if you would like.
SELECT *
FROM tablea (col1, col2, col4)
JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
JOIN tableb AS tb2
(col1, col6, col7) USING (col1)
The "USING" clause ensure that only a single "col1" appears in the output.
For all other columns in the duplicate join you need to provide a context-specific alias.
David J.
W dniu 24.10.2015 o 15:00, David G. Johnston pisze: > On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu > <mailto:rafal@ztk-rp.eu>>wrote: [----------------------] > > Using explicit column names is expected - using "*" in non-trivial and > production queries is not. > > You can move the aliases if you would like. > > SELECT * > FROM tablea (col1, col2, col4) > JOIN tableb AS tb1 (col1, col3, col5) USING (col1) > JOIN tableb AS tb2 > > (col1, col6, col7) USING (col1) I knew there must have been something like this. thenx, -R
W dniu 24.10.2015 o 21:03, Rafal Pietrak pisze: > > > W dniu 24.10.2015 o 15:00, David G. Johnston pisze: >> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu >> <mailto:rafal@ztk-rp.eu>>wrote: > [----------------------] >> >> Using explicit column names is expected - using "*" in non-trivial and >> production queries is not. >> >> You can move the aliases if you would like. >> >> SELECT * >> FROM tablea (col1, col2, col4) >> JOIN tableb AS tb1 (col1, col3, col5) USING (col1) >> JOIN tableb AS tb2 >> >> (col1, col6, col7) USING (col1) > > I knew there must have been something like this. Upss. Almost, but not quite. I've just read the manual on that (http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html) and it looks like "col1", "col2", etc in the above example are column *aliases*. Right? So I have to list *all* the columns of the aliased table irrespectively if I need any of them within the output, or not. It's a pity standard didn't choose to make column aliasing optional, allowing for cherry pick what's aliased like following: .. JOIN table AS tb(column7 [as alias1], column3 [as alias2],...) thenx anyway, "Mandatory" column aliasing is helpfull too. -R
On Sat, Oct 24, 2015 at 3:42 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
Good luck,
Dane
W dniu 24.10.2015 o 21:03, Rafal Pietrak pisze:
>
>
> W dniu 24.10.2015 o 15:00, David G. Johnston pisze:
>> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu
>> <mailto:rafal@ztk-rp.eu>>wrote:
> [----------------------]
>>
>> Using explicit column names is expected - using "*" in non-trivial and
>> production queries is not.
>>
>> You can move the aliases if you would like.
>>
>> SELECT *
>> FROM tablea (col1, col2, col4)
>> JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
>> JOIN tableb AS tb2
>>
>> (col1, col6, col7) USING (col1)
>
> I knew there must have been something like this.
Upss. Almost, but not quite. I've just read the manual on that
(http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html)
and it looks like "col1", "col2", etc in the above example are column
*aliases*. Right?
So I have to list *all* the columns of the aliased table irrespectively
if I need any of them within the output, or not.
It's a pity standard didn't choose to make column aliasing optional,
allowing for cherry pick what's aliased like following:
.. JOIN table AS tb(column7 [as alias1], column3 [as alias2],...)
thenx anyway, "Mandatory" column aliasing is helpfull too.
-R
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
You may be able to accomplish that using aliased sub-selects as in-line views. The purpose of the sub-selects in this use-case is simply to cherry pick the columns you want.
SELECT *
FROM
(SELECT col1, col2, col4 FROM tablea) AS iv
JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1))
JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1)
FROM
(SELECT col1, col2, col4 FROM tablea) AS iv
JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1))
JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1)
Please note, this may be a performance nightmare for large tables because w/o a WHERE clause that can be pushed down to the sub-selects each sub-select will do a full table scan.
Please note that the 3rd JOIN clause is nutty (I translated it from your original) because why would you join a table to itself just to select a different set of columns?
Good luck,
Dane
On Sat, Oct 24, 2015 at 5:23 PM, Dane Foster <studdugie@gmail.com> wrote:
On Sat, Oct 24, 2015 at 3:42 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
W dniu 24.10.2015 o 21:03, Rafal Pietrak pisze:
>
>
> W dniu 24.10.2015 o 15:00, David G. Johnston pisze:
>> On Sat, Oct 24, 2015 at 6:41 AM, Rafal Pietrak <rafal@ztk-rp.eu
>> <mailto:rafal@ztk-rp.eu>>wrote:
> [----------------------]
>>
>> Using explicit column names is expected - using "*" in non-trivial and
>> production queries is not.
>>
>> You can move the aliases if you would like.
>>
>> SELECT *
>> FROM tablea (col1, col2, col4)
>> JOIN tableb AS tb1 (col1, col3, col5) USING (col1)
>> JOIN tableb AS tb2
>>
>> (col1, col6, col7) USING (col1)
>
> I knew there must have been something like this.
Upss. Almost, but not quite. I've just read the manual on that
(http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html)
and it looks like "col1", "col2", etc in the above example are column
*aliases*. Right?
So I have to list *all* the columns of the aliased table irrespectively
if I need any of them within the output, or not.
It's a pity standard didn't choose to make column aliasing optional,
allowing for cherry pick what's aliased like following:
.. JOIN table AS tb(column7 [as alias1], column3 [as alias2],...)
thenx anyway, "Mandatory" column aliasing is helpfull too.
-R
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-generalYou may be able to accomplish that using aliased sub-selects as in-line views. The purpose of the sub-selects in this use-case is simply to cherry pick the columns you want.SELECT *
FROM
(SELECT col1, col2, col4 FROM tablea) AS iv
JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1))
JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1)Please note, this may be a performance nightmare for large tables because w/o a WHERE clause that can be pushed down to the sub-selects each sub-select will do a full table scan.Please note that the 3rd JOIN clause is nutty (I translated it from your original) because why would you join a table to itself just to select a different set of columns?
Good luck,
Dane
For the record SELECT * in my example is absolutely the wrong thing to do but your original didn't leave me w/ any other option.
Dane
W dniu 24.10.2015 o 23:25, Dane Foster pisze: > > On Sat, Oct 24, 2015 at 5:23 PM, Dane Foster <studdugie@gmail.com > <mailto:studdugie@gmail.com>> wrote: > [--------------------] > -- > Sent via pgsql-general mailing list > (pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > You may be able to accomplish that using aliased sub-selects as > in-line views. The purpose of the sub-selects in this use-case is > simply to cherry pick the columns you want. > SELECT * > FROM > (SELECT col1, col2, col4 FROM tablea) AS iv > JOIN (SELECT co1, col3, col5 FROM tableb) AS tb1 USING (col1)) > JOIN (SELECT col1, col6, col7 FROM tableb) AS tb22 USING (col1) > > Please note, this may be a performance nightmare for large tables > because w/o a WHERE clause that can be pushed down to the > sub-selects each sub-select will do a full table scan. Yes. And that's why this is not truely an option. I'd rather give all coluns aliases (when coding), then opt for subquery on every execute. > > Please note that the 3rd JOIN clause is nutty (I translated it from > your original) because why would you join a table to itself just to > select a different set of columns? One example (a bit artificial, I know) might be the address data, for waybill: create table purchases( basket int, customer int, delivery int, ...); select * from purchases p join buyers b(customer, city, address) using (customer) join buyers d (delivery, to_city, to_address, to_zip) using (delivery); ... or something like that. ZIP code is not actually needed to indicate customer (SSN might be instead). But I wouldn't agrue if real life programming actually needs that. I've just wanted to have the most generic example I've imagined. > > Good luck, > > Dane > > > For the record SELECT * in my example is absolutely the wrong thing to > do but your original didn't leave me w/ any other option. > Hmmm. I've seen people say that. I do keep that in mind, but frankly I actually never had to avoid that to get my code working (and maintained). I do that sometimes to limit the bandwidth necesery to deliver the results, but not so often. But I'd say, that "the standard" sort of does that (i.e. the star) notoriusly: 1. with table aliasing (the case we are discussing now), standard expects us to give column aliases *in order* they are defined within the aliased table - that's nothing else but a "hidden star" somwhere there. And I really wish they did it without that. 2. see the systax of INSERT: a list of column names to be prowided with values is optional, and when you don't give it, it's like you've written "a star" in its place. This I find *very bad* and never use it myself. But standard people thought otherwise. So personally, I don't see a star in a select list so harmfull, quite the oposit. -R
On 10/25/15 2:30 AM, Rafal Pietrak wrote: > So personally, I don't see a star in a select list so harmfull, quite > the oposit. Using * outside the database is generally very dangerous. Using it *inside* the database can be very useful, because frequently it's exactly what you want (give me all the columns, dammit!). In particular, I find it useful in views where I want the original data, along with some calculated or other values. For example, SELECT nspname, c.oid, * FROM pg_class c LEFT JOIN pg_namespace ON... But there's also times I've wanted a way to manipulate what * would normally do. In particular, *_except_for(field_list), and *_replace( regexp_replace to run on each field name). If those existed (and maybe a way to combine them), it wouldn't be terribly hard for you do handle your sled query with something like: SELECT s.*, l.*( s/.*/left_%/ ), r.*( s/.*/right_& ) FROM sled s JOIN runner l ON ... JOIN runner r ON ... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
W dniu 26.10.2015 o 01:49, Jim Nasby pisze: > On 10/25/15 2:30 AM, Rafal Pietrak wrote: [----------------] > But there's also times I've wanted a way to manipulate what * would > normally do. In particular, *_except_for(field_list), and *_replace( > regexp_replace to run on each field name). If those existed (and maybe a > way to combine them), it wouldn't be terribly hard for you do handle > your sled query with something like: > > SELECT s.*, l.*( s/.*/left_%/ ), r.*( s/.*/right_& ) > FROM sled s JOIN runner l ON ... JOIN runner r ON ... It certainly look nasty. But in the long run I strongly believe in peoples ability to "settle for conventions", and I think I could find my ways with that construct, if only it was possible. Not that I'd prefere it to the explicit aliasing for aliased table coluns, syntactically inside column alias list. My personal point of view is that: 1. "a star" in select is not so harmfull, since backend gives out the names along with the data; and you cen see them in psql output and retrieve them from client library (libpq) thus adopting for any order "a star" happen to create. 2. in all other cases (from which I know: table aliases and inserts), the backend does not interfere only when your error in arrangeing the list results in a data type missmatch. this is wrong. But anyway, thenx for pointing me to column aliasing, which I didn't know before. -R