Thread: table aliasing problem with 6.5...
odd problem in postgres 6.5. basically, if you use aliased tables _and_ use the full table name, postgres goes nuts. ive verified this on two separate databases with the same postgres installation. (PostgreSQL 6.5.0 on i686-pc-linux-gnu, compiled by egcc) pics=> explain pics-> SELECT pd.picnum, pd.picname, d.dirid, d.diskid, pd.tstamp, pd.mdsum, pd.fsize, pics-> d.dirname FROM picdata pd,dirs d WHERE picdata.dirid=dirs.dirid; NOTICE: QUERY PLAN: Hash Join (cost=35573920.00 rows=1073741849 width=68) -> Nested Loop (cost=19621602.00 rows=483339288 width=64) -> Nested Loop (cost=143382.61 rows=3528024 width=60) -> Seq Scan on picdata pd (cost=1205.82 rows=25752width=40) -> Seq Scan on dirs d (cost=5.52 rows=137 width=20) -> Seq Scan on dirs (cost=5.52rows=137 width=4) -> Hash (cost=1205.82 rows=25752 width=4) -> Seq Scan on picdata (cost=1205.82 rows=25752width=4) picdata contains 25752 rows. dirs contains 137 rows. 483339288 is a cartesian join across (seemingly) pd, d, and dirs. 1073741849 doesnt seem to be anything that i can figure out. 3528024 is from a cartesian join across pd and d ( or picdata and dirs ) but changing 'dirs.dirid' to 'd.dirid' and 'picdata.dirid' to 'pd.dirid' ( using the alised tables ) fixes the excessive row problem: pics=> explain pics-> SELECT pd.picnum, pd.picname, d.dirid, d.diskid, pd.tstamp, pd.mdsum, pd.fsize, pics-> d.dirname FROM picdata pd,dirs d WHERE pd.dirid=d.dirid; NOTICE: QUERY PLAN: Nested Loop (cost=1202.40 rows=25752 width=64) -> Seq Scan on dirs d (cost=5.52 rows=137 width=20) -> Index Scan usingpd_dirid_idx on picdata pd (cost=8.74 rows=25752 width=44) with another database: ircbot=> explain ircbot-> SELECT u.usernum, u.nick, u.botnick, u.passwd, um.hostmask ircbot-> FROM users u, usermasks um ircbot-> WHERE users.usernum=usermasks.usernum; NOTICE: QUERY PLAN: Hash Join (cost=50951.80 rows=3932364 width=60) -> Nested Loop (cost=30348.58 rows=623971 width=56) -> Nested Loop (cost=444.54 rows=9313 width=52) -> Seq Scan on users u (cost=3.21 rows=67 width=40) -> SeqScan on usermasks um (cost=6.59 rows=139 width=12) -> Seq Scan on users (cost=3.21 rows=67 width=4) -> Hash (cost=6.59 rows=139 width=4) -> Seq Scan on usermasks (cost=6.59 rows=139 width=4) ircbot=> explain ircbot-> SELECT u.usernum, u.nick, u.botnick, u.passwd, um.hostmask ircbot-> FROM users u, usermasks um ircbot-> WHERE u.usernum=um.usernum; NOTICE: QUERY PLAN: Hash Join (cost=17.60 rows=423 width=56) -> Seq Scan on usermasks um (cost=6.59 rows=139 width=16) -> Hash (cost=3.21rows=67 width=40) -> Seq Scan on users u (cost=3.21 rows=67 width=40) so, questions: (1) why is it trying to do a cartesian join when not using the aliased tables? (2) is this 'bug' present in 6.5.1 ? (3) why am i still awake at 7.30am EST on a saturday? --- Howie <caffeine@toodarkpark.org> URL: http://www.toodarkpark.org "The distance between insanity and genius is measured only by success."
Howie <caffeine@toodarkpark.org> writes: > (1) why is it trying to do a cartesian join when not using the aliased > tables? That's what it's supposed to do. When you provide an alias for a table name in FROM, then as far as the rest of that query is concerned, that alias *is* the name of the table --- it has no other. When you refer to the original table name in the WHERE clause, that's taken as creating a separate table reference that's implicitly added to FROM. Your query is a four-way join with only one join having a restriction clause :-( The alias behavior is necessary in order to handle self-joins properly, for example to find married couples: SELECT * FROM person, person other WHERE person.spouse = other.spouse; This would be ambiguous if "person" were exposed by the second FROM clause. SQL92 requires it to work this way: <table reference> ::= <table name> [ [ AS ] <correlation name> [ <left paren> <derivedcolumn list> <right paren> ] ] ... 1) A <correlation name> immediately contained in a <table refer- ence> TR is exposed by TR. A <table name>immediately contained in a <table reference> TR is exposed by TR if and only if TR does not specifya <correlation name>. I think that implicitly adding a table to FROM is a Postgres extension not found in SQL92 --- we probably really ought to reject such a query with an error, since this behavior seems to be surprising... regards, tom lane
> I think that implicitly adding a table to FROM is a Postgres extension > not found in SQL92 --- we probably really ought to reject such a query > with an error, since this behavior seems to be surprising... Or perhaps print a message to the user? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Howie <caffeine@toodarkpark.org> writes: > > (1) why is it trying to do a cartesian join when not using the aliased > > tables? > > That's what it's supposed to do. When you provide an alias for a table > name in FROM, then as far as the rest of that query is concerned, that > alias *is* the name of the table --- it has no other. When you refer > to the original table name in the WHERE clause, that's taken as creating > a separate table reference that's implicitly added to FROM. Your query > is a four-way join with only one join having a restriction clause :-( > > The alias behavior is necessary in order to handle self-joins properly, > for example to find married couples: > SELECT * FROM person, person other WHERE person.spouse = other.spouse; > This would be ambiguous if "person" were exposed by the second FROM clause. > SQL92 requires it to work this way: > > <table reference> ::= > <table name> [ [ AS ] <correlation name> > [ <left paren> <derived column list> <right paren> ] ] > > ... > > 1) A <correlation name> immediately contained in a <table refer- > ence> TR is exposed by TR. A <table name> immediately contained > in a <table reference> TR is exposed by TR if and only if TR > does not specify a <correlation name>. > > > I think that implicitly adding a table to FROM is a Postgres extension > not found in SQL92 --- we probably really ought to reject such a query > with an error, since this behavior seems to be surprising... OK, I have added an elog(NOTICE) for this behavior:test=> select * from pg_language;lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler --------+-------+------------+-------------+--------------internal|f |f | 0|n/a lisp |f |f | 0|/usr/ucb/lisztC |f |f | 0|/bin/cc sql |f |f | 0|postgres (4 rows)test=> selectpg_language.*; --> NOTICE: Auto-creating query reference to table pg_languagelanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler --------+-------+------------+-------------+--------------internal|f |f | 0|n/a lisp |f |f | 0|/usr/ucb/lisztC |f |f | 0|/bin/cc sql |f |f | 0|postgres (4 rows) I am interested in comments about my wording, and this behavour. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > OK, I have added an elog(NOTICE) for this behavior: > test=> select * from pg_language; > lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler > --------+-------+------------+-------------+-------------- > internal|f |f | 0|n/a > lisp |f |f | 0|/usr/ucb/liszt > C |f |f | 0|/bin/cc > sql |f |f | 0|postgres > (4 rows) > test=> select pg_language.*; --> NOTICE: Auto-creating query reference to table pg_language > lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler > --------+-------+------------+-------------+-------------- > internal|f |f | 0|n/a > lisp |f |f | 0|/usr/ucb/liszt > C |f |f | 0|/bin/cc > sql |f |f | 0|postgres > (4 rows) > I am interested in comments about my wording, and this behavour. I think this is a fine idea, but that wording might not be much help to novices. Maybe "Adding missing FROM-clause entry for pg_language", or some such? In a situation where you've got subselects, it may not be immediately obvious which FROM list the entry got added to. I can't think of any simple way of identifying that, however :-( regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > OK, I have added an elog(NOTICE) for this behavior: > > > test=> select * from pg_language; > > lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler > > --------+-------+------------+-------------+-------------- > > internal|f |f | 0|n/a > > lisp |f |f | 0|/usr/ucb/liszt > > C |f |f | 0|/bin/cc > > sql |f |f | 0|postgres > > (4 rows) > > > test=> select pg_language.*; > --> NOTICE: Auto-creating query reference to table pg_language > > lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler > > --------+-------+------------+-------------+-------------- > > internal|f |f | 0|n/a > > lisp |f |f | 0|/usr/ucb/liszt > > C |f |f | 0|/bin/cc > > sql |f |f | 0|postgres > > (4 rows) > > > I am interested in comments about my wording, and this behavour. > > I think this is a fine idea, but that wording might not be much help > to novices. Maybe "Adding missing FROM-clause entry for pg_language", > or some such? Yes, I like that much better. Let me make the change. > In a situation where you've got subselects, it may not be immediately > obvious which FROM list the entry got added to. I can't think of any > simple way of identifying that, however :-( -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > I am interested in comments about my wording, and this behavour. > > I think this is a fine idea, but that wording might not be much help > to novices. Maybe "Adding missing FROM-clause entry for pg_language", > or some such? Wording change to your version. > > In a situation where you've got subselects, it may not be immediately > obvious which FROM list the entry got added to. I can't think of any > simple way of identifying that, however :-( Not sure how to handle that either. I could print the subquery level number, or I could say "in subquery" or even go fancy and do "in sub-sub-query" depending on the number of levels down. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> In a situation where you've got subselects, it may not be immediately >> obvious which FROM list the entry got added to. I can't think of any >> simple way of identifying that, however :-( > Not sure how to handle that either. I could print the subquery level > number, or I could say "in subquery" or even go fancy and do "in > sub-sub-query" depending on the number of levels down. That seems like a good idea; it won't help tell the difference between two subqueries at the same level, but in a lot of practical cases it would tell you what you needed to know, and it won't confuse a novice. I like the "... in subquery", "... in sub-subquery", etc wording. BTW, Jan was complaining that a number of the regress tests now "fail" because they provoke this message. Should we just update the expected outputs, or should we change the tests not to use the feature? regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> In a situation where you've got subselects, it may not be immediately > >> obvious which FROM list the entry got added to. I can't think of any > >> simple way of identifying that, however :-( > > > Not sure how to handle that either. I could print the subquery level > > number, or I could say "in subquery" or even go fancy and do "in > > sub-sub-query" depending on the number of levels down. > > That seems like a good idea; it won't help tell the difference between > two subqueries at the same level, but in a lot of practical cases it > would tell you what you needed to know, and it won't confuse a novice. > > I like the "... in subquery", "... in sub-subquery", etc wording. > > BTW, Jan was complaining that a number of the regress tests now "fail" > because they provoke this message. Should we just update the expected > outputs, or should we change the tests not to use the feature? I asked Thomas to regenerate the expected output. I have too many other differences to patch the files. I can disable it for the time being if Jan wants me to. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> In a situation where you've got subselects, it may not be immediately > >> obvious which FROM list the entry got added to. I can't think of any > >> simple way of identifying that, however :-( > > > Not sure how to handle that either. I could print the subquery level > > number, or I could say "in subquery" or even go fancy and do "in > > sub-sub-query" depending on the number of levels down. > > That seems like a good idea; it won't help tell the difference between > two subqueries at the same level, but in a lot of practical cases it > would tell you what you needed to know, and it won't confuse a novice. > > I like the "... in subquery", "... in sub-subquery", etc wording. > > BTW, Jan was complaining that a number of the regress tests now "fail" > because they provoke this message. Should we just update the expected > outputs, or should we change the tests not to use the feature? I just looked at it, and I woild be required to loop up through the pstate->parentParseState counting how many times I can continue going up, and creating an elog string on the fly to print. Doesn't seem worth it, does it? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > I just looked at it, and I woild be required to loop up through the > pstate->parentParseState counting how many times I can continue going > up, and creating an elog string on the fly to print. Doesn't seem worth > it, does it? It's probably worth saying "in subquery" or not, but if you don't feel like determining the exact depth then never mind the sub-subquery bit... regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > I just looked at it, and I woild be required to loop up through the > > pstate->parentParseState counting how many times I can continue going > > up, and creating an elog string on the fly to print. Doesn't seem worth > > it, does it? > > It's probably worth saying "in subquery" or not, but if you don't feel > like determining the exact depth then never mind the sub-subquery bit... > > regards, tom lane > Good idea. I have: --------------------------------------------------------------------------- test=> select pg_language.*; NOTICE: Adding missing FROM-clause entry for table pg_language lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler --------+-------+------------+-------------+-------------- internal|f |f | 0|n/a lisp |f |f | 0|/usr/ucb/liszt C |f |f | 0|/bin/cc sql |f |f | 0|postgres (4 rows) test=> select * from pg_language a where lanname in (select pg_language.lanname); NOTICE: Adding missing FROM-clause entry in subquery for table pg_language lanname |lanispl|lanpltrusted|lanplcallfoid|lancompiler --------+-------+------------+-------------+-------------- internal|f |f | 0|n/a lisp |f |f | 0|/usr/ucb/liszt C |f |f | 0|/bin/cc sql |f |f | 0|postgres (4 rows) -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026