LEFT OUTER JOIN and WHERE madness (8.3.3 bug?) - Mailing list pgsql-bugs
From | toruvinn |
---|---|
Subject | LEFT OUTER JOIN and WHERE madness (8.3.3 bug?) |
Date | |
Msg-id | op.udeyj4yd33x80h@insanity.lain.pl Whole thread Raw |
Responses |
Re: LEFT OUTER JOIN and WHERE madness (8.3.3 bug?)
Re: LEFT OUTER JOIN and WHERE madness (8.3.3 bug?) |
List | pgsql-bugs |
Hello. I've asked on IRC about that and I was told to send mail to this maillist about the problem I've encountered in postgresql 8.3.3 (well, actually my friend found it). The thing is, for some reason 8.3.3 doesn't correctly (left outer) join the tables when an additional condition in WHERE clause is supplied. Enough talking, here are the exact results: SELECT i.id, i.albumid, i.userid, a.id as aid, a.visible_for AS al_visible_for, i.visible_for FROM items i LEFT OUTER JOIN albums a ON a.id=i.albumid WHERE i.userid=564667 AND ((a.id IS NULL AND (i.visible_for IN (0,1))) OR a.visible_for IN (0,1)) AND i.type=1 ORDER BY i.created DESC limit 4; id | albumid | userid | aid | al_visible_for | visible_for ----------------+---------+--------+-----+----------------+------------- 12145661286279 | 22927 | 564667 | | | 0 12145661007878 | 22927 | 564667 | | | 0 12145660706588 | 22927 | 564667 | | | 0 12145658772889 | 22927 | 564667 | | | 0 (4 rows) Now, what we see here is pretty much correct, unless... SELECT COUNT(*) FROM albums WHERE id = 22927; count ------- 1 (1 row) toruvinn=> \d albums Table "public.albums" Column | Type | Modifiers -------------+-----------------------------+------------------------------------------------------------------- id | integer | not null default nextval(('public.album_id_seq'::text)::regclass) userid | integer | not null visible_for | smallint | not null default 0 created | timestamp without time zone | not null Indexes: "albums_pkey" PRIMARY KEY, btree (id) "albums_created" btree (created) "albums_userid" btree (userid) CLUSTER "albums_visible_for" btree (visible_for) What we see here is the fact that THERE IS a correspoding row in table albums which should be joined to the results (album.id has NOT NULL constraint, so we clearly see NULLs returned there are wrong). Let's see: select visible_for from albums where id = 22927; visible_for ------------- 4 (1 row) Yup. There is one. Now, the madness starts here. Let's take the above query and remove the AND (...) condition: SELECT i.id, i.albumid, i.userid, a.id as aid, a.visible_for AS al_visible_for, i.visible_for FROM items i LEFT OUTER JOIN albums a ON a.id=i.albumid WHERE i.userid=564667 AND i.type=1 ORDER BY i.created DESC limit 4; id | albumid | userid | aid | al_visible_for | visible_for ----------------+---------+--------+-------+----------------+------------- 12145661286279 | 22927 | 564667 | 22927 | 4 | 0 12145661007878 | 22927 | 564667 | 22927 | 4 | 0 12145660706588 | 22927 | 564667 | 22927 | 4 | 0 12145658772889 | 22927 | 564667 | 22927 | 4 | 0 (4 rows) Looks like EXACTLY the same rows as above (items.id is PRIMARY KEY) but suddenly al_visible_for has values, so the join actually occured. Why would WHERE influence the joins, I don't know. I tried the same queries in 8.2.9: SELECT i.id, i.albumid, i.userid, a.id as aid, a.visible_for AS al_visible_for, i.visible_for FROM items i LEFT OUTER JOIN albums a ON a.id=i.albumid WHERE i.userid=564667 AND ((a.id IS NULL AND (i.visible_for IN (0,1))) OR a.visible_for IN (0,1)) AND i.type=1 ORDER BY i.created DESC limit 4; id | albumid | userid | aid | al_visible_for | visible_for ----------------+---------+--------+-------+----------------+------------- 12143809382724 | 22957 | 564667 | 22957 | 1 | 0 12143809046652 | 22957 | 564667 | 22957 | 1 | 0 12143808694383 | 22957 | 564667 | 22957 | 1 | 0 12143809241162 | 22957 | 564667 | 22957 | 1 | 0 (4 rows) That's correct (notice albumid different from 8.3.3 results), all the conditions are met. SELECT i.id, i.albumid, i.userid, a.id as aid, a.visible_for AS al_visible_for, i.visible_for FROM items i LEFT OUTER JOIN albums a ON a.id=i.albumid WHERE i.userid=564667 AND i.type=1 ORDER BY i.created DESC limit 4; id | albumid | userid | aid | al_visible_for | visible_for ----------------+---------+--------+-------+----------------+------------- 12145661286279 | 22927 | 564667 | 22927 | 4 | 0 12145661007878 | 22927 | 564667 | 22927 | 4 | 0 12145660706588 | 22927 | 564667 | 22927 | 4 | 0 12145658772889 | 22927 | 564667 | 22927 | 4 | 0 (4 rows) Well, as expected, exactly the same as in 8.3.3's output. As I know, simple test cases are usually the best, so I've removed irrelevant columns and rows from both tables and the dump (~4KB bzip2 compressed) is available at http://lain.pl/wtf.sql.bz2. After removing the data, of course, I've checked query results and they're still the same (i.e. the first query returns wrong, `unjoined' al_visible_for). Now, to make sure, I've tried this on x86_32 with different CFLAGS (the last one was -pipe -O2 I think) and x86_64, the results were the same. I hope you can reproduce it... Now, as I said above, I did consult people on freenode's #postgresql (Hi crab! thanks! ;-)) in case I was missing something, but apparently I'm not... Just in case, here's EXPLAIN ANALYZE result for both queries in 8.3.3 (after clearing the tables, that's why pg decides to use seqscans here, don't blame seqscan! ;-)): toruvinn=> EXPLAIN ANALYZE SELECT i.id, i.albumid, i.userid, a.id as aid, a.visible_for AS al_visible_for, i.visible_for FROM items i LEFT OUTER JOIN albums a ON a.id=i.albumid WHERE i.userid=564667 AND ((a.id IS NULL AND (i.visible_for IN (0,1))) OR a.visible_for IN (0,1)) AND i.type=1 ORDER BY i.created DESC limit 4; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=18.43..18.44 rows=4 width=32) (actual time=0.418..0.418 rows=4 loops=1) -> Sort (cost=18.43..18.97 rows=216 width=32) (actual time=0.418..0.418 rows=4 loops=1) Sort Key: i.created Sort Method: top-N heapsort Memory: 25kB -> Nested Loop Left Join (cost=1.01..15.19 rows=216 width=32) (actual time=0.022..0.290 rows=216 loops=1) Join Filter: (a.id = i.albumid) Filter: (((a.id IS NULL) AND (i.visible_for = ANY ('{0,1}'::integer[]))) OR (a.visible_for = ANY ('{0,1}'::integer[]))) -> Seq Scan on items i (cost=0.00..8.24 rows=216 width=26) (actual time=0.012..0.153 rows=216 loops=1) Filter: ((userid = 564667) AND (type = 1)) -> Materialize (cost=1.01..1.02 rows=1 width=6) (actual time=0.000..0.000 rows=0 loops=216) -> Seq Scan on albums a (cost=0.00..1.01 rows=1 width=6) (actual time=0.006..0.006 rows=0 loops=1) Filter: ((id IS NULL) OR (visible_for = ANY ('{0,1}'::integer[]))) Total runtime: 0.464 ms (13 rows) toruvinn=> EXPLAIN ANALYZE SELECT toruvinn-> i.id, i.albumid, i.userid, toruvinn-> a.id as aid, a.visible_for AS al_visible_for, i.visible_for toruvinn-> FROM items i toruvinn-> LEFT OUTER JOIN albums a ON a.id=i.albumid toruvinn-> WHERE i.userid=564667 toruvinn-> AND i.type=1 toruvinn-> ORDER BY i.created DESC limit 4; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Limit (cost=17.35..17.36 rows=4 width=32) (actual time=0.455..0.457 rows=4 loops=1) -> Sort (cost=17.35..17.89 rows=216 width=32) (actual time=0.454..0.455 rows=4 loops=1) Sort Key: i.created Sort Method: top-N heapsort Memory: 25kB -> Nested Loop Left Join (cost=1.01..14.11 rows=216 width=32) (actual time=0.021..0.340 rows=216 loops=1) Join Filter: (a.id = i.albumid) -> Seq Scan on items i (cost=0.00..8.24 rows=216 width=26) (actual time=0.012..0.136 rows=216 loops=1) Filter: ((userid = 564667) AND (type = 1)) -> Materialize (cost=1.01..1.02 rows=1 width=6) (actual time=0.000..0.000 rows=1 loops=216) -> Seq Scan on albums a (cost=0.00..1.01 rows=1 width=6) (actual time=0.003..0.004 rows=1 loops=1) Total runtime: 0.498 ms (11 rows) I would be really grateful if someone explained why this happens - whether it's a bug or I'm stupid and don't see something obvious. Also, apart from sending this mail, should I file a bug report (assuming it's the first case in the above statement, not the second one)? Best regards, -- toruvinn
pgsql-bugs by date: