Thread: nested select query failing
Hi everybody, I am new to this mailing list, so please let me know if I am not posting queries the way you are expecting. - We are porting a web based application from MSSQL to postgres as a backend. This is a database intensive application. I am facing a problem in some queries like this : select distinct attached_info.id, ownerid ,attached_info.modified_date from attached_info where attached_info.id in ( select distinct attached_tag_list.id from attached_tag_list where attached_tag_list.id in select attached_info.id from attached_info where ttached_info.deleted='0' ) and attached_tag_list.id in ( select id from attached_tag_list where attached_tag = 262 ) and attached_tag_list.attached_tag in ( select tags.id from tags where tags.id in ( select tag_id from tag_classifier, tag_classifier_association where classifier_tag_id in ( 261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179, 3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, 3315, 87, 1041, 2343, 2345, 1869, 3088, 3872, 2651, 2923, 2302, 1681, 3636, 3964, 2778, 2694, 1371, 2532, 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700 ) and association_id='1566' and tag_classifier.uid=tag_classifier_association.uid ) and tags.isdeleted='0' ) ) order by attached_info.modified_date desc, attached_info.id desc; When I fire this query in psql, it does not return back. - top command shows postgres above 95+% cpu usage consistantly PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1550 postgres 25 0 20268 19M 18904 R 95.3 5.2 6:31 postmaster - I am using RedHat 8 with following postgres rpms postgresql-libs-7.2.2-1 postgresql-7.2.2-1 postgresql-server-7.2.2-1 - RAM size is 384 MB, SWAP size is 384 MB, but top shows that memory is free - I have done following changes after searching for performance realted information on the internet and postgres site - in /etc/rc.d/rc.local added following lines echo "32768" >/proc/sys/fs/file-max echo "98304" >/proc/sys/fs/inode-max - in /etc/init.d/postgresql file a pg_ctl call is changed to : su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o '-i -N 1024 -B 2048 -d 5' -p /usr/bin/postmaster start >> /var/log/pgsql.log 2>&1" < /dev/null - pgsql log shows : ...... }) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :keycount 3 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :numCols 3 :uniqColIdx 3 1 2 } DEBUG: ProcessQuery ********* log stops here for 5/6 minuts ********* DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 1595) exited with exit code 0 DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 1598) exited with exit code 0 DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 1599) exited with exit code 0 DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: reaping dead processes DEBUG: child process (pid 1600) exited with exit code 0 - What should I do to get such queries working? - Is there any limit on query size? - Is there anything left in tuning postgres which is causing this problem ? If you want me to try anything, please let me know. thanks Amol
* amol <amol@mithi.com> [15.05.2003 06:47]: > Hi everybody, > I am new to this mailing list, so please let me know if I am not posting > queries the way you are expecting. > > - We are porting a web based application from MSSQL to postgres as a > backend. > This is a database intensive application. I am facing a problem in some > queries like this : > > select distinct attached_info.id, ownerid ,attached_info.modified_date from > attached_info where attached_info.id in ( select distinct > attached_tag_list.id from attached_tag_list where attached_tag_list.id in > select attached_info.id from attached_info where > ttached_info.deleted='0' ) and attached_tag_list.id in ( select id from > attached_tag_list where attached_tag = 262 ) and > attached_tag_list.attached_tag in ( select tags.id from tags where tags.id > in ( select tag_id from tag_classifier, tag_classifier_association where > classifier_tag_id in ( 261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179, > 3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, 3315, 87, 1041, 2343, > 2345, 1869, 3088, 3872, 2651, 2923, 2302, 1681, 3636, 3964, 2778, 2694, > 1371, 2532, 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700 ) and > association_id='1566' and > tag_classifier.uid=tag_classifier_association.uid ) and > tags.isdeleted='0' ) ) order by attached_info.modified_date desc, > attached_info.id desc; IN () constructs isn't a good part of postgres (from the performance point of view). Try to rewrite your query using joins or EXISTS/NOT EXISTS constructs. Search archives for more details, there were a discussion of this topic lately. -- Victor Yegorov
Please post the EXPLAIN ANALYZE of that query... Chris ----- Original Message ----- From: "amol" <amol@mithi.com> To: <pgsql-performance@postgresql.org> Sent: Thursday, May 15, 2003 11:57 AM Subject: [PERFORM] nested select query failing > Hi everybody, > I am new to this mailing list, so please let me know if I am not posting > queries the way you are expecting. > > - We are porting a web based application from MSSQL to postgres as a > backend. > This is a database intensive application. I am facing a problem in some > queries like this : > > select distinct attached_info.id, ownerid ,attached_info.modified_date from > attached_info where attached_info.id in ( select distinct > attached_tag_list.id from attached_tag_list where attached_tag_list.id in > select attached_info.id from attached_info where > ttached_info.deleted='0' ) and attached_tag_list.id in ( select id from > attached_tag_list where attached_tag = 262 ) and > attached_tag_list.attached_tag in ( select tags.id from tags where tags.id > in ( select tag_id from tag_classifier, tag_classifier_association where > classifier_tag_id in ( 261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179, > 3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, 3315, 87, 1041, 2343, > 2345, 1869, 3088, 3872, 2651, 2923, 2302, 1681, 3636, 3964, 2778, 2694, > 1371, 2532, 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700 ) and > association_id='1566' and > tag_classifier.uid=tag_classifier_association.uid ) and > tags.isdeleted='0' ) ) order by attached_info.modified_date desc, > attached_info.id desc; > > When I fire this query in psql, it does not return back. > > - top command shows postgres above 95+% cpu usage consistantly > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 1550 postgres 25 0 20268 19M 18904 R 95.3 5.2 6:31 postmaster > > - I am using RedHat 8 with following postgres rpms > postgresql-libs-7.2.2-1 > postgresql-7.2.2-1 > postgresql-server-7.2.2-1 > > - RAM size is 384 MB, SWAP size is 384 MB, but top shows that memory is free > > - I have done following changes after searching for performance realted > information on the internet and postgres site > - in /etc/rc.d/rc.local added following lines > echo "32768" >/proc/sys/fs/file-max > echo "98304" >/proc/sys/fs/inode-max > - in /etc/init.d/postgresql file a pg_ctl call is changed to : > su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -o > '-i -N 1024 -B 2048 -d 5' -p /usr/bin/postmaster start >> > /var/log/pgsql.log 2>&1" < /dev/null > > - pgsql log shows : > ...... > }) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 > :scanrelid 1 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0 > :keycount 3 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0 > :numCols 3 :uniqColIdx 3 1 2 } > DEBUG: ProcessQuery > ********* > log stops here for 5/6 minuts > ********* > DEBUG: proc_exit(0) > DEBUG: shmem_exit(0) > DEBUG: exit(0) > DEBUG: reaping dead processes > DEBUG: child process (pid 1595) exited with exit code 0 > DEBUG: proc_exit(0) > DEBUG: shmem_exit(0) > DEBUG: exit(0) > DEBUG: reaping dead processes > DEBUG: child process (pid 1598) exited with exit code 0 > DEBUG: proc_exit(0) > DEBUG: shmem_exit(0) > DEBUG: exit(0) > DEBUG: reaping dead processes > DEBUG: child process (pid 1599) exited with exit code 0 > DEBUG: proc_exit(0) > DEBUG: shmem_exit(0) > DEBUG: exit(0) > DEBUG: reaping dead processes > DEBUG: child process (pid 1600) exited with exit code 0 > > > - What should I do to get such queries working? > - Is there any limit on query size? > - Is there anything left in tuning postgres which is causing this problem ? > > If you want me to try anything, please let me know. > > thanks > Amol > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
It's a rather nasty query format, but wrapped it to readable form. Looks like you could make a good join from all these IN's. Another question: does EXPLAIN (without ANALYZE) work for this query? Could you send its output, and table defs? maybe a minimal dump in private email? QUESTION TO PRO'S: Basically, is it true that IN's can be converted to RIGHT JOIN's quite simply? Is it always worth? G. -- while (!asleep()) sheep++; ---------------------------- cut here ------------------------------ ----- Original Message ----- From: "amol" <amol@mithi.com> Sent: Thursday, May 15, 2003 5:57 AM > Hi everybody, > I am new to this mailing list, so please let me know if I am not posting > queries the way you are expecting. > > - We are porting a web based application from MSSQL to postgres as a > backend. > This is a database intensive application. I am facing a problem in some > queries like this : > > select distinct > attached_info.id, ownerid ,attached_info.modified_date > from attached_info > where > attached_info.id in > (select distinct attached_tag_list.id from attached_tag_list > where > attached_tag_list.id in > (select attached_info.id from attached_info > where attached_info.deleted='0') and > attached_tag_list.id in > (select id from attached_tag_list > where attached_tag = 262) and > attached_tag_list.attached_tag in > (select tags.id from tags > where > tags.id in > (select tag_id > from tag_classifier, tag_classifier_association > where > classifier_tag_id in > (261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179, > 3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, > 3315, 87, 1041, 2343, 2345, 1869, 3088, 3872, 2651, > 2923, 2302, 1681, 3636, 3964, 2778, 2694, 1371, 2532, > 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700) > and > association_id='1566' and > tag_classifier.uid=tag_classifier_association.uid > ) and > tags.isdeleted='0' > ) > ) > order by attached_info.modified_date desc, attached_info.id desc;
On Thu, 15 May 2003, [iso-8859-1] SZUCS G�bor wrote: > Basically, is it true that IN's can be converted to RIGHT JOIN's quite > simply? Is it always worth? I'm not sure you want to convert to an outer join (since you want to throw away the rows on either side that don't match in an IN). You also have to be careful not to get duplicate entries from what was the subquery. As for whether it's worth doing, in 7.3 and earlier, almost certainly, in 7.4 almost certainly not. :)
thanks allot everybody for your mails, - It helped and now I have got down the query execution time allot. But I am facing problem in following query ----------- explain analyze select attached_info.id from attached_tag_list, attached_info where attached_tag_list.attached_tag = 265 and attached_tag_list.id = attached_info.id ---------- - it's result is ---------- NOTICE: QUERY PLAN: Nested Loop (cost=0.00..165349.50 rows=114 width=16) (actual time=117.14..8994.60 rows=15 loops=1) -> Index Scan using ix_attached_tag_list_id on attached_tag_list (cost=0.00..111.13 rows=96 width=12) (actual time=0.12..0.66 rows=15 loops=1) -> Seq Scan on attached_info (cost=0.00..1211.53 rows=33553 width=4) (actual time=3.67..197.98 rows=33553 loops=15) Total runtime: 8994.92 msec EXPLAIN --------- - I have already indexed attached_info on id using following query ------ CREATE INDEX attached_info_Index_1 ON attached_info(id) ; ------ - But I am wondering why there is "->Seq Scan on attached_info." After reading various documentation on the internet I am assuming it should have been an index scan. BTW I have done vaccume analyze also. Am I right? thanks, Amol ----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone23.bigpanda.com> To: "SZUCS Gábor" <surrano@mailbox.hu> Cc: <pgsql-performance@postgresql.org> Sent: Thursday, May 15, 2003 8:26 PM Subject: Re: [PERFORM] nested select query failing On Thu, 15 May 2003, [iso-8859-1] SZUCS Gábor wrote: > Basically, is it true that IN's can be converted to RIGHT JOIN's quite > simply? Is it always worth? I'm not sure you want to convert to an outer join (since you want to throw away the rows on either side that don't match in an IN). You also have to be careful not to get duplicate entries from what was the subquery. As for whether it's worth doing, in 7.3 and earlier, almost certainly, in 7.4 almost certainly not. :) ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
"amol" <amol@mithi.com> writes: > explain analyze select attached_info.id from attached_tag_list, > attached_info > where > attached_tag_list.attached_tag = 265 > and > attached_tag_list.id = attached_info.id > NOTICE: QUERY PLAN: > Nested Loop (cost=0.00..165349.50 rows=114 width=16) (actual > time=117.14..8994.60 rows=15 loops=1) > -> Index Scan using ix_attached_tag_list_id on attached_tag_list > (cost=0.00..111.13 rows=96 width=12) (actual time=0.12..0.66 rows=15 > loops=1) > -> Seq Scan on attached_info (cost=0.00..1211.53 rows=33553 width=4) > (actual time=3.67..197.98 rows=33553 loops=15) > Total runtime: 8994.92 msec > - I have already indexed attached_info on id using following query > CREATE INDEX attached_info_Index_1 ON attached_info(id) ; Hm. I'd have expected an index scan too. Maybe the two id columns are not of the same datatype? regards, tom lane
Hi Tom, U are great. As you have said, one item was numeric and another serial integer ) so it was applying seq scan. Thank you very much for your help everybody. regards, Amol ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "amol" <amol@mithi.com> Cc: <pgsql-performance@postgresql.org> Sent: Tuesday, May 20, 2003 8:48 PM Subject: Re: [PERFORM] nested select query failing > "amol" <amol@mithi.com> writes: > > explain analyze select attached_info.id from attached_tag_list, > > attached_info > > where > > attached_tag_list.attached_tag = 265 > > and > > attached_tag_list.id = attached_info.id > > > NOTICE: QUERY PLAN: > > > Nested Loop (cost=0.00..165349.50 rows=114 width=16) (actual > > time=117.14..8994.60 rows=15 loops=1) > > -> Index Scan using ix_attached_tag_list_id on attached_tag_list > > (cost=0.00..111.13 rows=96 width=12) (actual time=0.12..0.66 rows=15 > > loops=1) > > -> Seq Scan on attached_info (cost=0.00..1211.53 rows=33553 width=4) > > (actual time=3.67..197.98 rows=33553 loops=15) > > Total runtime: 8994.92 msec > > > - I have already indexed attached_info on id using following query > > CREATE INDEX attached_info_Index_1 ON attached_info(id) ; > > Hm. I'd have expected an index scan too. Maybe the two id columns are > not of the same datatype? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > >