Thread: nested select query failing

nested select query failing

From
"amol"
Date:
Hi everybody,
I am new to this mailing list, so please let me know if I am not posting
queries the way you are expecting.

-  We are porting a web based application from MSSQL to postgres as a
backend.
This is a database intensive application. I am facing a problem in some
queries like this :

select distinct attached_info.id, ownerid ,attached_info.modified_date from
attached_info where attached_info.id in (  select distinct
attached_tag_list.id from attached_tag_list where attached_tag_list.id in
 select attached_info.id from attached_info where
ttached_info.deleted='0'  )  and attached_tag_list.id  in ( select id from
attached_tag_list where attached_tag = 262 )  and
attached_tag_list.attached_tag in ( select tags.id from tags where tags.id
in ( select tag_id from tag_classifier, tag_classifier_association where
classifier_tag_id in ( 261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179,
3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, 3315, 87, 1041, 2343,
2345, 1869, 3088, 3872, 2651, 2923, 2302, 1681, 3636, 3964, 2778, 2694,
1371, 2532, 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700 ) and
association_id='1566' and
tag_classifier.uid=tag_classifier_association.uid )  and
tags.isdeleted='0' )  )  order by attached_info.modified_date desc,
attached_info.id desc;

When I fire this query in psql, it does not return back.

- top command shows postgres above 95+% cpu usage consistantly
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1550 postgres  25   0 20268  19M 18904 R    95.3  5.2   6:31 postmaster

- I am using RedHat 8 with following postgres rpms
postgresql-libs-7.2.2-1
postgresql-7.2.2-1
postgresql-server-7.2.2-1

- RAM size is 384 MB, SWAP size is 384 MB, but top shows that memory is free

- I have done following changes after searching for performance realted
information on the internet and postgres site
            - in /etc/rc.d/rc.local added following lines
                            echo "32768" >/proc/sys/fs/file-max
                            echo "98304" >/proc/sys/fs/inode-max
            - in /etc/init.d/postgresql file a pg_ctl call is changed to :
                su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl  -D $PGDATA -o
'-i -N 1024 -B 2048 -d 5' -p /usr/bin/postmaster start  >>
/var/log/pgsql.log 2>&1" < /dev/null

- pgsql log shows :
......
}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0
:scanrelid 1 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0
:keycount 3 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0
:numCols 3 :uniqColIdx 3 1 2 }
DEBUG:  ProcessQuery
*********
log stops here for 5/6 minuts
*********
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  reaping dead processes
DEBUG:  child process (pid 1595) exited with exit code 0
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  reaping dead processes
DEBUG:  child process (pid 1598) exited with exit code 0
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  reaping dead processes
DEBUG:  child process (pid 1599) exited with exit code 0
DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  reaping dead processes
DEBUG:  child process (pid 1600) exited with exit code 0


- What should I do to get such queries working?
- Is there any limit on query size?
- Is there anything left in tuning postgres which is causing this problem ?

If you want me to try anything, please let me know.

thanks
Amol






Re: nested select query failing

From
"Victor Yegorov"
Date:
* amol <amol@mithi.com> [15.05.2003 06:47]:
> Hi everybody,
> I am new to this mailing list, so please let me know if I am not posting
> queries the way you are expecting.
>
> -  We are porting a web based application from MSSQL to postgres as a
> backend.
> This is a database intensive application. I am facing a problem in some
> queries like this :
>
> select distinct attached_info.id, ownerid ,attached_info.modified_date from
> attached_info where attached_info.id in (  select distinct
> attached_tag_list.id from attached_tag_list where attached_tag_list.id in
>  select attached_info.id from attached_info where
> ttached_info.deleted='0'  )  and attached_tag_list.id  in ( select id from
> attached_tag_list where attached_tag = 262 )  and
> attached_tag_list.attached_tag in ( select tags.id from tags where tags.id
> in ( select tag_id from tag_classifier, tag_classifier_association where
> classifier_tag_id in ( 261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179,
> 3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, 3315, 87, 1041, 2343,
> 2345, 1869, 3088, 3872, 2651, 2923, 2302, 1681, 3636, 3964, 2778, 2694,
> 1371, 2532, 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700 ) and
> association_id='1566' and
> tag_classifier.uid=tag_classifier_association.uid )  and
> tags.isdeleted='0' )  )  order by attached_info.modified_date desc,
> attached_info.id desc;

IN () constructs isn't a good part of postgres (from the performance point
of view). Try to rewrite your query using joins or EXISTS/NOT EXISTS
constructs.

Search archives for more details, there were a discussion of this topic
lately.

--

Victor Yegorov

Re: nested select query failing

From
"Christopher Kings-Lynne"
Date:
Please post the EXPLAIN ANALYZE of that query...

Chris

----- Original Message -----
From: "amol" <amol@mithi.com>
To: <pgsql-performance@postgresql.org>
Sent: Thursday, May 15, 2003 11:57 AM
Subject: [PERFORM] nested select query failing


> Hi everybody,
> I am new to this mailing list, so please let me know if I am not posting
> queries the way you are expecting.
>
> -  We are porting a web based application from MSSQL to postgres as a
> backend.
> This is a database intensive application. I am facing a problem in some
> queries like this :
>
> select distinct attached_info.id, ownerid ,attached_info.modified_date
from
> attached_info where attached_info.id in (  select distinct
> attached_tag_list.id from attached_tag_list where attached_tag_list.id in
>  select attached_info.id from attached_info where
> ttached_info.deleted='0'  )  and attached_tag_list.id  in ( select id from
> attached_tag_list where attached_tag = 262 )  and
> attached_tag_list.attached_tag in ( select tags.id from tags where tags.id
> in ( select tag_id from tag_classifier, tag_classifier_association where
> classifier_tag_id in ( 261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179,
> 3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080, 3315, 87, 1041, 2343,
> 2345, 1869, 3088, 3872, 2651, 2923, 2302, 1681, 3636, 3964, 2778, 2694,
> 1371, 2532, 2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700 ) and
> association_id='1566' and
> tag_classifier.uid=tag_classifier_association.uid )  and
> tags.isdeleted='0' )  )  order by attached_info.modified_date desc,
> attached_info.id desc;
>
> When I fire this query in psql, it does not return back.
>
> - top command shows postgres above 95+% cpu usage consistantly
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>  1550 postgres  25   0 20268  19M 18904 R    95.3  5.2   6:31 postmaster
>
> - I am using RedHat 8 with following postgres rpms
> postgresql-libs-7.2.2-1
> postgresql-7.2.2-1
> postgresql-server-7.2.2-1
>
> - RAM size is 384 MB, SWAP size is 384 MB, but top shows that memory is
free
>
> - I have done following changes after searching for performance realted
> information on the internet and postgres site
>             - in /etc/rc.d/rc.local added following lines
>                             echo "32768" >/proc/sys/fs/file-max
>                             echo "98304" >/proc/sys/fs/inode-max
>             - in /etc/init.d/postgresql file a pg_ctl call is changed to :
>                 su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl  -D
$PGDATA -o
> '-i -N 1024 -B 2048 -d 5' -p /usr/bin/postmaster start  >>
> /var/log/pgsql.log 2>&1" < /dev/null
>
> - pgsql log shows :
> ......
> }) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0
> :scanrelid 1 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0
> :keycount 3 } :righttree <> :extprm () :locprm () :initplan <> :nprm 0
> :numCols 3 :uniqColIdx 3 1 2 }
> DEBUG:  ProcessQuery
> *********
> log stops here for 5/6 minuts
> *********
> DEBUG:  proc_exit(0)
> DEBUG:  shmem_exit(0)
> DEBUG:  exit(0)
> DEBUG:  reaping dead processes
> DEBUG:  child process (pid 1595) exited with exit code 0
> DEBUG:  proc_exit(0)
> DEBUG:  shmem_exit(0)
> DEBUG:  exit(0)
> DEBUG:  reaping dead processes
> DEBUG:  child process (pid 1598) exited with exit code 0
> DEBUG:  proc_exit(0)
> DEBUG:  shmem_exit(0)
> DEBUG:  exit(0)
> DEBUG:  reaping dead processes
> DEBUG:  child process (pid 1599) exited with exit code 0
> DEBUG:  proc_exit(0)
> DEBUG:  shmem_exit(0)
> DEBUG:  exit(0)
> DEBUG:  reaping dead processes
> DEBUG:  child process (pid 1600) exited with exit code 0
>
>
> - What should I do to get such queries working?
> - Is there any limit on query size?
> - Is there anything left in tuning postgres which is causing this problem
?
>
> If you want me to try anything, please let me know.
>
> thanks
> Amol
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


Re: nested select query failing

From
SZUCS Gábor
Date:
It's a rather nasty query format, but wrapped it to readable form.
Looks like you could make a good join from all these IN's.

Another question: does EXPLAIN (without ANALYZE) work for this query?
Could you send its output, and table defs? maybe a minimal dump in private
email?


QUESTION TO PRO'S:

Basically, is it true that IN's can be converted to RIGHT JOIN's quite
simply? Is it always worth?

G.
--
while (!asleep()) sheep++;

---------------------------- cut here ------------------------------
----- Original Message -----
From: "amol" <amol@mithi.com>
Sent: Thursday, May 15, 2003 5:57 AM


> Hi everybody,
> I am new to this mailing list, so please let me know if I am not posting
> queries the way you are expecting.
>
> -  We are porting a web based application from MSSQL to postgres as a
> backend.
> This is a database intensive application. I am facing a problem in some
> queries like this :
>
> select distinct
>     attached_info.id, ownerid ,attached_info.modified_date
> from attached_info
> where
>     attached_info.id in
>     (select distinct attached_tag_list.id from attached_tag_list
>      where
>         attached_tag_list.id in
>         (select attached_info.id from attached_info
>          where attached_info.deleted='0')  and
>         attached_tag_list.id  in
>         (select id from attached_tag_list
>          where attached_tag = 262)  and
>         attached_tag_list.attached_tag in
>         (select tags.id from tags
>          where
>             tags.id in
>             (select tag_id
>              from tag_classifier, tag_classifier_association
>              where
>                 classifier_tag_id in
>                 (261, 4467, 1894, 1045, 1087, 1355, 72, 1786, 1179,
>                  3090, 871, 3571, 3565, 3569, 3567, 1043, 2535, 1080,
>                  3315, 87, 1041, 2343, 2345, 1869, 3088, 3872, 2651,
>                  2923, 2302, 1681, 3636, 3964, 2778, 2694, 1371, 2532,
>                  2527, 3742, 3740, 1761, 4530, 4671, 4503, 4512, 3700)
>                 and
>                 association_id='1566' and
>                 tag_classifier.uid=tag_classifier_association.uid
>             )  and
>             tags.isdeleted='0'
>         )
>     )
> order by attached_info.modified_date desc, attached_info.id desc;


Re: nested select query failing

From
Stephan Szabo
Date:
On Thu, 15 May 2003, [iso-8859-1] SZUCS G�bor wrote:

> Basically, is it true that IN's can be converted to RIGHT JOIN's quite
> simply? Is it always worth?

I'm not sure you want to convert to an outer join (since you want to throw
away the rows on either side that don't match in an IN).  You also have to
be careful not to get duplicate entries from what was the subquery.

As for whether it's worth doing, in 7.3 and earlier, almost
certainly, in 7.4 almost certainly not. :)




Re: nested select query failing

From
"amol"
Date:
thanks allot everybody for your mails,

- It helped and now I have got down the query execution time allot. But I am
facing problem in following query
-----------
explain analyze select attached_info.id from attached_tag_list,
attached_info
where
         attached_tag_list.attached_tag = 265
         and
         attached_tag_list.id = attached_info.id
----------

- it's result is
----------
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..165349.50 rows=114 width=16) (actual
time=117.14..8994.60 rows=15 loops=1)
  ->  Index Scan using ix_attached_tag_list_id on attached_tag_list
(cost=0.00..111.13 rows=96 width=12) (actual time=0.12..0.66 rows=15
loops=1)
  ->  Seq Scan on attached_info  (cost=0.00..1211.53 rows=33553 width=4)
(actual time=3.67..197.98 rows=33553 loops=15)
Total runtime: 8994.92 msec

EXPLAIN
---------

- I have already indexed attached_info on id using following query
------
CREATE  INDEX attached_info_Index_1 ON attached_info(id) ;
------

- But I am wondering why there is  "->Seq Scan on attached_info."
        After reading various documentation on the internet I am assuming it
should have been an index scan. BTW I have done vaccume analyze also.

Am I right?

thanks,
Amol



----- Original Message -----
From: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>
To: "SZUCS Gábor" <surrano@mailbox.hu>
Cc: <pgsql-performance@postgresql.org>
Sent: Thursday, May 15, 2003 8:26 PM
Subject: Re: [PERFORM] nested select query failing


On Thu, 15 May 2003, [iso-8859-1] SZUCS Gábor wrote:

> Basically, is it true that IN's can be converted to RIGHT JOIN's quite
> simply? Is it always worth?

I'm not sure you want to convert to an outer join (since you want to throw
away the rows on either side that don't match in an IN).  You also have to
be careful not to get duplicate entries from what was the subquery.

As for whether it's worth doing, in 7.3 and earlier, almost
certainly, in 7.4 almost certainly not. :)




---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster




Re: nested select query failing

From
Tom Lane
Date:
"amol" <amol@mithi.com> writes:
> explain analyze select attached_info.id from attached_tag_list,
> attached_info
> where
>          attached_tag_list.attached_tag = 265
>          and
>          attached_tag_list.id = attached_info.id

> NOTICE:  QUERY PLAN:

> Nested Loop  (cost=0.00..165349.50 rows=114 width=16) (actual
> time=117.14..8994.60 rows=15 loops=1)
>   ->  Index Scan using ix_attached_tag_list_id on attached_tag_list
> (cost=0.00..111.13 rows=96 width=12) (actual time=0.12..0.66 rows=15
> loops=1)
>   ->  Seq Scan on attached_info  (cost=0.00..1211.53 rows=33553 width=4)
> (actual time=3.67..197.98 rows=33553 loops=15)
> Total runtime: 8994.92 msec

> - I have already indexed attached_info on id using following query
> CREATE  INDEX attached_info_Index_1 ON attached_info(id) ;

Hm.  I'd have expected an index scan too.  Maybe the two id columns are
not of the same datatype?

            regards, tom lane

Re: nested select query failing

From
"amol"
Date:
Hi Tom,
U are great. As you have said, one item was numeric and another serial
 integer ) so it was applying seq scan. Thank you very much for your help
everybody.

regards,
Amol

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "amol" <amol@mithi.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Tuesday, May 20, 2003 8:48 PM
Subject: Re: [PERFORM] nested select query failing


> "amol" <amol@mithi.com> writes:
> > explain analyze select attached_info.id from attached_tag_list,
> > attached_info
> > where
> >          attached_tag_list.attached_tag = 265
> >          and
> >          attached_tag_list.id = attached_info.id
>
> > NOTICE:  QUERY PLAN:
>
> > Nested Loop  (cost=0.00..165349.50 rows=114 width=16) (actual
> > time=117.14..8994.60 rows=15 loops=1)
> >   ->  Index Scan using ix_attached_tag_list_id on attached_tag_list
> > (cost=0.00..111.13 rows=96 width=12) (actual time=0.12..0.66 rows=15
> > loops=1)
> >   ->  Seq Scan on attached_info  (cost=0.00..1211.53 rows=33553 width=4)
> > (actual time=3.67..197.98 rows=33553 loops=15)
> > Total runtime: 8994.92 msec
>
> > - I have already indexed attached_info on id using following query
> > CREATE  INDEX attached_info_Index_1 ON attached_info(id) ;
>
> Hm.  I'd have expected an index scan too.  Maybe the two id columns are
> not of the same datatype?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>