Re: Select in subselect vs select = any array - Mailing list pgsql-performance
From | Pavel Stehule |
---|---|
Subject | Re: Select in subselect vs select = any array |
Date | |
Msg-id | AANLkTimbcaiN9PhZeizOGr1aW4SO7h+HfVouxJ7hFijJ@mail.gmail.com Whole thread Raw |
In response to | Re: Select in subselect vs select = any array (Adam Tistler <atistler@gmail.com>) |
List | pgsql-performance |
2011/3/21 Adam Tistler <atistler@gmail.com>: > Pavel, thanks for the help. > > I increased work_mem from 16MB to 64MB, no difference. The queries are really just a test case. My actual queries areactual just large number of primary keys that I am selecting from the db: > > For example: > select * from nodes where node_id in ( 1, 2, 3 ..... ) > > I found that even for small queries, the following is faster: > select * from nodes where node_in = any (array[1,2,3 .... ]) it depends on version. I think so on last postgres, these queries are same, not sure. Regards Pavel > > > Its not really a big deal to me, I was just wondering if others could reproduce it on other systems/versions and if perhapsthis is an issue that I should point out to postgres-dev. > > > Results below: > > logicops2=# explain analyze select count(*) from nodes where node_id in ( select node_id from nodes limit 100000 ); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=3017.18..3017.19 rows=1 width=0) (actual time=1017.051..1017.051 rows=1 loops=1) > -> Nested Loop (cost=2887.05..3016.68 rows=200 width=0) (actual time=157.290..986.329 rows=100000 loops=1) > -> HashAggregate (cost=2887.05..2889.05 rows=200 width=4) (actual time=157.252..241.995 rows=100000 loops=1) > -> Limit (cost=0.00..1637.05 rows=100000 width=4) (actual time=0.009..73.942 rows=100000 loops=1) > -> Seq Scan on nodes (cost=0.00..12355.34 rows=754734 width=4) (actual time=0.008..35.428 rows=100000loops=1) > -> Index Scan using n_node_id_index on nodes (cost=0.00..0.63 rows=1 width=4) (actual time=0.006..0.006 rows=1loops=100000) > Index Cond: (public.nodes.node_id = public.nodes.node_id) > Total runtime: 1017.794 ms > (8 rows) > > logicops2=# explain analyze select count(*) from nodes where node_id = any(array ( select node_id from nodes limit 100000)); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=1718.60..1718.61 rows=1 width=0) (actual time=485.554..485.555 rows=1 loops=1) > InitPlan 1 (returns $0) > -> Limit (cost=0.00..1637.05 rows=100000 width=4) (actual time=0.011..73.037 rows=100000 loops=1) > -> Seq Scan on nodes (cost=0.00..12355.34 rows=754734 width=4) (actual time=0.010..34.462 rows=100000 loops=1) > -> Bitmap Heap Scan on nodes (cost=42.67..81.53 rows=10 width=0) (actual time=433.003..461.108 rows=100000 loops=1) > Recheck Cond: (node_id = ANY ($0)) > -> Bitmap Index Scan on n_node_id_index (cost=0.00..42.67 rows=10 width=0) (actual time=432.810..432.810 rows=100000loops=1) > Index Cond: (node_id = ANY ($0)) > Total runtime: 485.638 ms > (9 rows) > > On Mar 21, 2011, at 1:54 AM, Pavel Stehule wrote: > >> Hello >> >> I think so HashAggregate goes out of memory - you can try to increase >> a work_mem. >> >> There are better queries for counting duplicit then cross join >> >> Regards >> >> Pavel Stehule >> >> 2011/3/21 Adam Tistler <atistler@gmail.com>: >>> logicops2=# explain analyze select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000)); >>> QUERY PLAN >>> ----------------------------------------------------------------------------------------------------------------------------------------- >>> Aggregate (cost=1718.59..1718.60 rows=1 width=0) (actual time=509.126..509.127 rows=1 loops=1) >>> InitPlan 1 (returns $0) >>> -> Limit (cost=0.00..1637.04 rows=100000 width=4) (actual time=0.010..76.604 rows=100000 loops=1) >>> -> Seq Scan on nodes (cost=0.00..12355.41 rows=754741 width=4) (actual time=0.008..38.105 rows=100000 loops=1) >>> -> Bitmap Heap Scan on nodes (cost=42.67..81.53 rows=10 width=0) (actual time=447.274..484.283 rows=100000 loops=1) >>> Recheck Cond: (node_id = ANY ($0)) >>> -> Bitmap Index Scan on n_node_id_index (cost=0.00..42.67 rows=10 width=0) (actual time=447.074..447.074 rows=100000loops=1) >>> Index Cond: (node_id = ANY ($0)) >>> Total runtime: 509.209 ms >>> (9 rows) >>> >>> Time: 510.009 ms >>> >>> >>> logicops2=# explain analyze select count(*) from nodes where node_id in (select node_id from nodes limit 100000); >>> QUERY PLAN >>> ---------------------------------------------------------------------------------------------------------------------------------------- >>> Aggregate (cost=3017.17..3017.18 rows=1 width=0) (actual time=1052.866..1052.866 rows=1 loops=1) >>> -> Nested Loop (cost=2887.04..3016.67 rows=200 width=0) (actual time=167.310..1021.540 rows=100000 loops=1) >>> -> HashAggregate (cost=2887.04..2889.04 rows=200 width=4) (actual time=167.198..251.205 rows=100000 loops=1) >>> -> Limit (cost=0.00..1637.04 rows=100000 width=4) (actual time=0.008..80.090 rows=100000 loops=1) >>> -> Seq Scan on nodes (cost=0.00..12355.41 rows=754741 width=4) (actual time=0.007..41.566 rows=100000loops=1) >>> -> Index Scan using n_node_id_index on nodes (cost=0.00..0.63 rows=1 width=4) (actual time=0.006..0.007 rows=1loops=100000) >>> Index Cond: (public.nodes.node_id = public.nodes.node_id) >>> Total runtime: 1053.523 ms >>> (8 rows) >>> >>> Time: 1054.864 ms >>> >>> >>> >>> On Mar 20, 2011, at 2:51 AM, Pavel Stehule wrote: >>> >>>> Hello >>>> >>>> 2011/3/20 Adam Tistler <atistler@gmail.com>: >>>>> I have noticed that SELECT ... = ANY(ARRAY(...)) is about twice as fast as SELECT IN ( ... ). >>>>> Can anyone explain a reason for this? Results are the bottom and are reproducible. I can test with other versionsif that is necessary. >>>>> >>>> >>>> send a result of EXPLAIN ANALYZE SELECT ..., please >>>> >>>> The reasons can be different - less seq scans, indexes >>>> >>>> Regards >>>> >>>> Pavel Stehule >>>> >>>> >>>> >>>>> ./configure --prefix=/usr/local/pgsql84 --with-openssl --with-perl >>>>> CentOS release 5.4 (Final) >>>>> psql (PostgreSQL) 8.4.1 >>>>> >>>>> prompt2=# select count(*) from nodes; >>>>> count >>>>> -------- >>>>> 754734 >>>>> (1 row) >>>>> >>>>> >>>>> prompt2=# \d nodes >>>>> Table "public.nodes" >>>>> Column | Type | Modifiers >>>>> --------------+--------------------------+----------------------------------------------------------- >>>>> node_id | integer | not null default nextval(('node_id_seq'::text)::regclass) >>>>> node_type_id | integer | not null >>>>> template_id | integer | not null >>>>> timestamp | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone >>>>> Indexes: >>>>> "nodes_pkey" PRIMARY KEY, btree (node_id) >>>>> "n_node_id_index" btree (node_id) >>>>> "n_node_type_id_index" btree (node_type_id) >>>>> "n_template_id_index" btree (template_id) >>>>> >>>>> prompt2=# select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) ); >>>>> count >>>>> -------- >>>>> 100000 >>>>> (1 row) >>>>> >>>>> Time: 404.530 ms >>>>> prompt2=# select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) ); >>>>> count >>>>> -------- >>>>> 100000 >>>>> (1 row) >>>>> >>>>> Time: 407.316 ms >>>>> prompt2=# select count(*) from nodes where node_id = any( Array(select node_id from nodes limit 100000) ); >>>>> count >>>>> -------- >>>>> 100000 >>>>> (1 row) >>>>> >>>>> Time: 408.728 ms >>>>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 ); >>>>> count >>>>> -------- >>>>> 100000 >>>>> (1 row) >>>>> >>>>> Time: 793.840 ms >>>>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 ); >>>>> count >>>>> -------- >>>>> 100000 >>>>> (1 row) >>>>> >>>>> Time: 779.137 ms >>>>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 ); >>>>> count >>>>> -------- >>>>> 100000 >>>>> (1 row) >>>>> >>>>> Time: 781.820 ms >>>>> >>>>> >>>>> -- >>>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>> >>> >>> > >
pgsql-performance by date: