Re: Recheck condition - Mailing list pgsql-general

From Josh Harrison
Subject Re: Recheck condition
Date
Msg-id 8d89ea1d0711300637j5c567847vece5299ff6f8e8fc@mail.gmail.com
Whole thread Raw
In response to Re: Recheck condition  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general

> > > *Query1*
> > > SELECT person_id  FROM person   WHERE (column1=1 AND column2='62')
> > > INTERSECT
> > > SELECT person_id  FROM person  WHERE (column1=1 AND column2='189')

> I get the same plan(see below)  with 'sort'  for 'intersect all' operation
> too. Why is intersect not an effecient way? Is there any other way this
> query/index can be written/created so that I can get the intersect results
> in an efficient way?

Set operations are rather inefficient. To find the intersection of two
arbitrary sets you need to sort them and compare. A query like you
write would be better expressed as a join, something like:

SELECT a.person_id
FROM (SELECT person_id  FROM person   WHERE (column1=1 AND column2='62') a,
    (SELECT person_id  FROM person  WHERE (column1=1 AND column2='189') b
WHERE a.person_id = b.person_id;

or perhaps:

SELECT a.person_id
FROM person a, person b
WHERE a.column1=1 AND a.column2='62 '
AND b.column1=1 AND b.column2='189'
AND a.person_id = b.person_id;

Which will probably generate a merge join...

Thanks. But this query seems to be more expensive than using intersect operator.
This is the explain analyse plan for this query. It took 5 1/2 minutes to generate this. I also tried to disable the mergejoin and in that case it uses hash join and still takes more than 3 minutes (intersect took only 40 sec)

QUERY PLAN                                                                                                                                                                   
 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1610648.92..10280119.99 rows=577856095 width=4) (actual time=30562.630..264534.677 rows=225145385 loops=1)                                                 
   Merge Cond: (a.patient_id = b.patient_id )                                                                                                                                  
   ->  Sort  (cost=527974.81..528929.79 rows=381993 width=4) (actual time=3755.361..3845.134 rows=213435 loops=1)                                                             
         Sort Key: a.patient_id                                                                                                                                               
         Sort Method:  quicksort  Memory: 15868kB                                                                                                                             
         ->  Bitmap Heap Scan on clinical_variable2 a  (cost= 17886.42..492557.97 rows=381993 width=4) (actual time=315.753..3410.366 rows=327498 loops=1)                     
               Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text))                                                             
               ->  Bitmap Index Scan on clinical_variable_idx_topserv  (cost=0.00..17790.92 rows=381993 width=0) (actual time=277.185..277.185 rows=327498 loops=1)           
                     Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text))                                                         
   ->  Materialize  (cost=1082674.11..1124546.38 rows=3349781 width=4) (actual time=26807.248..99885.620 rows=225148250 loops=1)                                              
         ->  Sort  (cost=1082674.11..1091048.57 rows=3349781 width=4) (actual time=26807.238..30343.870 rows=3429228 loops=1)                                                 
               Sort Key: b.patient_id                                                                                                                                         
               Sort Method:  external merge  Disk: 53552kB                                                                                                                    
               ->  Bitmap Heap Scan on clinical_variable2 b  (cost= 156754.24..679555.96 rows=3349781 width=4) (actual time=2744.126..20106.160 rows=3429228 loops=1)          
                     Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text))                                                      
                     ->  Bitmap Index Scan on clinical_variable_idx_topserv  (cost=0.00..155916.80 rows=3349781 width=0) (actual time=2686.456..2686.456 rows=3429228 loops=1)
                           Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text))                                                  
 Total runtime: 324646.035 ms                                                                                                                                                 

 18 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] a: 0/ms]

Is there any other way you can think of to solve this problem. May be creating the indexes in a  different way or something?

Thanks
jo

pgsql-general by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: PostgresSQL vs Ingress
Next
From: "Pavel Stehule"
Date:
Subject: Re: scrollable cursor in functions