Re: Recheck condition - Mailing list pgsql-general

From Josh Harrison
Subject Re: Recheck condition
Date
Msg-id 8d89ea1d0711300451h219be3b9y67cd7c84e96e0817@mail.gmail.com
Whole thread Raw
In response to Re: Recheck condition  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Recheck condition  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-general


On Nov 29, 2007 8:15 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Josh Harrison escribió:
> >
> > > For example if I have a table Person with 3 fields (name,city_id,age).
> > And
> > > the table contains 1000 rows. The table has 2 indexes city_id and age
> > > If I have a query :
> > > SELECT * FROM PERSON WHERE city_id=5 AND AGE=30
>
> Okay....So If I have a query like the above and the query plan shows  a
> 'recheck condition' and bitmap scan, then does that mean it scans the
> indexes first to get the intermediate results and goto the heap only for the
> final data?

Yes.

If the table actually contains 1000 rows, the most likely outcome is
that the bitmaps would not be lossy and therefore no rechecking is
needed at all.  (Tuple bitmaps become lossy only if they have to store a
lot of tuples, in which case they forget the idea of storing each tuple,
and instead "compress" the representation to storing only the page
numbers where matching tuples are to be found).

Note however, that even if the bitmaps are not lossy, the visit to the
heap is still required, because the need to check for visibility.
Thanks...
I have 1 more question in the same line...

Query1
SELECT person_id  FROM person   WHERE (column1=1 AND column2='62')
INTERSECT
SELECT person_id  FROM person  WHERE (column1=1 AND column2='189')
 
There is an index created as person_idx(column1,column2)
     
QUERY PLAN                                                                                                                                                                         
 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 SetOp Intersect  (cost=1750719.48..1769378.35 rows=373177 width=4) (actual time=42913.626..47247.650 rows=6352 loops=1)                                                            
   ->  Sort  (cost=1750719.48..1760048.92 rows=3731774 width=4) (actual time=42913.537..45838.472 rows=3756726 loops=1)                                                             
         Sort Key: "*SELECT* 1".patient_id                                                                                                                                          
         Sort method: external merge Disk:73432kB                                                                                                                                          
         ->  Append  (cost= 17886.42..1209431.67 rows=3731774 width=4) (actual time=1474.995..32215.493 rows=3756726 loops=1)                                                        
               ->  Subquery Scan "*SELECT* 1"  (cost= 17886.42..496377.90 rows=381993 width=4) (actual time=1474.993..4936.240 rows=327498 loops=1)                                  
                     ->  Bitmap Heap Scan on person   (cost=17886.42..492557.97 rows=381993 width=4) (actual time= 1474.990..4735.972 rows=327498 loops=1)            
                           Recheck Cond: ((column1 = 1) AND ((column2)::text = '62'::text))                                                       
                           ->  Bitmap Index Scan on person_idx  (cost= 0.00..17790.92 rows=381993 width=0) (actual time=1469.508..1469.508 rows=327498 loops=1)   
                                 Index Cond: ((column1 = 1) AND ((column2)::text = '62'::text))                                                   
               ->  Subquery Scan "*SELECT* 2"  (cost=156754.24..713053.77 rows=3349781 width=4) (actual time=4142.577..25518.305 rows=3429228 loops=1)                              
                     ->  Bitmap Heap Scan on person   (cost= 156754.24..679555.96 rows=3349781 width=4) (actual time=4142.573..23493.596 rows=3429228 loops=1)        
                           Recheck Cond: ((column1 = 1) AND ((column2)::text = '189'::text))                                                      
                           ->  Bitmap Index Scan on person_idx  (cost=0.00..155916.80 rows=3349781 width=0) (actual time=4136.948..4136.948 rows=3429228 loops=1)
                                 Index Cond: ((column1 = 1) AND ((column2)::text = '189'::text))                                                  
 Total runtime: 47250.501 ms 
     


Question:
In this query Intersection is used. How does postgres handle this? The steps in the above query are
1.find all tuples that match column1=1 AND column2='62'
2. find all tuples that match column1=1 AND column2='189'
3. Find the intersection of the above 2
Does it go to the heap even to get the intermediate results (1 & 2) ?
or
Does it do the first 2 steps using index and go to the heap for the final data?

Also what does Sort method: external merge Disk:73432kB  mean?  Should I have to modify this to make this query run faster? Postgres takes 4 times slower than Oracle to return this query.  Is there a way to make this faster?

Thanks
jo

pgsql-general by date:

Previous
From: Wolfgang Keller
Date:
Subject: Re: Linux v.s. Mac OS-X Performance
Next
From: Alvaro Herrera
Date:
Subject: Re: Recheck condition