Re: UNION versus SUB SELECT - Mailing list pgsql-performance

From Robert DiFalco
Subject Re: UNION versus SUB SELECT
Date
Msg-id CAAXGW-xj0mCFTFtE=BqwCv-8t=3OF1bB+_vgft8kHLSy7y-D_Q@mail.gmail.com
Whole thread Raw
In response to Re: UNION versus SUB SELECT  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
So far that one was the worst in terms of cost and time. Here are all the plans with buffers, more records, and results being returned. At this point I have good enough performance with my UNION approach but I'm just trying to learn now. WHY is the union approach the fastest? I would have expected the EXISTS or IN approaches to be faster or at least have the SAME cost? At this point I just want to understand.

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select * 
from contacts c 
where EXISTS(
   (select 1 from phone_numbers p where c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')
   union 
   (select 1 id = e.contact_id and e.email = 'robert.xxxx@gmail.com'));
                                                                                QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..23238.90 rows=1425 width=95) (actual time=2.241..46.817 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: (SubPlan 1)
   Rows Removed by Filter: 2843
   Buffers: shared hit=11497
   SubPlan 1
     ->  Unique  (cost=8.13..8.13 rows=2 width=0) (actual time=0.015..0.015 rows=0 loops=2850)
           Output: (1)
           Buffers: shared hit=11440
           ->  Sort  (cost=8.13..8.13 rows=2 width=0) (actual time=0.013..0.013 rows=0 loops=2850)
                 Output: (1)
                 Sort Key: (1)
                 Sort Method: quicksort  Memory: 25kB
                 Buffers: shared hit=11440
                 ->  Append  (cost=0.06..8.13 rows=2 width=0) (actual time=0.009..0.009 rows=0 loops=2850)
                       Buffers: shared hit=11440
                       ->  Index Only Scan using idx_phone on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=2850)
                             Output: 1
                             Index Cond: ((p.contact_id = c.id) AND (p."national" = 5038904993::bigint) AND (p.e164 = '+15038904993'::text))
                             Heap Fetches: 11
                             Buffers: shared hit=5721
                       ->  Index Only Scan using idx_email_full on public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=2850)
                             Output: 1
                             Index Cond: ((e.contact_id = c.id) AND (e.email = 'robert.xxxx@gmail.com'::text))
                             Heap Fetches: 5
                             Buffers: shared hit=5719
 Total runtime: 46.897 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select * 
from contacts where id IN (
  (select c.id
  from contacts c join phone_numbers p on c.id = p.contact_id and p.national = 5038904993 and p.e164 = '+15038904993')
  union 
  (select c.id from contacts c join email_addresses e on c.id = e.contact_id and e.email = 'robert.xxxx@gmail.com'));
                                                                                                              QUERY PLAN                                                                                                               
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=29.38..53.74 rows=6 width=95) (actual time=0.356..0.418 rows=7 loops=1)
   Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
   Buffers: shared hit=87
   ->  HashAggregate  (cost=29.32..29.34 rows=6 width=8) (actual time=0.347..0.354 rows=7 loops=1)
         Output: c.id
         Buffers: shared hit=66
         ->  Append  (cost=0.11..29.32 rows=6 width=8) (actual time=0.047..0.316 rows=16 loops=1)
               Buffers: shared hit=66
               ->  Nested Loop  (cost=0.11..8.12 rows=1 width=8) (actual time=0.045..0.169 rows=11 loops=1)
                     Output: c.id
                     Buffers: shared hit=43
                     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.027..0.047 rows=11 loops=1)
                           Output: p.id, p.contact_id, p."national", p.e164, p.raw_number
                           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
                           Buffers: shared hit=9
                     ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c  (cost=0.06..4.06 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=11)
                           Output: c.id, c.owner_id, c.user_id
                           Index Cond: (c.id = p.contact_id)
                           Heap Fetches: 11
                           Buffers: shared hit=34
               ->  Nested Loop  (cost=2.12..21.17 rows=5 width=8) (actual time=0.057..0.114 rows=5 loops=1)
                     Output: c_1.id
                     Buffers: shared hit=23
                     ->  Bitmap Heap Scan on public.email_addresses e  (cost=2.06..8.85 rows=5 width=8) (actual time=0.044..0.055 rows=5 loops=1)
                           Output: e.id, e.contact_id, e.email
                           Recheck Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
                           Buffers: shared hit=7
                           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06 rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
                                 Index Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
                                 Buffers: shared hit=2
                     ->  Index Only Scan using idx_contacts_pkey_owner on public.contacts c_1  (cost=0.06..2.46 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=5)
                           Output: c_1.id, c_1.owner_id, c_1.user_id
                           Index Cond: (c_1.id = e.contact_id)
                           Heap Fetches: 5
                           Buffers: shared hit=16
   ->  Index Scan using idx_contacts_pkey_owner on public.contacts  (cost=0.06..4.06 rows=1 width=95) (actual time=0.003..0.004 rows=1 loops=7)
         Output: contacts.id, contacts.owner_id, contacts.user_id, contacts.device_id, contacts.last_call, contacts.record_id, contacts.dtype, contacts.blocked, contacts.details_hash, contacts.fname, contacts.lname, contacts.fb_id
         Index Cond: (contacts.id = c.id)
         Buffers: shared hit=21
 Total runtime: 0.535 ms
(40 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select c.*
from contacts c
where EXISTS(select 1 from phone_numbers p where (p.national = 5038904993 and p.e164 = '+15038904993') and p.contact_id = c.id)
   or EXISTS(select 1 from email_addresses e where e.email = 'robert.xxxx@gmail.com' and e.contact_id = c.id);
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=0.00..23213.25 rows=2138 width=95) (actual time=0.209..1.290 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))
   Rows Removed by Filter: 2843
   Buffers: shared hit=73
   SubPlan 1
     ->  Index Only Scan using idx_phone on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((p.contact_id = c.id) AND (p."national" = 5038904993::bigint) AND (p.e164 = '+15038904993'::text))
           Heap Fetches: 0
   SubPlan 2
     ->  Index Scan using idx_phone_address on public.phone_numbers p_1  (cost=0.06..4.06 rows=1 width=8) (actual time=0.033..0.056 rows=11 loops=1)
           Output: p_1.contact_id
           Index Cond: ((p_1."national" = 5038904994::bigint) AND ((p_1.e164)::text = '+15038904993'::text))
           Buffers: shared hit=9
   SubPlan 3
     ->  Index Only Scan using idx_email_full on public.email_addresses e  (cost=0.06..4.06 rows=1 width=0) (never executed)
           Index Cond: ((e.contact_id = c.id) AND (e.email = 'robert.xxxx@gmail.com'::text))
           Heap Fetches: 0
   SubPlan 4
     ->  Bitmap Heap Scan on public.email_addresses e_1  (cost=2.06..8.85 rows=5 width=8) (actual time=0.040..0.050 rows=5 loops=1)
           Output: e_1.contact_id
           Recheck Cond: ((e_1.email)::text = 'robert.xxxx@gmail.com'::text)
           Buffers: shared hit=7
           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06 rows=5 width=0) (actual time=0.030..0.030 rows=6 loops=1)
                 Index Cond: ((e_1.email)::text = 'robert.xxxx@gmail.com'::text)
                 Buffers: shared hit=2
 Total runtime: 1.395 ms
(27 rows)

dft1fjfv106r48=> explain (analyze, buffers, verbose) 
select c.*
from contacts c
where c.id IN (
    select p.contact_id from phone_numbers p where (p.national = 5038904993 and p.e164 = '+15038904993'))
or c.id IN (
    select e.contact_id from email_addresses e where e.email = 'robert.xxxx@gmail.com');
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.contacts c  (cost=12.92..81.32 rows=2138 width=95) (actual time=0.208..1.283 rows=7 loops=1)
   Output: c.id, c.owner_id, c.user_id, c.device_id, c.last_call, c.record_id, c.dtype, c.blocked, c.details_hash, c.fname, c.lname, c.fb_id
   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
   Rows Removed by Filter: 2843
   Buffers: shared hit=73
   SubPlan 1
     ->  Index Scan using idx_phone_address on public.phone_numbers p  (cost=0.06..4.06 rows=1 width=8) (actual time=0.032..0.054 rows=11 loops=1)
           Output: p.contact_id
           Index Cond: ((p."national" = 5038904993::bigint) AND ((p.e164)::text = '+15038904993'::text))
           Buffers: shared hit=9
   SubPlan 2
     ->  Bitmap Heap Scan on public.email_addresses e  (cost=2.06..8.85 rows=5 width=8) (actual time=0.040..0.049 rows=5 loops=1)
           Output: e.contact_id
           Recheck Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
           Buffers: shared hit=7
           ->  Bitmap Index Scan on idx_email_address  (cost=0.00..2.06 rows=5 width=0) (actual time=0.031..0.031 rows=6 loops=1)
                 Index Cond: ((e.email)::text = 'robert.xxxx@gmail.com'::text)
                 Buffers: shared hit=2
 Total runtime: 1.371 ms
(19 rows)

dft1fjfv106r48=> 


On Fri, Nov 22, 2013 at 7:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Nov 21, 2013 at 2:31 PM, desmodemone <desmodemone@gmail.com> wrote:
> Hi Robert, could you try with "exists" ?
>
> SELECT c.*
> FROM contacts c
> WHERE  exists  ( SELECT  1 FROM phone p WHERE p.addr =? and
> p.contact_id=c.id )
> OR exists (SELECT  1 FROM email e WHERE e.addr = ? and  e.contact_id=c.id );

hm, how about:
SELECT c.*
FROM contacts c
WHERE  exists  (
  SELECT  1
  FROM phone p
  WHERE p.addr =? AND p.contact_id=c.id
  UNION ALL
  SELECT  1 FROM email e
  WHERE e.addr = ? AND e.contact_id=c.id
);

merlin

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: UNION versus SUB SELECT
Next
From: Lee Nguyen
Date:
Subject: Postgresql in a Virtual Machine