Re: Understanding "seq scans" - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Understanding "seq scans"
Date
Msg-id 561C2050.1070506@aklaver.com
Whole thread Raw
In response to Understanding "seq scans"  (Lele Gaifax <lele@metapensiero.it>)
List pgsql-general
On 10/12/2015 12:06 PM, Lele Gaifax wrote:
> Hi all,
>
> I'm doing some experiments to find the better layout for reimplementing
> an existing db (MySQL cough!) with PostgreSQL 9.4+.
>
> I noticed a strange plan coming out from a simple query joining two tables,
> both containing 10Mrecs (and both ANALYZEd):
>
>      l10ntest=# \d master;
>                                Table "public.master"
>       Column |  Type   |                      Modifiers
>      --------+---------+------------------------------------------------------
>       num    | integer | not null default nextval('master_num_seq'::regclass)
>      Indexes:
>          "master_pkey" PRIMARY KEY, btree (num)
>
>      l10ntest=# \d master_l10n;
>                                      Table "public.master_l10n"
>       Column |         Type         |                         Modifiers
>      --------+----------------------+-----------------------------------------------------------
>       num    | integer              | not null default nextval('master_l10n_num_seq'::regclass)
>       lang   | character varying(2) | not null
>       text   | text                 |
>      Indexes:
>          "master_l10n_pkey" PRIMARY KEY, btree (num, lang)
>          "l10n_text_index" btree (lower(text) text_pattern_ops)
>
>      l10ntest=# EXPLAIN SELECT count(l.num) AS count_1 FROM master_l10n l WHERE l.lang = 'it' AND lower(l.text) LIKE
'quattro%';
>                                                    QUERY PLAN
>      ------------------------------------------------------------------------------------------------------
>       Aggregate  (cost=309315.38..309315.39 rows=1 width=4)
>         ->  Bitmap Heap Scan on master_l10n l  (cost=64700.56..307801.65 rows=605492 width=4)
>               Filter: (((lang)::text = 'it'::text) AND (lower(text) ~~ 'quattro%'::text))
>               ->  Bitmap Index Scan on l10n_text_index  (cost=0.00..64549.19 rows=999662 width=0)
>                     Index Cond: ((lower(text) ~>=~ 'quattro'::text) AND (lower(text) ~<~ 'quattrp'::text))
>      (5 rows)
>
>      Time: 1.665 ms
>
>      l10ntest=# EXPLAIN SELECT count(m.num) AS count_1 FROM master_l10n l JOIN master m ON m.num = l.num WHERE l.lang
='it' AND lower(l.text) LIKE 'quattro%'; 
>                                                       QUERY PLAN
>      ------------------------------------------------------------------------------------------------------------
>       Aggregate  (cost=676558.14..676558.15 rows=1 width=4)
>         ->  Hash Join  (cost=373011.02..675044.41 rows=605492 width=4)
>               Hash Cond: (l.num = m.num)
>               ->  Bitmap Heap Scan on master_l10n l  (cost=64700.56..307801.65 rows=605492 width=4)
>                     Filter: (((lang)::text = 'it'::text) AND (lower(text) ~~ 'quattro%'::text))
>                     ->  Bitmap Index Scan on l10n_text_index  (cost=0.00..64549.19 rows=999662 width=0)
>                           Index Cond: ((lower(text) ~>=~ 'quattro'::text) AND (lower(text) ~<~ 'quattrp'::text))
>               ->  Hash  (cost=144247.76..144247.76 rows=9999976 width=4)
>                     ->  Seq Scan on master m  (cost=0.00..144247.76 rows=9999976 width=4)
>      (9 rows)
>
>      Time: 1.244 ms
>
>      l10ntest=# SELECT count(l.num) AS count_1 FROM master_l10n l WHERE l.lang = 'it' AND lower(l.text) LIKE
'quattro%';
>       count_1
>      ---------
>       1101101
>      (1 row)
>
>      Time: 1221.941 ms
>
>      l10ntest=# SELECT count(m.num) AS count_1 FROM master_l10n l JOIN master m ON m.num = l.num WHERE l.lang = 'it'
ANDlower(l.text) LIKE 'quattro%'; 
>       count_1
>      ---------
>       1101101
>      (1 row)
>
>      Time: 3541.852 ms
>
> Why does the join on the master table require a "Seq Scan on master"? I tried
> different kinds of "JOIN", but the resulting explanation remains the same.
>
> Am I missing something, or should I stop worrying about that sequential scan?

Off hand I would say it is because of this --> count(m.num). Try
count(l.num) instead and see what happens. As your queries above show
they are the same number.

>
> Thanks in advance,
> bye, lele.
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Understanding "seq scans"
Next
From: Lele Gaifax
Date:
Subject: Re: Understanding "seq scans"