Re: Slow planning time when public schema included (12 vs. 9.4) - Mailing list pgsql-performance

From Anders Steinlein
Subject Re: Slow planning time when public schema included (12 vs. 9.4)
Date
Msg-id CAC35HN=mieW9zXbJAaYkK3_n8meqgA+gtM71q57SFzy+wrEa7A@mail.gmail.com
Whole thread Raw
In response to Re: Slow planning time when public schema included (12 vs. 9.4)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Slow planning time when public schema included (12 vs. 9.4)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Sat, Mar 21, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anders Steinlein <anders@e5r.no> writes:
> We haven't noticed any issues with this before now, until we started seeing
> really slow planning time on some relatively simple queries:
> ...
> The planning time is the same even if running the same query multiple times
> within the same session. When having only the tenant's schema in the
> search_path, planning time is much improved:

I notice a difference in these plans:

>  Hash Join  (cost=452.96..1887.72 rows=1518 width=41) (actual
> time=6.581..18.845 rows=2945 loops=1)
>    Hash Cond: ((cs.email)::citext = (cl.email)::citext)
                             ^^^^^^               ^^^^^^

>  Hash Join  (cost=452.96..1887.72 rows=1517 width=41) (actual
> time=3.980..8.554 rows=2945 loops=1)
>    Hash Cond: ((cs.email)::text = (cl.email)::text)
                             ^^^^               ^^^^

I think what is happening is that the "cl.email = cs.email" clause
is resolving as a different operator depending on your search path;
probably there is a "citext = citext" operator in the public
schema, and if available the parser will think it's a better match
than the "text = text" operator.  However, "citext = citext" can
be orders of magnitude slower, depending on what locale settings
you're using.  That's affecting your planning time (since the
planner will apply the operator to the values available from
pg_stats), and it's also visibly affecting the query runtime.

Not sure why you'd not have seen the same effect in your 9.4
installation, but maybe you had citext installed somewhere else?

 The citext extension is installed in the public schema in both instances. Also, the second query example that I could run on both 12 and 9.4 runs with the citext comparison in both cases. From 9.4:

mm_prod=> explain analyze select * from segments_with_contacts swc inner join segments s using (sid) inner join contacts_lists cl on cl.email = swc.email and s.lid = cl.lid where swc.sid = 34983 and lstatus = 'a';
                                                                          QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=700.92..1524.11 rows=444 width=187) (actual time=13.800..20.009 rows=2295 loops=1)
   Hash Cond: (swc.email = (cl.email)::citext)
                                     ^^^^^^^^
mm2_prod=> \d segments_with_contacts
Materialized view "eliksir.segments_with_contacts"
 Column |  Type   | Modifiers
--------+---------+-----------
 lid    | integer |
 sid    | integer |
 email  | citext  | 


The tables segments and contacts_lists are identical on the two instances, i.e. both are using citext (email domain using the citext type) on both 12 and 9.4, with the citext extension in the public schema. Is it the lc_collate setting citext cares about? lc_collate=nb_NO.UTF-8 on both 9.4 and 12.

So I don't understand this big difference? Because it does seem like citext is indeed the difference. I tried to modify the query to cast before joining, and it is indeed much improved:

mm_prod=> set search_path = eliksir, public;
SET
mm_prod=> explain analyze select cs.* from contacts_segments cs inner join segments s on s.sid = cs.segment_id inner join contacts_lists cl on lower(cl.email::text) = lower(cs.email::text) and cl.lid = s.lid where cs.segment_id = 34983 and cl.lstatus = 'a';
                                                                             QUERY PLAN                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=2518.61..4565.20 rows=101259 width=41) (actual time=47.278..51.686 rows=2945 loops=1)
   Merge Cond: ((lower((cl.email)::text)) = (lower((cs.email)::text)))
   ->  Sort  (cost=419.78..425.24 rows=2187 width=25) (actual time=18.283..18.516 rows=4646 loops=1)
         Sort Key: (lower((cl.email)::text))
         Sort Method: quicksort  Memory: 665kB
         ->  Nested Loop  (cost=0.56..298.45 rows=2187 width=25) (actual time=0.057..9.805 rows=4646 loops=1)
               ->  Index Scan using segments_pkey on segments s  (cost=0.27..2.49 rows=1 width=8) (actual time=0.021..0.022 rows=1 loops=1)
                     Index Cond: (sid = 34983)
               ->  Index Scan using contacts_lists_lid_idx on contacts_lists cl  (cost=0.29..288.53 rows=744 width=25) (actual time=0.023..4.953 rows=4646 loops=1)
                     Index Cond: (lid = s.lid)
                     Filter: ((lstatus)::bpchar = 'a'::bpchar)
                     Rows Removed by Filter: 6628
   ->  Sort  (cost=2098.83..2121.98 rows=9258 width=41) (actual time=28.988..29.373 rows=9258 loops=1)
         Sort Key: (lower((cs.email)::text))
         Sort Method: quicksort  Memory: 1598kB
         ->  Bitmap Heap Scan on contacts_segments cs  (cost=127.17..1488.89 rows=9258 width=41) (actual time=0.511..7.910 rows=9258 loops=1)
               Recheck Cond: (segment_id = 34983)
               Heap Blocks: exact=1246
               ->  Bitmap Index Scan on contacts_segments_segment_id_idx  (cost=0.00..124.86 rows=9258 width=0) (actual time=0.390..0.391 rows=9258 loops=1)
                     Index Cond: (segment_id = 34983)
 Planning Time: 0.416 ms
 Execution Time: 51.924 ms
(22 rows)

Best,
-- a.

pgsql-performance by date:

Previous
From: Anders Steinlein
Date:
Subject: Re: Slow planning time when public schema included (12 vs. 9.4)
Next
From: Tom Lane
Date:
Subject: Re: Slow planning time when public schema included (12 vs. 9.4)