Thread: Slow planning time when public schema included (12 vs. 9.4)

Slow planning time when public schema included (12 vs. 9.4)

From
Anders Steinlein
Date:
Hi,

We have a multi-tenant app where each tenant has their own schemas, and inherits tables from tables defined in the public schema. Other shared data such as data types are also stored in the public schema. While running this app, every transaction is started with setting the search_path to <tenant_id>, public. 

We haven't noticed any issues with this before now, until we started seeing really slow planning time on some relatively simple queries:

mm_prod=> explain analyze select cs.* from contacts_segments cs inner join segments s on s.sid = cs.segment_id inner join contacts_lists cl on cl.email = cs.email and cl.lid = s.lid where cs.segment_id = 34983 and cl.lstatus = 'a';
                                                                             QUERY PLAN                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=452.96..1887.72 rows=1518 width=41) (actual time=6.581..18.845 rows=2945 loops=1)
   Hash Cond: ((cs.email)::citext = (cl.email)::citext)
   ->  Bitmap Heap Scan on contacts_segments cs  (cost=127.17..1488.89 rows=9258 width=41) (actual time=0.501..4.085 rows=9258 loops=1)
         Recheck Cond: (segment_id = 34983)
         Heap Blocks: exact=1246
         ->  Bitmap Index Scan on contacts_segments_segment_id_idx  (cost=0.00..124.86 rows=9258 width=0) (actual time=0.380..0.380 rows=9258 loops=1)
               Index Cond: (segment_id = 34983)
   ->  Hash  (cost=298.45..298.45 rows=2187 width=25) (actual time=6.061..6.061 rows=4645 loops=1)
         Buckets: 8192 (originally 4096)  Batches: 1 (originally 1)  Memory Usage: 324kB
         ->  Nested Loop  (cost=0.56..298.45 rows=2187 width=25) (actual time=0.025..3.182 rows=4645 loops=1)
               ->  Index Scan using segments_pkey on segments s  (cost=0.27..2.49 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
                     Index Cond: (sid = 34983)
               ->  Index Scan using contacts_lists_lid_idx on contacts_lists cl  (cost=0.29..288.53 rows=744 width=25) (actual time=0.012..2.791 rows=4645 loops=1)
                     Index Cond: (lid = s.lid)
                     Filter: ((lstatus)::bpchar = 'a'::bpchar)
                     Rows Removed by Filter: 6628
 Planning Time: 1930.901 ms
 Execution Time: 18.996 ms
(18 rows)

The planning time is the same even if running the same query multiple times within the same session. When having only the tenant's schema in the search_path, planning time is much improved:

mm_prod=> set search_path = eliksir;
SET
mm_prod=> explain analyze select cs.* from contacts_segments cs inner join segments s on s.sid = cs.segment_id inner join contacts_lists cl on cl.email = cs.email and cl.lid = s.lid where cs.segment_id = 34983 and cl.lstatus = 'a';
                                                                             QUERY PLAN                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=452.96..1887.72 rows=1517 width=41) (actual time=3.980..8.554 rows=2945 loops=1)
   Hash Cond: ((cs.email)::text = (cl.email)::text)
   ->  Bitmap Heap Scan on contacts_segments cs  (cost=127.17..1488.89 rows=9258 width=41) (actual time=0.495..3.467 rows=9258 loops=1)
         Recheck Cond: (segment_id = 34983)
         Heap Blocks: exact=1246
         ->  Bitmap Index Scan on contacts_segments_segment_id_idx  (cost=0.00..124.86 rows=9258 width=0) (actual time=0.376..0.376 rows=9258 loops=1)
               Index Cond: (segment_id = 34983)
   ->  Hash  (cost=298.45..298.45 rows=2187 width=25) (actual time=3.476..3.476 rows=4645 loops=1)
         Buckets: 8192 (originally 4096)  Batches: 1 (originally 1)  Memory Usage: 324kB
         ->  Nested Loop  (cost=0.56..298.45 rows=2187 width=25) (actual time=0.019..2.726 rows=4645 loops=1)
               ->  Index Scan using segments_pkey on segments s  (cost=0.27..2.49 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=1)
                     Index Cond: (sid = 34983)
               ->  Index Scan using contacts_lists_lid_idx on contacts_lists cl  (cost=0.29..288.53 rows=744 width=25) (actual time=0.012..2.394 rows=4645 loops=1)
                     Index Cond: (lid = s.lid)
                     Filter: ((lstatus)::bpchar = 'a'::bpchar)
                     Rows Removed by Filter: 6628
 Planning Time: 23.416 ms
 Execution Time: 8.668 ms
(18 rows)

To give the schema:

mm_prod=> \d contacts_segments
                     Table "eliksir.contacts_segments"
   Column   |            Type             | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
 email      | email                       |           | not null |
 segment_id | integer                     |           | not null |
 entered_at | timestamp without time zone |           | not null | now()
 exited_at  | timestamp without time zone |           |          |
Indexes:
    "contacts_segments_pkey" PRIMARY KEY, btree (email, segment_id)
    "contacts_segments_segment_id_idx" btree (segment_id)
Foreign-key constraints:
    "contacts_segments_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE
    "contacts_segments_segment_id_fkey" FOREIGN KEY (segment_id) REFERENCES segments(sid) ON DELETE CASCADE DEFERRABLE
Inherits: public.contacts_segments

mm_prod=> \d segments
                                         Table "eliksir.segments"
    Column    |            Type             | Collation | Nullable |                Default                
--------------+-----------------------------+-----------+----------+---------------------------------------
 sid          | integer                     |           | not null | nextval('segments_sid_seq'::regclass)
 lid          | integer                     |           | not null |
 segmentname  | text                        |           | not null |
 createdat    | timestamp without time zone |           | not null | now()
 label        | text                        |           |          |
 num_contacts | integer                     |           |          |
 cid          | integer                     |           |          |
 mid          | integer                     |           |          |
 original_sid | integer                     |           |          |
 archived_at  | timestamp without time zone |           |          |
Indexes:
    "segments_pkey" PRIMARY KEY, btree (sid)
    "segments_name" UNIQUE, btree (lid, segmentname)
Foreign-key constraints:
    "segments_cid_fkey" FOREIGN KEY (cid) REFERENCES campaigns(cid) ON UPDATE RESTRICT ON DELETE CASCADE DEFERRABLE
    "segments_lid_fkey" FOREIGN KEY (lid) REFERENCES lists(lid) ON UPDATE RESTRICT ON DELETE CASCADE DEFERRABLE
    "segments_mid_fkey" FOREIGN KEY (mid) REFERENCES mails(mid) ON UPDATE RESTRICT ON DELETE CASCADE DEFERRABLE
    "segments_original_sid_fkey" FOREIGN KEY (original_sid) REFERENCES segments(sid) ON UPDATE RESTRICT ON DELETE CASCADE DEFERRABLE
Referenced by:
    TABLE "contacts_segments" CONSTRAINT "contacts_segments_segment_id_fkey" FOREIGN KEY (segment_id) REFERENCES segments(sid) ON DELETE CASCADE DEFERRABLE
    TABLE "mails_segments" CONSTRAINT "mails_segments_sid_fkey" FOREIGN KEY (sid) REFERENCES segments(sid) ON UPDATE RESTRICT ON DELETE SET NULL DEFERRABLE
    TABLE "segments" CONSTRAINT "segments_original_sid_fkey" FOREIGN KEY (original_sid) REFERENCES segments(sid) ON UPDATE RESTRICT ON DELETE CASCADE DEFERRABLE
Inherits: public.segments

mm_prod=> \d contacts_lists
                                Table "eliksir.contacts_lists"
           Column           |            Type             | Collation | Nullable |   Default  
----------------------------+-----------------------------+-----------+----------+-------------
 email                      | email                       |           | not null |
 lid                        | integer                     |           | not null |
 lstatus                    | contact_status              |           | not null | 'a'::bpchar
 ladded                     | timestamp without time zone |           | not null | now()
 lstatuschanged             | timestamp without time zone |           | not null | now()
 skip_preexisting_campaigns | boolean                     |           |          |
 source_api_client_id       | uuid                        |           |          |
 source_integration_id      | text                        |           |          |
 source_form_id             | integer                     |           |          |
 source_user_id             | text                        |           |          |
 is_bulk_added              | boolean                     |           |          | false
Indexes:
    "contacts_lists_pkey" PRIMARY KEY, btree (email, lid)
    "contacts_lists_lid_idx" btree (lid) CLUSTER
    "contacts_lists_lstatus_idx" btree (lstatus)
    "contacts_lists_lstatuschanged_idx" btree (lstatuschanged)
Foreign-key constraints:
    "contacts_lists_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE
    "contacts_lists_lid_fkey" FOREIGN KEY (lid) REFERENCES lists(lid) ON UPDATE RESTRICT ON DELETE CASCADE DEFERRABLE
    "contacts_lists_source_form_id_fkey" FOREIGN KEY (source_form_id) REFERENCES forms(id) ON UPDATE RESTRICT ON DELETE SET NULL DEFERRABLE
Triggers:
    [multiple triggers]
Inherits: public.contacts_lists

I tried investigated a bit PostgreSQL 12 vs 9.4 we were on a few weeks ago, and while this exact query can not be run on our 9.4 database (since some tables here are new), I found a similar query where the data is more or less unchanged in the old backup. In this query, planning time is much slower and constant in 12 compared to 9.4:

PostgreSQL 12.2:

mm_prod=> set search_path = eliksir, public;
SET
mm_prod=> explain analyze select * from segments_with_contacts swc inner join segments s using (sid) inner join contacts_lists cl on cl.email = swc.email and s.lid = cl.lid where swc.sid = 34983 and lstatus = 'a';
                                                                                       QUERY PLAN                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=326.21..931.38 rows=402 width=187) (actual time=6.708..10.343 rows=2216 loops=1)
   Hash Cond: (swc.email = (cl.email)::citext)
   ->  Index Only Scan using segments_with_contacts_sid_lid_email_idx on segments_with_contacts swc  (cost=0.42..587.71 rows=2218 width=29) (actual time=0.013..0.546 rows=2216 loops=1)
         Index Cond: (sid = 34983)
         Heap Fetches: 2216
   ->  Hash  (cost=298.45..298.45 rows=2187 width=162) (actual time=6.687..6.687 rows=4645 loops=1)
         Buckets: 8192 (originally 4096)  Batches: 1 (originally 1)  Memory Usage: 817kB
         ->  Nested Loop  (cost=0.56..298.45 rows=2187 width=162) (actual time=0.019..3.315 rows=4645 loops=1)
               ->  Index Scan using segments_pkey on segments s  (cost=0.27..2.49 rows=1 width=74) (actual time=0.005..0.006 rows=1 loops=1)
                     Index Cond: (sid = 34983)
               ->  Index Scan using contacts_lists_lid_idx on contacts_lists cl  (cost=0.29..288.53 rows=744 width=88) (actual time=0.011..2.572 rows=4645 loops=1)
                     Index Cond: (lid = s.lid)
                     Filter: ((lstatus)::bpchar = 'a'::bpchar)
                     Rows Removed by Filter: 6628
 Planning Time: 1096.942 ms
 Execution Time: 10.473 ms
(16 rows)


PostgreSQL 9.4.12:

mm_prod=> set search_path = eliksir, public;
SET
mm_prod=> explain analyze select * from segments_with_contacts swc inner join segments s using (sid) inner join contacts_lists cl on cl.email = swc.email and s.lid = cl.lid where swc.sid = 34983 and lstatus = 'a';
                                                                          QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=700.92..1524.11 rows=444 width=187) (actual time=13.800..20.009 rows=2295 loops=1)
   Hash Cond: (swc.email = (cl.email)::citext)
   ->  Bitmap Heap Scan on segments_with_contacts swc  (cost=110.44..888.50 rows=2325 width=29) (actual time=0.318..0.737 rows=2295 loops=1)
         Recheck Cond: (sid = 34983)
         Heap Blocks: exact=19
         ->  Bitmap Index Scan on segments_with_contacts_sid_lid_email_idx  (cost=0.00..109.86 rows=2325 width=0) (actual time=0.299..0.299 rows=2335 loops=1)
               Index Cond: (sid = 34983)
   ->  Hash  (cost=559.75..559.75 rows=2459 width=162) (actual time=13.401..13.401 rows=4624 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 749kB
         ->  Nested Loop  (cost=51.55..559.75 rows=2459 width=162) (actual time=1.375..7.515 rows=4624 loops=1)
               ->  Index Scan using segments_pkey on segments s  (cost=0.28..8.29 rows=1 width=74) (actual time=0.012..0.015 rows=1 loops=1)
                     Index Cond: (sid = 34983)
               ->  Bitmap Heap Scan on contacts_lists cl  (cost=51.27..543.86 rows=760 width=88) (actual time=1.355..6.525 rows=4624 loops=1)
                     Recheck Cond: (lid = s.lid)
                     Filter: ((lstatus)::bpchar = 'a'::bpchar)
                     Rows Removed by Filter: 6428
                     Heap Blocks: exact=455
                     ->  Bitmap Index Scan on contacts_lists_lid_idx  (cost=0.00..51.08 rows=1439 width=0) (actual time=1.275..1.275 rows=11533 loops=1)
                           Index Cond: (lid = s.lid)
 Planning time: 22.120 ms
 Execution time: 20.337 ms
(21 rows)

PostgreSQL 12.2 is run on EPYC 7302P 16-core, 64GB RAM, 2x NVMe disks in RAID1, Ubuntu 18.04.1 running 5.0.0-37 HWE kernel. Some settings:

constraint_exclusion = partition (I tried setting this to off or on, to see if that made any difference; nope)
default_statistics_target = 1000
effective_cache_size = 48GB
effective_io_concurrency = 200
max_worker_processes = 16
random_page_cost = 1.1
shared_buffers = 16GB
work_mem = 16MB

Thanks for any guidance.

-- a.

Re: Slow planning time when public schema included (12 vs. 9.4)

From
Andreas Kretschmer
Date:

Am 21.03.20 um 13:02 schrieb Anders Steinlein:
> default_statistics_target = 1000

not sure if this be the culprit here, but i think this is way too high. 
Leave it at the normal value of 100 and raise it only for particular 
tables and columns.

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Slow planning time when public schema included (12 vs. 9.4)

From
Tom Lane
Date:
Anders Steinlein <anders@e5r.no> writes:
> We haven't noticed any issues with this before now, until we started seeing
> really slow planning time on some relatively simple queries:
> ...
> The planning time is the same even if running the same query multiple times
> within the same session. When having only the tenant's schema in the
> search_path, planning time is much improved:

I notice a difference in these plans:

>  Hash Join  (cost=452.96..1887.72 rows=1518 width=41) (actual
> time=6.581..18.845 rows=2945 loops=1)
>    Hash Cond: ((cs.email)::citext = (cl.email)::citext)
                             ^^^^^^               ^^^^^^

>  Hash Join  (cost=452.96..1887.72 rows=1517 width=41) (actual
> time=3.980..8.554 rows=2945 loops=1)
>    Hash Cond: ((cs.email)::text = (cl.email)::text)
                             ^^^^               ^^^^

I think what is happening is that the "cl.email = cs.email" clause
is resolving as a different operator depending on your search path;
probably there is a "citext = citext" operator in the public
schema, and if available the parser will think it's a better match
than the "text = text" operator.  However, "citext = citext" can
be orders of magnitude slower, depending on what locale settings
you're using.  That's affecting your planning time (since the
planner will apply the operator to the values available from
pg_stats), and it's also visibly affecting the query runtime.

Not sure why you'd not have seen the same effect in your 9.4
installation, but maybe you had citext installed somewhere else?

            regards, tom lane



Re: Slow planning time when public schema included (12 vs. 9.4)

From
Anders Steinlein
Date:
On Sat, Mar 21, 2020 at 2:37 PM Andreas Kretschmer <andreas@a-kretschmer.de> wrote:


Am 21.03.20 um 13:02 schrieb Anders Steinlein:
> default_statistics_target = 1000

not sure if this be the culprit here, but i think this is way too high.
Leave it at the normal value of 100 and raise it only for particular
tables and columns.

It may very well be too high, but the 9.4 instance also has default_statistics_target = 1000.

Best,
-- a.

Re: Slow planning time when public schema included (12 vs. 9.4)

From
Anders Steinlein
Date:
On Sat, Mar 21, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anders Steinlein <anders@e5r.no> writes:
> We haven't noticed any issues with this before now, until we started seeing
> really slow planning time on some relatively simple queries:
> ...
> The planning time is the same even if running the same query multiple times
> within the same session. When having only the tenant's schema in the
> search_path, planning time is much improved:

I notice a difference in these plans:

>  Hash Join  (cost=452.96..1887.72 rows=1518 width=41) (actual
> time=6.581..18.845 rows=2945 loops=1)
>    Hash Cond: ((cs.email)::citext = (cl.email)::citext)
                             ^^^^^^               ^^^^^^

>  Hash Join  (cost=452.96..1887.72 rows=1517 width=41) (actual
> time=3.980..8.554 rows=2945 loops=1)
>    Hash Cond: ((cs.email)::text = (cl.email)::text)
                             ^^^^               ^^^^

I think what is happening is that the "cl.email = cs.email" clause
is resolving as a different operator depending on your search path;
probably there is a "citext = citext" operator in the public
schema, and if available the parser will think it's a better match
than the "text = text" operator.  However, "citext = citext" can
be orders of magnitude slower, depending on what locale settings
you're using.  That's affecting your planning time (since the
planner will apply the operator to the values available from
pg_stats), and it's also visibly affecting the query runtime.

Not sure why you'd not have seen the same effect in your 9.4
installation, but maybe you had citext installed somewhere else?

 The citext extension is installed in the public schema in both instances. Also, the second query example that I could run on both 12 and 9.4 runs with the citext comparison in both cases. From 9.4:

mm_prod=> explain analyze select * from segments_with_contacts swc inner join segments s using (sid) inner join contacts_lists cl on cl.email = swc.email and s.lid = cl.lid where swc.sid = 34983 and lstatus = 'a';
                                                                          QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=700.92..1524.11 rows=444 width=187) (actual time=13.800..20.009 rows=2295 loops=1)
   Hash Cond: (swc.email = (cl.email)::citext)
                                     ^^^^^^^^
mm2_prod=> \d segments_with_contacts
Materialized view "eliksir.segments_with_contacts"
 Column |  Type   | Modifiers
--------+---------+-----------
 lid    | integer |
 sid    | integer |
 email  | citext  | 


The tables segments and contacts_lists are identical on the two instances, i.e. both are using citext (email domain using the citext type) on both 12 and 9.4, with the citext extension in the public schema. Is it the lc_collate setting citext cares about? lc_collate=nb_NO.UTF-8 on both 9.4 and 12.

So I don't understand this big difference? Because it does seem like citext is indeed the difference. I tried to modify the query to cast before joining, and it is indeed much improved:

mm_prod=> set search_path = eliksir, public;
SET
mm_prod=> explain analyze select cs.* from contacts_segments cs inner join segments s on s.sid = cs.segment_id inner join contacts_lists cl on lower(cl.email::text) = lower(cs.email::text) and cl.lid = s.lid where cs.segment_id = 34983 and cl.lstatus = 'a';
                                                                             QUERY PLAN                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=2518.61..4565.20 rows=101259 width=41) (actual time=47.278..51.686 rows=2945 loops=1)
   Merge Cond: ((lower((cl.email)::text)) = (lower((cs.email)::text)))
   ->  Sort  (cost=419.78..425.24 rows=2187 width=25) (actual time=18.283..18.516 rows=4646 loops=1)
         Sort Key: (lower((cl.email)::text))
         Sort Method: quicksort  Memory: 665kB
         ->  Nested Loop  (cost=0.56..298.45 rows=2187 width=25) (actual time=0.057..9.805 rows=4646 loops=1)
               ->  Index Scan using segments_pkey on segments s  (cost=0.27..2.49 rows=1 width=8) (actual time=0.021..0.022 rows=1 loops=1)
                     Index Cond: (sid = 34983)
               ->  Index Scan using contacts_lists_lid_idx on contacts_lists cl  (cost=0.29..288.53 rows=744 width=25) (actual time=0.023..4.953 rows=4646 loops=1)
                     Index Cond: (lid = s.lid)
                     Filter: ((lstatus)::bpchar = 'a'::bpchar)
                     Rows Removed by Filter: 6628
   ->  Sort  (cost=2098.83..2121.98 rows=9258 width=41) (actual time=28.988..29.373 rows=9258 loops=1)
         Sort Key: (lower((cs.email)::text))
         Sort Method: quicksort  Memory: 1598kB
         ->  Bitmap Heap Scan on contacts_segments cs  (cost=127.17..1488.89 rows=9258 width=41) (actual time=0.511..7.910 rows=9258 loops=1)
               Recheck Cond: (segment_id = 34983)
               Heap Blocks: exact=1246
               ->  Bitmap Index Scan on contacts_segments_segment_id_idx  (cost=0.00..124.86 rows=9258 width=0) (actual time=0.390..0.391 rows=9258 loops=1)
                     Index Cond: (segment_id = 34983)
 Planning Time: 0.416 ms
 Execution Time: 51.924 ms
(22 rows)

Best,
-- a.

Re: Slow planning time when public schema included (12 vs. 9.4)

From
Tom Lane
Date:
Anders Steinlein <anders@e5r.no> writes:
> On Sat, Mar 21, 2020 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Not sure why you'd not have seen the same effect in your 9.4
>> installation, but maybe you had citext installed somewhere else?

> The tables segments and contacts_lists are identical on the two instances,
> i.e. both are using citext (email domain using the citext type) on both 12
> and 9.4, with the citext extension in the public schema. Is it the
> lc_collate setting citext cares about? lc_collate=nb_NO.UTF-8 on both 9.4
> and 12.

I think it depends on both lc_collate and lc_ctype, since basically
what it's doing is lower() on each string and then a strcoll()
comparison.  The strcoll() part should be pretty much equivalent to
text comparisons, though ... or, hmm, maybe not.  texteq() knows
it can reduce that to just a memcmp bitwise-equality test, but
citext doesn't have that optimization.

> So I don't understand this big difference? Because it does seem like citext
> is indeed the difference.

It seems odd to me too.  I'm not at all surprised that citext comparison
is way slower than text, but I am surprised that you don't see that on 9.4
as well.  Is lc_ctype the same in both installs?  For that matter, is the
underlying libc the same?  We have seen large performance discrepancies
between different libc versions in this area.

If you're interested in digging further, getting a "perf" profile while
running the problem query over and over would likely yield some insight
about where the time is going.

https://wiki.postgresql.org/wiki/Profiling_with_perf

            regards, tom lane



Re: Slow planning time when public schema included (12 vs. 9.4)

From
Anders Steinlein
Date:
On Sat, Mar 21, 2020 at 8:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anders Steinlein <anders@e5r.no> writes:
> So I don't understand this big difference? Because it does seem like citext
> is indeed the difference.

It seems odd to me too.  I'm not at all surprised that citext comparison
is way slower than text, but I am surprised that you don't see that on 9.4
as well.

Indeed. But also, how come this is part of the planner time? I would think that would be part of the execution time? (Just a detail I'm curious about.)

Is lc_ctype the same in both installs?

Yes, lc_ctype is also nb_NO.UTF-8 on both installs.
 
For that matter, is the underlying libc the same?  We have seen large performance discrepancies
between different libc versions in this area.

This they most definitely are not. 9.4 was running on an old box, Ubuntu 12.04, while 12 is on an up-to-date Ubuntu 18.04 LTS. AFAICS, 2.15 on the 9.4 box and 2.27 on the 12 box.

If you're interested in digging further, getting a "perf" profile while
running the problem query over and over would likely yield some insight
about where the time is going.

I collected a profile now, but I've never done this before so I'm unsure how to read the report. I'll email you directly with a link to the perf.data file, if you would be so kind as to take a quick look. From what little I think I understand, towlower from libc seems to take up 32% of the total time, although that by itself doesn't seem to explain almost 2 second planner time vs. 20ms... Should really citext/libc string comparison "issues" cause this order of magnitude slower planner time?

Best,
-- a.

Re: Slow planning time when public schema included (12 vs. 9.4)

From
Tom Lane
Date:
Anders Steinlein <anders@e5r.no> writes:
> On Sat, Mar 21, 2020 at 8:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It seems odd to me too.  I'm not at all surprised that citext comparison
>> is way slower than text, but I am surprised that you don't see that on 9.4
>> as well.

> Indeed. But also, how come this is part of the planner time? I would think
> that would be part of the execution time? (Just a detail I'm curious about.)

As part of estimating the size of a join, the planner will run through all
the most-common-values available from pg_stats and see which values from
one table match to which values from the other.  If you have a lot of MCVs
(which'd involve a fairly flat, but not unique, data distribution and a
large stats target setting) and a slow join operator, it's not hard for
that to take a lot of time.  You might care to look into pg_stats and see
just how big those arrays are for each of these columns.

But 9.4 did that too, so we're still at a loss as to why v12 is so much
slower.

> This they most definitely are not. 9.4 was running on an old box, Ubuntu
> 12.04, while 12 is on an up-to-date Ubuntu 18.04 LTS. AFAICS, 2.15 on the
> 9.4 box and 2.27 on the 12 box.

I'm suspicious that the root issue has to do with libc differences,
but I haven't any hard data to back that up with.

Another possibility perhaps is that v12's ANALYZE is collecting a lot
more "common" values than 9.4 did.  Whether it is or not, the advice
you already got to ratchet down the stats target would likely be
helpful to reduce the planning time.

            regards, tom lane



Re: Slow planning time when public schema included (12 vs. 9.4)

From
Anders Steinlein
Date:
On Sat, Mar 21, 2020 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anders Steinlein <anders@e5r.no> writes:
> This they most definitely are not. 9.4 was running on an old box, Ubuntu
> 12.04, while 12 is on an up-to-date Ubuntu 18.04 LTS. AFAICS, 2.15 on the
> 9.4 box and 2.27 on the 12 box.

I'm suspicious that the root issue has to do with libc differences,
but I haven't any hard data to back that up with.

Another possibility perhaps is that v12's ANALYZE is collecting a lot
more "common" values than 9.4 did.  Whether it is or not, the advice
you already got to ratchet down the stats target would likely be
helpful to reduce the planning time.

Yes, indeed, lowering the statistics target to the default 100 decreased the planning time a lot -- to sub-10m! Thanks for the guidance, although the excessive difference between the two boxes/libc versions are disappointing, to say the least.

Do you have any insight into how the Postgres 12 nondeterministic collation feature (with ICU) compares performance-wise in general? Although having a much lower statistics target "fixed" this, I'm concerned joins and sorting is slower in general after having uncovered this (we haven't dug into that performance numbers yet), since email (citext) are PKs in a lot of our tables. Would changing our email domain using citext to instead be a domain over text using a case-insensitive collation be a better choice?

Thanks again,
-- a.