Re: AWS forcing PG upgrade from v9.6 a disaster - Mailing list pgsql-performance

From Dean Gibson (DB Administrator)
Subject Re: AWS forcing PG upgrade from v9.6 a disaster
Date
Msg-id 92cc8563-195f-8727-d887-5f2bb63f231f@mailpen.com
Whole thread Raw
In response to AWS forcing PG upgrade from v9.6 a disaster  ("Dean Gibson (DB Administrator)" <postgresql@mailpen.com>)
Responses Re: AWS forcing PG upgrade from v9.6 a disaster
Re: AWS forcing PG upgrade from v9.6 a disaster
List pgsql-performance
SOLVED !!!  Below is the new EXPLAIN ANALYZE for 13.2 on AWS RDS (with no changes to server parameters) along with the prior EXPLAIN ANALYZE outputs for easy comparison.

While I didn't discount the significance & effect of optimizing the server parameters, this problem always seemed to me like a fundamental difference in how the PostgreSQL planner viewed the structure of the query.  In particular, I had a usage pattern of writing VIEWS that worked very well with v9.6 & prior versions, but which made me suspect a route of attack:

Since the FCC tables contain lots of one-character codes for different conditions, to simplify maintenance & displays to humans, I created over twenty tiny lookup tables (a dozen or so entries in each table), to render a human-readable field as a replacement for the original one-character field in many of the VIEWs.  In some cases those "humanized" fields were used as conditions in SELECT statements.  Of course, fields that are not referenced or selected for output from a particular query, never get looked up (an advantage over using a JOIN for each lookup).  In some cases, for ease of handling multiple or complex lookups, I indeed used a JOIN.  All this worked fine until v10.

Here's the FROM clause that bit me:

   FROM lic_en
     JOIN govt_region   USING (territory_id, country_id)
     LEFT JOIN zip_code USING (territory_id, country_id, zip5)
     LEFT JOIN "County" USING (territory_id, country_id, fips_county);


The first two JOINs are not the problem, & are in fact retained in my solution.  The problem is the third JOIN, where "fips_county" from "County" is actually matched with the corresponding field from the "zip_code" VIEW.  Works fine, if you don't mind the performance impact in v10 & above.  It has now been rewritten, to be a sub-query for an output field.  Voila !  Back to sub-second query times.

This also solved performance issues with other queries as well.  I also now use lookup values as additional fields in the output, in addition to the original fields, which should help some more (but means some changes to some web pages that do queries).

-- Dean

ps: I wonder how many other RDS users of v9.6 are going to get a very rude awakening very soon, as AWS is not allowing new instances of v9.6 after August 2 (see https://forums.aws.amazon.com/ann.jspa?annID=8499 ).  Whether that milestone affects restores from snapshots, remains to be seen (by others, not by me).  In other words, users should plan to be up & running on a newer version well before August.  Total cost to me?  I"m in my 8th day of dealing with this, & I still have a number of web pages to update, due to changes in SQL field names to manage this mess.  This was certainly not a obvious solution.

Here's from 13.2 (new):

=> EXPLAIN ANALYZE  SELECT club_count, extra_count, region_count, callsign AS trustee_callsign, applicant_type, entity_name, licensee_id AS _lid FROM genclub_multi_  WHERE club_count >= 5 ORDER BY extra_count DESC, club_count DESC, entity_name;
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=457.77..457.77 rows=1 width=64) (actual time=48.737..48.742 rows=43 loops=1)
   Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC, "_EN".entity_name
   Sort Method: quicksort  Memory: 31kB
   ->  Nested Loop Left Join  (cost=1.57..457.76 rows=1 width=64) (actual time=1.796..48.635 rows=43 loops=1)
         ->  Nested Loop  (cost=1.28..457.07 rows=1 width=71) (actual time=1.736..48.239 rows=43 loops=1)
               Join Filter: (("_EN".country_id = "_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
               Rows Removed by Join Filter: 1297
               ->  Nested Loop  (cost=1.28..453.75 rows=1 width=70) (actual time=1.720..47.778 rows=43 loops=1)
                     Join Filter: (("_HD".unique_system_identifier = "_EN".unique_system_identifier) AND ("_HD".callsign = "_EN".callsign))
                     ->  Nested Loop  (cost=0.85..450.98 rows=1 width=65) (actual time=1.207..34.912 rows=43 loops=1)
                           ->  Nested Loop  (cost=0.43..376.57 rows=27 width=50) (actual time=0.620..20.956 rows=43 loops=1)
                                 ->  Seq Scan on "_Club"  (cost=0.00..4.44 rows=44 width=35) (actual time=0.037..0.067 rows=44 loops=1)
                                       Filter: (club_count >= 5)
                                       Rows Removed by Filter: 151
                                 ->  Index Scan using "_HD_callsign" on "_HD"  (cost=0.43..8.45 rows=1 width=15) (actual time=0.474..0.474 rows=1 loops=44)
                                       Index Cond: (callsign = "_Club".trustee_callsign)
                                       Filter: (license_status = 'A'::bpchar)
                                       Rows Removed by Filter: 0
                           ->  Index Scan using "_AM_pkey" on "_AM"  (cost=0.43..2.75 rows=1 width=15) (actual time=0.323..0.323 rows=1 loops=43)
                                 Index Cond: (unique_system_identifier = "_HD".unique_system_identifier)
                                 Filter: ("_HD".callsign = callsign)
                     ->  Index Scan using "_EN_pkey" on "_EN"  (cost=0.43..2.75 rows=1 width=60) (actual time=0.298..0.298 rows=1 loops=43)
                           Index Cond: (unique_system_identifier = "_AM".unique_system_identifier)
                           Filter: ("_AM".callsign = callsign)
               ->  Seq Scan on "_GovtRegion"  (cost=0.00..1.93 rows=93 width=7) (actual time=0.002..0.004 rows=31 loops=43)
         ->  Nested Loop  (cost=0.29..0.68 rows=1 width=7) (actual time=0.008..0.008 rows=1 loops=43)
               ->  Index Only Scan using "_IsoCountry_iso_alpha2_key" on "_IsoCountry"  (cost=0.14..0.38 rows=1 width=3) (actual time=0.004..0.004 rows=1 loops=43)
                     Index Cond: (iso_alpha2 = "_GovtRegion".country_id)
                     Heap Fetches: 43
               ->  Index Only Scan using "_Territory_pkey" on "_Territory"  (cost=0.14..0.29 rows=1 width=7) (actual time=0.003..0.003 rows=1 loops=43)
                     Index Cond: ((country_id = "_IsoCountry".iso_alpha2) AND (territory_id = "_GovtRegion".territory_id))
                     Heap Fetches: 43
 Planning Time: 4.017 ms
 Execution Time: 48.822 ms



On 2021-05-28 11:48, Dean Gibson (DB Administrator) wrote:
...

Here's from v9.6:

=> EXPLAIN ANALYZE SELECT club_count, extra_count, region_count, callsign AS trustee_callsign, applicant_type, entity_name, licensee_id AS _lid FROM genclub_multi_  WHERE club_count >= 5 ORDER BY extra_count DESC, club_count DESC, entity_name;
                                                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=407.13..407.13 rows=1 width=94) (actual time=348.850..348.859 rows=43 loops=1)
   Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC, "_EN".entity_name
   Sort Method: quicksort  Memory: 31kB
   ->  Nested Loop  (cost=4.90..407.12 rows=1 width=94) (actual time=7.587..348.732 rows=43 loops=1)
         ->  Nested Loop  (cost=4.47..394.66 rows=1 width=94) (actual time=5.740..248.149 rows=43 loops=1)
               ->  Nested Loop Left Join  (cost=4.04..382.20 rows=1 width=79) (actual time=2.458..107.908 rows=55 loops=1)
                     ->  Hash Join  (cost=3.75..380.26 rows=1 width=86) (actual time=2.398..106.990 rows=55 loops=1)
                           Hash Cond: (("_EN".country_id = "_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
                           ->  Nested Loop  (cost=0.43..376.46 rows=47 width=94) (actual time=2.294..106.736 rows=55 loops=1)
                                 ->  Seq Scan on "_Club"  (cost=0.00..4.44 rows=44 width=35) (actual time=0.024..0.101 rows=44 loops=1)
                                       Filter: (club_count >= 5)
                                       Rows Removed by Filter: 151
                                 ->  Index Scan using "_EN_callsign" on "_EN"  (cost=0.43..8.45 rows=1 width=69) (actual time=2.179..2.420 rows=1 loops=44)
                                       Index Cond: (callsign = "_Club".trustee_callsign)
                           ->  Hash  (cost=1.93..1.93 rows=93 width=7) (actual time=0.071..0.071 rows=88 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 12kB
                                 ->  Seq Scan on "_GovtRegion"  (cost=0.00..1.93 rows=93 width=7) (actual time=0.010..0.034 rows=93 loops=1)
                     ->  Nested Loop  (cost=0.29..1.93 rows=1 width=7) (actual time=0.012..0.014 rows=1 loops=55)
                           Join Filter: ("_IsoCountry".iso_alpha2 = "_Territory".country_id)
                           Rows Removed by Join Filter: 0
                           ->  Index Only Scan using "_IsoCountry_iso_alpha2_key" on "_IsoCountry"  (cost=0.14..1.62 rows=1 width=3) (actual time=0.006..0.006 rows=1 loops=55)
                                 Index Cond: (iso_alpha2 = "_GovtRegion".country_id)
                                 Heap Fetches: 55
                           ->  Index Only Scan using "_Territory_pkey" on "_Territory"  (cost=0.14..0.29 rows=1 width=7)
 (actual time=0.004..0.005 rows=1 loops=55)
                                 Index Cond: (territory_id = "_GovtRegion".territory_id)
                                 Heap Fetches: 59
               ->  Index Scan using "_HD_pkey" on "_HD"  (cost=0.43..12.45 rows=1 width=15) (actual time=2.548..2.548 rows=1 loops=55)
                     Index Cond: (unique_system_identifier = "_EN".unique_system_identifier)
                     Filter: (("_EN".callsign = callsign) AND (((((license_status)::text || ' - '::text) || (COALESCE((SubPlan 2), '???'::character varying))::text))::character(1) = 'A'::bpchar))
                     Rows Removed by Filter: 0
                     SubPlan 2
                       ->  Limit  (cost=0.15..8.17 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=55)
                             ->  Index Scan using "_LicStatus_pkey" on "_LicStatus"  (cost=0.15..8.17 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=55)
                                   Index Cond: ("_HD".license_status = status_id)
         ->  Index Scan using "_AM_pkey" on "_AM"  (cost=0.43..4.27 rows=1 width=15) (actual time=2.325..2.325 rows=1 loops=43)
               Index Cond: (unique_system_identifier = "_EN".unique_system_identifier)
               Filter: ("_EN".callsign = callsign)
         SubPlan 1
           ->  Limit  (cost=0.15..8.17 rows=1 width=32) (actual time=0.007..0.007 rows=1 loops=43)
                 ->  Index Scan using "_ApplicantType_pkey" on "_ApplicantType"  (cost=0.15..8.17 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=43)
                       Index Cond: ("_EN".applicant_type_code = app_type_id)
 Planning time: 13.490 ms
 Execution time: 349.182 ms
(43 rows)



Here's from v13.2: 

=> EXPLAIN ANALYZE SELECT club_count, extra_count, region_count, callsign AS trustee_callsign, applicant_type, entity_name, licensee_id AS _lid FROM genclub_multi_  WHERE club_count >= 5 ORDER BY extra_count DESC, club_count DESC, entity_name;
                                                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=144365.60..144365.60 rows=1 width=94) (actual time=31898.860..31901.922 rows=43 loops=1)
   Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC, "_EN".entity_name
   Sort Method: quicksort  Memory: 31kB
   ->  Nested Loop  (cost=58055.66..144365.59 rows=1 width=94) (actual time=6132.403..31894.233 rows=43 loops=1)
         ->  Nested Loop  (cost=58055.51..144364.21 rows=1 width=62) (actual time=1226.085..30337.921 rows=837792 loops=1)
               ->  Nested Loop Left Join  (cost=58055.09..144360.38 rows=1 width=59) (actual time=1062.414..12471.456 rows=1487153 loops=1)
                     ->  Hash Join  (cost=58054.80..144359.69 rows=1 width=66) (actual time=1061.330..6635.041 rows=1487153 loops=1)
                           Hash Cond: (("_EN".unique_system_identifier = "_AM".unique_system_identifier) AND ("_EN".callsign = "_AM".callsign))
                           ->  Hash Join  (cost=3.33..53349.72 rows=1033046 width=51) (actual time=2.151..3433.178 rows=1487153 loops=1)
                                 Hash Cond: (("_EN".country_id = "_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
                                 ->  Seq Scan on "_EN"  (cost=0.00..45288.05 rows=1509005 width=60) (actual time=0.037..2737.054 rows=1508736 loops=1)
                                 ->  Hash  (cost=1.93..1.93 rows=93 width=7) (actual time=0.706..1.264 rows=88 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 12kB
                                       ->  Seq Scan on "_GovtRegion"  (cost=0.00..1.93 rows=93 width=7) (actual time=0.013..0.577 rows=93 loops=1)
                           ->  Hash  (cost=28093.99..28093.99 rows=1506699 width=15) (actual time=1055.587..1055.588 rows=1506474 loops=1)
                                 Buckets: 131072  Batches: 32  Memory Usage: 3175kB
                                 ->  Seq Scan on "_AM"  (cost=0.00..28093.99 rows=1506699 width=15) (actual time=0.009..742.774 rows=1506474 loops=1)
                     ->  Nested Loop  (cost=0.29..0.68 rows=1 width=7) (actual time=0.003..0.004 rows=1 loops=1487153)
                           Join Filter: ("_IsoCountry".iso_alpha2 = "_Territory".country_id)
                           Rows Removed by Join Filter: 0
                           ->  Index Only Scan using "_IsoCountry_iso_alpha2_key" on "_IsoCountry"  (cost=0.14..0.38 rows=1 width=3) (actual time=0.001..0.002 rows=1 loops=1487153)
                                 Index Cond: (iso_alpha2 = "_GovtRegion".country_id)
                                 Heap Fetches: 1487153
                           ->  Index Only Scan using "_Territory_pkey" on "_Territory"  (cost=0.14..0.29 rows=1 width=7) (actual time=0.001..0.001 rows=1 loops=1487153)
                                 Index Cond: (territory_id = "_GovtRegion".territory_id)
                                 Heap Fetches: 1550706
               ->  Index Scan using "_HD_pkey" on "_HD"  (cost=0.43..3.82 rows=1 width=15) (actual time=0.012..0.012 rows=1 loops=1487153)
                     Index Cond: (unique_system_identifier = "_EN".unique_system_identifier)
                     Filter: (("_EN".callsign = callsign) AND (((((license_status)::text || ' - '::text) || (COALESCE((SubPlan 2), '???'::character varying))::text))::character(1) = 'A'::bpchar))
                     Rows Removed by Filter: 0
                     SubPlan 2
                       ->  Limit  (cost=0.00..1.07 rows=1 width=13) (actual time=0.001..0.001 rows=1 loops=1487153)
                             ->  Seq Scan on "_LicStatus"  (cost=0.00..1.07 rows=1 width=13) (actual time=0.000..0.000 rows=1 loops=1487153)
                                   Filter: ("_HD".license_status = status_id)
                                   Rows Removed by Filter: 1
         ->  Index Scan using "_Club_pkey" on "_Club"  (cost=0.14..0.17 rows=1 width=35) (actual time=0.002..0.002 rows=0 loops=837792)
               Index Cond: (trustee_callsign = "_EN".callsign)
               Filter: (club_count >= 5)
               Rows Removed by Filter: 0
         SubPlan 1
           ->  Limit  (cost=0.00..1.20 rows=1 width=15) (actual time=0.060..0.060 rows=1 loops=43)
                 ->  Seq Scan on "_ApplicantType"  (cost=0.00..1.20 rows=1 width=15) (actual time=0.016..0.016 rows=1 loops=43)
                       Filter: ("_EN".applicant_type_code = app_type_id)
                       Rows Removed by Filter: 7
 Planning Time: 173.753 ms
 Execution Time: 31919.601 ms
(46 rows)


pgsql-performance by date:

Previous
From: "Dean Gibson (DB Administrator)"
Date:
Subject: Re: AWS forcing PG upgrade from v9.6 a disaster
Next
From: Christophe Pettus
Date:
Subject: Re: AWS forcing PG upgrade from v9.6 a disaster