Thread: Big performance slowdown from 11.2 to 13.3

Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

 

Hello,

 

We have a data warehouse working on Postgres V11.2. We have a query that is pretty beefy that has been taking under 5mn to run consistently every day for about 2 years as part of a data warehouse ETL process. It’s a pivot over 55 values on a table with some 15M rows. The total table size is over 2GB (table+indices+other).

 

 

CREATE TABLE assessmenticcqa_raw

(

    iccqar_iccassmt_fk integer NOT NULL, -- foreign key to assessment

    iccqar_ques_code character varying(255) COLLATE pg_catalog."default" NOT NULL, -- question code

    iccqar_ans_val character varying(255) COLLATE pg_catalog."default" NOT NULL,   -- answer value

    "lastUpdated" timestamp with time zone NOT NULL DEFAULT now(),

    CONSTRAINT fk_assessmenticcqa_raw_assessment FOREIGN KEY (iccqar_iccassmt_fk)

        REFERENCES assessmenticc_fact (iccassmt_pk) MATCH SIMPLE

        ON UPDATE CASCADE

        ON DELETE RESTRICT

)

 

TABLESPACE pg_default;

 

CREATE UNIQUE INDEX assessmenticcqa_raw_idx_iccqar_assmt_ques

    ON assessmenticcqa_raw USING btree

    (iccqar_iccassmt_fk ASC NULLS LAST, iccqar_ques_code COLLATE pg_catalog."default" ASC NULLS LAST)

    TABLESPACE pg_default;

 

CREATE INDEX assessmenticcqa_raw_idx_iccqar_lastupdated

    ON assessmenticcqa_raw USING btree

    ("lastUpdated" ASC NULLS LAST)

    TABLESPACE pg_default;

 

 

The query that does the pivot is:

 

 

WITH t AS (

         SELECT assessmenticcqa_raw.iccqar_iccassmt_fk AS iccqa_iccassmt_fk,

            assessmenticcqa_raw.iccqar_ques_code,

            max(assessmenticcqa_raw.iccqar_ans_val::text) AS iccqar_ans_val

           FROM assessmenticcqa_raw

          WHERE assessmenticcqa_raw.iccqar_ques_code::text = ANY (ARRAY['DEBRIDEMENT DATE'::character varying::text

                                                                      , 'DEBRIDEMENT THIS VISIT'::character varying::text

                                                                      , 'DEBRIDEMENT TYPE'::character varying::text

                                                                      , 'DEPTH (CM)'::character varying::text

                                                                      , 'DEPTH DESCRIPTION'::character varying::text

                                                                      , … 55 total columns to pivot

                                                                 ])

          GROUP BY assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        )

SELECT t.iccqa_iccassmt_fk,

    max(t.iccqar_ans_val) AS iccqar_ans_val,

    tilda.todate(max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEBRIDEMENT DATE'::text)::character varying, NULL::date) AS "iccqa_DEBRIDEMENT_DATE",

    max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEBRIDEMENT THIS VISIT'::text) AS "iccqa_DEBRIDEMENT_THIS_VISIT",

    max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEBRIDEMENT TYPE'::text) AS "iccqa_DEBRIDEMENT_TYPE",

    tilda.tofloat(max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEPTH (CM)'::text)::character varying, NULL::real) AS "iccqa_DEPTH_CM",

    max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEPTH DESCRIPTION'::text) AS "iccqa_DEPTH_DESCRIPTION",

    … 55 total columns being pivotted

   FROM t

  GROUP BY t.iccqa_iccassmt_fk;

 

 

 

This query has been working flawlessly without so much as a glitch every day for the last 2 years or so with of course an increasing amount of data every day (the data grows at about 15-20 thousand records per day). I know the query is not incremental but at under 5mn, it’s simple and works well and can handle inconsistent updates on the data source we use which is pretty dirty.

 

The problem I am facing is that we are trying to move to Postgres V13.3 and this query (and several others like it) is now taking 10x longer (3,000 seconds vs 300 seconds) which makes it completely unacceptable. I created a V13 instance following standard practices with pg_upgrade. I have V11 and V13 working side by side on the exact same hardware: the VM is an 8-core (16 threads) 64GB windows server 2012 R2 machine with SSD storage. I have vacuumed both V11 and V13 databases full freeze analyze. The V13 is an exact backup of the V11 database content-wise. The postgres.conf is the same too and hasn’t been touched in years:

 

 

      "effective_cache_size": "52GB",

      "from_collapse_limit": "24",

      "jit": "off",

      "jit_above_cost": "2e+08",

      "jit_inline_above_cost": "5e+08",

      "jit_optimize_above_cost": "5e+08",

      "join_collapse_limit": "24",

      "max_parallel_workers": "20",

      "max_parallel_workers_per_gather": "8",

      "random_page_cost": "1.1",

      "temp_buffers": "4GB",

      "work_mem": "384MB"

 

 

I have done all my testing with either of the database on while the other was off (shutting down the DB) to make sure there wasn’t any weird interaction. I have read some articles about major changes between 11 and 13 (some of which occurred in 12). In particular, information about the JIT sometimes causing trouble, and the way some CTEs can now be inlined and which can also cause trouble.

  • As you can see from the config above, I have disabled the JIT to make this more comparable with 11 and eliminate that possible source of issues.
  • I have also tried different versions of the query (MATERIALIZED vs NOT MATERIALIZED) with little impact.

 

The plans are pretty much identical too. I checked line by line and couldn’t see anything much different (note that I have a view over this query). Here is the V13 version of the plan:

"[

  {

    "Plan": {

      "Node Type": "Subquery Scan",

      "Parallel Aware": false,

      "Alias": "assessmenticcqapivotview",

      "Startup Cost": 1785087.62,

      "Total Cost": 1785100.62,

      "Plan Rows": 200,

      "Plan Width": 1228,

      "Output": [

        "assessmenticcqapivotview.iccqa_iccassmt_fk",

        "assessmenticcqapivotview.\"iccqa_DEBRIDEMENT_DATE\"",

        "assessmenticcqapivotview.\"iccqa_DEBRIDEMENT_THIS_VISIT\"",

        "assessmenticcqapivotview.\"iccqa_DEBRIDEMENT_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_DEPTH_CM\"",

        "assessmenticcqapivotview.\"iccqa_DEPTH_DESCRIPTION\"",

        "assessmenticcqapivotview.\"iccqa_DOES_PATIENT_HAVE_PAIN_ASSOCIATED_WITH_THIS_WOUND\"",

        "assessmenticcqapivotview.\"iccqa_DRAIN_PRESENT\"",

        "assessmenticcqapivotview.\"iccqa_DRAIN_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_EDGE_SURROUNDING_TISSUE_MACERATION\"",

        "assessmenticcqapivotview.\"iccqa_EDGES\"",

        "assessmenticcqapivotview.\"iccqa_EPITHELIALIZATION\"",

        "assessmenticcqapivotview.\"iccqa_EXUDATE_AMOUNT\"",

        "assessmenticcqapivotview.\"iccqa_EXUDATE_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_GRANULATION_TISSUE\"",

        "assessmenticcqapivotview.\"iccqa_INDICATE_OTHER_TYPE_OF_WOUND_CLOSURE\"",

        "assessmenticcqapivotview.\"iccqa_INDICATE_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_INDICATE_WOUND_CLOSURE\"",

        "assessmenticcqapivotview.\"iccqa_IS_THIS_A_CLOSED_SURGICAL_WOUND_OR_SUSPECTED_DEEP_TISSUE\"",

        "assessmenticcqapivotview.\"iccqa_LENGTH_CM\"",

        "assessmenticcqapivotview.\"iccqa_MEASUREMENTS_TAKEN\"",

        "assessmenticcqapivotview.\"iccqa_NECROTIC_TISSUE_AMOUNT\"",

        "assessmenticcqapivotview.\"iccqa_NECROTIC_TISSUE_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_ODOR\"",

        "assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_DEBRIDEMENT_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_DRAIN_TYPE\"",

        "assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_PAIN_INTERVENTIONS\"",

        "assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_PAIN_QUALITY\"",

        "assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_REASON_MEASUREMENTS_NOT_TAKEN\"",

        "assessmenticcqapivotview.\"iccqa_PAIN_FREQUENCY\"",

        "assessmenticcqapivotview.\"iccqa_PAIN_INTERVENTIONS\"",

        "assessmenticcqapivotview.\"iccqa_PAIN_QUALITY\"",

        "assessmenticcqapivotview.\"iccqa_PERIPHERAL_TISSUE_EDEMA\"",

        "assessmenticcqapivotview.\"iccqa_PERIPHERAL_TISSUE_INDURATION\"",

        "assessmenticcqapivotview.\"iccqa_REASON_MEASUREMENTS_NOT_TAKEN\"",

        "assessmenticcqapivotview.\"iccqa_RESPONSE_TO_PAIN_INTERVENTIONS\"",

        "assessmenticcqapivotview.\"iccqa_SHAPE\"",

        "assessmenticcqapivotview.\"iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION\"",

        "assessmenticcqapivotview.\"iccqa_SKIN_COLOR_SURROUNDING_WOUND\"",

        "assessmenticcqapivotview.\"iccqa_STATE\"",

        "assessmenticcqapivotview.\"iccqa_SURFACE_AREA_SQ_CM\"",

        "assessmenticcqapivotview.\"iccqa_TOTAL_NECROTIC_TISSUE_ESCHAR\"",

        "assessmenticcqapivotview.\"iccqa_TOTAL_NECROTIC_TISSUE_SLOUGH\"",

        "assessmenticcqapivotview.\"iccqa_TUNNELING\"",

        "assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_12_3_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_3_6_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_6_9_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_9_12_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_UNDERMINING\"",

        "assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_12_3_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_3_6_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_6_9_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_9_12_O_CLOCK\"",

        "assessmenticcqapivotview.\"iccqa_WIDTH_CM\"",

        "assessmenticcqapivotview.\"iccqa_WOUND_PAIN_LEVEL_WHERE_0_NO_PAIN_AND_10_WORST_POS\""

      ],

      "Plans": [

        {

          "Node Type": "Aggregate",

          "Strategy": "Hashed",

          "Partial Mode": "Simple",

          "Parent Relationship": "Subquery",

          "Parallel Aware": false,

          "Startup Cost": 1785087.62,

          "Total Cost": 1785098.62,

          "Plan Rows": 200,

          "Plan Width": 1260,

          "Output": [

            "t.iccqa_iccassmt_fk",

            "NULL::text",

            "tilda.todate((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEBRIDEMENT DATE'::text)))::character varying, NULL::date)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEBRIDEMENT THIS VISIT'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEBRIDEMENT TYPE'::text))",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEPTH (CM)'::text)))::character varying, NULL::real)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEPTH DESCRIPTION'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?'::text))",

            "tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DRAIN PRESENT'::text)))::character varying, NULL::integer)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DRAIN TYPE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EDGE / SURROUNDING TISSUE - MACERATION'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EDGES'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EPITHELIALIZATION'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EXUDATE AMOUNT'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EXUDATE TYPE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'GRANULATION TISSUE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'INDICATE OTHER TYPE OF WOUND CLOSURE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'INDICATE TYPE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'INDICATE WOUND CLOSURE'::text))",

            "tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?'::text)))::character varying, NULL::integer)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'LENGTH (CM)'::text)))::character varying, NULL::real)",

            "tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'MEASUREMENTS TAKEN'::text)))::character varying, NULL::integer)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'NECROTIC TISSUE AMOUNT'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'NECROTIC TISSUE TYPE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'ODOR'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING DEBRIDEMENT TYPE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING DRAIN TYPE'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING PAIN INTERVENTIONS'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING PAIN QUALITY'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PAIN FREQUENCY'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PAIN INTERVENTIONS'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PAIN QUALITY'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PERIPHERAL TISSUE EDEMA'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PERIPHERAL TISSUE INDURATION'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'REASON MEASUREMENTS NOT TAKEN'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'RESPONSE TO PAIN INTERVENTIONS'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SHAPE'::text))",

            "tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SIGNS AND SYMPTOMS OF INFECTION'::text)))::character varying, NULL::integer)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SKIN COLOR SURROUNDING WOUND'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'STATE'::text))",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SURFACE AREA (SQ CM)'::text)))::character varying, NULL::real)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TOTAL NECROTIC TISSUE ESCHAR'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TOTAL NECROTIC TISSUE SLOUGH'::text))",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING'::text))",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK'::text)))::character varying, NULL::real)",

            "max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING'::text))",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'WIDTH (CM)'::text)))::character varying, NULL::real)",

            "tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\"'::text)))::character varying, NULL::real)"

          ],

          "Group Key": [

            "t.iccqa_iccassmt_fk"

          ],

          "Planned Partitions": 0,

          "Plans": [

            {

              "Node Type": "Aggregate",

              "Strategy": "Hashed",

              "Partial Mode": "Simple",

              "Parent Relationship": "InitPlan",

              "Subplan Name": "CTE t",

             "Parallel Aware": false,

              "Startup Cost": 1360804.75,

              "Total Cost": 1374830.63,

              "Plan Rows": 1402588,

              "Plan Width": 56,

              "Output": [

                "assessmenticcqa_raw.iccqar_iccassmt_fk",

                "assessmenticcqa_raw.iccqar_ques_code",

                "max((assessmenticcqa_raw.iccqar_ans_val)::text)"

              ],

              "Group Key": [

                "assessmenticcqa_raw.iccqar_iccassmt_fk",

                "assessmenticcqa_raw.iccqar_ques_code"

              ],

              "Planned Partitions": 0,

              "Plans": [

                {

                  "Node Type": "Seq Scan",

                  "Parent Relationship": "Outer",

                  "Parallel Aware": false,

                  "Relation Name": "assessmenticcqa_raw",

                  "Schema": "public",

                  "Alias": "assessmenticcqa_raw",

                  "Startup Cost": 0,

                  "Total Cost": 1256856.62,

                  "Plan Rows": 13859750,

                  "Plan Width": 38,

                  "Output": [

                    "assessmenticcqa_raw.iccqar_iccassmt_fk",

                    "assessmenticcqa_raw.iccqar_ques_code",

                    "assessmenticcqa_raw.iccqar_ques_type",

                    "assessmenticcqa_raw.iccqar_ans_val",

                    "assessmenticcqa_raw.created",

                    "assessmenticcqa_raw.\"lastUpdated\"",

                    "assessmenticcqa_raw.deleted"

                  ],

                  "Filter": "((assessmenticcqa_raw.iccqar_ques_code)::text = ANY ('{\"DEBRIDEMENT DATE\",\"DEBRIDEMENT THIS VISIT\",\"DEBRIDEMENT TYPE\",\"DEPTH (CM)\",\"DEPTH DESCRIPTION\",\"DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?\",\"DRAIN PRESENT\",\"DRAIN TYPE\",\"EDGE / SURROUNDING TISSUE - MACERATION\",EDGES,EPITHELIALIZATION,\"EXUDATE AMOUNT\",\"EXUDATE TYPE\",\"GRANULATION TISSUE\",\"INDICATE OTHER TYPE OF WOUND CLOSURE\",\"INDICATE TYPE\",\"INDICATE WOUND CLOSURE\",\"IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?\",\"LENGTH (CM)\",\"MEASUREMENTS TAKEN\",\"NECROTIC TISSUE AMOUNT\",\"NECROTIC TISSUE TYPE\",ODOR,\"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE\",\"OTHER COMMENTS REGARDING DRAIN TYPE\",\"OTHER COMMENTS REGARDING PAIN INTERVENTIONS\",\"OTHER COMMENTS REGARDING PAIN QUALITY\",\"OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN\",\"PAIN FREQUENCY\",\"PAIN INTERVENTIONS\",\"PAIN QUALITY\",\"PERIPHERAL TISSUE EDEMA\",\"PERIPHERAL TISSUE INDURATION\",\"REASON MEASUREMENTS NOT TAKEN\",\"RESPONSE TO PAIN INTERVENTIONS\",SHAPE,\"SIGNS AND SYMPTOMS OF INFECTION\",\"SKIN COLOR SURROUNDING WOUND\",STATE,\"SURFACE AREA (SQ CM)\",\"TOTAL NECROTIC TISSUE ESCHAR\",\"TOTAL NECROTIC TISSUE SLOUGH\",TUNNELING,\"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK\",\"TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK\",\"TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK\",\"TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK\",UNDERMINING,\"UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK\",\"UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK\",\"UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK\",\"UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK\",\"WIDTH (CM)\",\"WOUND PAIN LEVEL, WHERE 0 = \\\"NO PAIN\\\" AND 10 = \\\"WORST POSSIBLE PAIN\\\"\"}'::text[]))"

                }

              ]

            },

            {

              "Node Type": "CTE Scan",

              "Parent Relationship": "Outer",

              "Parallel Aware": false,

              "CTE Name": "t",

              "Alias": "t",

              "Startup Cost": 0,

              "Total Cost": 28051.76,

              "Plan Rows": 1402588,

              "Plan Width": 552,

              "Output": [

                "t.iccqa_iccassmt_fk",

                "t.iccqar_ques_code",

                "t.iccqar_ans_val"

              ]

            }

          ]

        }

      ]

    },

    "Settings": {

         "version":13.3

      "effective_cache_size": "52GB",

      "from_collapse_limit": "24",

      "jit": "off",

      "jit_above_cost": "2e+08",

      "jit_inline_above_cost": "5e+08",

      "jit_optimize_above_cost": "5e+08",

      "join_collapse_limit": "24",

      "max_parallel_workers": "20",

      "max_parallel_workers_per_gather": "8",

      "random_page_cost": "1.1",

      "temp_buffers": "4GB",

      "work_mem": "384MB"

    },

    "Planning Time": 0.784

  }

]"

 

 

I am out of my wits as to what is causing such a massive slowdown and how I could fix it.

 

Any idea out there?

 

Thank you!

Laurent Hasson

 

 

Re: Big performance slowdown from 11.2 to 13.3

From
Justin Pryzby
Date:
On Wed, Jul 21, 2021 at 06:50:58PM +0000, ldh@laurent-hasson.com wrote:
> The plans are pretty much identical too. I checked line by line and couldn't see anything much different (note that I
havea view over this query). Here is the V13 version of the plan:
 

> I am out of my wits as to what is causing such a massive slowdown and how I could fix it.
> 
> Any idea out there?

Could you send the "explain (analyze,buffers,settings) for query on the v11 and
v13 instances ?

Or a link to the execution plan pasted into explain.depesz.com.

https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN

It might be good to check using a copy of your data that there's no regression
between 11.2 and 11.12.

-- 
Justin



Re: Big performance slowdown from 11.2 to 13.3

From
Michael Lewis
Date:
I'm not seeing the valueof the CTE. Why not access assessmenticcqa_raw directly in the main query and only do GROUP BY once? Do you have many values in iccqar_ques_code which are not used in this query?

RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
-----Original Message-----
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Wednesday, July 21, 2021 15:15
To: ldh@laurent-hasson.com
Cc: pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Wed, Jul 21, 2021 at 06:50:58PM +0000, ldh@laurent-hasson.com wrote:
> The plans are pretty much identical too. I checked line by line and couldn't see anything much different (note that I
havea view over this query). Here is the V13 version of the plan: 

> I am out of my wits as to what is causing such a massive slowdown and how I could fix it.
>
> Any idea out there?

Could you send the "explain (analyze,buffers,settings) for query on the v11 and
v13 instances ?

Or a link to the execution plan pasted into explain.depesz.com.

https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN

It might be good to check using a copy of your data that there's no regression between 11.2 and 11.12.

--
Justin





My apologies... I thought this is what I had attached in my original email from PGADMIN. In any case, I reran from the
commandline and here are the two plans. 


V11.2 explain (analyze,buffers,COSTS,TIMING)
========================================
HashAggregate  (cost=1758361.62..1758372.62 rows=200 width=1260) (actual time=80545.907..161176.867 rows=720950
loops=1)
  Group Key: t.iccqa_iccassmt_fk
  Buffers: shared hit=8 read=170093 written=23, temp written=82961
  CTE t
    ->  HashAggregate  (cost=1338668.50..1352428.93 rows=1376043 width=56) (actual time=23669.075..32038.977
rows=13821646loops=1) 
          Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
          Buffers: shared read=170084 written=23
          ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1236517.01 rows=13620198 width=38) (actual
time=0.081..10525.487rows=13821646 loops=1) 
                Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) 
                Rows Removed by Filter: 169940
                Buffers: shared read=170084 written=23
  ->  CTE Scan on t  (cost=0.00..27520.86 rows=1376043 width=552) (actual time=23669.081..39393.726 rows=13821646
loops=1)
        Buffers: shared read=170084 written=23, temp written=82961
Planning Time: 6.160 ms
Execution Time: 161,942.304 ms


V13.3  explain (analyze,buffers,COSTS,TIMING,SETTINGS)
======================================================
HashAggregate  (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=81053.437..1699800.741 rows=722853
loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 5  Memory Usage: 284737kB  Disk Usage: 600000kB
  Buffers: shared hit=20 read=169851, temp read=185258 written=305014
  ->  HashAggregate  (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=24967.655..47587.401
rows=13852618loops=1) 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 21  Memory Usage: 393273kB  Disk Usage: 683448kB
        Buffers: shared read=169851, temp read=110477 written=174216
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1256856.62 rows=13859750 width=38) (actual
time=0.104..12406.726rows=13852618 loops=1) 
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) 
              Rows Removed by Filter: 171680
              Buffers: shared read=169851
Settings: effective_cache_size = '52GB', from_collapse_limit = '24', jit = 'off', jit_above_cost = '2e+08',
jit_inline_above_cost= '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24', max_parallel_workers =
'20',max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB', work_mem = '384MB' 
Planning:
  Buffers: shared hit=203 read=20
Planning Time: 52.820 ms
Execution Time: 1,700,228.424 ms



As you can see, the V13.3 execution is about 10x slower.

It may be hard for me to create a whole copy of the database on 11.12 and check that environment by itself. I'd want to
doit on the same machine to control variables, and I don't have much extra disk space at the moment. 





Thank you,
Laurent.



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
From: Michael Lewis <mlewis@entrata.com> 
Sent: Wednesday, July 21, 2021 18:12
To: ldh@laurent-hasson.com
Cc: pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

I'm not seeing the valueof the CTE. Why not access assessmenticcqa_raw directly in the main query and only do GROUP BY
once?Do you have many values in iccqar_ques_code which are not used in this query?
 




Yes, there are close to 600 different values, and we are picking up only a small amount. And by the way, this is a
classiccase where the query could be folder as a sub-select in a join clause, and I tried this as well, with the same
results.

Thank you,
Laurent.


Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Wed, Jul 21, 2021 at 4:19 PM ldh@laurent-hasson.com
<ldh@laurent-hasson.com> wrote:
> As you can see, the V13.3 execution is about 10x slower.
>
> It may be hard for me to create a whole copy of the database on 11.12 and check that environment by itself. I'd want
todo it on the same machine to control variables, and I don't have much extra disk space at the moment.
 

I imagine that this has something to do with the fact that the hash
aggregate spills to disk in Postgres 13.

You might try increasing hash_mem_multiplier from its default of 1.0,
to 2.0 or even 4.0. That way you'd be able to use 2x or 4x more memory
for executor nodes that are based on hashing (hash join and hash
aggregate), without also affecting other kinds of nodes, which are
typically much less sensitive to memory availability. This is very
similar to increasing work_mem, except that it is better targeted.

It might even make sense to *decrease* work_mem and increase
hash_mem_multiplier even further than 4.0. That approach is more
aggressive, though, so I wouldn't use it until it actually proved
necessary.

-- 
Peter Geoghegan



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> My apologies... I thought this is what I had attached in my original email from PGADMIN. In any case, I reran from
thecommand line and here are the two plans. 

So the pain seems to be coming in with the upper hash aggregation, which
is spilling to disk because work_mem of '384MB' is nowhere near enough.
The v11 explain doesn't show any batching there, which makes me suspect
that it was using a larger value of work_mem.  (There could also be some
edge effect that is making v13 use a bit more memory for the same number
of tuples, which could lead it to spill when v11 had managed to scrape by
without doing so.)

So the first thing I'd try is seeing if setting work_mem to 1GB or so
improves matters.

The other thing that's notable is that v13 has collapsed out the CTE
that used to sit between the two levels of hashagg.  Now I don't know
of any reason that that wouldn't be a strict improvement, but if the
work_mem theory doesn't pan out then that's something that'd deserve
a closer look.  Does marking the WITH as WITH MATERIALIZED change
anything about v13's performance?

            regards, tom lane



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Peter Geoghegan <pg@bowt.ie> 
Sent: Wednesday, July 21, 2021 19:34
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Wed, Jul 21, 2021 at 4:19 PM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:
> As you can see, the V13.3 execution is about 10x slower.
>
> It may be hard for me to create a whole copy of the database on 11.12 and check that environment by itself. I'd want
todo it on the same machine to control variables, and I don't have much extra disk space at the moment.
 

I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13.

You might try increasing hash_mem_multiplier from its default of 1.0, to 2.0 or even 4.0. That way you'd be able to use
2xor 4x more memory for executor nodes that are based on hashing (hash join and hash aggregate), without also affecting
otherkinds of nodes, which are typically much less sensitive to memory availability. This is very similar to increasing
work_mem,except that it is better targeted.
 

It might even make sense to *decrease* work_mem and increase hash_mem_multiplier even further than 4.0. That approach
ismore aggressive, though, so I wouldn't use it until it actually proved necessary.
 

--
Peter Geoghegan



So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on the
exactsame VM etc... Why is that behavior so different?
 

As soon as I can, I'll check if perhaps the hash_mem_multiplier is somehow set differently between the two setups? That
wouldbe my first guess, but absent that, looks like a very different behavior across those 2 versions?
 

Thank you,
Laurent.

Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> From: Peter Geoghegan <pg@bowt.ie>
>> I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13.

> So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on
theexact same VM etc... Why is that behavior so different? 

What Peter's pointing out is that v11 never spilled hashagg hash tables to
disk period, no matter how big they got (possibly leading to out-of-memory
situations or swapping, but evidently you have enough RAM to have avoided
that sort of trouble).  I'd momentarily forgotten that, but I think he's
dead on about that explaining the difference.  As he says, messing with
hash_mem_multiplier would be a more targeted fix than increasing work_mem
across the board.

            regards, tom lane



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Wednesday, July 21, 2021 19:36
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> My apologies... I thought this is what I had attached in my original email from PGADMIN. In any case, I reran from
thecommand line and here are the two plans. 

So the pain seems to be coming in with the upper hash aggregation, which is spilling to disk because work_mem of
'384MB'is nowhere near enough. 
The v11 explain doesn't show any batching there, which makes me suspect that it was using a larger value of work_mem.
(Therecould also be some edge effect that is making v13 use a bit more memory for the same number of tuples, which
couldlead it to spill when v11 had managed to scrape by without doing so.) 

So the first thing I'd try is seeing if setting work_mem to 1GB or so improves matters.

The other thing that's notable is that v13 has collapsed out the CTE that used to sit between the two levels of
hashagg. Now I don't know of any reason that that wouldn't be a strict improvement, but if the work_mem theory doesn't
panout then that's something that'd deserve a closer look.  Does marking the WITH as WITH MATERIALIZED change anything
aboutv13's performance? 

            regards, tom lane




Hello Tom (and Peter)! Thanks for all this info.

I created 3 versions of this query: CTE MATERIALIZED, CTE NOT MATERIALIZED, and no CTE (select directly in a sub join).
Onlyvery minor change in the final execution time (seconds). 

I'll try the following later this evening:
- set work_mem to 1GB
- play with hash_mem_multiplier as per Peter's suggestions although he did suggest to try being more aggressive with it
andlower work_mem... so I'll play with those 2 variables. 

Thank you,
Laurent.






RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Wednesday, July 21, 2021 19:43
To: ldh@laurent-hasson.com
Cc: Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> From: Peter Geoghegan <pg@bowt.ie>
>> I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13.

> So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on
theexact same VM etc... Why is that behavior so different? 

What Peter's pointing out is that v11 never spilled hashagg hash tables to
disk period, no matter how big they got (possibly leading to out-of-memory
situations or swapping, but evidently you have enough RAM to have avoided
that sort of trouble).  I'd momentarily forgotten that, but I think he's
dead on about that explaining the difference.  As he says, messing with
hash_mem_multiplier would be a more targeted fix than increasing work_mem
across the board.

            regards, tom lane


OK, got it! That sounds and smells good. Will try later tonight or tomorrow and report back.

Thank you!
Laurent.



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: ldh@laurent-hasson.com <ldh@laurent-hasson.com> 
Sent: Wednesday, July 21, 2021 19:46
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: RE: Big performance slowdown from 11.2 to 13.3



-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Wednesday, July 21, 2021 19:43
To: ldh@laurent-hasson.com
Cc: Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> From: Peter Geoghegan <pg@bowt.ie> 
>> I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13.

> So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on
theexact same VM etc... Why is that behavior so different?
 

What Peter's pointing out is that v11 never spilled hashagg hash tables to
disk period, no matter how big they got (possibly leading to out-of-memory
situations or swapping, but evidently you have enough RAM to have avoided
that sort of trouble).  I'd momentarily forgotten that, but I think he's
dead on about that explaining the difference.  As he says, messing with
hash_mem_multiplier would be a more targeted fix than increasing work_mem
across the board.

            regards, tom lane


OK, got it! That sounds and smells good. Will try later tonight or tomorrow and report back.

Thank you!
Laurent.






Hello all,

Seems like no cigar ☹ See plan pasted below. I changed the conf as follows:
  - hash_mem_multiplier = '2'
  - work_mem = '1GB'

I tried a few other configuration, i.e., 512MB/4, 256MB/8 with similar results.

Also, you mentioned previously that the hash was spilling to disk? How are you seeing this in the plans? What should I
belooking for on my end when playing around with parameters to see the intended effect?
 

Thank you,
Laurent.


HashAggregate  (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=70844.078..1554843.323 rows=722853
loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 1277985kB
  Buffers: shared hit=14 read=169854, temp read=15777 written=27588
  ->  HashAggregate  (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=23370.026..33839.347
rows=13852618loops=1)
 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126560kB
        Buffers: shared read=169851, temp read=15777 written=27588
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1256856.62 rows=13859750 width=38) (actual
time=0.072..10906.894rows=13852618 loops=1)
 
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[]))
 
              Rows Removed by Filter: 171680
              Buffers: shared read=169851
Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '2', jit = 'off',
jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit =
'24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers =
'4GB',work_mem = '1GB'
 
Planning:
  Buffers: shared hit=186 read=37
Planning Time: 3.667 ms
Execution Time: 1555300.746 ms

Re: Big performance slowdown from 11.2 to 13.3

From
David Rowley
Date:
On Thu, 22 Jul 2021 at 16:37, ldh@laurent-hasson.com
<ldh@laurent-hasson.com> wrote:
> Seems like no cigar ☹ See plan pasted below. I changed the conf as follows:
>   - hash_mem_multiplier = '2'
>   - work_mem = '1GB'

>         Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126560kB

You might want to keep going higher with hash_mem_multiplier until you
see no "Disk Usage" there.  As mentioned, v11 didn't spill to disk and
just used all the memory it pleased.  That was a bit dangerous as it
could result in OOM, so it was fixed.

David



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
OK. Will do another round of testing.


-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com> 
Sent: Thursday, July 22, 2021 00:44
To: ldh@laurent-hasson.com
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, 22 Jul 2021 at 16:37, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:
> Seems like no cigar ☹ See plan pasted below. I changed the conf as follows:
>   - hash_mem_multiplier = '2'
>   - work_mem = '1GB'

>         Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126560kB

You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there.  As mentioned, v11
didn'tspill to disk and just used all the memory it pleased.  That was a bit dangerous as it could result in OOM, so it
wasfixed.
 

David

RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: ldh@laurent-hasson.com <ldh@laurent-hasson.com> 
Sent: Thursday, July 22, 2021 09:37
To: David Rowley <dgrowleyml@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: RE: Big performance slowdown from 11.2 to 13.3

OK. Will do another round of testing.


-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com> 
Sent: Thursday, July 22, 2021 00:44
To: ldh@laurent-hasson.com
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, 22 Jul 2021 at 16:37, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:
> Seems like no cigar ☹ See plan pasted below. I changed the conf as follows:
>   - hash_mem_multiplier = '2'
>   - work_mem = '1GB'

>         Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126560kB

You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there.  As mentioned, v11
didn'tspill to disk and just used all the memory it pleased.  That was a bit dangerous as it could result in OOM, so it
wasfixed.
 

David


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hello all,

So, I went possibly nuclear, and still no cigar. Something's not right.
- hash_mem_multiplier = '10'
- work_mem = '1GB'

The results are
    Batches: 5  Memory Usage: 2,449,457kB  Disk Usage: 105,936kB
    Execution Time: 1,837,126.766 ms

It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration. More importantly
  - I am not understanding how spilling to disk 100MB (which seems low to me and should be fast on our SSD), causes the
queryto slow down by a factor of 10.
 
  - It seems at the very least that memory consumption on 11 was more moderate? This process of ours was running
severalof these types of queries concurrently and I don't think I ever saw the machine go over 40GB in memory usage.
 


HashAggregate  (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=84860.629..1836583.909 rows=722853
loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 1277985kB
  Buffers: shared hit=46 read=169822, temp read=13144 written=23035
  ->  HashAggregate  (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=27890.422..39975.074
rows=13852618loops=1)
 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 5  Memory Usage: 2449457kB  Disk Usage: 105936kB
        Buffers: shared hit=32 read=169819, temp read=13144 written=23035
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1256856.62 rows=13859750 width=38) (actual
time=0.053..13623.310rows=13852618 loops=1)
 
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[]))
 
              Rows Removed by Filter: 171680
              Buffers: shared hit=32 read=169819
Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '10', jit = 'off',
jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit =
'24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers =
'4GB',work_mem = '1GB'
 
Planning:
  Buffers: shared hit=3
Planning Time: 1.038 ms
Execution Time: 1837126.766 ms


Thank you,



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> So, I went possibly nuclear, and still no cigar. Something's not right.
> - hash_mem_multiplier = '10'
> - work_mem = '1GB'

> The results are
>     Batches: 5  Memory Usage: 2,449,457kB  Disk Usage: 105,936kB
>     Execution Time: 1,837,126.766 ms

> It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration.

That is ... weird.  Maybe you have found a bug in the spill-to-disk logic;
it's quite new after all.  Can you extract a self-contained test case that
behaves this way?

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
I wrote:
> "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
>> It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration.

> That is ... weird.

Oh: see get_hash_mem:

    hash_mem = (double) work_mem * hash_mem_multiplier;

    /*
     * guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to
     * support the assumption that raw derived byte values can be stored in
     * 'long' variables.  The returned hash_mem value must also meet this
     * assumption.
     *
     * We clamp the final value rather than throw an error because it should
     * be possible to set work_mem and hash_mem_multiplier independently.
     */
    if (hash_mem < MAX_KILOBYTES)
        return (int) hash_mem;

    return MAX_KILOBYTES;

So basically, we now have a hard restriction that hashaggs can't use
more than INT_MAX kilobytes, or approximately 2.5GB, and this use case
is getting eaten alive by that restriction.  Seems like we need to
do something about that.

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
David Rowley
Date:
On Fri, 23 Jul 2021 at 03:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So basically, we now have a hard restriction that hashaggs can't use
> more than INT_MAX kilobytes, or approximately 2.5GB, and this use case
> is getting eaten alive by that restriction.  Seems like we need to
> do something about that.

Hmm, math check?

postgres=# select pg_size_pretty(power(2,31)::numeric*1024);
 pg_size_pretty
----------------
 2048 GB
(1 row)

David



Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That is ... weird.  Maybe you have found a bug in the spill-to-disk logic;
> it's quite new after all.  Can you extract a self-contained test case that
> behaves this way?

I wonder if this has something to do with the way that the input data
is clustered. I recall noticing that that could significantly alter
the behavior of HashAggs as of Postgres 13.

-- 
Peter Geoghegan



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, July 22, 2021 11:57
To: ldh@laurent-hasson.com
Cc: David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

I wrote:
> "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
>> It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration.

> That is ... weird.

Oh: see get_hash_mem:

    hash_mem = (double) work_mem * hash_mem_multiplier;

    /*
     * guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to
     * support the assumption that raw derived byte values can be stored in
     * 'long' variables.  The returned hash_mem value must also meet this
     * assumption.
     *
     * We clamp the final value rather than throw an error because it should
     * be possible to set work_mem and hash_mem_multiplier independently.
     */
    if (hash_mem < MAX_KILOBYTES)
        return (int) hash_mem;

    return MAX_KILOBYTES;

So basically, we now have a hard restriction that hashaggs can't use more than INT_MAX kilobytes, or approximately
2.5GB,and this use case is getting eaten alive by that restriction.  Seems like we need to do something about that. 

            regards, tom lane





------------------------------------------------------------------------------------------------------------------------------------------------------

Hello!


Ah... int vs long then? Tried even more (multiplier=16) and this seems to be definitely the case.

Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers
V11is consuming more or less? 

Is it also expected that a spill over of just 100MB (on top of 2.4GB memory consumption) would cause the query to
collapselike this? I am still not visualizing in my head how that would happen. 100MB just seems so small, and our SSD
isfast. 

Generating a dataset would take me a lot of time. This is a clinical database so I cannot reuse the current table. I
wouldhave to entirely mock the use case and create a dummy dataset from scratch. 



HashAggregate  (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=94618.303..1795311.542 rows=722853
loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 1277985kB
  Buffers: shared hit=14 read=169854, temp read=15777 written=27588
  ->  HashAggregate  (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=30753.022..45384.558
rows=13852618loops=1) 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126560kB
        Buffers: shared read=169851, temp read=15777 written=27588
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1256856.62 rows=13859750 width=38) (actual
time=0.110..14342.258rows=13852618 loops=1) 
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) 
              Rows Removed by Filter: 171680
              Buffers: shared read=169851
Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '16', jit = 'off',
jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit =
'24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers =
'4GB',work_mem = '1GB' 
Planning:
  Buffers: shared hit=186 read=37
Planning Time: 55.709 ms
Execution Time: 1795921.717 ms


Thank you,
Laurent.




Re: Big performance slowdown from 11.2 to 13.3

From
David Rowley
Date:
On Fri, 23 Jul 2021 at 04:14, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That is ... weird.  Maybe you have found a bug in the spill-to-disk logic;
> > it's quite new after all.  Can you extract a self-contained test case that
> > behaves this way?
>
> I wonder if this has something to do with the way that the input data
> is clustered. I recall noticing that that could significantly alter
> the behavior of HashAggs as of Postgres 13.

Isn't it more likely to be reaching the group limit rather than the
memory limit?

if (input_groups * hashentrysize < hash_mem * 1024L)
{
if (num_partitions != NULL)
*num_partitions = 0;
*mem_limit = hash_mem * 1024L;
*ngroups_limit = *mem_limit / hashentrysize;
return;
}

There are 55 aggregates on a varchar(255). I think hashentrysize is
pretty big. If it was 255*55 then only 765591 groups fit in the 10GB
of memory.

David



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
-----Original Message-----
From: Peter Geoghegan <pg@bowt.ie> 
Sent: Thursday, July 22, 2021 12:14
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jeff Davis <pgsql@j-davis.com>; ldh@laurent-hasson.com; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby
<pryzby@telsasoft.com>;pgsql-performance@postgresql.org
 
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That is ... weird.  Maybe you have found a bug in the spill-to-disk 
> logic; it's quite new after all.  Can you extract a self-contained 
> test case that behaves this way?

I wonder if this has something to do with the way that the input data is clustered. I recall noticing that that could
significantlyalter the behavior of HashAggs as of Postgres 13.
 

--
Peter Geoghegan



I could execute that test and re-cluster against the index. But I believe that's already done? Let me check.

Thank you,
Laurent.

Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 23 Jul 2021 at 03:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So basically, we now have a hard restriction that hashaggs can't use
>> more than INT_MAX kilobytes, or approximately 2.5GB, and this use case
>> is getting eaten alive by that restriction.  Seems like we need to
>> do something about that.

> Hmm, math check?

Yeah, I should have said "2GB plus palloc slop".  It doesn't surprise
me a bit that we seem to be eating another 20% on top of the nominal
limit.

I think the right fix here is to remove the cap, which will require
changing get_hash_mem to return double, and then maybe some cascading
changes --- I've not looked at its callers.

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
Justin Pryzby
Date:
On Thu, Jul 22, 2021 at 04:16:34PM +0000, ldh@laurent-hasson.com wrote:
> Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers
V11is consuming more or less?
 

It might be helpful to know how much RAM v11 is using.

Could you run the query with log_executor_stats=on; client_min_messages=debug;

The interesting part is this:
!       7808 kB max resident size

-- 
Justin



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com> 
Sent: Thursday, July 22, 2021 12:18
To: Peter Geoghegan <pg@bowt.ie>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Jeff Davis <pgsql@j-davis.com>; ldh@laurent-hasson.com; Justin Pryzby
<pryzby@telsasoft.com>;pgsql-performance@postgresql.org
 
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Fri, 23 Jul 2021 at 04:14, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That is ... weird.  Maybe you have found a bug in the spill-to-disk 
> > logic; it's quite new after all.  Can you extract a self-contained 
> > test case that behaves this way?
>
> I wonder if this has something to do with the way that the input data 
> is clustered. I recall noticing that that could significantly alter 
> the behavior of HashAggs as of Postgres 13.

Isn't it more likely to be reaching the group limit rather than the memory limit?

if (input_groups * hashentrysize < hash_mem * 1024L) { if (num_partitions != NULL) *num_partitions = 0; *mem_limit =
hash_mem* 1024L; *ngroups_limit = *mem_limit / hashentrysize; return; }
 

There are 55 aggregates on a varchar(255). I think hashentrysize is pretty big. If it was 255*55 then only 765591
groupsfit in the 10GB of memory.
 

David




-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hello,

So, FYI.... The query I shared is actually a simpler use case of ours 😊 We do have a similar pivot query over 600
columnsto create a large flat tale for analysis on an even larger table. Takes about 15mn to run on V11 with strong CPU
usageand no particular memory usage spike that I can detect via TaskManager. We have been pushing PG hard and simplify
theworkflows of our analysts and data scientists downstream.
 

Thank you,
Laurent.


Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
I wrote:
> I think the right fix here is to remove the cap, which will require
> changing get_hash_mem to return double, and then maybe some cascading
> changes --- I've not looked at its callers.

Or, actually, returning size_t would likely make the most sense.
We'd fold the 1024L multiplier in here too instead of doing that
at the callers.

            regards, tom lane



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Thursday, July 22, 2021 12:23
To: ldh@laurent-hasson.com
Cc: Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, Jul 22, 2021 at 04:16:34PM +0000, ldh@laurent-hasson.com wrote:
> Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers
V11is consuming more or less? 

It might be helpful to know how much RAM v11 is using.

Could you run the query with log_executor_stats=on; client_min_messages=debug;

The interesting part is this:
!       7808 kB max resident size

--
Justin


-------------------------------------------

Hello Justin,

> log_executor_stats=on; client_min_messages=debug;

Would the results then come in EXPLAIN or would I need to pick something up from the logs?

Thank you,
Laurent.





Re: Big performance slowdown from 11.2 to 13.3

From
Justin Pryzby
Date:
On Thu, Jul 22, 2021 at 04:30:00PM +0000, ldh@laurent-hasson.com wrote:
> Hello Justin,
> 
> > log_executor_stats=on; client_min_messages=debug;
> 
> Would the results then come in EXPLAIN or would I need to pick something up from the logs?

If you're running with psql, and client_min_messages=debug, then it'll show up
as a debug message to the client:

ts=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1;
SET
SET
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       0.000011 s user, 0.000209 s system, 0.000219 s elapsed
!       [0.040608 s user, 0.020304 s system total]
!       7808 kB max resident size
...

It can but doesn't have to use "explain" - that just avoids showing the query
output, since it's not what's interesting here.

-- 
Justin



Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I should have said "2GB plus palloc slop".  It doesn't surprise
> me a bit that we seem to be eating another 20% on top of the nominal
> limit.

MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been
for many years. The function get_hash_mem() returns a work_mem-style
int that callers refer to as hash_mem -- the convention is that
callers pretend that there is a work_mem style GUC (called hash_mem)
that they must access by calling get_hash_mem().

I don't see how it's possible for get_hash_mem() to be unable to
return a hash_mem value that could be represented by work_mem
directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where
sizeof(long) is 4. But that's nothing new.

-- 
Peter Geoghegan



Re: Big performance slowdown from 11.2 to 13.3

From
Justin Pryzby
Date:
On Thu, Jul 22, 2021 at 09:36:02AM -0700, Peter Geoghegan wrote:
> I don't see how it's possible for get_hash_mem() to be unable to
> return a hash_mem value that could be represented by work_mem
> directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where
> sizeof(long) is 4. But that's nothing new.

Oh.  So the problem seems to be that:

1) In v12, HashAgg now obeyes work_mem*hash_mem_multiplier;
2) Under windows, work_mem is limited to 2GB.

-- 
Justin



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Jul 22, 2021 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I should have said "2GB plus palloc slop".  It doesn't surprise
>> me a bit that we seem to be eating another 20% on top of the nominal
>> limit.

> MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been
> for many years.

Right.  The point here is that before v13, hash aggregation was not
subject to the work_mem limit, nor any related limit.  If you did an
aggregation requiring more than 2GB-plus-slop, it would work just fine
as long as your machine had enough RAM.  Now, the performance sucks and
there is no knob you can turn to fix it.  That's unacceptable in my book.

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 9:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Right.  The point here is that before v13, hash aggregation was not
> subject to the work_mem limit, nor any related limit.  If you did an
> aggregation requiring more than 2GB-plus-slop, it would work just fine
> as long as your machine had enough RAM.  Now, the performance sucks and
> there is no knob you can turn to fix it.  That's unacceptable in my book.

Oh! That makes way more sense.

I suspect David's theory about hash_agg_set_limits()'s ngroup limit is
correct. It certainly seems like a good starting point.

-- 
Peter Geoghegan



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> Oh.  So the problem seems to be that:

> 1) In v12, HashAgg now obeyes work_mem*hash_mem_multiplier;
> 2) Under windows, work_mem is limited to 2GB.

And more to the point, work_mem*hash_mem_multiplier is *also* limited
to 2GB.  We didn't think that through very carefully.  The point of
the hash_mem_multiplier feature was to allow hash aggregation to still
consume more than the work_mem limit, but we failed to free it from
this 2GB limit.

You're right though that this is Windows-only; on machines with
64-bit "long" there's less of a problem.

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I suspect David's theory about hash_agg_set_limits()'s ngroup limit is
> correct. It certainly seems like a good starting point.

I also suspect that if Laurent set work_mem and/or hash_mem_multiplier
*extremely* aggressively, then eventually the hash agg would be
in-memory. And without actually using all that much memory.

I'm not suggesting that that is a sensible resolution to Laurent's
complaint. I'm just pointing out that it's probably not fundamentally
impossible to make the hash agg avoid spilling through tuning these
GUCs. At least I see no evidence of that right now.

-- 
Peter Geoghegan



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> I also suspect that if Laurent set work_mem and/or hash_mem_multiplier
> *extremely* aggressively, then eventually the hash agg would be
> in-memory. And without actually using all that much memory.

No, he already tried, upthread.  The trouble is that he's on a Windows
machine, so get_hash_mem is quasi-artificially constraining the product
to 2GB.  And he needs it to be a bit more than that.  Whether the
constraint is hitting at the ngroups stage or it's related to actual
memory consumption isn't that relevant.

What I'm wondering about is whether it's worth putting in a solution
for this issue in isolation, or whether we ought to embark on the
long-ignored project of getting rid of use of "long" for any
memory-size-related computations.  There would be no chance of
back-patching something like the latter into v13, though.

            regards, tom lane



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
I did try 2000MB work_mem and 16 multiplier 😊 It seems to plateau at 2GB no matter what. This is what the explain had:

HashAggregate  (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=94618.303..1795311.542 rows=722853
loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 1277985kB
  Buffers: shared hit=14 read=169854, temp read=15777 written=27588
  ->  HashAggregate  (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=30753.022..45384.558
rows=13852618loops=1)
 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126560kB
        Buffers: shared read=169851, temp read=15777 written=27588
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1256856.62 rows=13859750 width=38) (actual
time=0.110..14342.258rows=13852618 loops=1)
 
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[]))
 
              Rows Removed by Filter: 171680
              Buffers: shared read=169851
Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '16', jit = 'off',
jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit =
'24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers =
'4GB',work_mem = ' 2000MB'
 
Planning:
  Buffers: shared hit=186 read=37
Planning Time: 55.709 ms
Execution Time: 1795921.717 ms





-----Original Message-----
From: Peter Geoghegan <pg@bowt.ie> 
Sent: Thursday, July 22, 2021 13:05
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: David Rowley <dgrowleyml@gmail.com>; ldh@laurent-hasson.com; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I suspect David's theory about hash_agg_set_limits()'s ngroup limit is 
> correct. It certainly seems like a good starting point.

I also suspect that if Laurent set work_mem and/or hash_mem_multiplier
*extremely* aggressively, then eventually the hash agg would be in-memory. And without actually using all that much
memory.

I'm not suggesting that that is a sensible resolution to Laurent's complaint. I'm just pointing out that it's probably
notfundamentally impossible to make the hash agg avoid spilling through tuning these GUCs. At least I see no evidence
ofthat right now.
 

--
Peter Geoghegan

RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Thursday, July 22, 2021 12:36
To: ldh@laurent-hasson.com
Cc: Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, Jul 22, 2021 at 04:30:00PM +0000, ldh@laurent-hasson.com wrote:
> Hello Justin,
>
> > log_executor_stats=on; client_min_messages=debug;
>
> Would the results then come in EXPLAIN or would I need to pick something up from the logs?

If you're running with psql, and client_min_messages=debug, then it'll show up as a debug message to the client:

ts=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       0.000011 s user, 0.000209 s system, 0.000219 s elapsed
!       [0.040608 s user, 0.020304 s system total]
!       7808 kB max resident size
...

It can but doesn't have to use "explain" - that just avoids showing the query output, since it's not what's interesting
here.

--
Justin


-------------------------------------------------------------------------------------

Justin,

I tried this but not seeing max resident size data output.

Pepper=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1;
SET
SET
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       0.000000 s user, 0.000000 s system, 0.000109 s elapsed
!       [494.640625 s user, 19.171875 s system total]
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
 Planning Time: 0.041 ms
 Execution Time: 0.012 ms
(3 rows)


For my query:


LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       169.625000 s user, 5.843750 s system, 175.490088 s elapsed
!       [494.640625 s user, 19.171875 s system total]
 HashAggregate  (cost=1764285.18..1764296.18 rows=200 width=1260) (actual time=86323.813..174737.442 rows=723659
loops=1)
   Group Key: t.iccqa_iccassmt_fk
   Buffers: shared hit=364 read=170293, temp written=83229
   CTE t
     ->  HashAggregate  (cost=1343178.39..1356985.17 rows=1380678 width=56) (actual time=22594.053..32519.573
rows=13865785loops=1) 
           Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
           Buffers: shared hit=364 read=170293
           ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1240682.76 rows=13666084 width=38) (actual
time=0.170..10714.598rows=13865785 loops=1) 
                 Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOE 
S PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE -
MACERATION",EDGES,EPITHELIALIZATION,"EXUDATEAMOUNT", 
"EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS
THISA CLOSED SURGICAL WOUND OR SUSPECTE 
D DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER
COMMENTSREGARDING DEBRIDEMENT TYPE","OTHE 
R COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN
QUALITY","OTHERCOMMENTS REGARDING REASON MEASUREM 
ENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","R 
ESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE
AREA(SQ CM)","TOTAL NECROTIC TISSUE ESCHAR 
","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION
-3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LO 
CATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION -
12- 3 O''CLOCK","UNDERMINING SIZE(CM)/LOC 
ATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK","WIDTH(CM)","WOUND PAIN LEVEL, WHERE 
 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[]))
                 Rows Removed by Filter: 172390
                 Buffers: shared hit=364 read=170293
   ->  CTE Scan on t  (cost=0.00..27613.56 rows=1380678 width=552) (actual time=22594.062..40248.874 rows=13865785
loops=1)
         Buffers: shared hit=364 read=170293, temp written=83229
 Planning Time: 0.728 ms
 Execution Time: 175482.904 ms
(15 rows)





RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Thursday, July 22, 2021 12:42
To: Peter Geoghegan <pg@bowt.ie>
Cc: David Rowley <dgrowleyml@gmail.com>; ldh@laurent-hasson.com; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Jul 22, 2021 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I should have said "2GB plus palloc slop".  It doesn't surprise 
>> me a bit that we seem to be eating another 20% on top of the nominal 
>> limit.

> MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been 
> for many years.

Right.  The point here is that before v13, hash aggregation was not subject to the work_mem limit, nor any related
limit. If you did an aggregation requiring more than 2GB-plus-slop, it would work just fine as long as your machine had
enoughRAM.  Now, the performance sucks and there is no knob you can turn to fix it.  That's unacceptable in my book.
 

            regards, tom lane


-------------------------------------------------------

Hello all,

As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great
performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are
overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs
Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused
issueselsewhere though. Are we doing things that are such out of the normal? Are we early in pushing V13 to full
production?😊 Doing analytics with pivoted tables with hundreds of columns is not uncommon in our world.
 

As for the three other requests from the team:

Clustering:
==========================
I re-clustered the table on the index that drives the pivot logic but I didn't see any change:

cluster verbose assessmenticcqa_raw using assessmenticcqa_raw_idx_iccqar_assmt_ques;

HashAggregate  (cost=1774465.36..1774476.36 rows=200 width=1260) (actual time=80848.591..1763443.586 rows=722853
loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 1277985kB
  Buffers: shared hit=369 read=169577, temp read=15780 written=27584
  ->  HashAggregate  (cost=1360748.50..1374772.80 rows=1402430 width=56) (actual time=25475.554..38256.923
rows=13852618loops=1)
 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 5  Memory Usage: 2400305kB  Disk Usage: 126552kB
        Buffers: shared hit=352 read=169577, temp read=15780 written=27584
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1256812.09 rows=13858188 width=38) (actual
time=0.085..11914.135rows=13852618 loops=1)
 
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN
TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION
TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND
ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE
TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING
PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT
TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF
INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC
TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[]))
 
              Rows Removed by Filter: 171680
              Buffers: shared hit=352 read=169577
Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '4', jit = 'off',
jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit =
'24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers =
'4GB',work_mem = '1GB'
 
Planning:
  Buffers: shared hit=100 read=1
Planning Time: 1.663 ms
Execution Time: 1763967.567 ms



More debugging on V11:
==========================
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       169.625000 s user, 5.843750 s system, 175.490088 s elapsed
!       [494.640625 s user, 19.171875 s system total]
 HashAggregate  (cost=1764285.18..1764296.18 rows=200 width=1260) (actual time=86323.813..174737.442 rows=723659
loops=1)
   Group Key: t.iccqa_iccassmt_fk
   Buffers: shared hit=364 read=170293, temp written=83229
   CTE t
     ->  HashAggregate  (cost=1343178.39..1356985.17 rows=1380678 width=56) (actual time=22594.053..32519.573
rows=13865785loops=1)
 
           Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
           Buffers: shared hit=364 read=170293
           ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..1240682.76 rows=13666084 width=38) (actual
time=0.170..10714.598rows=13865785 loops=1)
 
                 Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT
TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOE
 
S PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE -
MACERATION",EDGES,EPITHELIALIZATION,"EXUDATEAMOUNT",
 
"EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS
THISA CLOSED SURGICAL WOUND OR SUSPECTE
 
D DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER
COMMENTSREGARDING DEBRIDEMENT TYPE","OTHE
 
R COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN
QUALITY","OTHERCOMMENTS REGARDING REASON MEASUREM
 
ENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE
INDURATION","REASONMEASUREMENTS NOT TAKEN","R
 
ESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE
AREA(SQ CM)","TOTAL NECROTIC TISSUE ESCHAR
 
","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION
-3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LO
 
CATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION -
12- 3 O''CLOCK","UNDERMINING SIZE(CM)/LOC
 
ATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK","WIDTH(CM)","WOUND PAIN LEVEL, WHERE
 
 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[]))
                 Rows Removed by Filter: 172390
                 Buffers: shared hit=364 read=170293
   ->  CTE Scan on t  (cost=0.00..27613.56 rows=1380678 width=552) (actual time=22594.062..40248.874 rows=13865785
loops=1)
         Buffers: shared hit=364 read=170293, temp written=83229
 Planning Time: 0.728 ms
 Execution Time: 175482.904 ms
(15 rows)



Patch on V13?
==========================
Maybe there can be a patch on V13 and then a longer-term effort afterwards? As it is, I have no way to deploy V13 as
thisis a hard regression for us.
 

Thank you,
Laurent.




Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> No, he already tried, upthread.  The trouble is that he's on a Windows
> machine, so get_hash_mem is quasi-artificially constraining the product
> to 2GB.  And he needs it to be a bit more than that.  Whether the
> constraint is hitting at the ngroups stage or it's related to actual
> memory consumption isn't that relevant.

Somehow I missed that part.

> What I'm wondering about is whether it's worth putting in a solution
> for this issue in isolation, or whether we ought to embark on the
> long-ignored project of getting rid of use of "long" for any
> memory-size-related computations.  There would be no chance of
> back-patching something like the latter into v13, though.

+1. Even if we assume that Windows is a low priority platform, in the
long run it'll be easier to make it more like every other platform.

The use of "long" is inherently suspect to me. It signals that the
programmer wants something wider than "int", even if the standard
doesn't actually require that "long" be wider. This seems to
contradict what we know to be true for Postgres, which is that in
general it's unsafe to assume that long is int64. It's not just
work_mem related calculations. There is also code like logtape.c,
which uses long for block numbers -- that also exposes us to risk on
Windows.

By requiring int64 be used instead of long, we don't actually increase
risk for non-Windows platforms to any significant degree. I'm pretty
sure that "long" means int64 on non-Windows 64-bit platforms anyway.

-- 
Peter Geoghegan



Re: Big performance slowdown from 11.2 to 13.3

From
Justin Pryzby
Date:
On Thu, Jul 22, 2021 at 05:26:26PM +0000, ldh@laurent-hasson.com wrote:
> I tried this but not seeing max resident size data output.

Oh.  Apparently, that's not supported under windows..

#if defined(HAVE_GETRUSAGE)
        appendStringInfo(&str,
                                         "!\t%ld kB max resident size\n",



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
-----Original Message-----
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Thursday, July 22, 2021 13:29
To: ldh@laurent-hasson.com
Cc: Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, Jul 22, 2021 at 05:26:26PM +0000, ldh@laurent-hasson.com wrote:
> I tried this but not seeing max resident size data output.

Oh.  Apparently, that's not supported under windows..

#if defined(HAVE_GETRUSAGE)
        appendStringInfo(&str,
                                         "!\t%ld kB max resident size\n",


----------------------------

Hello,

Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent
everywhere...

Thank you,
Laurent.



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Jul 22, 2021 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What I'm wondering about is whether it's worth putting in a solution
>> for this issue in isolation, or whether we ought to embark on the
>> long-ignored project of getting rid of use of "long" for any
>> memory-size-related computations.  There would be no chance of
>> back-patching something like the latter into v13, though.

> By requiring int64 be used instead of long, we don't actually increase
> risk for non-Windows platforms to any significant degree. I'm pretty
> sure that "long" means int64 on non-Windows 64-bit platforms anyway.

Well, what we really ought to be using is size_t (a/k/a Size), at least
for memory-space-related calculations.  I don't have an opinion right
now about what logtape.c ought to use.  I do agree that avoiding "long"
altogether would be a good ultimate goal.

In the short term though, the question is whether we want to regard this
hashagg issue as something we need a fix for in v13/v14.  The fact that
it's Windows-only makes it slightly less pressing in my mind, but it's
still a regression that some people are going to hit.

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 10:33 AM ldh@laurent-hasson.com
<ldh@laurent-hasson.com> wrote:
> Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent
everywhere...

To be clear, I didn't actually say that. I said that it doesn't matter
either way, as far as addressing this long standing "int64 vs long"
issue goes.

-- 
Peter Geoghegan



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Peter Geoghegan <pg@bowt.ie> 
Sent: Thursday, July 22, 2021 13:36
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

On Thu, Jul 22, 2021 at 10:33 AM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:
> Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent
everywhere...

To be clear, I didn't actually say that. I said that it doesn't matter either way, as far as addressing this long
standing"int64 vs long"
 
issue goes.

--
Peter Geoghegan


Yes, agreed Peter... The "lower priority" issue was mentioned, but not in terms of the applicability of the fix
overall.Personally, I would prefer going the size_t route vs int/long/int64 in C/C++/. Of course, as a user, I'd love a
patchon V13 and something cleaner in V14.
 

Thanks,
Laurent.



Re: Big performance slowdown from 11.2 to 13.3

From
Peter Geoghegan
Date:
On Thu, Jul 22, 2021 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, what we really ought to be using is size_t (a/k/a Size), at least
> for memory-space-related calculations.  I don't have an opinion right
> now about what logtape.c ought to use.  I do agree that avoiding "long"
> altogether would be a good ultimate goal.

I assume that we often use "long" in contexts where a signed integer
type is required. Maybe this is not true in the case of the work_mem
style calculations. But I know that it works that way in logtape.c,
where -1 is a sentinel value.

We already use int64 (not size_t) in tuplesort.c for roughly the same
reason: LACKMEM() needs to work with negative values, to handle
certain edge cases.

> In the short term though, the question is whether we want to regard this
> hashagg issue as something we need a fix for in v13/v14.  The fact that
> it's Windows-only makes it slightly less pressing in my mind, but it's
> still a regression that some people are going to hit.

True. I worry about the potential for introducing new bugs on Windows
by backpatching a fix for this. Technically this restriction existed
in every other work_mem consumer on Windows. Of course this won't
matter much to users like Laurent.

-- 
Peter Geoghegan



Re: Big performance slowdown from 11.2 to 13.3

From
Ranier Vilela
Date:
Em qui., 22 de jul. de 2021 às 14:28, Peter Geoghegan <pg@bowt.ie> escreveu:
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> No, he already tried, upthread.  The trouble is that he's on a Windows
> machine, so get_hash_mem is quasi-artificially constraining the product
> to 2GB.  And he needs it to be a bit more than that.  Whether the
> constraint is hitting at the ngroups stage or it's related to actual
> memory consumption isn't that relevant.

Somehow I missed that part.

> What I'm wondering about is whether it's worth putting in a solution
> for this issue in isolation, or whether we ought to embark on the
> long-ignored project of getting rid of use of "long" for any
> memory-size-related computations.  There would be no chance of
> back-patching something like the latter into v13, though.

+1. Even if we assume that Windows is a low priority platform, in the
long run it'll be easier to make it more like every other platform.

The use of "long" is inherently suspect to me. It signals that the
programmer wants something wider than "int", even if the standard
doesn't actually require that "long" be wider. This seems to
contradict what we know to be true for Postgres, which is that in
general it's unsafe to assume that long is int64. It's not just
work_mem related calculations. There is also code like logtape.c,
which uses long for block numbers -- that also exposes us to risk on
Windows.

By requiring int64 be used instead of long, we don't actually increase
risk for non-Windows platforms to any significant degree. I'm pretty
sure that "long" means int64 on non-Windows 64-bit platforms anyway.
I wonder if similar issues not raise from this [1].

(b/src/backend/optimizer/path/costsize.c)
cost_tuplesort uses *long* to store sort_mem_bytes.
I suggested switching to int64, but obviously to no avail.

+1 to switch long to int64.

regards,
Ranier Vilela
 

Re: Big performance slowdown from 11.2 to 13.3

From
Alvaro Herrera
Date:
On 2021-Jul-22, ldh@laurent-hasson.com wrote:

> Yes, agreed Peter... The "lower priority" issue was mentioned, but not
> in terms of the applicability of the fix overall. Personally, I would
> prefer going the size_t route vs int/long/int64 in C/C++/. Of course,
> as a user, I'd love a patch on V13 and something cleaner in V14.

Just to clarify our terminology here.  "A patch" means any kind of
change to the source code, regardless of its cleanliness or
applicability to versions deemed stable.  You can have one patch which
is a ugly hack for a stable version that can't have invasive changes,
and another patch which is a clean, more maintainable version of a
totally different fix for the version in development.  We use the same
word, "patch", for both types of source code changes.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Big performance slowdown from 11.2 to 13.3

From
Vijaykumar Jain
Date:
Just asking, I may be completely wrong.

is this query parallel safe?
can we force parallel workers, by setting low parallel_setup_cost or otherwise to make use of scatter gather and Partial HashAggregate(s)?
I am just assuming more workers doing things in parallel, would require less disk spill per hash aggregate (or partial hash aggregate ?) and the scatter gather at the end.

I did some runs in my demo environment, not with the same query, some group by aggregates  with around 25M rows, and it showed reasonable results, not too off.
this was pg14 on ubuntu.




RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit.

 

 

From: Vijaykumar Jain <vijaykumarjain.github@gmail.com>
Sent: Thursday, July 22, 2021 16:32
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

 

Just asking, I may be completely wrong.

 

is this query parallel safe?

can we force parallel workers, by setting low parallel_setup_cost or otherwise to make use of scatter gather and Partial HashAggregate(s)?

I am just assuming more workers doing things in parallel, would require less disk spill per hash aggregate (or partial hash aggregate ?) and the scatter gather at the end.

 

I did some runs in my demo environment, not with the same query, some group by aggregates  with around 25M rows, and it showed reasonable results, not too off.

this was pg14 on ubuntu.

 

 

 

 

Re: Big performance slowdown from 11.2 to 13.3

From
Vijaykumar Jain
Date:
On Fri, 23 Jul 2021 at 03:06, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:

I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit.

 


Yeah, may be i was diverting, and possibly cannot use the windows bottleneck.

although the query is diff, the steps were
1) use system default, work_mem = 4MB, parallel_setup_cost = 1000 
-- runs the query in parallel, no disk spill as work_mem suff.for my data

postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates  group by year_actual, month_name, day_name) select max(date),year_actual  from cte group by year_actual;
                                                                         QUERY PLAN                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=931227.78..932398.85 rows=200 width=8) (actual time=7850.214..7855.848 rows=51 loops=1)
   Group Key: dates.year_actual
   ->  Finalize GroupAggregate  (cost=931227.78..932333.85 rows=4200 width=28) (actual time=7850.075..7855.611 rows=4201 loops=1)
         Group Key: dates.year_actual, dates.month_name, dates.day_name
         ->  Gather Merge  (cost=931227.78..932207.85 rows=8400 width=28) (actual time=7850.069..7854.008 rows=11295 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=930227.76..930238.26 rows=4200 width=28) (actual time=7846.419..7846.551 rows=3765 loops=3)
                     Sort Key: dates.year_actual, dates.month_name, dates.day_name
                     Sort Method: quicksort  Memory: 391kB
                     Worker 0:  Sort Method: quicksort  Memory: 392kB
                     Worker 1:  Sort Method: quicksort  Memory: 389kB
                     ->  Partial HashAggregate  (cost=929933.00..929975.00 rows=4200 width=28) (actual time=7841.979..7842.531 rows=3765 loops=3)
                           Group Key: dates.year_actual, dates.month_name, dates.day_name
                           Batches: 1  Memory Usage: 721kB
                           Worker 0:  Batches: 1  Memory Usage: 721kB
                           Worker 1:  Batches: 1  Memory Usage: 721kB
                           ->  Parallel Seq Scan on dates  (cost=0.00..820355.00 rows=10957800 width=28) (actual time=3.347..4779.784 rows=8766240 loops=3)
 Planning Time: 0.133 ms
 Execution Time: 7855.958 ms
(20 rows)

-- set  work_mem to a very low value, to spill to disk and compare the spill in parallel vs serial
postgres=# set work_mem TO 64; -- 
SET
postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates  group by year_actual, month_name, day_name) select max(date),year_actual  from cte group by year_actual;
                                                                         QUERY PLAN                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2867778.00..2868949.07 rows=200 width=8) (actual time=18116.529..18156.972 rows=51 loops=1)
   Group Key: dates.year_actual
   ->  Finalize GroupAggregate  (cost=2867778.00..2868884.07 rows=4200 width=28) (actual time=18116.421..18156.729 rows=4201 loops=1)
         Group Key: dates.year_actual, dates.month_name, dates.day_name
         ->  Gather Merge  (cost=2867778.00..2868758.07 rows=8400 width=28) (actual time=18116.412..18155.136 rows=11126 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=2866777.98..2866788.48 rows=4200 width=28) (actual time=17983.836..17984.981 rows=3709 loops=3)
                     Sort Key: dates.year_actual, dates.month_name, dates.day_name
                     Sort Method: external merge  Disk: 160kB
                     Worker 0:  Sort Method: external merge  Disk: 168kB
                     Worker 1:  Sort Method: external merge  Disk: 160kB
                     ->  Partial HashAggregate  (cost=2566754.38..2866423.72 rows=4200 width=28) (actual time=10957.390..17976.250 rows=3709 loops=3)
                           Group Key: dates.year_actual, dates.month_name, dates.day_name
                           Planned Partitions: 4  Batches: 21  Memory Usage: 93kB  Disk Usage: 457480kB
                           Worker 0:  Batches: 21  Memory Usage: 93kB  Disk Usage: 473056kB
                           Worker 1:  Batches: 21  Memory Usage: 93kB  Disk Usage: 456792kB
                           ->  Parallel Seq Scan on dates  (cost=0.00..820355.00 rows=10957800 width=28) (actual time=1.042..5893.803 rows=8766240 loops=3)
 Planning Time: 0.142 ms
 Execution Time: 18195.973 ms
(20 rows)

postgres=# set parallel_setup_cost TO 1000000000; -- make sure it never uses parallel, check disk spill (much more than when parallel workers used)
SET
postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates  group by year_actual, month_name, day_name) select max(date),year_actual  from cte group by year_actual;
                                                                  QUERY PLAN                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=5884624.58..5884658.08 rows=200 width=8) (actual time=35462.340..35463.142 rows=51 loops=1)
   Group Key: cte.year_actual
   ->  Sort  (cost=5884624.58..5884635.08 rows=4200 width=8) (actual time=35462.325..35462.752 rows=4201 loops=1)
         Sort Key: cte.year_actual
         Sort Method: external merge  Disk: 80kB
         ->  Subquery Scan on cte  (cost=5165122.70..5884312.33 rows=4200 width=8) (actual time=21747.139..35461.371 rows=4201 loops=1)
               ->  HashAggregate  (cost=5165122.70..5884270.33 rows=4200 width=28) (actual time=21747.138..35461.140 rows=4201 loops=1)
                     Group Key: dates.year_actual, dates.month_name, dates.day_name
                     Planned Partitions: 4  Batches: 21  Memory Usage: 93kB  Disk Usage: 1393192kB
                     ->  Seq Scan on dates  (cost=0.00..973764.20 rows=26298720 width=28) (actual time=0.005..10698.392 rows=26298721 loops=1)
 Planning Time: 0.124 ms
 Execution Time: 35548.514 ms
(12 rows)

I was thinking trying to make the query run in parallel, would reduce disk io per worker, and maybe speed up aggregate, especially if ti runs around 1 hours.
ofcourse, this was just trying things, maybe i am trying to override optimizer, but just wanted to understand cost diff and resource by forcing custom plans.

i also tried with enable_sort to off, enable_hashag to off <it only got worse, so not sharing as it would deviate the thread>.

again, ignore, if it does not make sense :)
 

 

RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

 

 

From: Vijaykumar Jain <vijaykumarjain.github@gmail.com>
Sent: Friday, July 23, 2021 10:45
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

 

On Fri, 23 Jul 2021 at 03:06, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:

I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit.

 

 

Yeah, may be i was diverting, and possibly cannot use the windows bottleneck.

 

although the query is diff, the steps were

1) use system default, work_mem = 4MB, parallel_setup_cost = 1000 

-- runs the query in parallel, no disk spill as work_mem suff.for my data

 

postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates  group by year_actual, month_name, day_name) select max(date),year_actual  from cte group by year_actual;

                                                                         QUERY PLAN                                                     

------------------------------------------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=931227.78..932398.85 rows=200 width=8) (actual time=7850.214..7855.848 rows=51 loops=1)

   Group Key: dates.year_actual

   ->  Finalize GroupAggregate  (cost=931227.78..932333.85 rows=4200 width=28) (actual time=7850.075..7855.611 rows=4201 loops=1)

         Group Key: dates.year_actual, dates.month_name, dates.day_name

         ->  Gather Merge  (cost=931227.78..932207.85 rows=8400 width=28) (actual time=7850.069..7854.008 rows=11295 loops=1)

               Workers Planned: 2

               Workers Launched: 2

               ->  Sort  (cost=930227.76..930238.26 rows=4200 width=28) (actual time=7846.419..7846.551 rows=3765 loops=3)

                     Sort Key: dates.year_actual, dates.month_name, dates.day_name

                     Sort Method: quicksort  Memory: 391kB

                     Worker 0:  Sort Method: quicksort  Memory: 392kB

                     Worker 1:  Sort Method: quicksort  Memory: 389kB

                     ->  Partial HashAggregate  (cost=929933.00..929975.00 rows=4200 width=28) (actual time=7841.979..7842.531 rows=3765 loops=3)

                           Group Key: dates.year_actual, dates.month_name, dates.day_name

                           Batches: 1  Memory Usage: 721kB

                           Worker 0:  Batches: 1  Memory Usage: 721kB

                           Worker 1:  Batches: 1  Memory Usage: 721kB

                           ->  Parallel Seq Scan on dates  (cost=0.00..820355.00 rows=10957800 width=28) (actual time=3.347..4779.784 rows=8766240 loops=3)

 Planning Time: 0.133 ms

 Execution Time: 7855.958 ms

(20 rows)

 

-- set  work_mem to a very low value, to spill to disk and compare the spill in parallel vs serial

postgres=# set work_mem TO 64; -- 

SET

postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates  group by year_actual, month_name, day_name) select max(date),year_actual  from cte group by year_actual;

                                                                         QUERY PLAN                                                     

------------------------------------------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=2867778.00..2868949.07 rows=200 width=8) (actual time=18116.529..18156.972 rows=51 loops=1)

   Group Key: dates.year_actual

   ->  Finalize GroupAggregate  (cost=2867778.00..2868884.07 rows=4200 width=28) (actual time=18116.421..18156.729 rows=4201 loops=1)

         Group Key: dates.year_actual, dates.month_name, dates.day_name

         ->  Gather Merge  (cost=2867778.00..2868758.07 rows=8400 width=28) (actual time=18116.412..18155.136 rows=11126 loops=1)

               Workers Planned: 2

               Workers Launched: 2

               ->  Sort  (cost=2866777.98..2866788.48 rows=4200 width=28) (actual time=17983.836..17984.981 rows=3709 loops=3)

                     Sort Key: dates.year_actual, dates.month_name, dates.day_name

                     Sort Method: external merge  Disk: 160kB

                     Worker 0:  Sort Method: external merge  Disk: 168kB

                     Worker 1:  Sort Method: external merge  Disk: 160kB

                     ->  Partial HashAggregate  (cost=2566754.38..2866423.72 rows=4200 width=28) (actual time=10957.390..17976.250 rows=3709 loops=3)

                           Group Key: dates.year_actual, dates.month_name, dates.day_name

                           Planned Partitions: 4  Batches: 21  Memory Usage: 93kB  Disk Usage: 457480kB

                           Worker 0:  Batches: 21  Memory Usage: 93kB  Disk Usage: 473056kB

                           Worker 1:  Batches: 21  Memory Usage: 93kB  Disk Usage: 456792kB

                           ->  Parallel Seq Scan on dates  (cost=0.00..820355.00 rows=10957800 width=28) (actual time=1.042..5893.803 rows=8766240 loops=3)

 Planning Time: 0.142 ms

 Execution Time: 18195.973 ms

(20 rows)

 

postgres=# set parallel_setup_cost TO 1000000000; -- make sure it never uses parallel, check disk spill (much more than when parallel workers used)

SET

postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates  group by year_actual, month_name, day_name) select max(date),year_actual  from cte group by year_actual;

                                                                  QUERY PLAN                                                            

-----------------------------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=5884624.58..5884658.08 rows=200 width=8) (actual time=35462.340..35463.142 rows=51 loops=1)

   Group Key: cte.year_actual

   ->  Sort  (cost=5884624.58..5884635.08 rows=4200 width=8) (actual time=35462.325..35462.752 rows=4201 loops=1)

         Sort Key: cte.year_actual

         Sort Method: external merge  Disk: 80kB

         ->  Subquery Scan on cte  (cost=5165122.70..5884312.33 rows=4200 width=8) (actual time=21747.139..35461.371 rows=4201 loops=1)

               ->  HashAggregate  (cost=5165122.70..5884270.33 rows=4200 width=28) (actual time=21747.138..35461.140 rows=4201 loops=1)

                     Group Key: dates.year_actual, dates.month_name, dates.day_name

                     Planned Partitions: 4  Batches: 21  Memory Usage: 93kB  Disk Usage: 1393192kB

                     ->  Seq Scan on dates  (cost=0.00..973764.20 rows=26298720 width=28) (actual time=0.005..10698.392 rows=26298721 loops=1)

 Planning Time: 0.124 ms

 Execution Time: 35548.514 ms

(12 rows)

 

I was thinking trying to make the query run in parallel, would reduce disk io per worker, and maybe speed up aggregate, especially if ti runs around 1 hours.

ofcourse, this was just trying things, maybe i am trying to override optimizer, but just wanted to understand cost diff and resource by forcing custom plans.

 

i also tried with enable_sort to off, enable_hashag to off <it only got worse, so not sharing as it would deviate the thread>.

 

again, ignore, if it does not make sense :)

 

 

 

 

Hello,

 

OK, that makes sense. I have some limited time to test those additional scenarios, but they make sense. I’ll see what I can do. The query on 11 takes under 5mn, and 50mn+ on 13.

 

Thank you,

Laurent.

Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great
performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are
overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs
Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused
issueselsewhere though. 

Maybe it has, but you're the first to report the problem, or at least
the first to report it with enough detail to trace the cause.

I've pushed a fix that removes the artificial restriction on work_mem
times hash_mem_multiplier; it will be in next month's 13.4 release.
You'll still need to increase hash_mem_multiplier to get satisfactory
performance on your workload, but at least it'll be possible to do that.

            regards, tom lane



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Sunday, July 25, 2021 14:08
To: ldh@laurent-hasson.com
Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great
performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are
overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs
Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused
issueselsewhere though.
 

Maybe it has, but you're the first to report the problem, or at least the first to report it with enough detail to
tracethe cause.
 

I've pushed a fix that removes the artificial restriction on work_mem times hash_mem_multiplier; it will be in next
month's13.4 release.
 
You'll still need to increase hash_mem_multiplier to get satisfactory performance on your workload, but at least it'll
bepossible to do that.
 

            regards, tom lane



---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hello Tom,

So happy to help in whatever capacity I can 😊

This is fantastic news! Can't wait to try 13.4 asap and will report back.

Thank you,
Laurent.

RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
Tom,

One question that popped up in my head. hash_mem_multiplier is an upper-bound right: it doesn't reserve memory ahead of
timecorrect? So there is no reason for me to spend undue amounts of time fine-tuning this parameter? If I have work_mem
to521MB, then I can set hash_mem_multiplier to 8 and should be OK. This doesn't mean that every query will consume 4GB
ofmemory. 

Thank you,
Laurent.


-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Sunday, July 25, 2021 14:08
To: ldh@laurent-hasson.com
Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great
performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are
overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs
Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused
issueselsewhere though. 

Maybe it has, but you're the first to report the problem, or at least the first to report it with enough detail to
tracethe cause. 

I've pushed a fix that removes the artificial restriction on work_mem times hash_mem_multiplier; it will be in next
month's13.4 release. 
You'll still need to increase hash_mem_multiplier to get satisfactory performance on your workload, but at least it'll
bepossible to do that. 

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
"David G. Johnston"
Date:
On Tue, Jul 27, 2021 at 7:57 PM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:
hash_mem_multiplier is an upper-bound right: it doesn't reserve memory ahead of time correct?

Yes, that is what the phrasing "maximum amount" in the docs is trying to convey.


But also note that it is "each operation" that gets access to that limit.

David J.

Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes:
> One question that popped up in my head. hash_mem_multiplier is an upper-bound right: it doesn't reserve memory ahead
oftime correct? So there is no reason for me to spend undue amounts of time fine-tuning this parameter? If I have
work_memto 521MB, then I can set hash_mem_multiplier to 8 and should be OK. This doesn't mean that every query will
consume4GB of memory. 

Yeah, I wouldn't sweat over the specific value.  The pre-v13 behavior
was effectively equivalent to hash_mem_multiplier = infinity, so if
you weren't having any OOM problems before, just crank it up.

            regards, tom lane



Re: Big performance slowdown from 11.2 to 13.3

From
Tom Lane
Date:
I wrote:
> Yeah, I wouldn't sweat over the specific value.  The pre-v13 behavior
> was effectively equivalent to hash_mem_multiplier = infinity, so if
> you weren't having any OOM problems before, just crank it up.

Oh, wait, scratch that: the old executor's behavior is accurately
described by that statement, but the planner's is not.  The planner
will not pick a hashagg plan if it guesses that the hash table
would exceed the configured limit (work_mem before, now work_mem
times hash_mem_multiplier).  So raising hash_mem_multiplier to the
moon might bias the v13 planner to pick hashagg plans in cases
where earlier versions would not have.  This doesn't describe your
immediate problem, but it might be a reason to not just set the
value as high as you can.

BTW, this also suggests that the planner is underestimating the
amount of memory needed for the hashagg, both before and after.
That might be something to investigate at some point.

            regards, tom lane



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 27, 2021 23:31
To: ldh@laurent-hasson.com
Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

I wrote:
> Yeah, I wouldn't sweat over the specific value.  The pre-v13 behavior
> was effectively equivalent to hash_mem_multiplier = infinity, so if
> you weren't having any OOM problems before, just crank it up.

Oh, wait, scratch that: the old executor's behavior is accurately described by that statement, but the planner's is
not. The planner will not pick a hashagg plan if it guesses that the hash table would exceed the configured limit
(work_membefore, now work_mem times hash_mem_multiplier).  So raising hash_mem_multiplier to the moon might bias the
v13planner to pick hashagg plans in cases where earlier versions would not have.  This doesn't describe your immediate
problem,but it might be a reason to not just set the value as high as you can. 

BTW, this also suggests that the planner is underestimating the amount of memory needed for the hashagg, both before
andafter. 
That might be something to investigate at some point.

            regards, tom lane


This is very useful to know... all things I'll get to test after 13.4 is released. I'll report back when I am able to.

Thank you,
Laurent.



RE: Big performance slowdown from 11.2 to 13.3

From
"ldh@laurent-hasson.com"
Date:
Hello all...

I upgraded to 13.4 and re-ran the various versions of the query. I think I get a mixed bag of results. In the spirit of
closingthis thread, I'll start with the fantastic news and then start a new thread with the other issue I have
isolated.

It looks like the query I submitted initially is now performing very well!!! It's funny but it looks like the query is
consuminga tiny bit over 2.5GB, so my use case was literally a hair above the memory limit that we had discovered.
V13.4takes 54s vs V11.2 takes 74s!! This is quite amazing to have such a performance gain! 25% better. The query was as
follows:

select "iccqa_iccassmt_fk"
     , MAX("iccqar_ans_val") as "iccqar_ans_val"
     , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEBRIDEMENT DATE'))::text as "iccqa_DEBRIDEMENT_DATE"
     , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEBRIDEMENT THIS VISIT') )::text as
"iccqa_DEBRIDEMENT_THIS_VISIT"
     , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEBRIDEMENT TYPE') )::text as "iccqa_DEBRIDEMENT_TYPE"
     , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)'))::text as "iccqa_DEPTH_CM"
     , ... 50 more columns
from (
-- 'A pivoted view of ICC QA assessments'
select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC
assessment.
     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the
meta-data.
     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if
applicable)from the meta-data. 
  from VNAHGEDW_FACTS.AssessmentICCQA_Raw
 where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEBRIDEMENT DATE', 'DEBRIDEMENT THIS VISIT',
'DEBRIDEMENTTYPE', 'DEPTH (CM)', ... 50 more values) 
 group by 1, 2
) T
     group by 1
;

The plans are:

V13.4 - 54s
HashAggregate  (cost=1486844.46..1486846.46 rows=200 width=1764) (actual time=50714.016..53813.049 rows=677899 loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 1196065kB
  Buffers: shared hit=158815
  ->  Finalize HashAggregate  (cost=1100125.42..1113234.54 rows=1310912 width=56) (actual time=14487.522..20498.241
rows=12926549loops=1) 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Batches: 1  Memory Usage: 2572305kB
        Buffers: shared hit=158815
        ->  Gather  (cost=382401.10..1050966.22 rows=6554560 width=56) (actual time=2891.177..6614.288 rows=12926549
loops=1)
              Workers Planned: 5
              Workers Launched: 5
              Buffers: shared hit=158815
              ->  Partial HashAggregate  (cost=381401.10..394510.22 rows=1310912 width=56) (actual
time=2790.736..3680.249rows=2154425 loops=6) 
                    Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
                    Batches: 1  Memory Usage: 417809kB
                    Buffers: shared hit=158815
                    Worker 0:  Batches: 1  Memory Usage: 401425kB
                    Worker 1:  Batches: 1  Memory Usage: 409617kB
                    Worker 2:  Batches: 1  Memory Usage: 393233kB
                    Worker 3:  Batches: 1  Memory Usage: 385041kB
                    Worker 4:  Batches: 1  Memory Usage: 393233kB
                    ->  Parallel Seq Scan on assessmenticcqa_raw  (cost=0.00..362006.30 rows=2585974 width=38) (actual
time=0.042..1600.138rows=2154425 loops=6) 
                          Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS
VISIT","DEBRIDEMENTTYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN
PRESENT","DRAINTYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE
TYPE","GRANULATIONTISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A
CLOSEDSURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE
AMOUNT","NECROTICTISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN
TYPE","OTHERCOMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING
REASONMEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE
EDEMA","PERIPHERALTISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND
SYMPTOMSOF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL
NECROTICTISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) 
                          Rows Removed by Filter: 30428
                          Buffers: shared hit=158815
Planning:
  Buffers: shared hit=3
Planning Time: 0.552 ms
Execution Time: 54241.152 ms



V11.2 - 74s
HashAggregate  (cost=1629249.48..1629251.48 rows=200 width=1764) (actual time=68833.447..73384.374 rows=742896 loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Buffers: shared read=173985
  ->  Finalize HashAggregate  (cost=1205455.43..1219821.33 rows=1436590 width=56) (actual time=19441.489..28630.297
rows=14176024loops=1) 
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
        Buffers: shared read=173985
        ->  Gather  (cost=418922.41..1151583.31 rows=7182950 width=56) (actual time=3698.235..8445.971 rows=14176024
loops=1)
              Workers Planned: 5
              Workers Launched: 5
              Buffers: shared read=173985
              ->  Partial HashAggregate  (cost=417922.41..432288.31 rows=1436590 width=56) (actual
time=3559.562..4619.406rows=2362671 loops=6) 
                    Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
                    Buffers: shared read=173985
                    ->  Parallel Seq Scan on assessmenticcqa_raw  (cost=0.00..396656.48 rows=2835457 width=38) (actual
time=0.261..1817.102rows=2362671 loops=6) 
                          Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS
VISIT","DEBRIDEMENTTYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN
PRESENT","DRAINTYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE
TYPE","GRANULATIONTISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A
CLOSEDSURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE
AMOUNT","NECROTICTISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN
TYPE","OTHERCOMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING
REASONMEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE
EDEMA","PERIPHERALTISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND
SYMPTOMSOF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL
NECROTICTISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12
O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6
O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH
(CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) 
                          Rows Removed by Filter: 31608
                          Buffers: shared read=173985
Planning Time: 22.673 ms
Execution Time: 74110.779 ms


Thank you so much to the whole team to this great work!

I'll send a separate email with the other issue I think I have isolated.

Thank you!
Laurent Hasson


-----Original Message-----
From: ldh@laurent-hasson.com <ldh@laurent-hasson.com>
Sent: Wednesday, July 28, 2021 16:13
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: RE: Big performance slowdown from 11.2 to 13.3


-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 27, 2021 23:31
To: ldh@laurent-hasson.com
Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>;
pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3

I wrote:
> Yeah, I wouldn't sweat over the specific value.  The pre-v13 behavior
> was effectively equivalent to hash_mem_multiplier = infinity, so if
> you weren't having any OOM problems before, just crank it up.

Oh, wait, scratch that: the old executor's behavior is accurately described by that statement, but the planner's is
not. The planner will not pick a hashagg plan if it guesses that the hash table would exceed the configured limit
(work_membefore, now work_mem times hash_mem_multiplier).  So raising hash_mem_multiplier to the moon might bias the
v13planner to pick hashagg plans in cases where earlier versions would not have.  This doesn't describe your immediate
problem,but it might be a reason to not just set the value as high as you can. 

BTW, this also suggests that the planner is underestimating the amount of memory needed for the hashagg, both before
andafter. 
That might be something to investigate at some point.

            regards, tom lane


This is very useful to know... all things I'll get to test after 13.4 is released. I'll report back when I am able to.

Thank you,
Laurent.