Thread: Big performance slowdown from 11.2 to 13.3
Hello,
We have a data warehouse working on Postgres V11.2. We have a query that is pretty beefy that has been taking under 5mn to run consistently every day for about 2 years as part of a data warehouse ETL process. It’s a pivot over 55 values on a table with some 15M rows. The total table size is over 2GB (table+indices+other).
CREATE TABLE assessmenticcqa_raw
(
iccqar_iccassmt_fk integer NOT NULL, -- foreign key to assessment
iccqar_ques_code character varying(255) COLLATE pg_catalog."default" NOT NULL, -- question code
iccqar_ans_val character varying(255) COLLATE pg_catalog."default" NOT NULL, -- answer value
"lastUpdated" timestamp with time zone NOT NULL DEFAULT now(),
CONSTRAINT fk_assessmenticcqa_raw_assessment FOREIGN KEY (iccqar_iccassmt_fk)
REFERENCES assessmenticc_fact (iccassmt_pk) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
)
TABLESPACE pg_default;
CREATE UNIQUE INDEX assessmenticcqa_raw_idx_iccqar_assmt_ques
ON assessmenticcqa_raw USING btree
(iccqar_iccassmt_fk ASC NULLS LAST, iccqar_ques_code COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX assessmenticcqa_raw_idx_iccqar_lastupdated
ON assessmenticcqa_raw USING btree
("lastUpdated" ASC NULLS LAST)
TABLESPACE pg_default;
The query that does the pivot is:
WITH t AS (
SELECT assessmenticcqa_raw.iccqar_iccassmt_fk AS iccqa_iccassmt_fk,
assessmenticcqa_raw.iccqar_ques_code,
max(assessmenticcqa_raw.iccqar_ans_val::text) AS iccqar_ans_val
FROM assessmenticcqa_raw
WHERE assessmenticcqa_raw.iccqar_ques_code::text = ANY (ARRAY['DEBRIDEMENT DATE'::character varying::text
, 'DEBRIDEMENT THIS VISIT'::character varying::text
, 'DEBRIDEMENT TYPE'::character varying::text
, 'DEPTH (CM)'::character varying::text
, 'DEPTH DESCRIPTION'::character varying::text
, … 55 total columns to pivot
])
GROUP BY assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code
)
SELECT t.iccqa_iccassmt_fk,
max(t.iccqar_ans_val) AS iccqar_ans_val,
tilda.todate(max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEBRIDEMENT DATE'::text)::character varying, NULL::date) AS "iccqa_DEBRIDEMENT_DATE",
max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEBRIDEMENT THIS VISIT'::text) AS "iccqa_DEBRIDEMENT_THIS_VISIT",
max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEBRIDEMENT TYPE'::text) AS "iccqa_DEBRIDEMENT_TYPE",
tilda.tofloat(max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEPTH (CM)'::text)::character varying, NULL::real) AS "iccqa_DEPTH_CM",
max(t.iccqar_ans_val) FILTER (WHERE t.iccqar_ques_code::text = 'DEPTH DESCRIPTION'::text) AS "iccqa_DEPTH_DESCRIPTION",
… 55 total columns being pivotted
FROM t
GROUP BY t.iccqa_iccassmt_fk;
This query has been working flawlessly without so much as a glitch every day for the last 2 years or so with of course an increasing amount of data every day (the data grows at about 15-20 thousand records per day). I know the query is not incremental but at under 5mn, it’s simple and works well and can handle inconsistent updates on the data source we use which is pretty dirty.
The problem I am facing is that we are trying to move to Postgres V13.3 and this query (and several others like it) is now taking 10x longer (3,000 seconds vs 300 seconds) which makes it completely unacceptable. I created a V13 instance following standard practices with pg_upgrade. I have V11 and V13 working side by side on the exact same hardware: the VM is an 8-core (16 threads) 64GB windows server 2012 R2 machine with SSD storage. I have vacuumed both V11 and V13 databases full freeze analyze. The V13 is an exact backup of the V11 database content-wise. The postgres.conf is the same too and hasn’t been touched in years:
"effective_cache_size": "52GB",
"from_collapse_limit": "24",
"jit": "off",
"jit_above_cost": "2e+08",
"jit_inline_above_cost": "5e+08",
"jit_optimize_above_cost": "5e+08",
"join_collapse_limit": "24",
"max_parallel_workers": "20",
"max_parallel_workers_per_gather": "8",
"random_page_cost": "1.1",
"temp_buffers": "4GB",
"work_mem": "384MB"
I have done all my testing with either of the database on while the other was off (shutting down the DB) to make sure there wasn’t any weird interaction. I have read some articles about major changes between 11 and 13 (some of which occurred in 12). In particular, information about the JIT sometimes causing trouble, and the way some CTEs can now be inlined and which can also cause trouble.
- As you can see from the config above, I have disabled the JIT to make this more comparable with 11 and eliminate that possible source of issues.
- I have also tried different versions of the query (MATERIALIZED vs NOT MATERIALIZED) with little impact.
The plans are pretty much identical too. I checked line by line and couldn’t see anything much different (note that I have a view over this query). Here is the V13 version of the plan:
"[
{
"Plan": {
"Node Type": "Subquery Scan",
"Parallel Aware": false,
"Alias": "assessmenticcqapivotview",
"Startup Cost": 1785087.62,
"Total Cost": 1785100.62,
"Plan Rows": 200,
"Plan Width": 1228,
"Output": [
"assessmenticcqapivotview.iccqa_iccassmt_fk",
"assessmenticcqapivotview.\"iccqa_DEBRIDEMENT_DATE\"",
"assessmenticcqapivotview.\"iccqa_DEBRIDEMENT_THIS_VISIT\"",
"assessmenticcqapivotview.\"iccqa_DEBRIDEMENT_TYPE\"",
"assessmenticcqapivotview.\"iccqa_DEPTH_CM\"",
"assessmenticcqapivotview.\"iccqa_DEPTH_DESCRIPTION\"",
"assessmenticcqapivotview.\"iccqa_DOES_PATIENT_HAVE_PAIN_ASSOCIATED_WITH_THIS_WOUND\"",
"assessmenticcqapivotview.\"iccqa_DRAIN_PRESENT\"",
"assessmenticcqapivotview.\"iccqa_DRAIN_TYPE\"",
"assessmenticcqapivotview.\"iccqa_EDGE_SURROUNDING_TISSUE_MACERATION\"",
"assessmenticcqapivotview.\"iccqa_EDGES\"",
"assessmenticcqapivotview.\"iccqa_EPITHELIALIZATION\"",
"assessmenticcqapivotview.\"iccqa_EXUDATE_AMOUNT\"",
"assessmenticcqapivotview.\"iccqa_EXUDATE_TYPE\"",
"assessmenticcqapivotview.\"iccqa_GRANULATION_TISSUE\"",
"assessmenticcqapivotview.\"iccqa_INDICATE_OTHER_TYPE_OF_WOUND_CLOSURE\"",
"assessmenticcqapivotview.\"iccqa_INDICATE_TYPE\"",
"assessmenticcqapivotview.\"iccqa_INDICATE_WOUND_CLOSURE\"",
"assessmenticcqapivotview.\"iccqa_IS_THIS_A_CLOSED_SURGICAL_WOUND_OR_SUSPECTED_DEEP_TISSUE\"",
"assessmenticcqapivotview.\"iccqa_LENGTH_CM\"",
"assessmenticcqapivotview.\"iccqa_MEASUREMENTS_TAKEN\"",
"assessmenticcqapivotview.\"iccqa_NECROTIC_TISSUE_AMOUNT\"",
"assessmenticcqapivotview.\"iccqa_NECROTIC_TISSUE_TYPE\"",
"assessmenticcqapivotview.\"iccqa_ODOR\"",
"assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_DEBRIDEMENT_TYPE\"",
"assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_DRAIN_TYPE\"",
"assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_PAIN_INTERVENTIONS\"",
"assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_PAIN_QUALITY\"",
"assessmenticcqapivotview.\"iccqa_OTHER_COMMENTS_REGARDING_REASON_MEASUREMENTS_NOT_TAKEN\"",
"assessmenticcqapivotview.\"iccqa_PAIN_FREQUENCY\"",
"assessmenticcqapivotview.\"iccqa_PAIN_INTERVENTIONS\"",
"assessmenticcqapivotview.\"iccqa_PAIN_QUALITY\"",
"assessmenticcqapivotview.\"iccqa_PERIPHERAL_TISSUE_EDEMA\"",
"assessmenticcqapivotview.\"iccqa_PERIPHERAL_TISSUE_INDURATION\"",
"assessmenticcqapivotview.\"iccqa_REASON_MEASUREMENTS_NOT_TAKEN\"",
"assessmenticcqapivotview.\"iccqa_RESPONSE_TO_PAIN_INTERVENTIONS\"",
"assessmenticcqapivotview.\"iccqa_SHAPE\"",
"assessmenticcqapivotview.\"iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION\"",
"assessmenticcqapivotview.\"iccqa_SKIN_COLOR_SURROUNDING_WOUND\"",
"assessmenticcqapivotview.\"iccqa_STATE\"",
"assessmenticcqapivotview.\"iccqa_SURFACE_AREA_SQ_CM\"",
"assessmenticcqapivotview.\"iccqa_TOTAL_NECROTIC_TISSUE_ESCHAR\"",
"assessmenticcqapivotview.\"iccqa_TOTAL_NECROTIC_TISSUE_SLOUGH\"",
"assessmenticcqapivotview.\"iccqa_TUNNELING\"",
"assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_12_3_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_3_6_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_6_9_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_TUNNELING_SIZE_CM_LOCATION_9_12_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_UNDERMINING\"",
"assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_12_3_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_3_6_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_6_9_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_UNDERMINING_SIZE_CM_LOCATION_9_12_O_CLOCK\"",
"assessmenticcqapivotview.\"iccqa_WIDTH_CM\"",
"assessmenticcqapivotview.\"iccqa_WOUND_PAIN_LEVEL_WHERE_0_NO_PAIN_AND_10_WORST_POS\""
],
"Plans": [
{
"Node Type": "Aggregate",
"Strategy": "Hashed",
"Partial Mode": "Simple",
"Parent Relationship": "Subquery",
"Parallel Aware": false,
"Startup Cost": 1785087.62,
"Total Cost": 1785098.62,
"Plan Rows": 200,
"Plan Width": 1260,
"Output": [
"t.iccqa_iccassmt_fk",
"NULL::text",
"tilda.todate((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEBRIDEMENT DATE'::text)))::character varying, NULL::date)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEBRIDEMENT THIS VISIT'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEBRIDEMENT TYPE'::text))",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEPTH (CM)'::text)))::character varying, NULL::real)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DEPTH DESCRIPTION'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?'::text))",
"tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DRAIN PRESENT'::text)))::character varying, NULL::integer)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'DRAIN TYPE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EDGE / SURROUNDING TISSUE - MACERATION'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EDGES'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EPITHELIALIZATION'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EXUDATE AMOUNT'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'EXUDATE TYPE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'GRANULATION TISSUE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'INDICATE OTHER TYPE OF WOUND CLOSURE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'INDICATE TYPE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'INDICATE WOUND CLOSURE'::text))",
"tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?'::text)))::character varying, NULL::integer)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'LENGTH (CM)'::text)))::character varying, NULL::real)",
"tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'MEASUREMENTS TAKEN'::text)))::character varying, NULL::integer)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'NECROTIC TISSUE AMOUNT'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'NECROTIC TISSUE TYPE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'ODOR'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING DEBRIDEMENT TYPE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING DRAIN TYPE'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING PAIN INTERVENTIONS'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING PAIN QUALITY'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PAIN FREQUENCY'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PAIN INTERVENTIONS'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PAIN QUALITY'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PERIPHERAL TISSUE EDEMA'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'PERIPHERAL TISSUE INDURATION'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'REASON MEASUREMENTS NOT TAKEN'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'RESPONSE TO PAIN INTERVENTIONS'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SHAPE'::text))",
"tilda.toint((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SIGNS AND SYMPTOMS OF INFECTION'::text)))::character varying, NULL::integer)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SKIN COLOR SURROUNDING WOUND'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'STATE'::text))",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'SURFACE AREA (SQ CM)'::text)))::character varying, NULL::real)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TOTAL NECROTIC TISSUE ESCHAR'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TOTAL NECROTIC TISSUE SLOUGH'::text))",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING'::text))",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK'::text)))::character varying, NULL::real)",
"max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING'::text))",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'WIDTH (CM)'::text)))::character varying, NULL::real)",
"tilda.tofloat((max(t.iccqar_ans_val) FILTER (WHERE ((t.iccqar_ques_code)::text = 'WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\"'::text)))::character varying, NULL::real)"
],
"Group Key": [
"t.iccqa_iccassmt_fk"
],
"Planned Partitions": 0,
"Plans": [
{
"Node Type": "Aggregate",
"Strategy": "Hashed",
"Partial Mode": "Simple",
"Parent Relationship": "InitPlan",
"Subplan Name": "CTE t",
"Parallel Aware": false,
"Startup Cost": 1360804.75,
"Total Cost": 1374830.63,
"Plan Rows": 1402588,
"Plan Width": 56,
"Output": [
"assessmenticcqa_raw.iccqar_iccassmt_fk",
"assessmenticcqa_raw.iccqar_ques_code",
"max((assessmenticcqa_raw.iccqar_ans_val)::text)"
],
"Group Key": [
"assessmenticcqa_raw.iccqar_iccassmt_fk",
"assessmenticcqa_raw.iccqar_ques_code"
],
"Planned Partitions": 0,
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Relation Name": "assessmenticcqa_raw",
"Schema": "public",
"Alias": "assessmenticcqa_raw",
"Startup Cost": 0,
"Total Cost": 1256856.62,
"Plan Rows": 13859750,
"Plan Width": 38,
"Output": [
"assessmenticcqa_raw.iccqar_iccassmt_fk",
"assessmenticcqa_raw.iccqar_ques_code",
"assessmenticcqa_raw.iccqar_ques_type",
"assessmenticcqa_raw.iccqar_ans_val",
"assessmenticcqa_raw.created",
"assessmenticcqa_raw.\"lastUpdated\"",
"assessmenticcqa_raw.deleted"
],
"Filter": "((assessmenticcqa_raw.iccqar_ques_code)::text = ANY ('{\"DEBRIDEMENT DATE\",\"DEBRIDEMENT THIS VISIT\",\"DEBRIDEMENT TYPE\",\"DEPTH (CM)\",\"DEPTH DESCRIPTION\",\"DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?\",\"DRAIN PRESENT\",\"DRAIN TYPE\",\"EDGE / SURROUNDING TISSUE - MACERATION\",EDGES,EPITHELIALIZATION,\"EXUDATE AMOUNT\",\"EXUDATE TYPE\",\"GRANULATION TISSUE\",\"INDICATE OTHER TYPE OF WOUND CLOSURE\",\"INDICATE TYPE\",\"INDICATE WOUND CLOSURE\",\"IS THIS A CLOSED SURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?\",\"LENGTH (CM)\",\"MEASUREMENTS TAKEN\",\"NECROTIC TISSUE AMOUNT\",\"NECROTIC TISSUE TYPE\",ODOR,\"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE\",\"OTHER COMMENTS REGARDING DRAIN TYPE\",\"OTHER COMMENTS REGARDING PAIN INTERVENTIONS\",\"OTHER COMMENTS REGARDING PAIN QUALITY\",\"OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN\",\"PAIN FREQUENCY\",\"PAIN INTERVENTIONS\",\"PAIN QUALITY\",\"PERIPHERAL TISSUE EDEMA\",\"PERIPHERAL TISSUE INDURATION\",\"REASON MEASUREMENTS NOT TAKEN\",\"RESPONSE TO PAIN INTERVENTIONS\",SHAPE,\"SIGNS AND SYMPTOMS OF INFECTION\",\"SKIN COLOR SURROUNDING WOUND\",STATE,\"SURFACE AREA (SQ CM)\",\"TOTAL NECROTIC TISSUE ESCHAR\",\"TOTAL NECROTIC TISSUE SLOUGH\",TUNNELING,\"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK\",\"TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK\",\"TUNNELING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK\",\"TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK\",UNDERMINING,\"UNDERMINING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK\",\"UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK\",\"UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK\",\"UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK\",\"WIDTH (CM)\",\"WOUND PAIN LEVEL, WHERE 0 = \\\"NO PAIN\\\" AND 10 = \\\"WORST POSSIBLE PAIN\\\"\"}'::text[]))"
}
]
},
{
"Node Type": "CTE Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"CTE Name": "t",
"Alias": "t",
"Startup Cost": 0,
"Total Cost": 28051.76,
"Plan Rows": 1402588,
"Plan Width": 552,
"Output": [
"t.iccqa_iccassmt_fk",
"t.iccqar_ques_code",
"t.iccqar_ans_val"
]
}
]
}
]
},
"Settings": {
"version":13.3
"effective_cache_size": "52GB",
"from_collapse_limit": "24",
"jit": "off",
"jit_above_cost": "2e+08",
"jit_inline_above_cost": "5e+08",
"jit_optimize_above_cost": "5e+08",
"join_collapse_limit": "24",
"max_parallel_workers": "20",
"max_parallel_workers_per_gather": "8",
"random_page_cost": "1.1",
"temp_buffers": "4GB",
"work_mem": "384MB"
},
"Planning Time": 0.784
}
]"
I am out of my wits as to what is causing such a massive slowdown and how I could fix it.
Any idea out there?
Thank you!
Laurent Hasson
On Wed, Jul 21, 2021 at 06:50:58PM +0000, ldh@laurent-hasson.com wrote: > The plans are pretty much identical too. I checked line by line and couldn't see anything much different (note that I havea view over this query). Here is the V13 version of the plan: > I am out of my wits as to what is causing such a massive slowdown and how I could fix it. > > Any idea out there? Could you send the "explain (analyze,buffers,settings) for query on the v11 and v13 instances ? Or a link to the execution plan pasted into explain.depesz.com. https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN It might be good to check using a copy of your data that there's no regression between 11.2 and 11.12. -- Justin
-----Original Message----- From: Justin Pryzby <pryzby@telsasoft.com> Sent: Wednesday, July 21, 2021 15:15 To: ldh@laurent-hasson.com Cc: pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Wed, Jul 21, 2021 at 06:50:58PM +0000, ldh@laurent-hasson.com wrote: > The plans are pretty much identical too. I checked line by line and couldn't see anything much different (note that I havea view over this query). Here is the V13 version of the plan: > I am out of my wits as to what is causing such a massive slowdown and how I could fix it. > > Any idea out there? Could you send the "explain (analyze,buffers,settings) for query on the v11 and v13 instances ? Or a link to the execution plan pasted into explain.depesz.com. https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN It might be good to check using a copy of your data that there's no regression between 11.2 and 11.12. -- Justin My apologies... I thought this is what I had attached in my original email from PGADMIN. In any case, I reran from the commandline and here are the two plans. V11.2 explain (analyze,buffers,COSTS,TIMING) ======================================== HashAggregate (cost=1758361.62..1758372.62 rows=200 width=1260) (actual time=80545.907..161176.867 rows=720950 loops=1) Group Key: t.iccqa_iccassmt_fk Buffers: shared hit=8 read=170093 written=23, temp written=82961 CTE t -> HashAggregate (cost=1338668.50..1352428.93 rows=1376043 width=56) (actual time=23669.075..32038.977 rows=13821646loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared read=170084 written=23 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1236517.01 rows=13620198 width=38) (actual time=0.081..10525.487rows=13821646 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 169940 Buffers: shared read=170084 written=23 -> CTE Scan on t (cost=0.00..27520.86 rows=1376043 width=552) (actual time=23669.081..39393.726 rows=13821646 loops=1) Buffers: shared read=170084 written=23, temp written=82961 Planning Time: 6.160 ms Execution Time: 161,942.304 ms V13.3 explain (analyze,buffers,COSTS,TIMING,SETTINGS) ====================================================== HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=81053.437..1699800.741 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 5 Memory Usage: 284737kB Disk Usage: 600000kB Buffers: shared hit=20 read=169851, temp read=185258 written=305014 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=24967.655..47587.401 rows=13852618loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 21 Memory Usage: 393273kB Disk Usage: 683448kB Buffers: shared read=169851, temp read=110477 written=174216 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.104..12406.726rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared read=169851 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', jit = 'off', jit_above_cost = '2e+08', jit_inline_above_cost= '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24', max_parallel_workers = '20',max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB', work_mem = '384MB' Planning: Buffers: shared hit=203 read=20 Planning Time: 52.820 ms Execution Time: 1,700,228.424 ms As you can see, the V13.3 execution is about 10x slower. It may be hard for me to create a whole copy of the database on 11.12 and check that environment by itself. I'd want to doit on the same machine to control variables, and I don't have much extra disk space at the moment. Thank you, Laurent.
From: Michael Lewis <mlewis@entrata.com> Sent: Wednesday, July 21, 2021 18:12 To: ldh@laurent-hasson.com Cc: pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 I'm not seeing the valueof the CTE. Why not access assessmenticcqa_raw directly in the main query and only do GROUP BY once?Do you have many values in iccqar_ques_code which are not used in this query? Yes, there are close to 600 different values, and we are picking up only a small amount. And by the way, this is a classiccase where the query could be folder as a sub-select in a join clause, and I tried this as well, with the same results. Thank you, Laurent.
On Wed, Jul 21, 2021 at 4:19 PM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > As you can see, the V13.3 execution is about 10x slower. > > It may be hard for me to create a whole copy of the database on 11.12 and check that environment by itself. I'd want todo it on the same machine to control variables, and I don't have much extra disk space at the moment. I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13. You might try increasing hash_mem_multiplier from its default of 1.0, to 2.0 or even 4.0. That way you'd be able to use 2x or 4x more memory for executor nodes that are based on hashing (hash join and hash aggregate), without also affecting other kinds of nodes, which are typically much less sensitive to memory availability. This is very similar to increasing work_mem, except that it is better targeted. It might even make sense to *decrease* work_mem and increase hash_mem_multiplier even further than 4.0. That approach is more aggressive, though, so I wouldn't use it until it actually proved necessary. -- Peter Geoghegan
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > My apologies... I thought this is what I had attached in my original email from PGADMIN. In any case, I reran from thecommand line and here are the two plans. So the pain seems to be coming in with the upper hash aggregation, which is spilling to disk because work_mem of '384MB' is nowhere near enough. The v11 explain doesn't show any batching there, which makes me suspect that it was using a larger value of work_mem. (There could also be some edge effect that is making v13 use a bit more memory for the same number of tuples, which could lead it to spill when v11 had managed to scrape by without doing so.) So the first thing I'd try is seeing if setting work_mem to 1GB or so improves matters. The other thing that's notable is that v13 has collapsed out the CTE that used to sit between the two levels of hashagg. Now I don't know of any reason that that wouldn't be a strict improvement, but if the work_mem theory doesn't pan out then that's something that'd deserve a closer look. Does marking the WITH as WITH MATERIALIZED change anything about v13's performance? regards, tom lane
-----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: Wednesday, July 21, 2021 19:34 To: ldh@laurent-hasson.com Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Wed, Jul 21, 2021 at 4:19 PM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > As you can see, the V13.3 execution is about 10x slower. > > It may be hard for me to create a whole copy of the database on 11.12 and check that environment by itself. I'd want todo it on the same machine to control variables, and I don't have much extra disk space at the moment. I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13. You might try increasing hash_mem_multiplier from its default of 1.0, to 2.0 or even 4.0. That way you'd be able to use 2xor 4x more memory for executor nodes that are based on hashing (hash join and hash aggregate), without also affecting otherkinds of nodes, which are typically much less sensitive to memory availability. This is very similar to increasing work_mem,except that it is better targeted. It might even make sense to *decrease* work_mem and increase hash_mem_multiplier even further than 4.0. That approach ismore aggressive, though, so I wouldn't use it until it actually proved necessary. -- Peter Geoghegan So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on the exactsame VM etc... Why is that behavior so different? As soon as I can, I'll check if perhaps the hash_mem_multiplier is somehow set differently between the two setups? That wouldbe my first guess, but absent that, looks like a very different behavior across those 2 versions? Thank you, Laurent.
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > From: Peter Geoghegan <pg@bowt.ie> >> I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13. > So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on theexact same VM etc... Why is that behavior so different? What Peter's pointing out is that v11 never spilled hashagg hash tables to disk period, no matter how big they got (possibly leading to out-of-memory situations or swapping, but evidently you have enough RAM to have avoided that sort of trouble). I'd momentarily forgotten that, but I think he's dead on about that explaining the difference. As he says, messing with hash_mem_multiplier would be a more targeted fix than increasing work_mem across the board. regards, tom lane
-----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Wednesday, July 21, 2021 19:36 To: ldh@laurent-hasson.com Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > My apologies... I thought this is what I had attached in my original email from PGADMIN. In any case, I reran from thecommand line and here are the two plans. So the pain seems to be coming in with the upper hash aggregation, which is spilling to disk because work_mem of '384MB'is nowhere near enough. The v11 explain doesn't show any batching there, which makes me suspect that it was using a larger value of work_mem. (Therecould also be some edge effect that is making v13 use a bit more memory for the same number of tuples, which couldlead it to spill when v11 had managed to scrape by without doing so.) So the first thing I'd try is seeing if setting work_mem to 1GB or so improves matters. The other thing that's notable is that v13 has collapsed out the CTE that used to sit between the two levels of hashagg. Now I don't know of any reason that that wouldn't be a strict improvement, but if the work_mem theory doesn't panout then that's something that'd deserve a closer look. Does marking the WITH as WITH MATERIALIZED change anything aboutv13's performance? regards, tom lane Hello Tom (and Peter)! Thanks for all this info. I created 3 versions of this query: CTE MATERIALIZED, CTE NOT MATERIALIZED, and no CTE (select directly in a sub join). Onlyvery minor change in the final execution time (seconds). I'll try the following later this evening: - set work_mem to 1GB - play with hash_mem_multiplier as per Peter's suggestions although he did suggest to try being more aggressive with it andlower work_mem... so I'll play with those 2 variables. Thank you, Laurent.
-----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Wednesday, July 21, 2021 19:43 To: ldh@laurent-hasson.com Cc: Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > From: Peter Geoghegan <pg@bowt.ie> >> I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13. > So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on theexact same VM etc... Why is that behavior so different? What Peter's pointing out is that v11 never spilled hashagg hash tables to disk period, no matter how big they got (possibly leading to out-of-memory situations or swapping, but evidently you have enough RAM to have avoided that sort of trouble). I'd momentarily forgotten that, but I think he's dead on about that explaining the difference. As he says, messing with hash_mem_multiplier would be a more targeted fix than increasing work_mem across the board. regards, tom lane OK, got it! That sounds and smells good. Will try later tonight or tomorrow and report back. Thank you! Laurent.
-----Original Message----- From: ldh@laurent-hasson.com <ldh@laurent-hasson.com> Sent: Wednesday, July 21, 2021 19:46 To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: RE: Big performance slowdown from 11.2 to 13.3 -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Wednesday, July 21, 2021 19:43 To: ldh@laurent-hasson.com Cc: Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > From: Peter Geoghegan <pg@bowt.ie> >> I imagine that this has something to do with the fact that the hash aggregate spills to disk in Postgres 13. > So how is this happening? I mean, it's the exact same query, looks like the same plan to me, it's the same data on theexact same VM etc... Why is that behavior so different? What Peter's pointing out is that v11 never spilled hashagg hash tables to disk period, no matter how big they got (possibly leading to out-of-memory situations or swapping, but evidently you have enough RAM to have avoided that sort of trouble). I'd momentarily forgotten that, but I think he's dead on about that explaining the difference. As he says, messing with hash_mem_multiplier would be a more targeted fix than increasing work_mem across the board. regards, tom lane OK, got it! That sounds and smells good. Will try later tonight or tomorrow and report back. Thank you! Laurent. Hello all, Seems like no cigar ☹ See plan pasted below. I changed the conf as follows: - hash_mem_multiplier = '2' - work_mem = '1GB' I tried a few other configuration, i.e., 512MB/4, 256MB/8 with similar results. Also, you mentioned previously that the hash was spilling to disk? How are you seeing this in the plans? What should I belooking for on my end when playing around with parameters to see the intended effect? Thank you, Laurent. HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=70844.078..1554843.323 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=14 read=169854, temp read=15777 written=27588 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=23370.026..33839.347 rows=13852618loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB Buffers: shared read=169851, temp read=15777 written=27588 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.072..10906.894rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared read=169851 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '2', jit = 'off', jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB',work_mem = '1GB' Planning: Buffers: shared hit=186 read=37 Planning Time: 3.667 ms Execution Time: 1555300.746 ms
On Thu, 22 Jul 2021 at 16:37, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > Seems like no cigar ☹ See plan pasted below. I changed the conf as follows: > - hash_mem_multiplier = '2' > - work_mem = '1GB' > Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there. As mentioned, v11 didn't spill to disk and just used all the memory it pleased. That was a bit dangerous as it could result in OOM, so it was fixed. David
OK. Will do another round of testing. -----Original Message----- From: David Rowley <dgrowleyml@gmail.com> Sent: Thursday, July 22, 2021 00:44 To: ldh@laurent-hasson.com Cc: Tom Lane <tgl@sss.pgh.pa.us>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, 22 Jul 2021 at 16:37, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > Seems like no cigar ☹ See plan pasted below. I changed the conf as follows: > - hash_mem_multiplier = '2' > - work_mem = '1GB' > Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there. As mentioned, v11 didn'tspill to disk and just used all the memory it pleased. That was a bit dangerous as it could result in OOM, so it wasfixed. David
-----Original Message----- From: ldh@laurent-hasson.com <ldh@laurent-hasson.com> Sent: Thursday, July 22, 2021 09:37 To: David Rowley <dgrowleyml@gmail.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: RE: Big performance slowdown from 11.2 to 13.3 OK. Will do another round of testing. -----Original Message----- From: David Rowley <dgrowleyml@gmail.com> Sent: Thursday, July 22, 2021 00:44 To: ldh@laurent-hasson.com Cc: Tom Lane <tgl@sss.pgh.pa.us>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, 22 Jul 2021 at 16:37, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > Seems like no cigar ☹ See plan pasted below. I changed the conf as follows: > - hash_mem_multiplier = '2' > - work_mem = '1GB' > Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB You might want to keep going higher with hash_mem_multiplier until you see no "Disk Usage" there. As mentioned, v11 didn'tspill to disk and just used all the memory it pleased. That was a bit dangerous as it could result in OOM, so it wasfixed. David ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hello all, So, I went possibly nuclear, and still no cigar. Something's not right. - hash_mem_multiplier = '10' - work_mem = '1GB' The results are Batches: 5 Memory Usage: 2,449,457kB Disk Usage: 105,936kB Execution Time: 1,837,126.766 ms It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration. More importantly - I am not understanding how spilling to disk 100MB (which seems low to me and should be fast on our SSD), causes the queryto slow down by a factor of 10. - It seems at the very least that memory consumption on 11 was more moderate? This process of ours was running severalof these types of queries concurrently and I don't think I ever saw the machine go over 40GB in memory usage. HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=84860.629..1836583.909 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=46 read=169822, temp read=13144 written=23035 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=27890.422..39975.074 rows=13852618loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2449457kB Disk Usage: 105936kB Buffers: shared hit=32 read=169819, temp read=13144 written=23035 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.053..13623.310rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared hit=32 read=169819 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '10', jit = 'off', jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB',work_mem = '1GB' Planning: Buffers: shared hit=3 Planning Time: 1.038 ms Execution Time: 1837126.766 ms Thank you,
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > So, I went possibly nuclear, and still no cigar. Something's not right. > - hash_mem_multiplier = '10' > - work_mem = '1GB' > The results are > Batches: 5 Memory Usage: 2,449,457kB Disk Usage: 105,936kB > Execution Time: 1,837,126.766 ms > It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration. That is ... weird. Maybe you have found a bug in the spill-to-disk logic; it's quite new after all. Can you extract a self-contained test case that behaves this way? regards, tom lane
I wrote: > "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: >> It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration. > That is ... weird. Oh: see get_hash_mem: hash_mem = (double) work_mem * hash_mem_multiplier; /* * guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to * support the assumption that raw derived byte values can be stored in * 'long' variables. The returned hash_mem value must also meet this * assumption. * * We clamp the final value rather than throw an error because it should * be possible to set work_mem and hash_mem_multiplier independently. */ if (hash_mem < MAX_KILOBYTES) return (int) hash_mem; return MAX_KILOBYTES; So basically, we now have a hard restriction that hashaggs can't use more than INT_MAX kilobytes, or approximately 2.5GB, and this use case is getting eaten alive by that restriction. Seems like we need to do something about that. regards, tom lane
On Fri, 23 Jul 2021 at 03:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So basically, we now have a hard restriction that hashaggs can't use > more than INT_MAX kilobytes, or approximately 2.5GB, and this use case > is getting eaten alive by that restriction. Seems like we need to > do something about that. Hmm, math check? postgres=# select pg_size_pretty(power(2,31)::numeric*1024); pg_size_pretty ---------------- 2048 GB (1 row) David
On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > That is ... weird. Maybe you have found a bug in the spill-to-disk logic; > it's quite new after all. Can you extract a self-contained test case that > behaves this way? I wonder if this has something to do with the way that the input data is clustered. I recall noticing that that could significantly alter the behavior of HashAggs as of Postgres 13. -- Peter Geoghegan
-----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Thursday, July 22, 2021 11:57 To: ldh@laurent-hasson.com Cc: David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 I wrote: > "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: >> It's still spilling to disk and seems to cap at 2.5GB of memory usage in spite of configuration. > That is ... weird. Oh: see get_hash_mem: hash_mem = (double) work_mem * hash_mem_multiplier; /* * guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to * support the assumption that raw derived byte values can be stored in * 'long' variables. The returned hash_mem value must also meet this * assumption. * * We clamp the final value rather than throw an error because it should * be possible to set work_mem and hash_mem_multiplier independently. */ if (hash_mem < MAX_KILOBYTES) return (int) hash_mem; return MAX_KILOBYTES; So basically, we now have a hard restriction that hashaggs can't use more than INT_MAX kilobytes, or approximately 2.5GB,and this use case is getting eaten alive by that restriction. Seems like we need to do something about that. regards, tom lane ------------------------------------------------------------------------------------------------------------------------------------------------------ Hello! Ah... int vs long then? Tried even more (multiplier=16) and this seems to be definitely the case. Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers V11is consuming more or less? Is it also expected that a spill over of just 100MB (on top of 2.4GB memory consumption) would cause the query to collapselike this? I am still not visualizing in my head how that would happen. 100MB just seems so small, and our SSD isfast. Generating a dataset would take me a lot of time. This is a clinical database so I cannot reuse the current table. I wouldhave to entirely mock the use case and create a dummy dataset from scratch. HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=94618.303..1795311.542 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=14 read=169854, temp read=15777 written=27588 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=30753.022..45384.558 rows=13852618loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB Buffers: shared read=169851, temp read=15777 written=27588 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.110..14342.258rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared read=169851 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '16', jit = 'off', jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB',work_mem = '1GB' Planning: Buffers: shared hit=186 read=37 Planning Time: 55.709 ms Execution Time: 1795921.717 ms Thank you, Laurent.
On Fri, 23 Jul 2021 at 04:14, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > That is ... weird. Maybe you have found a bug in the spill-to-disk logic; > > it's quite new after all. Can you extract a self-contained test case that > > behaves this way? > > I wonder if this has something to do with the way that the input data > is clustered. I recall noticing that that could significantly alter > the behavior of HashAggs as of Postgres 13. Isn't it more likely to be reaching the group limit rather than the memory limit? if (input_groups * hashentrysize < hash_mem * 1024L) { if (num_partitions != NULL) *num_partitions = 0; *mem_limit = hash_mem * 1024L; *ngroups_limit = *mem_limit / hashentrysize; return; } There are 55 aggregates on a varchar(255). I think hashentrysize is pretty big. If it was 255*55 then only 765591 groups fit in the 10GB of memory. David
-----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: Thursday, July 22, 2021 12:14 To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Jeff Davis <pgsql@j-davis.com>; ldh@laurent-hasson.com; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>;pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > That is ... weird. Maybe you have found a bug in the spill-to-disk > logic; it's quite new after all. Can you extract a self-contained > test case that behaves this way? I wonder if this has something to do with the way that the input data is clustered. I recall noticing that that could significantlyalter the behavior of HashAggs as of Postgres 13. -- Peter Geoghegan I could execute that test and re-cluster against the index. But I believe that's already done? Let me check. Thank you, Laurent.
David Rowley <dgrowleyml@gmail.com> writes: > On Fri, 23 Jul 2021 at 03:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So basically, we now have a hard restriction that hashaggs can't use >> more than INT_MAX kilobytes, or approximately 2.5GB, and this use case >> is getting eaten alive by that restriction. Seems like we need to >> do something about that. > Hmm, math check? Yeah, I should have said "2GB plus palloc slop". It doesn't surprise me a bit that we seem to be eating another 20% on top of the nominal limit. I think the right fix here is to remove the cap, which will require changing get_hash_mem to return double, and then maybe some cascading changes --- I've not looked at its callers. regards, tom lane
On Thu, Jul 22, 2021 at 04:16:34PM +0000, ldh@laurent-hasson.com wrote: > Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers V11is consuming more or less? It might be helpful to know how much RAM v11 is using. Could you run the query with log_executor_stats=on; client_min_messages=debug; The interesting part is this: ! 7808 kB max resident size -- Justin
-----Original Message----- From: David Rowley <dgrowleyml@gmail.com> Sent: Thursday, July 22, 2021 12:18 To: Peter Geoghegan <pg@bowt.ie> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Jeff Davis <pgsql@j-davis.com>; ldh@laurent-hasson.com; Justin Pryzby <pryzby@telsasoft.com>;pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Fri, 23 Jul 2021 at 04:14, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jul 22, 2021 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > That is ... weird. Maybe you have found a bug in the spill-to-disk > > logic; it's quite new after all. Can you extract a self-contained > > test case that behaves this way? > > I wonder if this has something to do with the way that the input data > is clustered. I recall noticing that that could significantly alter > the behavior of HashAggs as of Postgres 13. Isn't it more likely to be reaching the group limit rather than the memory limit? if (input_groups * hashentrysize < hash_mem * 1024L) { if (num_partitions != NULL) *num_partitions = 0; *mem_limit = hash_mem* 1024L; *ngroups_limit = *mem_limit / hashentrysize; return; } There are 55 aggregates on a varchar(255). I think hashentrysize is pretty big. If it was 255*55 then only 765591 groupsfit in the 10GB of memory. David ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hello, So, FYI.... The query I shared is actually a simpler use case of ours 😊 We do have a similar pivot query over 600 columnsto create a large flat tale for analysis on an even larger table. Takes about 15mn to run on V11 with strong CPU usageand no particular memory usage spike that I can detect via TaskManager. We have been pushing PG hard and simplify theworkflows of our analysts and data scientists downstream. Thank you, Laurent.
I wrote: > I think the right fix here is to remove the cap, which will require > changing get_hash_mem to return double, and then maybe some cascading > changes --- I've not looked at its callers. Or, actually, returning size_t would likely make the most sense. We'd fold the 1024L multiplier in here too instead of doing that at the callers. regards, tom lane
-----Original Message----- From: Justin Pryzby <pryzby@telsasoft.com> Sent: Thursday, July 22, 2021 12:23 To: ldh@laurent-hasson.com Cc: Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 04:16:34PM +0000, ldh@laurent-hasson.com wrote: > Is it fair then to deduce that the total memory usage would be 2,400,305kB + 126,560kB? Is this what under the covers V11is consuming more or less? It might be helpful to know how much RAM v11 is using. Could you run the query with log_executor_stats=on; client_min_messages=debug; The interesting part is this: ! 7808 kB max resident size -- Justin ------------------------------------------- Hello Justin, > log_executor_stats=on; client_min_messages=debug; Would the results then come in EXPLAIN or would I need to pick something up from the logs? Thank you, Laurent.
On Thu, Jul 22, 2021 at 04:30:00PM +0000, ldh@laurent-hasson.com wrote: > Hello Justin, > > > log_executor_stats=on; client_min_messages=debug; > > Would the results then come in EXPLAIN or would I need to pick something up from the logs? If you're running with psql, and client_min_messages=debug, then it'll show up as a debug message to the client: ts=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.000011 s user, 0.000209 s system, 0.000219 s elapsed ! [0.040608 s user, 0.020304 s system total] ! 7808 kB max resident size ... It can but doesn't have to use "explain" - that just avoids showing the query output, since it's not what's interesting here. -- Justin
On Thu, Jul 22, 2021 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, I should have said "2GB plus palloc slop". It doesn't surprise > me a bit that we seem to be eating another 20% on top of the nominal > limit. MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been for many years. The function get_hash_mem() returns a work_mem-style int that callers refer to as hash_mem -- the convention is that callers pretend that there is a work_mem style GUC (called hash_mem) that they must access by calling get_hash_mem(). I don't see how it's possible for get_hash_mem() to be unable to return a hash_mem value that could be represented by work_mem directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where sizeof(long) is 4. But that's nothing new. -- Peter Geoghegan
On Thu, Jul 22, 2021 at 09:36:02AM -0700, Peter Geoghegan wrote: > I don't see how it's possible for get_hash_mem() to be unable to > return a hash_mem value that could be represented by work_mem > directly. MAX_KILOBYTES is an annoyingly low limit on Windows, where > sizeof(long) is 4. But that's nothing new. Oh. So the problem seems to be that: 1) In v12, HashAgg now obeyes work_mem*hash_mem_multiplier; 2) Under windows, work_mem is limited to 2GB. -- Justin
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Jul 22, 2021 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I should have said "2GB plus palloc slop". It doesn't surprise >> me a bit that we seem to be eating another 20% on top of the nominal >> limit. > MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been > for many years. Right. The point here is that before v13, hash aggregation was not subject to the work_mem limit, nor any related limit. If you did an aggregation requiring more than 2GB-plus-slop, it would work just fine as long as your machine had enough RAM. Now, the performance sucks and there is no knob you can turn to fix it. That's unacceptable in my book. regards, tom lane
On Thu, Jul 22, 2021 at 9:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right. The point here is that before v13, hash aggregation was not > subject to the work_mem limit, nor any related limit. If you did an > aggregation requiring more than 2GB-plus-slop, it would work just fine > as long as your machine had enough RAM. Now, the performance sucks and > there is no knob you can turn to fix it. That's unacceptable in my book. Oh! That makes way more sense. I suspect David's theory about hash_agg_set_limits()'s ngroup limit is correct. It certainly seems like a good starting point. -- Peter Geoghegan
Justin Pryzby <pryzby@telsasoft.com> writes: > Oh. So the problem seems to be that: > 1) In v12, HashAgg now obeyes work_mem*hash_mem_multiplier; > 2) Under windows, work_mem is limited to 2GB. And more to the point, work_mem*hash_mem_multiplier is *also* limited to 2GB. We didn't think that through very carefully. The point of the hash_mem_multiplier feature was to allow hash aggregation to still consume more than the work_mem limit, but we failed to free it from this 2GB limit. You're right though that this is Windows-only; on machines with 64-bit "long" there's less of a problem. regards, tom lane
On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote: > I suspect David's theory about hash_agg_set_limits()'s ngroup limit is > correct. It certainly seems like a good starting point. I also suspect that if Laurent set work_mem and/or hash_mem_multiplier *extremely* aggressively, then eventually the hash agg would be in-memory. And without actually using all that much memory. I'm not suggesting that that is a sensible resolution to Laurent's complaint. I'm just pointing out that it's probably not fundamentally impossible to make the hash agg avoid spilling through tuning these GUCs. At least I see no evidence of that right now. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > I also suspect that if Laurent set work_mem and/or hash_mem_multiplier > *extremely* aggressively, then eventually the hash agg would be > in-memory. And without actually using all that much memory. No, he already tried, upthread. The trouble is that he's on a Windows machine, so get_hash_mem is quasi-artificially constraining the product to 2GB. And he needs it to be a bit more than that. Whether the constraint is hitting at the ngroups stage or it's related to actual memory consumption isn't that relevant. What I'm wondering about is whether it's worth putting in a solution for this issue in isolation, or whether we ought to embark on the long-ignored project of getting rid of use of "long" for any memory-size-related computations. There would be no chance of back-patching something like the latter into v13, though. regards, tom lane
I did try 2000MB work_mem and 16 multiplier 😊 It seems to plateau at 2GB no matter what. This is what the explain had: HashAggregate (cost=1774568.21..1774579.21 rows=200 width=1260) (actual time=94618.303..1795311.542 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=14 read=169854, temp read=15777 written=27588 -> HashAggregate (cost=1360804.75..1374830.63 rows=1402588 width=56) (actual time=30753.022..45384.558 rows=13852618loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2400305kB Disk Usage: 126560kB Buffers: shared read=169851, temp read=15777 written=27588 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256856.62 rows=13859750 width=38) (actual time=0.110..14342.258rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared read=169851 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '16', jit = 'off', jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB',work_mem = ' 2000MB' Planning: Buffers: shared hit=186 read=37 Planning Time: 55.709 ms Execution Time: 1795921.717 ms -----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: Thursday, July 22, 2021 13:05 To: Tom Lane <tgl@sss.pgh.pa.us> Cc: David Rowley <dgrowleyml@gmail.com>; ldh@laurent-hasson.com; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 9:53 AM Peter Geoghegan <pg@bowt.ie> wrote: > I suspect David's theory about hash_agg_set_limits()'s ngroup limit is > correct. It certainly seems like a good starting point. I also suspect that if Laurent set work_mem and/or hash_mem_multiplier *extremely* aggressively, then eventually the hash agg would be in-memory. And without actually using all that much memory. I'm not suggesting that that is a sensible resolution to Laurent's complaint. I'm just pointing out that it's probably notfundamentally impossible to make the hash agg avoid spilling through tuning these GUCs. At least I see no evidence ofthat right now. -- Peter Geoghegan
-----Original Message----- From: Justin Pryzby <pryzby@telsasoft.com> Sent: Thursday, July 22, 2021 12:36 To: ldh@laurent-hasson.com Cc: Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 04:30:00PM +0000, ldh@laurent-hasson.com wrote: > Hello Justin, > > > log_executor_stats=on; client_min_messages=debug; > > Would the results then come in EXPLAIN or would I need to pick something up from the logs? If you're running with psql, and client_min_messages=debug, then it'll show up as a debug message to the client: ts=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.000011 s user, 0.000209 s system, 0.000219 s elapsed ! [0.040608 s user, 0.020304 s system total] ! 7808 kB max resident size ... It can but doesn't have to use "explain" - that just avoids showing the query output, since it's not what's interesting here. -- Justin ------------------------------------------------------------------------------------- Justin, I tried this but not seeing max resident size data output. Pepper=# SET log_executor_stats=on; SET client_min_messages=debug; explain analyze SELECT 1; SET SET LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 0.000000 s user, 0.000000 s system, 0.000109 s elapsed ! [494.640625 s user, 19.171875 s system total] QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 0.041 ms Execution Time: 0.012 ms (3 rows) For my query: LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 169.625000 s user, 5.843750 s system, 175.490088 s elapsed ! [494.640625 s user, 19.171875 s system total] HashAggregate (cost=1764285.18..1764296.18 rows=200 width=1260) (actual time=86323.813..174737.442 rows=723659 loops=1) Group Key: t.iccqa_iccassmt_fk Buffers: shared hit=364 read=170293, temp written=83229 CTE t -> HashAggregate (cost=1343178.39..1356985.17 rows=1380678 width=56) (actual time=22594.053..32519.573 rows=13865785loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared hit=364 read=170293 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1240682.76 rows=13666084 width=38) (actual time=0.170..10714.598rows=13865785 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOE S PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATEAMOUNT", "EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THISA CLOSED SURGICAL WOUND OR SUSPECTE D DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTSREGARDING DEBRIDEMENT TYPE","OTHE R COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHERCOMMENTS REGARDING REASON MEASUREM ENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","R ESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA(SQ CM)","TOTAL NECROTIC TISSUE ESCHAR ","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION -3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LO CATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12- 3 O''CLOCK","UNDERMINING SIZE(CM)/LOC ATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH(CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 172390 Buffers: shared hit=364 read=170293 -> CTE Scan on t (cost=0.00..27613.56 rows=1380678 width=552) (actual time=22594.062..40248.874 rows=13865785 loops=1) Buffers: shared hit=364 read=170293, temp written=83229 Planning Time: 0.728 ms Execution Time: 175482.904 ms (15 rows)
-----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Thursday, July 22, 2021 12:42 To: Peter Geoghegan <pg@bowt.ie> Cc: David Rowley <dgrowleyml@gmail.com>; ldh@laurent-hasson.com; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Jul 22, 2021 at 9:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I should have said "2GB plus palloc slop". It doesn't surprise >> me a bit that we seem to be eating another 20% on top of the nominal >> limit. > MAX_KILOBYTES is the max_val for the work_mem GUC itself, and has been > for many years. Right. The point here is that before v13, hash aggregation was not subject to the work_mem limit, nor any related limit. If you did an aggregation requiring more than 2GB-plus-slop, it would work just fine as long as your machine had enoughRAM. Now, the performance sucks and there is no knob you can turn to fix it. That's unacceptable in my book. regards, tom lane ------------------------------------------------------- Hello all, As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused issueselsewhere though. Are we doing things that are such out of the normal? Are we early in pushing V13 to full production?😊 Doing analytics with pivoted tables with hundreds of columns is not uncommon in our world. As for the three other requests from the team: Clustering: ========================== I re-clustered the table on the index that drives the pivot logic but I didn't see any change: cluster verbose assessmenticcqa_raw using assessmenticcqa_raw_idx_iccqar_assmt_ques; HashAggregate (cost=1774465.36..1774476.36 rows=200 width=1260) (actual time=80848.591..1763443.586 rows=722853 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1277985kB Buffers: shared hit=369 read=169577, temp read=15780 written=27584 -> HashAggregate (cost=1360748.50..1374772.80 rows=1402430 width=56) (actual time=25475.554..38256.923 rows=13852618loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 5 Memory Usage: 2400305kB Disk Usage: 126552kB Buffers: shared hit=352 read=169577, temp read=15780 written=27584 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1256812.09 rows=13858188 width=38) (actual time=0.085..11914.135rows=13852618 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE/ SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATION TISSUE","INDICATEOTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSED SURGICAL WOUND ORSUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHERCOMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAININTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASON MEASUREMENTS NOT TAKEN","PAINFREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKINCOLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTIC TISSUESLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 171680 Buffers: shared hit=352 read=169577 Settings: effective_cache_size = '52GB', from_collapse_limit = '24', hash_mem_multiplier = '4', jit = 'off', jit_above_cost= '2e+08', jit_inline_above_cost = '5e+08', jit_optimize_above_cost = '5e+08', join_collapse_limit = '24',max_parallel_workers = '20', max_parallel_workers_per_gather = '8', random_page_cost = '1.1', temp_buffers = '4GB',work_mem = '1GB' Planning: Buffers: shared hit=100 read=1 Planning Time: 1.663 ms Execution Time: 1763967.567 ms More debugging on V11: ========================== LOG: EXECUTOR STATISTICS DETAIL: ! system usage stats: ! 169.625000 s user, 5.843750 s system, 175.490088 s elapsed ! [494.640625 s user, 19.171875 s system total] HashAggregate (cost=1764285.18..1764296.18 rows=200 width=1260) (actual time=86323.813..174737.442 rows=723659 loops=1) Group Key: t.iccqa_iccassmt_fk Buffers: shared hit=364 read=170293, temp written=83229 CTE t -> HashAggregate (cost=1343178.39..1356985.17 rows=1380678 width=56) (actual time=22594.053..32519.573 rows=13865785loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared hit=364 read=170293 -> Seq Scan on assessmenticcqa_raw (cost=0.00..1240682.76 rows=13666084 width=38) (actual time=0.170..10714.598rows=13865785 loops=1) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENT TYPE","DEPTH(CM)","DEPTH DESCRIPTION","DOE S PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAIN TYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATEAMOUNT", "EXUDATE TYPE","GRANULATION TISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THISA CLOSED SURGICAL WOUND OR SUSPECTE D DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTIC TISSUE TYPE",ODOR,"OTHER COMMENTSREGARDING DEBRIDEMENT TYPE","OTHE R COMMENTS REGARDING DRAIN TYPE","OTHER COMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHERCOMMENTS REGARDING REASON MEASUREM ENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERAL TISSUE INDURATION","REASONMEASUREMENTS NOT TAKEN","R ESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMS OF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA(SQ CM)","TOTAL NECROTIC TISSUE ESCHAR ","TOTAL NECROTIC TISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION -3 - 6 O''CLOCK","TUNNELING SIZE(CM)/LO CATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMINING SIZE(CM)/LOCATION - 12- 3 O''CLOCK","UNDERMINING SIZE(CM)/LOC ATION - 3 - 6 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH(CM)","WOUND PAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 172390 Buffers: shared hit=364 read=170293 -> CTE Scan on t (cost=0.00..27613.56 rows=1380678 width=552) (actual time=22594.062..40248.874 rows=13865785 loops=1) Buffers: shared hit=364 read=170293, temp written=83229 Planning Time: 0.728 ms Execution Time: 175482.904 ms (15 rows) Patch on V13? ========================== Maybe there can be a patch on V13 and then a longer-term effort afterwards? As it is, I have no way to deploy V13 as thisis a hard regression for us. Thank you, Laurent.
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > No, he already tried, upthread. The trouble is that he's on a Windows > machine, so get_hash_mem is quasi-artificially constraining the product > to 2GB. And he needs it to be a bit more than that. Whether the > constraint is hitting at the ngroups stage or it's related to actual > memory consumption isn't that relevant. Somehow I missed that part. > What I'm wondering about is whether it's worth putting in a solution > for this issue in isolation, or whether we ought to embark on the > long-ignored project of getting rid of use of "long" for any > memory-size-related computations. There would be no chance of > back-patching something like the latter into v13, though. +1. Even if we assume that Windows is a low priority platform, in the long run it'll be easier to make it more like every other platform. The use of "long" is inherently suspect to me. It signals that the programmer wants something wider than "int", even if the standard doesn't actually require that "long" be wider. This seems to contradict what we know to be true for Postgres, which is that in general it's unsafe to assume that long is int64. It's not just work_mem related calculations. There is also code like logtape.c, which uses long for block numbers -- that also exposes us to risk on Windows. By requiring int64 be used instead of long, we don't actually increase risk for non-Windows platforms to any significant degree. I'm pretty sure that "long" means int64 on non-Windows 64-bit platforms anyway. -- Peter Geoghegan
On Thu, Jul 22, 2021 at 05:26:26PM +0000, ldh@laurent-hasson.com wrote: > I tried this but not seeing max resident size data output. Oh. Apparently, that's not supported under windows.. #if defined(HAVE_GETRUSAGE) appendStringInfo(&str, "!\t%ld kB max resident size\n",
-----Original Message----- From: Justin Pryzby <pryzby@telsasoft.com> Sent: Thursday, July 22, 2021 13:29 To: ldh@laurent-hasson.com Cc: Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; Peter Geoghegan <pg@bowt.ie>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 05:26:26PM +0000, ldh@laurent-hasson.com wrote: > I tried this but not seeing max resident size data output. Oh. Apparently, that's not supported under windows.. #if defined(HAVE_GETRUSAGE) appendStringInfo(&str, "!\t%ld kB max resident size\n", ---------------------------- Hello, Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent everywhere... Thank you, Laurent.
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Jul 22, 2021 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What I'm wondering about is whether it's worth putting in a solution >> for this issue in isolation, or whether we ought to embark on the >> long-ignored project of getting rid of use of "long" for any >> memory-size-related computations. There would be no chance of >> back-patching something like the latter into v13, though. > By requiring int64 be used instead of long, we don't actually increase > risk for non-Windows platforms to any significant degree. I'm pretty > sure that "long" means int64 on non-Windows 64-bit platforms anyway. Well, what we really ought to be using is size_t (a/k/a Size), at least for memory-space-related calculations. I don't have an opinion right now about what logtape.c ought to use. I do agree that avoiding "long" altogether would be a good ultimate goal. In the short term though, the question is whether we want to regard this hashagg issue as something we need a fix for in v13/v14. The fact that it's Windows-only makes it slightly less pressing in my mind, but it's still a regression that some people are going to hit. regards, tom lane
On Thu, Jul 22, 2021 at 10:33 AM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent everywhere... To be clear, I didn't actually say that. I said that it doesn't matter either way, as far as addressing this long standing "int64 vs long" issue goes. -- Peter Geoghegan
-----Original Message----- From: Peter Geoghegan <pg@bowt.ie> Sent: Thursday, July 22, 2021 13:36 To: ldh@laurent-hasson.com Cc: Justin Pryzby <pryzby@telsasoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; David Rowley <dgrowleyml@gmail.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 On Thu, Jul 22, 2021 at 10:33 AM ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote: > Damn... I know Windows is a lower priority, and this is yet another issue, but in Healthcare, Windows is so prevalent everywhere... To be clear, I didn't actually say that. I said that it doesn't matter either way, as far as addressing this long standing"int64 vs long" issue goes. -- Peter Geoghegan Yes, agreed Peter... The "lower priority" issue was mentioned, but not in terms of the applicability of the fix overall.Personally, I would prefer going the size_t route vs int/long/int64 in C/C++/. Of course, as a user, I'd love a patchon V13 and something cleaner in V14. Thanks, Laurent.
On Thu, Jul 22, 2021 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, what we really ought to be using is size_t (a/k/a Size), at least > for memory-space-related calculations. I don't have an opinion right > now about what logtape.c ought to use. I do agree that avoiding "long" > altogether would be a good ultimate goal. I assume that we often use "long" in contexts where a signed integer type is required. Maybe this is not true in the case of the work_mem style calculations. But I know that it works that way in logtape.c, where -1 is a sentinel value. We already use int64 (not size_t) in tuplesort.c for roughly the same reason: LACKMEM() needs to work with negative values, to handle certain edge cases. > In the short term though, the question is whether we want to regard this > hashagg issue as something we need a fix for in v13/v14. The fact that > it's Windows-only makes it slightly less pressing in my mind, but it's > still a regression that some people are going to hit. True. I worry about the potential for introducing new bugs on Windows by backpatching a fix for this. Technically this restriction existed in every other work_mem consumer on Windows. Of course this won't matter much to users like Laurent. -- Peter Geoghegan
On Thu, Jul 22, 2021 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> No, he already tried, upthread. The trouble is that he's on a Windows
> machine, so get_hash_mem is quasi-artificially constraining the product
> to 2GB. And he needs it to be a bit more than that. Whether the
> constraint is hitting at the ngroups stage or it's related to actual
> memory consumption isn't that relevant.
Somehow I missed that part.
> What I'm wondering about is whether it's worth putting in a solution
> for this issue in isolation, or whether we ought to embark on the
> long-ignored project of getting rid of use of "long" for any
> memory-size-related computations. There would be no chance of
> back-patching something like the latter into v13, though.
+1. Even if we assume that Windows is a low priority platform, in the
long run it'll be easier to make it more like every other platform.
The use of "long" is inherently suspect to me. It signals that the
programmer wants something wider than "int", even if the standard
doesn't actually require that "long" be wider. This seems to
contradict what we know to be true for Postgres, which is that in
general it's unsafe to assume that long is int64. It's not just
work_mem related calculations. There is also code like logtape.c,
which uses long for block numbers -- that also exposes us to risk on
Windows.
By requiring int64 be used instead of long, we don't actually increase
risk for non-Windows platforms to any significant degree. I'm pretty
sure that "long" means int64 on non-Windows 64-bit platforms anyway.
On 2021-Jul-22, ldh@laurent-hasson.com wrote: > Yes, agreed Peter... The "lower priority" issue was mentioned, but not > in terms of the applicability of the fix overall. Personally, I would > prefer going the size_t route vs int/long/int64 in C/C++/. Of course, > as a user, I'd love a patch on V13 and something cleaner in V14. Just to clarify our terminology here. "A patch" means any kind of change to the source code, regardless of its cleanliness or applicability to versions deemed stable. You can have one patch which is a ugly hack for a stable version that can't have invasive changes, and another patch which is a clean, more maintainable version of a totally different fix for the version in development. We use the same word, "patch", for both types of source code changes. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit.
From: Vijaykumar Jain <vijaykumarjain.github@gmail.com>
Sent: Thursday, July 22, 2021 16:32
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3
Just asking, I may be completely wrong.
is this query parallel safe?
can we force parallel workers, by setting low parallel_setup_cost or otherwise to make use of scatter gather and Partial HashAggregate(s)?
I am just assuming more workers doing things in parallel, would require less disk spill per hash aggregate (or partial hash aggregate ?) and the scatter gather at the end.
I did some runs in my demo environment, not with the same query, some group by aggregates with around 25M rows, and it showed reasonable results, not too off.
this was pg14 on ubuntu.
I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit.
From: Vijaykumar Jain <vijaykumarjain.github@gmail.com>
Sent: Friday, July 23, 2021 10:45
To: ldh@laurent-hasson.com
Cc: Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org
Subject: Re: Big performance slowdown from 11.2 to 13.3
On Fri, 23 Jul 2021 at 03:06, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:
I am not sure I understand this parameter well enough but it’s with a default value right now of 1000. I have read Robert’s post (http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html) and could play with those parameters, but unsure whether what you are describing will unlock this 2GB limit.
Yeah, may be i was diverting, and possibly cannot use the windows bottleneck.
although the query is diff, the steps were
1) use system default, work_mem = 4MB, parallel_setup_cost = 1000
-- runs the query in parallel, no disk spill as work_mem suff.for my data
postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates group by year_actual, month_name, day_name) select max(date),year_actual from cte group by year_actual;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=931227.78..932398.85 rows=200 width=8) (actual time=7850.214..7855.848 rows=51 loops=1)
Group Key: dates.year_actual
-> Finalize GroupAggregate (cost=931227.78..932333.85 rows=4200 width=28) (actual time=7850.075..7855.611 rows=4201 loops=1)
Group Key: dates.year_actual, dates.month_name, dates.day_name
-> Gather Merge (cost=931227.78..932207.85 rows=8400 width=28) (actual time=7850.069..7854.008 rows=11295 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=930227.76..930238.26 rows=4200 width=28) (actual time=7846.419..7846.551 rows=3765 loops=3)
Sort Key: dates.year_actual, dates.month_name, dates.day_name
Sort Method: quicksort Memory: 391kB
Worker 0: Sort Method: quicksort Memory: 392kB
Worker 1: Sort Method: quicksort Memory: 389kB
-> Partial HashAggregate (cost=929933.00..929975.00 rows=4200 width=28) (actual time=7841.979..7842.531 rows=3765 loops=3)
Group Key: dates.year_actual, dates.month_name, dates.day_name
Batches: 1 Memory Usage: 721kB
Worker 0: Batches: 1 Memory Usage: 721kB
Worker 1: Batches: 1 Memory Usage: 721kB
-> Parallel Seq Scan on dates (cost=0.00..820355.00 rows=10957800 width=28) (actual time=3.347..4779.784 rows=8766240 loops=3)
Planning Time: 0.133 ms
Execution Time: 7855.958 ms
(20 rows)
-- set work_mem to a very low value, to spill to disk and compare the spill in parallel vs serial
postgres=# set work_mem TO 64; --
SET
postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates group by year_actual, month_name, day_name) select max(date),year_actual from cte group by year_actual;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=2867778.00..2868949.07 rows=200 width=8) (actual time=18116.529..18156.972 rows=51 loops=1)
Group Key: dates.year_actual
-> Finalize GroupAggregate (cost=2867778.00..2868884.07 rows=4200 width=28) (actual time=18116.421..18156.729 rows=4201 loops=1)
Group Key: dates.year_actual, dates.month_name, dates.day_name
-> Gather Merge (cost=2867778.00..2868758.07 rows=8400 width=28) (actual time=18116.412..18155.136 rows=11126 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=2866777.98..2866788.48 rows=4200 width=28) (actual time=17983.836..17984.981 rows=3709 loops=3)
Sort Key: dates.year_actual, dates.month_name, dates.day_name
Sort Method: external merge Disk: 160kB
Worker 0: Sort Method: external merge Disk: 168kB
Worker 1: Sort Method: external merge Disk: 160kB
-> Partial HashAggregate (cost=2566754.38..2866423.72 rows=4200 width=28) (actual time=10957.390..17976.250 rows=3709 loops=3)
Group Key: dates.year_actual, dates.month_name, dates.day_name
Planned Partitions: 4 Batches: 21 Memory Usage: 93kB Disk Usage: 457480kB
Worker 0: Batches: 21 Memory Usage: 93kB Disk Usage: 473056kB
Worker 1: Batches: 21 Memory Usage: 93kB Disk Usage: 456792kB
-> Parallel Seq Scan on dates (cost=0.00..820355.00 rows=10957800 width=28) (actual time=1.042..5893.803 rows=8766240 loops=3)
Planning Time: 0.142 ms
Execution Time: 18195.973 ms
(20 rows)
postgres=# set parallel_setup_cost TO 1000000000; -- make sure it never uses parallel, check disk spill (much more than when parallel workers used)
SET
postgres=# explain analyze with cte as (select month_name, day_name, year_actual, max(date) date from dimensions.dates group by year_actual, month_name, day_name) select max(date),year_actual from cte group by year_actual;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=5884624.58..5884658.08 rows=200 width=8) (actual time=35462.340..35463.142 rows=51 loops=1)
Group Key: cte.year_actual
-> Sort (cost=5884624.58..5884635.08 rows=4200 width=8) (actual time=35462.325..35462.752 rows=4201 loops=1)
Sort Key: cte.year_actual
Sort Method: external merge Disk: 80kB
-> Subquery Scan on cte (cost=5165122.70..5884312.33 rows=4200 width=8) (actual time=21747.139..35461.371 rows=4201 loops=1)
-> HashAggregate (cost=5165122.70..5884270.33 rows=4200 width=28) (actual time=21747.138..35461.140 rows=4201 loops=1)
Group Key: dates.year_actual, dates.month_name, dates.day_name
Planned Partitions: 4 Batches: 21 Memory Usage: 93kB Disk Usage: 1393192kB
-> Seq Scan on dates (cost=0.00..973764.20 rows=26298720 width=28) (actual time=0.005..10698.392 rows=26298721 loops=1)
Planning Time: 0.124 ms
Execution Time: 35548.514 ms
(12 rows)
I was thinking trying to make the query run in parallel, would reduce disk io per worker, and maybe speed up aggregate, especially if ti runs around 1 hours.
ofcourse, this was just trying things, maybe i am trying to override optimizer, but just wanted to understand cost diff and resource by forcing custom plans.
i also tried with enable_sort to off, enable_hashag to off <it only got worse, so not sharing as it would deviate the thread>.
again, ignore, if it does not make sense :)
Hello,
OK, that makes sense. I have some limited time to test those additional scenarios, but they make sense. I’ll see what I can do. The query on 11 takes under 5mn, and 50mn+ on 13.
Thank you,
Laurent.
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused issueselsewhere though. Maybe it has, but you're the first to report the problem, or at least the first to report it with enough detail to trace the cause. I've pushed a fix that removes the artificial restriction on work_mem times hash_mem_multiplier; it will be in next month's 13.4 release. You'll still need to increase hash_mem_multiplier to get satisfactory performance on your workload, but at least it'll be possible to do that. regards, tom lane
-----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Sunday, July 25, 2021 14:08 To: ldh@laurent-hasson.com Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused issueselsewhere though. Maybe it has, but you're the first to report the problem, or at least the first to report it with enough detail to tracethe cause. I've pushed a fix that removes the artificial restriction on work_mem times hash_mem_multiplier; it will be in next month's13.4 release. You'll still need to increase hash_mem_multiplier to get satisfactory performance on your workload, but at least it'll bepossible to do that. regards, tom lane --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hello Tom, So happy to help in whatever capacity I can 😊 This is fantastic news! Can't wait to try 13.4 asap and will report back. Thank you, Laurent.
Tom, One question that popped up in my head. hash_mem_multiplier is an upper-bound right: it doesn't reserve memory ahead of timecorrect? So there is no reason for me to spend undue amounts of time fine-tuning this parameter? If I have work_mem to521MB, then I can set hash_mem_multiplier to 8 and should be OK. This doesn't mean that every query will consume 4GB ofmemory. Thank you, Laurent. -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Sunday, July 25, 2021 14:08 To: ldh@laurent-hasson.com Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 "ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > As a user of PG, we have taken pride in the last few years in tuning the heck out of the system and getting great performancecompared to alternatives like SQLServer. The customers we work with typically have data centers and are overwhelminglyWindows shops: we won the battle to deploy a complex operational system on PG vs SQLServer, but Linux vs Windowswas still a bridge too far for many. I am surprised that this limitation introduced after V11 hasn't caused issueselsewhere though. Maybe it has, but you're the first to report the problem, or at least the first to report it with enough detail to tracethe cause. I've pushed a fix that removes the artificial restriction on work_mem times hash_mem_multiplier; it will be in next month's13.4 release. You'll still need to increase hash_mem_multiplier to get satisfactory performance on your workload, but at least it'll bepossible to do that. regards, tom lane
hash_mem_multiplier is an upper-bound right: it doesn't reserve memory ahead of time correct?
"ldh@laurent-hasson.com" <ldh@laurent-hasson.com> writes: > One question that popped up in my head. hash_mem_multiplier is an upper-bound right: it doesn't reserve memory ahead oftime correct? So there is no reason for me to spend undue amounts of time fine-tuning this parameter? If I have work_memto 521MB, then I can set hash_mem_multiplier to 8 and should be OK. This doesn't mean that every query will consume4GB of memory. Yeah, I wouldn't sweat over the specific value. The pre-v13 behavior was effectively equivalent to hash_mem_multiplier = infinity, so if you weren't having any OOM problems before, just crank it up. regards, tom lane
I wrote: > Yeah, I wouldn't sweat over the specific value. The pre-v13 behavior > was effectively equivalent to hash_mem_multiplier = infinity, so if > you weren't having any OOM problems before, just crank it up. Oh, wait, scratch that: the old executor's behavior is accurately described by that statement, but the planner's is not. The planner will not pick a hashagg plan if it guesses that the hash table would exceed the configured limit (work_mem before, now work_mem times hash_mem_multiplier). So raising hash_mem_multiplier to the moon might bias the v13 planner to pick hashagg plans in cases where earlier versions would not have. This doesn't describe your immediate problem, but it might be a reason to not just set the value as high as you can. BTW, this also suggests that the planner is underestimating the amount of memory needed for the hashagg, both before and after. That might be something to investigate at some point. regards, tom lane
-----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, July 27, 2021 23:31 To: ldh@laurent-hasson.com Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 I wrote: > Yeah, I wouldn't sweat over the specific value. The pre-v13 behavior > was effectively equivalent to hash_mem_multiplier = infinity, so if > you weren't having any OOM problems before, just crank it up. Oh, wait, scratch that: the old executor's behavior is accurately described by that statement, but the planner's is not. The planner will not pick a hashagg plan if it guesses that the hash table would exceed the configured limit (work_membefore, now work_mem times hash_mem_multiplier). So raising hash_mem_multiplier to the moon might bias the v13planner to pick hashagg plans in cases where earlier versions would not have. This doesn't describe your immediate problem,but it might be a reason to not just set the value as high as you can. BTW, this also suggests that the planner is underestimating the amount of memory needed for the hashagg, both before andafter. That might be something to investigate at some point. regards, tom lane This is very useful to know... all things I'll get to test after 13.4 is released. I'll report back when I am able to. Thank you, Laurent.
Hello all... I upgraded to 13.4 and re-ran the various versions of the query. I think I get a mixed bag of results. In the spirit of closingthis thread, I'll start with the fantastic news and then start a new thread with the other issue I have isolated. It looks like the query I submitted initially is now performing very well!!! It's funny but it looks like the query is consuminga tiny bit over 2.5GB, so my use case was literally a hair above the memory limit that we had discovered. V13.4takes 54s vs V11.2 takes 74s!! This is quite amazing to have such a performance gain! 25% better. The query was as follows: select "iccqa_iccassmt_fk" , MAX("iccqar_ans_val") as "iccqar_ans_val" , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEBRIDEMENT DATE'))::text as "iccqa_DEBRIDEMENT_DATE" , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEBRIDEMENT THIS VISIT') )::text as "iccqa_DEBRIDEMENT_THIS_VISIT" , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEBRIDEMENT TYPE') )::text as "iccqa_DEBRIDEMENT_TYPE" , (MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)'))::text as "iccqa_DEPTH_CM" , ... 50 more columns from ( -- 'A pivoted view of ICC QA assessments' select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment. , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data. , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable)from the meta-data. from VNAHGEDW_FACTS.AssessmentICCQA_Raw where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEBRIDEMENT DATE', 'DEBRIDEMENT THIS VISIT', 'DEBRIDEMENTTYPE', 'DEPTH (CM)', ... 50 more values) group by 1, 2 ) T group by 1 ; The plans are: V13.4 - 54s HashAggregate (cost=1486844.46..1486846.46 rows=200 width=1764) (actual time=50714.016..53813.049 rows=677899 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Batches: 1 Memory Usage: 1196065kB Buffers: shared hit=158815 -> Finalize HashAggregate (cost=1100125.42..1113234.54 rows=1310912 width=56) (actual time=14487.522..20498.241 rows=12926549loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 1 Memory Usage: 2572305kB Buffers: shared hit=158815 -> Gather (cost=382401.10..1050966.22 rows=6554560 width=56) (actual time=2891.177..6614.288 rows=12926549 loops=1) Workers Planned: 5 Workers Launched: 5 Buffers: shared hit=158815 -> Partial HashAggregate (cost=381401.10..394510.22 rows=1310912 width=56) (actual time=2790.736..3680.249rows=2154425 loops=6) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Batches: 1 Memory Usage: 417809kB Buffers: shared hit=158815 Worker 0: Batches: 1 Memory Usage: 401425kB Worker 1: Batches: 1 Memory Usage: 409617kB Worker 2: Batches: 1 Memory Usage: 393233kB Worker 3: Batches: 1 Memory Usage: 385041kB Worker 4: Batches: 1 Memory Usage: 393233kB -> Parallel Seq Scan on assessmenticcqa_raw (cost=0.00..362006.30 rows=2585974 width=38) (actual time=0.042..1600.138rows=2154425 loops=6) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENTTYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAINTYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATIONTISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSEDSURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTICTISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHERCOMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASONMEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERALTISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMSOF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTICTISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 30428 Buffers: shared hit=158815 Planning: Buffers: shared hit=3 Planning Time: 0.552 ms Execution Time: 54241.152 ms V11.2 - 74s HashAggregate (cost=1629249.48..1629251.48 rows=200 width=1764) (actual time=68833.447..73384.374 rows=742896 loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk Buffers: shared read=173985 -> Finalize HashAggregate (cost=1205455.43..1219821.33 rows=1436590 width=56) (actual time=19441.489..28630.297 rows=14176024loops=1) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared read=173985 -> Gather (cost=418922.41..1151583.31 rows=7182950 width=56) (actual time=3698.235..8445.971 rows=14176024 loops=1) Workers Planned: 5 Workers Launched: 5 Buffers: shared read=173985 -> Partial HashAggregate (cost=417922.41..432288.31 rows=1436590 width=56) (actual time=3559.562..4619.406rows=2362671 loops=6) Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code Buffers: shared read=173985 -> Parallel Seq Scan on assessmenticcqa_raw (cost=0.00..396656.48 rows=2835457 width=38) (actual time=0.261..1817.102rows=2362671 loops=6) Filter: ((iccqar_ques_code)::text = ANY ('{"DEBRIDEMENT DATE","DEBRIDEMENT THIS VISIT","DEBRIDEMENTTYPE","DEPTH (CM)","DEPTH DESCRIPTION","DOES PATIENT HAVE PAIN ASSOCIATED WITH THIS WOUND?","DRAIN PRESENT","DRAINTYPE","EDGE / SURROUNDING TISSUE - MACERATION",EDGES,EPITHELIALIZATION,"EXUDATE AMOUNT","EXUDATE TYPE","GRANULATIONTISSUE","INDICATE OTHER TYPE OF WOUND CLOSURE","INDICATE TYPE","INDICATE WOUND CLOSURE","IS THIS A CLOSEDSURGICAL WOUND OR SUSPECTED DEEP TISSUE INJURY?","LENGTH (CM)","MEASUREMENTS TAKEN","NECROTIC TISSUE AMOUNT","NECROTICTISSUE TYPE",ODOR,"OTHER COMMENTS REGARDING DEBRIDEMENT TYPE","OTHER COMMENTS REGARDING DRAIN TYPE","OTHERCOMMENTS REGARDING PAIN INTERVENTIONS","OTHER COMMENTS REGARDING PAIN QUALITY","OTHER COMMENTS REGARDING REASONMEASUREMENTS NOT TAKEN","PAIN FREQUENCY","PAIN INTERVENTIONS","PAIN QUALITY","PERIPHERAL TISSUE EDEMA","PERIPHERALTISSUE INDURATION","REASON MEASUREMENTS NOT TAKEN","RESPONSE TO PAIN INTERVENTIONS",SHAPE,"SIGNS AND SYMPTOMSOF INFECTION","SKIN COLOR SURROUNDING WOUND",STATE,"SURFACE AREA (SQ CM)","TOTAL NECROTIC TISSUE ESCHAR","TOTAL NECROTICTISSUE SLOUGH",TUNNELING,"TUNNELING SIZE(CM)/LOCATION - 12 - 3 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","TUNNELINGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","TUNNELING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK",UNDERMINING,"UNDERMININGSIZE(CM)/LOCATION - 12 - 3 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 3 - 6 O''CLOCK","UNDERMININGSIZE(CM)/LOCATION - 6 - 9 O''CLOCK","UNDERMINING SIZE(CM)/LOCATION - 9 - 12 O''CLOCK","WIDTH (CM)","WOUNDPAIN LEVEL, WHERE 0 = \"NO PAIN\" AND 10 = \"WORST POSSIBLE PAIN\""}'::text[])) Rows Removed by Filter: 31608 Buffers: shared read=173985 Planning Time: 22.673 ms Execution Time: 74110.779 ms Thank you so much to the whole team to this great work! I'll send a separate email with the other issue I think I have isolated. Thank you! Laurent Hasson -----Original Message----- From: ldh@laurent-hasson.com <ldh@laurent-hasson.com> Sent: Wednesday, July 28, 2021 16:13 To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: RE: Big performance slowdown from 11.2 to 13.3 -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, July 27, 2021 23:31 To: ldh@laurent-hasson.com Cc: Peter Geoghegan <pg@bowt.ie>; David Rowley <dgrowleyml@gmail.com>; Justin Pryzby <pryzby@telsasoft.com>; pgsql-performance@postgresql.org Subject: Re: Big performance slowdown from 11.2 to 13.3 I wrote: > Yeah, I wouldn't sweat over the specific value. The pre-v13 behavior > was effectively equivalent to hash_mem_multiplier = infinity, so if > you weren't having any OOM problems before, just crank it up. Oh, wait, scratch that: the old executor's behavior is accurately described by that statement, but the planner's is not. The planner will not pick a hashagg plan if it guesses that the hash table would exceed the configured limit (work_membefore, now work_mem times hash_mem_multiplier). So raising hash_mem_multiplier to the moon might bias the v13planner to pick hashagg plans in cases where earlier versions would not have. This doesn't describe your immediate problem,but it might be a reason to not just set the value as high as you can. BTW, this also suggests that the planner is underestimating the amount of memory needed for the hashagg, both before andafter. That might be something to investigate at some point. regards, tom lane This is very useful to know... all things I'll get to test after 13.4 is released. I'll report back when I am able to. Thank you, Laurent.