Home > mailing lists

Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 - Mailing list pgsql-performance

From	ldh@laurent-hasson.com
Subject	Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Date	August 21, 2021 07:56:34
Msg-id	MN2PR15MB2560D77DFB58BD893E2EAAAD85C29@MN2PR15MB2560.namprd15.prod.outlook.com Whole thread Raw
Responses	RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
List	pgsql-performance

Tree view

Hello all,

I think I have identified a major performance issue between V11.2 and 13.4 with respect to exception handling in UDFs. I have the following simplified query that pivots data and makes use of a UDF to convert data to a specific type, in this case, float:

select "iccqa_iccassmt_fk"

, Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null) as "iccqa_DEPTH_CM"

, Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null) as "iccqa_LENGTH_CM"

, Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null) as "iccqa_WIDTH_CM"

, Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null) as "iccqa_DRAIN_PRESENT"

, Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null) as "iccqa_MEASUREMENTS_TAKEN"

, Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null) as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

, VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

, max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

, 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

) group by 1, 2

) T

group by 1

;

The UDF is simple as follows:

CREATE OR REPLACE FUNCTION TILDA.toFloat(str varchar, val real)

RETURNS real AS $$

BEGIN

RETURN case when str is null then val else str::real end;

EXCEPTION WHEN OTHERS THEN

RETURN val;

END;

$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

It works as a coalesce but with a conversion. I think I have identified some large performance difference with the exception handling. It so happens that with the last 3 columns ('DRAIN PRESENT', 'MEASUREMENTS TAKEN' and 'SIGNS AND SYMPTOMS OF INFECTION'), the data is VERY dirty. There is a mix of 0/1, YES/NO, and other mistyped stuff. This means these 3 columns throw lots of exceptions in the UDF. To illustrate, I simply break this into 2 queries.

select "iccqa_iccassmt_fk"

, (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"

, (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"

, (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"

-- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"

-- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"

-- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

, VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

, max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

, 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

)

group by 1, 2

) T

group by 1

;

The performance is as expected.

HashAggregate (cost=448463.70..448467.20 rows=200 width=16) (actual time=6760.797..9585.397 rows=677899 loops=1)

Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

Batches: 1 Memory Usage: 147489kB

Buffers: shared hit=158815

-> HashAggregate (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4576.514..5460.770 rows=2374628 loops=1)

Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

Batches: 1 Memory Usage: 368657kB

Buffers: shared hit=158815

-> Seq Scan on assessmenticcqa_raw (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.033..3298.544 rows=2374628 loops=1)

Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

Rows Removed by Filter: 10734488

Buffers: shared hit=158815

Planning:

Buffers: shared hit=3

Planning Time: 0.198 ms

Execution Time: 9678.120 ms

However, once we switch with the three “bad” columns, the results fall apart.

select "iccqa_iccassmt_fk"

-- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"

-- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"

-- , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"

, (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"

, (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"

, (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

, VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

, max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

, 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

)

group by 1, 2

) T

group by 1

;

The performance falls apart. It is a huge performance difference from ~10s to ~11mn and the only difference that I can think of is that the data is dirty which causes the exception path to be taken. The explain is:

HashAggregate (cost=448463.70..448467.20 rows=200 width=16) (actual time=6672.921..696753.080 rows=677899 loops=1)

Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

Batches: 1 Memory Usage: 131105kB

Buffers: shared hit=158815

-> HashAggregate (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4574.918..5446.022 rows=2374628 loops=1)

Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

Batches: 1 Memory Usage: 368657kB

Buffers: shared hit=158815

-> Seq Scan on assessmenticcqa_raw (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.032..3300.616 rows=2374628 loops=1)

Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

Rows Removed by Filter: 10734488

Buffers: shared hit=158815

Planning:

Buffers: shared hit=3

Planning Time: 0.201 ms

Execution Time: 696868.845 ms

Now, on V11.2, the explain is:

HashAggregate (cost=492171.36..492174.86 rows=200 width=16) (actual time=19322.522..50556.738 rows=743723 loops=1)

Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

Buffers: shared hit=11 read=174155 dirtied=13

-> HashAggregate (cost=445458.43..457915.21 rows=1245678 width=56) (actual time=16260.015..17575.088 rows=2601088 loops=1)

Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

Buffers: shared read=174155 dirtied=13

-> Seq Scan on assessmenticcqa_raw (cost=0.00..425803.93 rows=2620600 width=38) (actual time=0.126..14425.239 rows=2601088 loops=1)

Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

Rows Removed by Filter: 11778360

Buffers: shared read=174155 dirtied=13

Planning Time: 36.121 ms

Execution Time: 50730.255 ms

So, we are seeing two issues:

I think exception handling is significantly slower between V11.2 and v13.4. I see almost a 14x difference from 50s to 700s.
Comparing the two queries on V11.2, the difference is 13s vs 50s. So even on V11.2, the exception handling adds a significant overhead which I was not expecting.

I’ll be happy to update my test cases and share additional info if needed.

Thank you,

Laurent Hasson.

pgsql-performance by date:

From: "ldh@laurent-hasson.com"
Date: 21 August 2021, 05:42:38
Subject: RE: Big performance slowdown from 11.2 to 13.3

From: "ldh@laurent-hasson.com"
Date: 21 August 2021, 08:55:37
Subject: RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 - Mailing list pgsql-performance

Previous

Next