Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 - Mailing list pgsql-performance

From ldh@laurent-hasson.com
Subject Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Date
Msg-id MN2PR15MB2560D77DFB58BD893E2EAAAD85C29@MN2PR15MB2560.namprd15.prod.outlook.com
Whole thread Raw
Responses RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
List pgsql-performance

Hello all,

 

I think I have identified a major performance issue between V11.2 and 13.4 with respect to exception handling in UDFs. I have the following simplified query that pivots data and makes use of a UDF to convert data to a specific type, in this case, float:

 

 

select "iccqa_iccassmt_fk"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null) as "iccqa_DEPTH_CM"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null) as "iccqa_LENGTH_CM"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null) as "iccqa_WIDTH_CM"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null) as "iccqa_DRAIN_PRESENT"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null) as "iccqa_MEASUREMENTS_TAKEN"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null) as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from  (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

  from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

                                                               , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

                                                               ) group by 1, 2

) T

     group by 1

;

 

 

The UDF is simple as follows:

 

 

CREATE OR REPLACE FUNCTION TILDA.toFloat(str varchar, val real)

RETURNS real AS $$

BEGIN

  RETURN case when str is null then val else str::real end;

EXCEPTION WHEN OTHERS THEN

  RETURN val;

END;

$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

 

 

 

It works as a coalesce but with a conversion. I think I have identified some large performance difference with the exception handling. It so happens that with the last 3 columns ('DRAIN PRESENT', 'MEASUREMENTS TAKEN' and 'SIGNS AND SYMPTOMS OF INFECTION'), the data is VERY dirty. There is a mix of 0/1, YES/NO, and other mistyped stuff. This means these 3 columns throw lots of exceptions in the UDF. To illustrate, I simply break this into 2 queries.

 

 

 

select "iccqa_iccassmt_fk"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from  (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

  from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

                                                               , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

                                                               )

group by 1, 2

) T

     group by 1

;

 

 

The performance is as expected.

 

 

HashAggregate  (cost=448463.70..448467.20 rows=200 width=16) (actual time=6760.797..9585.397 rows=677899 loops=1)

  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

  Batches: 1  Memory Usage: 147489kB

  Buffers: shared hit=158815

  ->  HashAggregate  (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4576.514..5460.770 rows=2374628 loops=1)

        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        Batches: 1  Memory Usage: 368657kB

        Buffers: shared hit=158815

        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.033..3298.544 rows=2374628 loops=1)

              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

              Rows Removed by Filter: 10734488

              Buffers: shared hit=158815

Planning:

  Buffers: shared hit=3

Planning Time: 0.198 ms

Execution Time: 9678.120 ms

 

 

 

However, once we switch with the three “bad” columns, the results fall apart.

 

 

 

select "iccqa_iccassmt_fk"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from  (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

  from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

                                                               , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

                                                               )

group by 1, 2

) T

     group by 1

;

 

 

 

The performance falls apart. It is a huge performance difference from ~10s to ~11mn and the only difference that I can think of is that the data is dirty which causes the exception path to be taken. The explain is:

 

 

HashAggregate  (cost=448463.70..448467.20 rows=200 width=16) (actual time=6672.921..696753.080 rows=677899 loops=1)

  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

  Batches: 1  Memory Usage: 131105kB

  Buffers: shared hit=158815

  ->  HashAggregate  (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4574.918..5446.022 rows=2374628 loops=1)

        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        Batches: 1  Memory Usage: 368657kB

        Buffers: shared hit=158815

        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.032..3300.616 rows=2374628 loops=1)

              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

              Rows Removed by Filter: 10734488

              Buffers: shared hit=158815

Planning:

  Buffers: shared hit=3

Planning Time: 0.201 ms

Execution Time: 696868.845 ms

 

 

 

Now, on V11.2, the explain is:

 

 

HashAggregate  (cost=492171.36..492174.86 rows=200 width=16) (actual time=19322.522..50556.738 rows=743723 loops=1)

  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

  Buffers: shared hit=11 read=174155 dirtied=13

  ->  HashAggregate  (cost=445458.43..457915.21 rows=1245678 width=56) (actual time=16260.015..17575.088 rows=2601088 loops=1)

        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        Buffers: shared read=174155 dirtied=13

        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..425803.93 rows=2620600 width=38) (actual time=0.126..14425.239 rows=2601088 loops=1)

              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

              Rows Removed by Filter: 11778360

              Buffers: shared read=174155 dirtied=13

Planning Time: 36.121 ms

Execution Time: 50730.255 ms

 

 

 

So, we are seeing two issues:

  • I think exception handling is significantly slower between V11.2 and v13.4. I see almost a 14x difference from 50s to 700s.
  • Comparing the two queries on V11.2, the difference is 13s vs 50s. So even on V11.2, the exception handling adds a significant overhead which I was not expecting.

 

I’ll be happy to update my test cases and share additional info if needed.

 

Thank you,

Laurent Hasson.

 

pgsql-performance by date:

Previous
From: "ldh@laurent-hasson.com"
Date:
Subject: RE: Big performance slowdown from 11.2 to 13.3
Next
From: "ldh@laurent-hasson.com"
Date:
Subject: RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4