RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 - Mailing list pgsql-performance

From ldh@laurent-hasson.com
Subject RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Date
Msg-id MN2PR15MB2560AE2A878FBBE88DCFF33385C29@MN2PR15MB2560.namprd15.prod.outlook.com
Whole thread Raw
In response to Big Performance drop of Exceptions in UDFs between V11.2 and 13.4  ("ldh@laurent-hasson.com" <ldh@laurent-hasson.com>)
Responses Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
List pgsql-performance

OK… I apologize for the long email before. Right after I sent it, I thought of a much simpler use-case to illustrate the issue which doesn’t depend on any special data I have access o and complex pivoting. It’s as raw as I can make it.

 

I create a table with 1M rows and 2 columns. Column “a” is a random string, while column “b” is a random integer as a string. Then I use a UDF that converts strings to floats and handles an exception if the incoming string is not parsable as a float. Then I do a simple select of each column. In the “a” case, the UDF throws and catches lots of exceptions. In the “b” case, the conversion is clean and exceptions are not thrown.

 

 

create table sampletest (a varchar, b varchar);

 

insert into sampletest (a, b)

select substr(md5(random()::text), 0, 15), (100000000*random())::integer::varchar

from generate_series(1,1000000);

 

CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)

RETURNS real AS $$

BEGIN

  RETURN case when str is null then val else str::real end;

EXCEPTION WHEN OTHERS THEN

  RETURN val;

END;

$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

 

select MAX(toFloat(a, null)) as "a" from sampletest;

 

select MAX(toFloat(b, null)) as "b" from sampletest;

 

 

 

On purpose, I am doing a max(toFloat) instead of toFloat(max) to exercise the UDF 1M times.

 

 

V13.4 “a” scenario (exceptions)

-------------------------------------------------------------

Aggregate  (cost=14778.40..14778.41 rows=1 width=4) (actual time=774098.537..774098.538 rows=1 loops=1)

  Buffers: shared hit=6373

  ->  Seq Scan on sampletest  (cost=0.00..11975.60 rows=560560 width=32) (actual time=0.011..285.458 rows=1000000 loops=1)

        Buffers: shared hit=6370

Planning Time: 0.066 ms

Execution Time: 774,098.563 ms

 

 

V13.4 “b” scenario (no exceptions)

-------------------------------------------------------------

Aggregate  (cost=14778.40..14778.41 rows=1 width=4) (actual time=1510.200..1510.201 rows=1 loops=1)

  Buffers: shared hit=6385

  ->  Seq Scan on sampletest  (cost=0.00..11975.60 rows=560560 width=32) (actual time=0.024..115.196 rows=1000000 loops=1)

        Buffers: shared hit=6370

Planning:

  Buffers: shared hit=26

Planning Time: 0.361 ms

Execution Time: 1,530.659 ms

 

 

V11.2 “a” scenario (exceptions)

-------------------------------------------------------------

Aggregate  (cost=21658.00..21658.01 rows=1 width=4) (actual time=26528.286..26528.286 rows=1 loops=1)

  Buffers: shared hit=6393

  ->  Seq Scan on sampletest  (cost=0.00..16562.00 rows=1019200 width=15) (actual time=0.037..190.633 rows=1000000 loops=1)

        Buffers: shared hit=6370

Planning Time: 1.182 ms

Execution Time: 26,530.492 ms

 

 

V11.2 “b” scenario (no exceptions)

-------------------------------------------------------------

Aggregate  (cost=21658.00..21658.01 rows=1 width=4) (actual time=1856.116..1856.116 rows=1 loops=1)

  Buffers: shared hit=6370

  ->  Seq Scan on sampletest  (cost=0.00..16562.00 rows=1019200 width=8) (actual time=0.014..88.152 rows=1000000 loops=1)

        Buffers: shared hit=6370

Planning Time: 0.098 ms

Execution Time: 1,856.152 ms

 

 

 

 

 

Summary:

  • Scenario V11.2/a: 26.6s
  • Scenario V11.2/b: 1.9s
  • Scenario V13.4/a: 774.1s
  • Scenario V13.4/b: 1.5s

 

Conclusion:

  • The no-exception scenario performs 20% better on 13.4 vs 11.2 (nice for a straight scan!)
  • On 11.2, exceptions add an overhead of over 14x (1.9s vs 26.6s). I did not expect exceptions to add such a large overhead. Why is that?
  • Between 11.2 and 13.4, the no-exceptions scenario “b” performs 30x slower (26.6s vs 774.1s).

 

Thank you!

Laurent Hasson.

 

 

 

From: ldh@laurent-hasson.com <ldh@laurent-hasson.com>
Sent: Saturday, August 21, 2021 03:57
To: pgsql-performance@postgresql.org
Subject: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

 

Hello all,

 

I think I have identified a major performance issue between V11.2 and 13.4 with respect to exception handling in UDFs. I have the following simplified query that pivots data and makes use of a UDF to convert data to a specific type, in this case, float:

 

 

select "iccqa_iccassmt_fk"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null) as "iccqa_DEPTH_CM"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null) as "iccqa_LENGTH_CM"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null) as "iccqa_WIDTH_CM"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null) as "iccqa_DRAIN_PRESENT"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null) as "iccqa_MEASUREMENTS_TAKEN"

     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null) as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from  (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

  from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

                                                               , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

                                                               ) group by 1, 2

) T

     group by 1

;

 

 

The UDF is simple as follows:

 

 

CREATE OR REPLACE FUNCTION TILDA.toFloat(str varchar, val real)

RETURNS real AS $$

BEGIN

  RETURN case when str is null then val else str::real end;

EXCEPTION WHEN OTHERS THEN

  RETURN val;

END;

$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

 

 

 

It works as a coalesce but with a conversion. I think I have identified some large performance difference with the exception handling. It so happens that with the last 3 columns ('DRAIN PRESENT', 'MEASUREMENTS TAKEN' and 'SIGNS AND SYMPTOMS OF INFECTION'), the data is VERY dirty. There is a mix of 0/1, YES/NO, and other mistyped stuff. This means these 3 columns throw lots of exceptions in the UDF. To illustrate, I simply break this into 2 queries.

 

 

 

select "iccqa_iccassmt_fk"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from  (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

  from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

                                                               , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

                                                               )

group by 1, 2

) T

     group by 1

;

 

 

The performance is as expected.

 

 

HashAggregate  (cost=448463.70..448467.20 rows=200 width=16) (actual time=6760.797..9585.397 rows=677899 loops=1)

  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

  Batches: 1  Memory Usage: 147489kB

  Buffers: shared hit=158815

  ->  HashAggregate  (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4576.514..5460.770 rows=2374628 loops=1)

        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        Batches: 1  Memory Usage: 368657kB

        Buffers: shared hit=158815

        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.033..3298.544 rows=2374628 loops=1)

              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

              Rows Removed by Filter: 10734488

              Buffers: shared hit=158815

Planning:

  Buffers: shared hit=3

Planning Time: 0.198 ms

Execution Time: 9678.120 ms

 

 

 

However, once we switch with the three “bad” columns, the results fall apart.

 

 

 

select "iccqa_iccassmt_fk"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"

--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"

     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as "iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"

from  (

-- 'A pivoted view of ICC QA assessments'

select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as "iccqa_iccassmt_fk" -- The key identifying an ICC assessment.

     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as "iccqar_ques_code" -- The question long code from the meta-data.

     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as "iccqar_ans_val" -- The official answer, if applicable) from the meta-data.

  from VNAHGEDW_FACTS.AssessmentICCQA_Raw

where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 'LENGTH (CM)', 'WIDTH (CM)'

                                                               , 'DRAIN PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'

                                                               )

group by 1, 2

) T

     group by 1

;

 

 

 

The performance falls apart. It is a huge performance difference from ~10s to ~11mn and the only difference that I can think of is that the data is dirty which causes the exception path to be taken. The explain is:

 

 

HashAggregate  (cost=448463.70..448467.20 rows=200 width=16) (actual time=6672.921..696753.080 rows=677899 loops=1)

  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

  Batches: 1  Memory Usage: 131105kB

  Buffers: shared hit=158815

  ->  HashAggregate  (cost=405997.87..417322.09 rows=1132422 width=56) (actual time=4574.918..5446.022 rows=2374628 loops=1)

        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        Batches: 1  Memory Usage: 368657kB

        Buffers: shared hit=158815

        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..388224.53 rows=2369779 width=38) (actual time=0.032..3300.616 rows=2374628 loops=1)

              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

              Rows Removed by Filter: 10734488

              Buffers: shared hit=158815

Planning:

  Buffers: shared hit=3

Planning Time: 0.201 ms

Execution Time: 696868.845 ms

 

 

 

Now, on V11.2, the explain is:

 

 

HashAggregate  (cost=492171.36..492174.86 rows=200 width=16) (actual time=19322.522..50556.738 rows=743723 loops=1)

  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk

  Buffers: shared hit=11 read=174155 dirtied=13

  ->  HashAggregate  (cost=445458.43..457915.21 rows=1245678 width=56) (actual time=16260.015..17575.088 rows=2601088 loops=1)

        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, assessmenticcqa_raw.iccqar_ques_code

        Buffers: shared read=174155 dirtied=13

        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..425803.93 rows=2620600 width=38) (actual time=0.126..14425.239 rows=2601088 loops=1)

              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH (CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF INFECTION"}'::text[]))

              Rows Removed by Filter: 11778360

              Buffers: shared read=174155 dirtied=13

Planning Time: 36.121 ms

Execution Time: 50730.255 ms

 

 

 

So, we are seeing two issues:

-          I think exception handling is significantly slower between V11.2 and v13.4. I see almost a 14x difference from 50s to 700s.

-          Comparing the two queries on V11.2, the difference is 13s vs 50s. So even on V11.2, the exception handling adds a significant overhead which I was not expecting.

 

I’ll be happy to update my test cases and share additional info if needed.

 

Thank you,

Laurent Hasson.

 

pgsql-performance by date:

Previous
From: "ldh@laurent-hasson.com"
Date:
Subject: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Next
From: Tom Lane
Date:
Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4