Thread: BUG #18950: pgsql function that worked in Postgresql 16 does not return in Postgresql 17

The following bug has been logged on the website:

Bug reference:      18950
Logged by:          Lowell Hought
Email address:      lowell.hought@gmail.com
PostgreSQL version: 17.5
Operating system:   Red Hat Enterprise Linux release 8.10 (Ootpa)
Description:

A pgsql function that returns Table is relatively simple.  It accepts two
arguments; a date and an interval.  It uses these to arguments in a query
that returns a result set.  This function worked in Postgresql 16.1 and
Postgresql 16.9.  However, it does not work in versions 17.0 nor in 17.5
Here is my build sequence for version 17.5, the last version I tested.
./configure --prefix=/usr/local/pgsql/17/ --with-openssl --with-ldap
--with-systemd --with-libxml --with-libxslt
gmake world
gmake check : # All 222 tests passed.
sudo gmake install-world
su root
mkdir /usr/local/pgsql/17/data
chown postgres /usr/local/pgsql/17/data
su postgres
/usr/local/pgsql/17/bin/initdb -D /usr/local/pgsql/17/data
Success. You can now start the database server using:
    /usr/local/pgsql/17/bin/pg_ctl -D /usr/local/pgsql/17/data -l logfile
start
/usr/local/pgsql/17/bin/pg_ctl -D /usr/local/pgsql/17/data -l
/usr/local/pgsql/17/data/logfile start
/usr/local/pgsql/17/bin/createdb test
/usr/local/pgsql/17/bin/psql test
psql (17.5)
Type "help" for help.
Restore from backup:
/usr/local/pgsql/17/bin/psql  -d test -Upostgres -f /bak/db/db_backup.sql
Then I run psql and execute this command:
SELECT * FROM report.GetReportPoolTrainees(CURRENT_DATE, '1 year');
The result is that even after 12 hours the function does not return.  There
is no response whatsoever.  I tested this both on Redhat 8.10 and on Red Hat
Enterprise Linux release 8.10 (Ootpa) and on Rocky Linux release 8.7 (Green
Obsidian) with identical results.  On both systems the function call works
as expected in version 16, but in version 17 it never returns.
Another interesting fact: I run the query contained in within the function,
it executes as expected and returns the expected results.  It is only when
called via the function that it fails.
Here is the function definition:
CREATE OR REPLACE FUNCTION report.GetReportPoolTrainees(
venddate DATE, vtimespan INTERVAL
) RETURNS TABLE (
        facility_key INT,                       facility_code TEXT,
traineeid INT,
        ndc_emp_id INT,                         facility_eod DATE,
hire_status TEXT,
        trainee_start_date DATE,        devstatus_date DATE,    status TEXT,
        status_days INT
)AS $$
DECLARE
plenddate DATE;
plinterval INTERVAL;
BEGIN
        IF venddate IS NULL THEN
                plenddate = CURRENT_DATE;
        ELSE
                plenddate = venddate;
        END IF;
        IF vtimespan IS NULL THEN
                SELECT value::INTERVAL INTO plinterval FROM
ntd.site_settings WHERE variable = 'PPT_DATA_YEARS';
                IF NOT FOUND THEN
                        plinterval = '10 years';
                END IF;
        ELSE
                plinterval = vtimespan;
        END IF;
        RETURN QUERY
        WITH t AS (
                SELECT DISTINCT td.facility_key, td.traineeid,
td.ndc_emp_id, td.facility_eod,
                        td.tia_code AS hire_status, td.trainee_start_date,
td.devstatus_date, td.status, td.status_days
                  FROM report.vw_training_details td
        ),
        f AS (
                SELECT *
                  FROM ntd.facility_dim fd
                  JOIN (
                        SELECT hft.facility_key,
MAX(facility_type_start_date) AS facility_type_start_date
                          FROM ntd.history_facility_type hft
                          GROUP BY hft.facility_key
                        )DT1 USING(facility_key)
        )
        SELECT DT1.facility_key, DT1.facility_code, t.traineeid,
t.ndc_emp_id,
                t.facility_eod, t.hire_status, t.trainee_start_date,
t.devstatus_date, t.status, t.status_days
          FROM t
          JOIN (SELECT l.facility_key, l.facility_code,
                                   CASE
                                   WHEN facility_type_start_date IS NOT NULL
AND facility_type_start_date > l.last_devstatus_date - plinterval
                                                THEN
facility_type_start_date
                                   ELSE l.last_devstatus_date - plinterval
                                   END AS last_devstatus_date
                          FROM (SELECT t.facility_key, f.facility_code,
MAX(t.devstatus_date) AS last_devstatus_date
                                          FROM t
                                          LEFT JOIN f USING(facility_key)
                                         WHERE (t.status = 'Completed' OR
t.status = 'Did Not Complete')
                                           AND t.devstatus_date <= plenddate
                                         GROUP BY t.facility_key,
f.facility_code
                                )l
                          LEFT JOIN (SELECT pef.facility_key,
MAX(facility_type_start_date) AS facility_type_start_date
                                                   FROM
ntd.ppt_exception_facilities pef
                                                  WHERE
facility_type_start_date < CURRENT_DATE
                                                  GROUP BY pef.facility_key
                                ) e USING(facility_key)
                )DT1 USING(facility_key)
         WHERE t.devstatus_date <= plenddate
           AND t.devstatus_date >= DT1.last_devstatus_date
        ;
END;
$$ LANGUAGE plpgsql;
This is the first time I have ever encountered a bug and so I have no
experience with reporting it.  I understand I need to send the underlying
table structures but not sure if I do it here or some other way.  I do not
see an option to attach files.
I love Postgres!  It is a terrific database and has served me well for about
20 years.


On Sat, 2025-06-07 at 16:08 +0000, PG Bug reporting form wrote:
> PostgreSQL version: 17.5
> Operating system:   Red Hat Enterprise Linux release 8.10 (Ootpa)
>
> A pgsql function that returns Table is relatively simple.  It accepts two
> arguments; a date and an interval.  It uses these to arguments in a query
> that returns a result set.  This function worked in Postgresql 16.1 and
> Postgresql 16.9.  However, it does not work in versions 17.0 nor in 17.5
>
> [function definition]
>
> This is the first time I have ever encountered a bug and so I have no
> experience with reporting it.

Your function essentially is running a query.
If you run that query outside of the function, does it finish on time?
If not, you have a simpler problem to tackle.

It need not necessarily be a bug if some queries perform worse in a later
PostgreSQL version.  There are too many factors involved in the execution
of a complicated query.  To give help, we'd need at least the EXPLAIN
output from the query and the EXPLAIN (ANALYZE, BUFFERS) output from the
query on the old version.

Yours,
Laurenz Albe



If I run the query outside of the function it works as expected.  But the function never returns.

Lowell

> On Jun 7, 2025, at 7:27 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Sat, 2025-06-07 at 16:08 +0000, PG Bug reporting form wrote:
>> PostgreSQL version: 17.5
>> Operating system:   Red Hat Enterprise Linux release 8.10 (Ootpa)
>>
>> A pgsql function that returns Table is relatively simple.  It accepts two
>> arguments; a date and an interval.  It uses these to arguments in a query
>> that returns a result set.  This function worked in Postgresql 16.1 and
>> Postgresql 16.9.  However, it does not work in versions 17.0 nor in 17.5
>>
>> [function definition]
>>
>> This is the first time I have ever encountered a bug and so I have no
>> experience with reporting it.
>
> Your function essentially is running a query.
> If you run that query outside of the function, does it finish on time?
> If not, you have a simpler problem to tackle.
>
> It need not necessarily be a bug if some queries perform worse in a later
> PostgreSQL version.  There are too many factors involved in the execution
> of a complicated query.  To give help, we'd need at least the EXPLAIN
> output from the query and the EXPLAIN (ANALYZE, BUFFERS) output from the
> query on the old version.
>
> Yours,
> Laurenz Albe



Lowell Hought <lowell.hought@gmail.com> writes:
> If I run the query outside of the function it works as expected.  But the function never returns.

This isn't that surprising either: a query inside a function is
often translated into a "generic" plan that doesn't depend on
specific parameter values, typically sacrificing runtime to
avoid repeated planning.  Does it get any better if you do
"set plan_cache_mode = force_custom_plan" before running the
function?

            regards, tom lane



Yes!  I just changed the parameter as  you suggested and the query returned as expected.
So I guess something changed between version 16 and version 17?  Perhaps the default for that setting?

Lowell

On Sat, Jun 7, 2025 at 9:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lowell Hought <lowell.hought@gmail.com> writes:
> If I run the query outside of the function it works as expected.  But the function never returns.

This isn't that surprising either: a query inside a function is
often translated into a "generic" plan that doesn't depend on
specific parameter values, typically sacrificing runtime to
avoid repeated planning.  Does it get any better if you do
"set plan_cache_mode = force_custom_plan" before running the
function?

                        regards, tom lane
Wait, I spoke too soon.  When I just ran the query I was on version 16.  Give me a moment to shut down 16 and fire up 17 and I will try it again.

Lowell

On Sat, Jun 7, 2025 at 9:57 PM Lowell Hought <lowell.hought@gmail.com> wrote:
Yes!  I just changed the parameter as  you suggested and the query returned as expected.
So I guess something changed between version 16 and version 17?  Perhaps the default for that setting?

Lowell

On Sat, Jun 7, 2025 at 9:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lowell Hought <lowell.hought@gmail.com> writes:
> If I run the query outside of the function it works as expected.  But the function never returns.

This isn't that surprising either: a query inside a function is
often translated into a "generic" plan that doesn't depend on
specific parameter values, typically sacrificing runtime to
avoid repeated planning.  Does it get any better if you do
"set plan_cache_mode = force_custom_plan" before running the
function?

                        regards, tom lane
Lowell Hought <lowell.hought@gmail.com> writes:
> Yes!  I just changed the parameter as  you suggested and the query returned
> as expected.
> So I guess something changed between version 16 and version 17?  Perhaps
> the default for that setting?

No, that default has not changed.  What apparently happened is that
cost estimates or something changed enough to persuade the planner
to use a generic plan that's considerably inferior to what it was
choosing before.  There's nowhere near enough info in your report
to pin it down more closely than that.

Just to eliminate the obvious --- you did run ANALYZE after the
upgrade, right?  If you had any nondefault settings of
default_statistics_target or similar planner parameters,
did you make sure they got transferred to the new installation?

            regards, tom lane



Changing that parameter had no effect on the version 17 instance.  The query itself produces results in a matter of a second or so, but the function still hangs and does not return.

Lowell

On Sat, Jun 7, 2025 at 9:59 PM Lowell Hought <lowell.hought@gmail.com> wrote:
Wait, I spoke too soon.  When I just ran the query I was on version 16.  Give me a moment to shut down 16 and fire up 17 and I will try it again.

Lowell

On Sat, Jun 7, 2025 at 9:57 PM Lowell Hought <lowell.hought@gmail.com> wrote:
Yes!  I just changed the parameter as  you suggested and the query returned as expected.
So I guess something changed between version 16 and version 17?  Perhaps the default for that setting?

Lowell

On Sat, Jun 7, 2025 at 9:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lowell Hought <lowell.hought@gmail.com> writes:
> If I run the query outside of the function it works as expected.  But the function never returns.

This isn't that surprising either: a query inside a function is
often translated into a "generic" plan that doesn't depend on
specific parameter values, typically sacrificing runtime to
avoid repeated planning.  Does it get any better if you do
"set plan_cache_mode = force_custom_plan" before running the
function?

                        regards, tom lane
My initial response that changing the parameter worked was mistaken.  I forgot to shut down version 16 so the function call happened on that instance.  I now shut down 16, fired up 17 and set the parameter, and the behavior did not change.  The query itself returns results in about 1 second.  The function never returns at all, it just sits there.

Lowell


On Sat, Jun 7, 2025 at 10:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lowell Hought <lowell.hought@gmail.com> writes:
> Yes!  I just changed the parameter as  you suggested and the query returned
> as expected.
> So I guess something changed between version 16 and version 17?  Perhaps
> the default for that setting?

No, that default has not changed.  What apparently happened is that
cost estimates or something changed enough to persuade the planner
to use a generic plan that's considerably inferior to what it was
choosing before.  There's nowhere near enough info in your report
to pin it down more closely than that.

Just to eliminate the obvious --- you did run ANALYZE after the
upgrade, right?  If you had any nondefault settings of
default_statistics_target or similar planner parameters,
did you make sure they got transferred to the new installation?

                        regards, tom lane
Lowell Hought <lowell.hought@gmail.com> writes:
> Changing that parameter had no effect on the version 17 instance.  The
> query itself produces results in a matter of a second or so, but the
> function still hangs and does not return.

Darn.  But I'm still pretty sure that the problem is an undesirable
change of plan, and that we don't have enough information to say
more than that.  Any chance you could extract a self-contained
test case?

            regards, tom lane



I can try.  I am not sure how to go about that.  I did not see on the bug report page where I could upload files, and I am afraid the file size of the tables needed might be too large for email.  The entire database when written to an sql dump file is about 20 GB, so not terribly large.  I could attempt to dump the schema definition in one file and then the underlying tables in another.  Would that work?  Or would you also need the files for the function and any views the query relies upon?

Lowell

On Sat, Jun 7, 2025 at 10:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lowell Hought <lowell.hought@gmail.com> writes:
> Changing that parameter had no effect on the version 17 instance.  The
> query itself produces results in a matter of a second or so, but the
> function still hangs and does not return.

Darn.  But I'm still pretty sure that the problem is an undesirable
change of plan, and that we don't have enough information to say
more than that.  Any chance you could extract a self-contained
test case?

                        regards, tom lane
Lowell Hought <lowell.hought@gmail.com> writes:
> I can try.  I am not sure how to go about that.  I did not see on the bug
> report page where I could upload files, and I am afraid the file size of
> the tables needed might be too large for email.

No, uploading stuff to that webform doesn't work.  But at this point
we're just conversing on the pgsql-bugs mailing list, so anything you
can squeeze into email is fine.  Having said that, nobody likes
multi-gigabyte emails.

> The entire database when
> written to an sql dump file is about 20 GB, so not terribly large.  I could
> attempt to dump the schema definition in one file and then the underlying
> tables in another.  Would that work?  Or would you also need the files for
> the function and any views the query relies upon?

Yeah, we'd need all the moving parts.

Usually people with this kind of problem don't want to expose their
data anyway, for privacy and/or legal reasons.  So what I'd suggest
is trying to create some little script that generates fake data
that's close enough to trigger the problem.  Then you just need to
provide that script and the DDL and function definitions.

            regards, tom lane



I will work on that over the next few days.  I really appreciate you and your team!

Lowell


On Sat, Jun 7, 2025 at 10:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Lowell Hought <lowell.hought@gmail.com> writes:
> I can try.  I am not sure how to go about that.  I did not see on the bug
> report page where I could upload files, and I am afraid the file size of
> the tables needed might be too large for email.

No, uploading stuff to that webform doesn't work.  But at this point
we're just conversing on the pgsql-bugs mailing list, so anything you
can squeeze into email is fine.  Having said that, nobody likes
multi-gigabyte emails.

> The entire database when
> written to an sql dump file is about 20 GB, so not terribly large.  I could
> attempt to dump the schema definition in one file and then the underlying
> tables in another.  Would that work?  Or would you also need the files for
> the function and any views the query relies upon?

Yeah, we'd need all the moving parts.

Usually people with this kind of problem don't want to expose their
data anyway, for privacy and/or legal reasons.  So what I'd suggest
is trying to create some little script that generates fake data
that's close enough to trigger the problem.  Then you just need to
provide that script and the DDL and function definitions.

                        regards, tom lane