Thread: Consequence of changes to CTE's in 12
Hi,
I realise this is probably an edge case, but would appreciate some advice or suggestions.
I have a table that has rows to be processed:
postgres=# create table lock_test (id uuid primary key default gen_random_uuid(), lock_id bigint);
CREATE TABLE
postgres=# insert into lock_test (lock_id) values (10),(10),(20),(30),(30),(30);
INSERT 0 6
CREATE TABLE
postgres=# insert into lock_test (lock_id) values (10),(10),(20),(30),(30),(30);
INSERT 0 6
postgres=#* select * from lock_test;
id | lock_id
--------------------------------------+---------
326a2d34-ecec-4c01-94bb-40f43f244d40 | 10
8ed1d680-6304-4fb2-a47c-9427c6d48622 | 10
04482ba1-7193-4e7f-a507-71fe6a351781 | 20
34003468-e959-4c7b-a48c-97195c43982e | 30
2d20394b-c79b-4867-8d0a-72044c370543 | 30
bef7b880-e7a0-4c07-8eab-182c9c1bd33a | 30
(6 rows)
--------------------------------------+---------
326a2d34-ecec-4c01-94bb-40f43f244d40 | 10
8ed1d680-6304-4fb2-a47c-9427c6d48622 | 10
04482ba1-7193-4e7f-a507-71fe6a351781 | 20
34003468-e959-4c7b-a48c-97195c43982e | 30
2d20394b-c79b-4867-8d0a-72044c370543 | 30
bef7b880-e7a0-4c07-8eab-182c9c1bd33a | 30
(6 rows)
My business rule says I need to process rows by lock_id in descending order of the number of rows. In my test data, that would mean rows with a lock_id of 30 would be processed first.
If another 'processor' wakes up while lock_id 30 is being processed, it moves on to lock_id 10, etc.
My pre-12 solution was a view something like this:
postgres=# create or replace view lock_test_v
as
with g as (
select lock_id, count(*) as n_rows
from lock_test
group by lock_id
order by n_rows desc
), l as (
select lock_id
from g
where pg_try_advisory_xact_lock(lock_id)
limit 1)
select t.*
from lock_test as t
join l on t.lock_id = l.lock_id
;
CREATE VIEW
CREATE VIEW
This works fine, and only creates one advisory lock (or zero) when querying the view:
postgres=# begin;
BEGIN
postgres=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
(0 rows)
postgres=#* select * from lock_test_v;
id | lock_id
--------------------------------------+---------
34003468-e959-4c7b-a48c-97195c43982e | 30
2d20394b-c79b-4867-8d0a-72044c370543 | 30
bef7b880-e7a0-4c07-8eab-182c9c1bd33a | 30
(3 rows)
postgres=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
0 | 30
(1 row)
BEGIN
postgres=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
(0 rows)
postgres=#* select * from lock_test_v;
id | lock_id
--------------------------------------+---------
34003468-e959-4c7b-a48c-97195c43982e | 30
2d20394b-c79b-4867-8d0a-72044c370543 | 30
bef7b880-e7a0-4c07-8eab-182c9c1bd33a | 30
(3 rows)
postgres=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
0 | 30
(1 row)
However in 12, the same view returns the same data, but generates multiple advisory locks:
sns_publisher=# begin;
BEGIN
sns_publisher=#* select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 12.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
(1 row)
sns_publisher=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
(0 rows)
sns_publisher=#* select * from lock_test_v;
id | lock_id
--------------------------------------+---------
1a9f3f77-fcdc-4779-8fd9-30f274825e15 | 30
ac670997-9c23-44da-8eb8-e055f02a5f19 | 30
b5f939ac-7c7d-4975-811a-9af26aaa3a31 | 30
(3 rows)
sns_publisher=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
0 | 20
0 | 30
0 | 10
(3 rows)
BEGIN
sns_publisher=#* select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 12.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
(1 row)
sns_publisher=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
(0 rows)
sns_publisher=#* select * from lock_test_v;
id | lock_id
--------------------------------------+---------
1a9f3f77-fcdc-4779-8fd9-30f274825e15 | 30
ac670997-9c23-44da-8eb8-e055f02a5f19 | 30
b5f939ac-7c7d-4975-811a-9af26aaa3a31 | 30
(3 rows)
sns_publisher=#* select classid, objid from pg_locks where locktype = 'advisory' and pid = pg_backend_pid();
classid | objid
---------+-------
0 | 20
0 | 30
0 | 10
(3 rows)
If I use 'as materialized' for my 'g' cte, I get the same outcome as with pre-12 versions.
My 'dilemma' is that this functionality is packaged and the database it is bundled into could be running on a pre-12 version or 12+. Is there any way I can rewrite my view to achieve the same outcome (i.e. only creating 0 or 1 advisory locks) regardless of the server version? I realise I could have two installation scripts but if it is installed into a pre-12 DB and that DB is subsequently upgraded to 12+, my behaviour is broken.
Any suggestions greatly appreciated.
Steve
This functionality seems more a candidate for a set-returning function rather than a view, but I like my views to be side effect free and read only. It would be trivial to implement in plpgsql I believe.
If you move the limit 1 to the first CTE, does it not give you the same behavior in both versions?
On Thu, Feb 11, 2021 at 5:07 PM Steve Baldwin <steve.baldwin@gmail.com> wrote:
My 'dilemma' is that this functionality is packaged and the database it is bundled into could be running on a pre-12 version or 12+. Is there any way I can rewrite my view to achieve the same outcome (i.e. only creating 0 or 1 advisory locks) regardless of the server version? I realise I could have two installation scripts but if it is installed into a pre-12 DB and that DB is subsequently upgraded to 12+, my behaviour is broken.
Pretty sure you will need to choose a location in which to make the installation behave version-dependently. Within PostgreSQL itself that would be most easily done by writing a pl/pgsql function that conditionally adds the MATERIALIZED indicator on the textual representation of the query before executing it. You can hide that function call within a view if desired.
David J.
Michael Lewis <mlewis@entrata.com> writes: > If you move the limit 1 to the first CTE, does it not give you the same > behavior in both versions? Not sure if that's exactly the same, but certainly adding a traditional optimization fence (OFFSET 0) to the first CTE should do the trick. regards, tom lane
Thanks all. The fact that this is a view is not really relevant. I only bundled as a view here to make testing simpler. The underlying query still behaves differently pre-12 and 12+.
Is there a chance that the query optimiser should 'notice' the pg_try_advisory_xact_lock function, and not be so clever when it sees it?
It makes me wonder what other queries we might have that are inadvertently relying on the default materializing behaviour of pre-12.
Steve
On Fri, Feb 12, 2021 at 11:24 AM Michael Lewis <mlewis@entrata.com> wrote:
This functionality seems more a candidate for a set-returning function rather than a view, but I like my views to be side effect free and read only. It would be trivial to implement in plpgsql I believe.If you move the limit 1 to the first CTE, does it not give you the same behavior in both versions?
Steve Baldwin <steve.baldwin@gmail.com> writes: > Is there a chance that the query optimiser should 'notice' the > pg_try_advisory_xact_lock function, and not be so clever when it sees it? The general policy with respect to volatile functions in WHERE quals is "here be dragons". You don't have enough control over when a WHERE clause will be evaluated to be sure about what the semantics will be; and we don't want to tie the optimizer's hands to the extent that would be needed to make it fully predictable. In this particular case, you can make it fairly safe by making sure there are optimization fences both above and below where the WHERE clause is. You have one above from the LIMIT 1, but (with the new interpretation of CTEs) not one below it. Adding a fence -- either OFFSET 0 or LIMIT ALL -- to the first CTE should fix it in a reasonably version-independent fashion. regards, tom lane
Thanks Tom. This optimization fences concept is a new one to me, so great to know about.
This does indeed give me a nice version-independent solution, and make me a very happy camper ;-)
Steve
On Fri, Feb 12, 2021 at 11:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Baldwin <steve.baldwin@gmail.com> writes:
> Is there a chance that the query optimiser should 'notice' the
> pg_try_advisory_xact_lock function, and not be so clever when it sees it?
The general policy with respect to volatile functions in WHERE quals is
"here be dragons". You don't have enough control over when a WHERE clause
will be evaluated to be sure about what the semantics will be; and we
don't want to tie the optimizer's hands to the extent that would be needed
to make it fully predictable.
In this particular case, you can make it fairly safe by making sure there
are optimization fences both above and below where the WHERE clause is.
You have one above from the LIMIT 1, but (with the new interpretation of
CTEs) not one below it. Adding a fence -- either OFFSET 0 or LIMIT ALL --
to the first CTE should fix it in a reasonably version-independent
fashion.
regards, tom lane