Thread: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17812 Logged by: Zhenglong Li Email address: smartkeyerror@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 20.04 Description: Step to reproduce the behavior We need 2 sessions to reproduce this bug. Firstly, we create a simple view that just fetches all the data from table1. And then we start a transaction and lock table1 with AccessExclusive Mode in Read Committed Transaction Isolation Level. After that, we try to use CTAS to create a temp table table2 using the data from table1 in session2, and this will be blocked by AccessExclusive Lock. Finally, we insert some data into table1 in session1 and commit it, session2 will continue, but there is no data in table2. ```sql --------[ Sessions 1 ]-------- show transaction_isolation; transaction_isolation ----------------------- read committed DROP TABLE IF EXISTS table1 CASCADE; CREATE TABLE table1 ( zahl integer, upd_dat timestamp without time zone ); CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1; BEGIN; LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE; TRUNCATE TABLE table1; --------[ Sessions 2 ]-------- show transaction_isolation; transaction_isolation ----------------------- read committed DROP TABLE IF EXISTS table2 CASCADE; CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1; -- this will hang now waiting for a lock form session1 --------[ Sessions 1 ]-------- SELECT clock_timestamp(); INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM generate_series(1,10) a(i); COMMIT; --------[ Sessions 2 ]-------- SELECT * FROM table2 limit 10; zahl | upd_dat ------+--------- (0 rows) ``` But when we used a table, there will have data: ```sql --------[ Sessions 1 ]-------- DROP TABLE IF EXISTS table1 CASCADE; CREATE TABLE table1 ( zahl integer, upd_dat timestamp without time zone ); CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1; BEGIN; LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE; TRUNCATE TABLE table1; --------[ Sessions 2 ]-------- DROP TABLE IF EXISTS table2 CASCADE; CREATE TEMP TABLE table2 AS select zahl,upd_dat from table1; -- this will hang now waiting for a lock form session1 --------[ Sessions 1 ]-------- SELECT clock_timestamp(); INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM generate_series(1,10) a(i); COMMIT; --------[ Sessions 2 ]-------- SELECT * FROM table2 limit 10; zahl | upd_dat ------+---------------------------- 1 | 2023-02-28 03:10:15.743068 2 | 2023-02-28 03:10:15.743196 3 | 2023-02-28 03:10:15.743203 4 | 2023-02-28 03:10:15.743206 5 | 2023-02-28 03:10:15.743208 6 | 2023-02-28 03:10:15.74321 7 | 2023-02-28 03:10:15.743212 8 | 2023-02-28 03:10:15.743214 9 | 2023-02-28 03:10:15.743216 10 | 2023-02-28 03:10:15.743218 (10 rows) ```
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Julien Rouhaud
Date:
Hi, On Tue, Feb 28, 2023 at 03:12:38AM +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17812 > Logged by: Zhenglong Li > Email address: smartkeyerror@gmail.com > PostgreSQL version: 15.2 > Operating system: Ubuntu 20.04 > Description: > > Step to reproduce the behavior > > We need 2 sessions to reproduce this bug. > > Firstly, we create a simple view that just fetches all the data from > table1. > > And then we start a transaction and lock table1 with AccessExclusive Mode in > Read Committed Transaction Isolation Level. > > After that, we try to use CTAS to create a temp table table2 using the data > from table1 in session2, and this will be blocked by AccessExclusive Lock. > > Finally, we insert some data into table1 in session1 and commit it, session2 > will continue, but there is no data in table2. > > ```sql > [...] > TRUNCATE TABLE table1; > [...] This is not a bug, this is a documented corner case. In postgres TRUNCATE is not fully MVCC, see https://www.postgresql.org/docs/current/sql-truncate.html and https://www.postgresql.org/docs/current/mvcc-caveats.html: "TRUNCATE is not MVCC-safe. After truncation, the table will appear empty to concurrent transactions, if they are using a snapshot taken before the truncation occurred". I guess the difference between referencing the table rather than the view is that the query get stuck at execution time rather than planning time, meaning that you do get a snapshot older than the INSERT in the session 1. Change the TRUNCATE with a DELETE and you will get the same behavior for both cases.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Keyerror Smart
Date:
But when I removed the TRUNCATE STATEMENT in session1, session2 will still not have data:
```sql
--------[ Sessions1 ]--------
DROP TABLE IF EXISTS table1 CASCADE;
CREATE TABLE table1 (
zahl integer,
upd_dat timestamp without time zone
);
CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1;
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
--------[ Sessions 2 ]--------
DROP TABLE IF EXISTS table2 CASCADE;
CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1;
-- this will hang now waiting for a lock form session1
--------[ Sessions 1 ]--------
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM generate_series(1,10) a(i);
COMMIT;
--------[ Sessions 2 ]--------
SELECT * FROM table2 limit 10;
zahl | upd_dat
------+---------
(0 rows)
CREATE TABLE table1 (
zahl integer,
upd_dat timestamp without time zone
);
CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1;
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
--------[ Sessions 2 ]--------
DROP TABLE IF EXISTS table2 CASCADE;
CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1;
-- this will hang now waiting for a lock form session1
--------[ Sessions 1 ]--------
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM generate_series(1,10) a(i);
COMMIT;
--------[ Sessions 2 ]--------
SELECT * FROM table2 limit 10;
zahl | upd_dat
------+---------
(0 rows)
```
So I think we can not blame the TRUNCATE is not fully MVCC.
Julien Rouhaud <rjuju123@gmail.com> 于2023年2月28日周二 21:52写道:
Hi,
On Tue, Feb 28, 2023 at 03:12:38AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 17812
> Logged by: Zhenglong Li
> Email address: smartkeyerror@gmail.com
> PostgreSQL version: 15.2
> Operating system: Ubuntu 20.04
> Description:
>
> Step to reproduce the behavior
>
> We need 2 sessions to reproduce this bug.
>
> Firstly, we create a simple view that just fetches all the data from
> table1.
>
> And then we start a transaction and lock table1 with AccessExclusive Mode in
> Read Committed Transaction Isolation Level.
>
> After that, we try to use CTAS to create a temp table table2 using the data
> from table1 in session2, and this will be blocked by AccessExclusive Lock.
>
> Finally, we insert some data into table1 in session1 and commit it, session2
> will continue, but there is no data in table2.
>
> ```sql
> [...]
> TRUNCATE TABLE table1;
> [...]
This is not a bug, this is a documented corner case. In postgres TRUNCATE is
not fully MVCC, see
https://www.postgresql.org/docs/current/sql-truncate.html and
https://www.postgresql.org/docs/current/mvcc-caveats.html:
"TRUNCATE is not MVCC-safe. After truncation, the table will appear empty to
concurrent transactions, if they are using a snapshot taken before the
truncation occurred".
I guess the difference between referencing the table rather than the view is
that the query get stuck at execution time rather than planning time, meaning
that you do get a snapshot older than the INSERT in the session 1. Change the
TRUNCATE with a DELETE and you will get the same behavior for both cases.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Julien Rouhaud
Date:
On Wed, Mar 01, 2023 at 10:01:29AM +0800, Keyerror Smart wrote: > But when I removed the TRUNCATE STATEMENT in session1, session2 will still > not have data: > > ```sql > --------[ Sessions1 ]-------- > DROP TABLE IF EXISTS table1 CASCADE; > > CREATE TABLE table1 ( > zahl integer, > upd_dat timestamp without time zone > ); > > CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1; > > BEGIN; > LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE; > > --------[ Sessions 2 ]-------- > DROP TABLE IF EXISTS table2 CASCADE; > > CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1; > -- this will hang now waiting for a lock form session1 > > --------[ Sessions 1 ]-------- > INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM > generate_series(1,10) a(i); > COMMIT; > > --------[ Sessions 2 ]-------- > SELECT * FROM table2 limit 10; > zahl | upd_dat > ------+--------- > (0 rows) > ``` > > So I think we can not blame the TRUNCATE is not fully MVCC. Right, I was testing with a plain SELECT instead of CREATE TABLE AS. This is however again due to when the lock is actually conflicting, leading to different snapshot acquisition time. For CTAS referencing the view, the lock is conflicting during execution, so it has a snapshot that sees the data as they were before that transaction, as required by ACID rules. The CTAS referencing the table conflicts during parse analysis, so when the lock is finally acquired and the parse analysis is done, it then acquires a new snapshot for execution, which now sees the transaction as committed and thus the newly added rows. I still don't think this is a bug, however I'm not sure if/where those details are documented.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Keyerror Smart
Date:
OK, I get it. Thanks for your explanation.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
"David G. Johnston"
Date:
On Tue, Feb 28, 2023 at 8:02 PM Keyerror Smart <smartkeyerror@gmail.com> wrote:
OK, I get it. Thanks for your explanation.
You may find the following to be informative as well.
In short this seems to fall into "maybe we can do better in the future" - though it seems that future has not yet come to fruition.
To my knowledge this dynamic isn't called out anywhere though it seems almost obvious once pointed out.
There still seems to be something off here:
> unfortunately it causes execution to use a snapshot that has been acquired before locking any of the tables mentioned in the query
I went down the path that the planner should be the one realizing that the view-using query has a lock it cannot grab - on the rewritten query's introduction of the new table - and it should have blocked in the planner, resulting in the acquisition of the execution-time snapshot in that case as well.
Apparently the planner gets to bypass a check (that blocks) that parse analysis has to perform. Thus locking of tables does happen for the ones named in the query text itself?
David J.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes: > You may find the following to be informative as well. A general comment on these examples is that we don't intend MVCC guarantees to hold with respect to tables that are created/dropped by other sessions midway through your transaction. The reason can be understood from this example: Session 1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT * FROM table1; -- establishes session1's snapshot Session 2: DROP TABLE table2; CREATE TABLE table2 ... Session 1: SELECT * FROM table2; Strict MVCC semantics would require that session 1 now see the old contents of table2, which implies basically that we can't ever drop tables --- at the point of the DROP, there was absolutely nothing indicating that session 1 would take any interest in table2, but after the fact it wants to know about it. That's not how it works; if DROP is able to obtain exclusive lock on table2, that table is toast, immediately. (I think that if you use SERIALIZABLE level, you might get a serialization failure from the "SELECT * FROM table2", or if you don't then perhaps it'd be reasonable to fix it so you do. But under no circumstances are we going to sit on the data in table2 on the off chance that some transaction might want it later.) Like Julien, I'm not entirely sure how well this is documented. There is plenty of detail in [1] though. regards, tom lane [1] https://www.postgresql.org/docs/current/mvcc.html
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
"David G. Johnston"
Date:
On Tue, Feb 28, 2023 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> You may find the following to be informative as well.
A general comment on these examples is that we don't intend MVCC
guarantees to hold with respect to tables that are created/dropped
by other sessions midway through your transaction.
That isn't what is in question here though...but also my comment regarding planning seems incomplete...
table1 and view1 already exist, table1 is empty
view1 is select * from table1
Given that Session 1 does:
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
-- sessions 2-5 now issue their commands
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM generate_series(1,10) a(i);
COMMIT;
The command and result from Sessions 2-5 are below:
Session 2:
select * from view1;
10 Rows
Session 3:
select * from table1;
10 Rows
Session 4:
create table tbl2 as select * from view1;
SELECT 0
Session 5:
create table tbl3 as select * from table1;
SELECT 10
Why is it OK for session 4 to be different here?
The argument that it got blocked after getting an execution snapshot that doesn't include the insertion into table1, when all the other cases were apparently blocked somewhere before then, doesn't sit well. In particular, the difference between it (Session 4) and Session 2. That one sent the result rows to a newly created temporary table and the other to the client doesn't seem like it should be affecting/affected-by MVCC.
It is unclear to me whether you were instead talking about other sessions dropping tables as another way of saying "ACCESS EXCLUSIVE" in which case at what lock level should this anomaly go away, and does it? (I haven't checked).
David J.
Sessions 2 and 4:
postgres=# select * from view1;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:33:28.628112
2 | 2023-03-01 04:33:28.628227
3 | 2023-03-01 04:33:28.628229
4 | 2023-03-01 04:33:28.62823
5 | 2023-03-01 04:33:28.62823
6 | 2023-03-01 04:33:28.62823
7 | 2023-03-01 04:33:28.628231
8 | 2023-03-01 04:33:28.628231
9 | 2023-03-01 04:33:28.628232
postgres=# create table tbl2 as select * from view1;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:33:28.628112
2 | 2023-03-01 04:33:28.628227
3 | 2023-03-01 04:33:28.628229
4 | 2023-03-01 04:33:28.62823
5 | 2023-03-01 04:33:28.62823
6 | 2023-03-01 04:33:28.62823
7 | 2023-03-01 04:33:28.628231
8 | 2023-03-01 04:33:28.628231
9 | 2023-03-01 04:33:28.628232
postgres=# create table tbl2 as select * from view1;
SELECT 0
postgres=# select * from tbl2;
zahl | upd_dat
------+---------
(0 rows)
postgres=# select * from tbl2;
zahl | upd_dat
------+---------
(0 rows)
Sessions 3 and 5:
postgres=# select * from table1;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:33:28.628112
2 | 2023-03-01 04:33:28.628227
3 | 2023-03-01 04:33:28.628229
4 | 2023-03-01 04:33:28.62823
5 | 2023-03-01 04:33:28.62823
6 | 2023-03-01 04:33:28.62823
7 | 2023-03-01 04:33:28.628231
8 | 2023-03-01 04:33:28.628231
postgres=# create table tbl3 as select * from table1;
SELECT 10
postgres=# select * from tbl3;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:36:00.762788
2 | 2023-03-01 04:36:00.762911
3 | 2023-03-01 04:36:00.762913
4 | 2023-03-01 04:36:00.762914
5 | 2023-03-01 04:36:00.762915
6 | 2023-03-01 04:36:00.762915
7 | 2023-03-01 04:36:00.762915
8 | 2023-03-01 04:36:00.762916
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:33:28.628112
2 | 2023-03-01 04:33:28.628227
3 | 2023-03-01 04:33:28.628229
4 | 2023-03-01 04:33:28.62823
5 | 2023-03-01 04:33:28.62823
6 | 2023-03-01 04:33:28.62823
7 | 2023-03-01 04:33:28.628231
8 | 2023-03-01 04:33:28.628231
postgres=# create table tbl3 as select * from table1;
SELECT 10
postgres=# select * from tbl3;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:36:00.762788
2 | 2023-03-01 04:36:00.762911
3 | 2023-03-01 04:36:00.762913
4 | 2023-03-01 04:36:00.762914
5 | 2023-03-01 04:36:00.762915
6 | 2023-03-01 04:36:00.762915
7 | 2023-03-01 04:36:00.762915
8 | 2023-03-01 04:36:00.762916
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes: > Session 4: > create table tbl2 as select * from view1; > SELECT 0 > Why is it OK for session 4 to be different here? Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird IIRC, and maybe it's missing a step where we should update the active snapshot. > It is unclear to me whether you were instead talking about other sessions > dropping tables as another way of saying "ACCESS EXCLUSIVE" in which case > at what lock level should this anomaly go away, and does it? The originally-proposed tests seemed to all involve either TRUNCATE or DROP TABLE, which are outside what I consider to be our MVCC guarantees. Your example here does seem a bit strange though. Views generally ought not have different semantics from writing out the view query in-line. regards, tom lane
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Julien Rouhaud
Date:
On Wed, Mar 01, 2023 at 01:53:22AM -0500, Tom Lane wrote: > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > Session 4: > > create table tbl2 as select * from view1; > > SELECT 0 > > > Why is it OK for session 4 to be different here? > > Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird > IIRC, and maybe it's missing a step where we should update the > active snapshot. I think it comes from this chunk in ExecCreateTableAs(): rewritten = QueryRewrite(query); /* SELECT should never rewrite to more or less than one SELECT query */ if (list_length(rewritten) != 1) elog(ERROR, "unexpected rewrite result for %s", is_matview ? "CREATE MATERIALIZED VIEW" : "CREATE TABLE AS SELECT"); query = linitial_node(Query, rewritten); Assert(query->commandType == CMD_SELECT); /* plan the query */ plan = pg_plan_query(query, pstate->p_sourcetext, CURSOR_OPT_PARALLEL_OK, params); /* * Use a snapshot with an updated command ID to ensure this query sees * results of any previously executed queries. (This could only * matter if the planner executed an allegedly-stable function that * changed the database contents, but let's do it anyway to be * parallel to the EXPLAIN code path.) */ PushCopiedSnapshot(GetActiveSnapshot()); UpdateActiveSnapshotCommandId(); /* Create a QueryDesc, redirecting output to our tuple receiver */ queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext, GetActiveSnapshot(), InvalidSnapshot, dest, params, queryEnv, 0); It seems to clearly be in contradiction with our usual rule that planning and execution should have a different snapshot.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
"David G. Johnston"
Date:
On Tuesday, February 28, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> It is unclear to me whether you were instead talking about other sessions
> dropping tables as another way of saying "ACCESS EXCLUSIVE" in which case
> at what lock level should this anomaly go away, and does it?
The originally-proposed tests seemed to all involve either TRUNCATE
or DROP TABLE,
The only purpose of those was to produce a self-contained script that could be run repeatedly. The commands were outside the transaction flow demonstrating the issue.
David J.
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes: > On Wed, Mar 01, 2023 at 01:53:22AM -0500, Tom Lane wrote: >> Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird >> IIRC, and maybe it's missing a step where we should update the >> active snapshot. > I think it comes from this chunk in ExecCreateTableAs(): Hm. There are a lot of places that do this: > PushCopiedSnapshot(GetActiveSnapshot()); > UpdateActiveSnapshotCommandId(); rather than PushActiveSnapshot(GetTransactionSnapshot()); which is what would have the effect of noticing changes from other sessions. Digging into the history of that, I found this commit: Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL9_1_BR [c0b007603] 2011-02-28 23:28:06 -0500 Rearrange snapshot handling to make rule expansion more consistent. With this patch, portals, SQL functions, and SPI all agree that there should be only a CommandCounterIncrement between the queries that are generated from a single SQL command by rule expansion. Fetching a whole new snapshot now happens only between original queries. This is equivalent to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the best choice since it eliminates one source of concurrency hazards for rules. The patch should also make things marginally faster by reducing the number of snapshot push/pop operations. So I guess it's intended behavior that we don't notice other-session changes between steps of a single command. Whether that rule should apply to CREATE TABLE AS is perhaps debatable --- but I see several other places that are doing it exactly like this, so it's not like CREATE TABLE AS is alone in its folly. I'm pretty hesitant to change this without substantial thought. regards, tom lane
Re: BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
From
"David G. Johnston"
Date:
On Wed, Mar 1, 2023 at 8:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Wed, Mar 01, 2023 at 01:53:22AM -0500, Tom Lane wrote:
>> Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird
>> IIRC, and maybe it's missing a step where we should update the
>> active snapshot.
> I think it comes from this chunk in ExecCreateTableAs():
Hm. There are a lot of places that do this:
> PushCopiedSnapshot(GetActiveSnapshot());
> UpdateActiveSnapshotCommandId();
rather than
PushActiveSnapshot(GetTransactionSnapshot());
which is what would have the effect of noticing changes from other
sessions. Digging into the history of that, I found this commit:
Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_1_BR [c0b007603] 2011-02-28 23:28:06 -0500
Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion.
[...]
So I guess it's intended behavior that we don't notice other-session
changes between steps of a single command. Whether that rule should
apply to CREATE TABLE AS is perhaps debatable --- but I see several
other places that are doing it exactly like this, so it's not like
CREATE TABLE AS is alone in its folly.
I'm pretty hesitant to change this without substantial thought.
So both Sessions 4 (view) and 5 (table) make it into tcop/postgres.c and into parse analysis.
The view session gets past there because it is a utility command whose body is parsable. postgres.c tcop invokes the executor (with a new snapshot but still of the empty table) for the utility command CTAS passing the parsable query along. CTAS performs rewrite and blocks there while holding the empty table snapshot, and when it continues on to execution CTAS doesn't pull forward the snapshot.
The table session fails to get past parse analysis while the lock is held and once it does postgres.c tcop assigns a new snapshot which does contain the data in table1, then invokes the CTAS utility.
To get parity in behavior either CTAS would need to assign a new snapshot OR postgres.c tcop would need to perform rewriting on the query under the CTAS prior to beginning execution of CTAS, since that rewriting would then block the process before the new snapshot is assigned. (That is all in a mostly questioning, not authoritative, tone
The non-CTAS view query gets blocked on rewrite while the non-CTAS table query gets blocked in parse analysis. When unblocked, postgres.c tcop assigns both a new snapshot just prior to execution.
I experimented with putting the create temp table into a function body then calling the function - same results.
I agree this does seem like a poor risk/reward on the fixing side, especially absent a concrete live use case problem. I am curious what led to this discovery.
In terms of documentation; "query began" is the terminology we use but as demonstrated here, from an end-user's perspective, and in the presence of locking, the point at which the query began seems undefined. If it was when I send the statement to the server then none of the four sessions should be producing results, instead I get three that seem to violate the read-committed rule. Which moves expectations to "begins retrieving rows" in which case the CTAS+View combination, getting stuck in between, when the CTAS+table one doesn't, indeed seems odd and maybe should at least be addressed, even in general terms.
David J.