Thread: SERIALIZABLE with parallel query
Hi hackers, Currently we don't generate parallel plans in SERIALIZABLE. What problems need to be solved to be able to do that? I'm probably steamrolling over a ton of subtleties and assumptions here, but it occurred to me that a first step might be something like this: 1. Hand the leader's SERIALIZABLEXACT to workers. 2. Make sure that workers don't release predicate locks. 3. Make sure that the leader doesn't release predicate locks until after the workers have finished accessing the leader's SERIALIZABLEXACT. I think this is already the case. See attached 5 minute hack. Need to audit predicate.c for cases where MySerializableXact might be modified without suitable locking, and probably sprinkle assertions all over the place that workers don't reach certain places etc. I wonder what horrible things might happen as a result of workers running with a SERIALIZABLEXACT that contains the leader's vxid and other such things. I'd love to figure all this out in time for one of the later CFs in this cycle. Any thoughts? -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Nov 8, 2016 at 1:51 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Currently we don't generate parallel plans in SERIALIZABLE. What > problems need to be solved to be able to do that? FWIW, parallel CREATE INDEX works at SERIALIZABLE isolation level by specially asking the parallel infrastructure to not care. I think that this works fine, given the limited scope of the problem, but it would be nice to have that confirmed. -- Peter Geoghegan
On Wed, Nov 9, 2016 at 10:51 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Need to audit predicate.c for cases where > MySerializableXact might be modified without suitable locking, The only thing I see along those lines is that CheckForSerializableConflictOut() and CheckForSerializableConflictIn() access SxactIsDoomed(MySerializableXact) without any locking, but if that's OK in the non-parallel case it should also be OK in a worker. I guess this is an opportunistic early error path that doesn't mind seeing data from the past without worrying about cache coherency, on the basis that it will be checked again in PreCommit_CheckForSerializationFailure(). > I wonder what horrible things might happen > as a result of workers running with a SERIALIZABLEXACT that contains > the leader's vxid and other such things. What is the consequence of that vxid? What other complications could be involved here? Here's a rebased patch that updates the documentation and adds a test cast to show a serialization failure being detected when one of the queries runs entirely in a parallel worker. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Nov 9, 2016 at 12:34 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Nov 8, 2016 at 1:51 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Currently we don't generate parallel plans in SERIALIZABLE. What >> problems need to be solved to be able to do that? > > FWIW, parallel CREATE INDEX works at SERIALIZABLE isolation level by > specially asking the parallel infrastructure to not care. I think that > this works fine, given the limited scope of the problem, but it would > be nice to have that confirmed. I don't see any problem with it, but it'd be nicer to get rid of the restriction so your change isn't needed. -- Thomas Munro http://www.enterprisedb.com
On Tue, Nov 8, 2016 at 4:51 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Currently we don't generate parallel plans in SERIALIZABLE. What > problems need to be solved to be able to do that? I'm probably > steamrolling over a ton of subtleties and assumptions here, but it > occurred to me that a first step might be something like this: > > 1. Hand the leader's SERIALIZABLEXACT to workers. > 2. Make sure that workers don't release predicate locks. > 3. Make sure that the leader doesn't release predicate locks until > after the workers have finished accessing the leader's > SERIALIZABLEXACT. I think this is already the case. What happens if the workers exit at the end of the query and the leader then goes on and executes more queries? Will the worker-acquired predicate locks be retained or will they go away when the leader exits? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 16, 2017 at 2:58 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Nov 8, 2016 at 4:51 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Currently we don't generate parallel plans in SERIALIZABLE. What >> problems need to be solved to be able to do that? I'm probably >> steamrolling over a ton of subtleties and assumptions here, but it >> occurred to me that a first step might be something like this: >> >> 1. Hand the leader's SERIALIZABLEXACT to workers. >> 2. Make sure that workers don't release predicate locks. >> 3. Make sure that the leader doesn't release predicate locks until >> after the workers have finished accessing the leader's >> SERIALIZABLEXACT. I think this is already the case. > > What happens if the workers exit at the end of the query and the > leader then goes on and executes more queries? Will the > worker-acquired predicate locks be retained or will they go away when > the leader exits? All predicate locks last at least until ReleasePredicateLocks() run after ProcReleaseLocks(), and sometimes longer. Although ReleasePredicateLocks() runs in workers too, this patch makes it return without doing anything. I suppose someone could say that ReleasePredicateLocks() should at least run hash_destroy(LocalPredicateLockHash) and set LocalPredicateLockHash to NULL in workers. This sort of thing could be important if we start reusing worker processes. I'll do that in the next version. The predicate locks themselves consist of state in shared memory, and those created by workers are indistinguishable from those created by the leader process. Having multiple workers and the leader all creating predicate locks linked to the same SERIALIZABLEXACT is *almost* OK, because most relevant shmem state is protected by locks already in all paths (with the exception of the DOOMED flag already mentioned, which seems to follow a "notice me as soon as possible" philosophy, to avoid putting locking into the CheckForSerializableConflict(In|Out) paths, with a definitive check at commit time). But... there is a problem with the management of the linked list of predicate locks held by a transactions. The locks themselves are covered by partition locks, but the links are not, and that previous patch broke the assumption that they could never be changed by another process. Specifically, DeleteChildTargetLocks() assumes it can walk MySerializableXact->predicateLocks and throw away locks that are covered by a new lock (ie throw away tuple locks because a covering page lock has been acquired) without let or hindrance until it needs to modify the locks themselves. That assumption doesn't hold up with that last patch and will require a new kind of mutual exclusion. I wonder if the solution is to introduce an LWLock into each SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent workers from stepping on each others' toes during lock cleanup. An alternative would be to start taking SerializablePredicateLockListLock in exclusive rather than shared mode, but that seems unnecessarily course. I have a patch that implements the above but I'm still figuring out how to test it, and I'll need to do a bit more poking around for other similar assumptions before I post a new version. I tried to find any way that LocalPredicateLockHash could create problems, but it's effectively a cache with false-negatives-but-never-false-positives semantics. In cache-miss scenarios it we look in shmem data structures and are prepared to find that our SERIALIZABLEXACT already has the predicate lock even though there was a cache miss in LocalPredicateLockHash. That works because our SERIALIZABLEXACT's address is part of the tag, and it's stable across backends. Random observation: The global variable MyXactDidWrite would probably need to move into shared memory when parallel workers eventually learn to write. -- Thomas Munro http://www.enterprisedb.com
On Thu, Feb 16, 2017 at 6:19 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Specifically, DeleteChildTargetLocks() assumes it can walk > MySerializableXact->predicateLocks and throw away locks that are > covered by a new lock (ie throw away tuple locks because a covering > page lock has been acquired) without let or hindrance until it needs > to modify the locks themselves. That assumption doesn't hold up with > that last patch and will require a new kind of mutual exclusion. I > wonder if the solution is to introduce an LWLock into each > SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent > workers from stepping on each others' toes during lock cleanup. An > alternative would be to start taking SerializablePredicateLockListLock > in exclusive rather than shared mode, but that seems unnecessarily > coarse. Here is a patch to do that, for discussion. It adds an LWLock to each SERIALIZABLEXACT, and acquires it after SerializablePredicateListLock and before any predicate lock partition lock. It doesn't bother with that if not in parallel mode, or in the cases where SerializablePredicateListLock is held exclusively. This prevents parallel query workers and leader from stepping on each others' toes when manipulating the predicate list. The case in CheckTargetForConflictsIn is theoretical for now since we don't support writing in parallel query yet. The case in CreatePredicateLock is reachable by running a simple parallel sequential scan. The case in DeleteChildTargetLocks is for when we've acquired a new predicate lock that covers finer grained locks which can be dropped; that is reachable the same way again. I don't think it's required in ReleaseOneSerializableXact since it was already called in several places with an sxact other than the caller's, and deals with finished transactions. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Feb 21, 2017 at 12:55 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Feb 16, 2017 at 6:19 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Specifically, DeleteChildTargetLocks() assumes it can walk >> MySerializableXact->predicateLocks and throw away locks that are >> covered by a new lock (ie throw away tuple locks because a covering >> page lock has been acquired) without let or hindrance until it needs >> to modify the locks themselves. That assumption doesn't hold up with >> that last patch and will require a new kind of mutual exclusion. I >> wonder if the solution is to introduce an LWLock into each >> SERIALIZABLEXACT object, so DeleteChildTargetLocks() can prevent >> workers from stepping on each others' toes during lock cleanup. An >> alternative would be to start taking SerializablePredicateLockListLock >> in exclusive rather than shared mode, but that seems unnecessarily >> coarse. > > Here is a patch to do that, for discussion. It adds an LWLock to each > SERIALIZABLEXACT, and acquires it after SerializablePredicateListLock > and before any predicate lock partition lock. It doesn't bother with > that if not in parallel mode, or in the cases where > SerializablePredicateListLock is held exclusively. This prevents > parallel query workers and leader from stepping on each others' toes > when manipulating the predicate list. > > The case in CheckTargetForConflictsIn is theoretical for now since we > don't support writing in parallel query yet. The case in > CreatePredicateLock is reachable by running a simple parallel > sequential scan. The case in DeleteChildTargetLocks is for when we've > acquired a new predicate lock that covers finer grained locks which > can be dropped; that is reachable the same way again. I don't think > it's required in ReleaseOneSerializableXact since it was already > called in several places with an sxact other than the caller's, and > deals with finished transactions. I don't think I know enough about the serializable code to review this, or at least not quickly, but it seems very cool if it works. Have you checked what effect it has on shared memory consumption? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Feb 22, 2017 at 2:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I don't think I know enough about the serializable code to review > this, or at least not quickly, but it seems very cool if it works. > Have you checked what effect it has on shared memory consumption? I'm not sure how to test that. Kevin, can you provide any pointers to the test workloads you guys used when developing SSI? In theory shmem usage shouldn't change, since the predicate locks are shared by the cooperating backends. It might be able to use a bit more by creating finer grain locks in worker A that are already covered by coarse grained locks acquired by worker B or something like that, but it seems unlikely if they tend to scan disjoint sets of pages. Here is a rebased patch. I should explain the included isolation test. It's almost the same as the SERIALIZABLE variant that I submitted for https://commitfest.postgresql.org/13/949/. That is a useful test here because it's a serialisation anomaly that involves a read-only query. In this version we run that query (s3r) in a parallel worker, and the query plan comes out like this: Gather (cost=1013.67..1013.87 rows=2 width=64) Workers Planned: 1 Single Copy: true -> Sort (cost=13.67..13.67 rows=2 width=64) Sort Key: id -> Bitmap Heap Scan on bank_account (cost=8.32..13.66 rows=2 width=64) Recheck Cond: (id = ANY ('{X,Y}'::text[])) -> Bitmap Index Scan on bank_account_pkey (cost=0.00..8.32 rows=2 width=0) Index Cond: (id = ANY ('{X,Y}'::text[])) A dangerous cycle is detected, confirming that reads done by the worker participate correctly in predicate locking and conflict detection: step s2wx: UPDATE bank_account SET balance = -11 WHERE id = 'X'; ERROR: could not serialize access due to read/write dependencies among transactions It's probably too late for this WIP patch to get the kind of review and testing it needs for PostgreSQL 10. That's OK, but think it might be a strategically good idea to get parallel SSI support (whether with this or some other approach) into the tree before people start showing up with parallel write patches. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, On 2017-03-11 15:19:23 +1300, Thomas Munro wrote: > Here is a rebased patch. It seems that this patch is still undergoing development, review and performance evaluation. Therefore it seems like it'd be a bad idea to try to get this into v10. Any arguments against moving this to the next CF? - Andres
On Tue, Apr 4, 2017 at 6:41 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2017-03-11 15:19:23 +1300, Thomas Munro wrote: >> Here is a rebased patch. > > It seems that this patch is still undergoing development, review and > performance evaluation. Therefore it seems like it'd be a bad idea to > try to get this into v10. Any arguments against moving this to the next > CF? None, and done. It would be good to get some feedback from Kevin on whether this is a reasonable approach, but considering that these data structures may finish up being redesigned as part of the GSoC project[1], it may be best to wait and see where that goes before doing anything. I'll follow developments there, and if this patch remains relevant I'll plan to do some more work on it including testing (possibly with the RUBiS benchmark from Kevin and Dan's paper since it seems the most likely to be able to really use parallelism) for PG11 CF1. [1] https://wiki.postgresql.org/wiki/GSoC_2017#Eliminate_O.28N.5E2.29_scaling_from_rw-conflict_tracking_in_serializable_transactions -- Thomas Munro http://www.enterprisedb.com
On Fri, Mar 10, 2017 at 8:19 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Feb 22, 2017 at 2:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I don't think I know enough about the serializable code to review >> this, or at least not quickly, but it seems very cool if it works. >> Have you checked what effect it has on shared memory consumption? > > I'm not sure how to test that. Kevin, can you provide any pointers to > the test workloads you guys used when developing SSI? During development there was first and foremost the set of tests which wound up implemented in the isolation testing environment developed by Heikki, although for most of the development cycle these tests were run by a python tool written by Markus Wanner (which was not as fast as Heikki's C-based tool, but provided a lot more detail in case of failure -- so it was very nice to have until we got down to polishing things). The final stress test to chase out race conditions and the performance benchmarks were all run by Dan Ports on big test machines at MIT. I'm not sure how much I can say to elaborate on what is in section 8 of this paper: http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf I seem to remember that he had some saturation run versions of the "DBT-2++" tests, modified to monitor for serialization anomalies missed by the implementation, which he sometimes ran for days at a time on MIT testing machines. There were some very narrow race conditions uncovered by this testing, which at high concurrency on a 16 core machine might hit a particular problem less often than once per day. I also remember that both Anssi Kääriäinen and Yamamoto Takahashi did a lot of valuable testing of the patch and found problems that we had missed. Perhaps they can describe their testing or make other suggestions. -- Kevin Grittner
On Tue, Apr 4, 2017 at 8:25 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > ... but considering that these data structures may > finish up being redesigned as part of the GSoC project[1], it may be > best to wait and see where that goes before doing anything. I'll > follow developments there, and if this patch remains relevant I'll > plan to do some more work on it including testing (possibly with the > RUBiS benchmark from Kevin and Dan's paper since it seems the most > likely to be able to really use parallelism) for PG11 CF1. I've been keeping one eye on the GSoC project. That patch changes the inConflicts and outConflicts data structures, but not the locking protocol. This patch works by introducing per-SERIALIZABLEXACT locking in the places where the code currently assumes that the current backend is the only one that could modify a shared data structure (namely MySerializableXact->predicateLocks), so that MySerializableXact can be shared with workers. There doesn't seem to be any incompatibility or dependency so far, so here's a rebased patch. Testing needed. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > [ssi-parallel-v5.patch] Rebased. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Sep 1, 2017 at 5:11 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> [ssi-parallel-v5.patch] > > Rebased. Rebased again. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Sep 19, 2017 at 11:42 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Fri, Sep 1, 2017 at 5:11 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> [ssi-parallel-v5.patch]
>
> Rebased.
Rebased again.
During testing of this patch, I found some behavior difference
with the support of parallel query, while experimenting with the provided
test case in the patch.
But I tested the V6 patch, and I don't think that this version contains
any fixes other than rebase.
Test steps:
CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
Session -1:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM bank_account WHERE id = 'Y';
Session -2:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET max_parallel_workers_per_gather = 2;
SET force_parallel_mode = on;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = 0;
set enable_indexscan = off;
set enable_bitmapscan = off;
SELECT balance FROM bank_account WHERE id = 'X';
Session -1:
update bank_account set balance = 10 where id = 'X';
Session -2:
update bank_account set balance = 10 where id = 'Y';
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.
Without the parallel query of select statement in session-2,
the update statement in session-2 is passed.
Regards,
Hari Babu
Fujitsu Australia
On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > During testing of this patch, I found some behavior difference > with the support of parallel query, while experimenting with the provided > test case in the patch. > > But I tested the V6 patch, and I don't think that this version contains > any fixes other than rebase. > > Test steps: > > CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL); > INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0); > > Session -1: > > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SELECT balance FROM bank_account WHERE id = 'Y'; > > Session -2: > > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; > SET max_parallel_workers_per_gather = 2; > SET force_parallel_mode = on; > set parallel_setup_cost = 0; > set parallel_tuple_cost = 0; > set min_parallel_table_scan_size = 0; > set enable_indexscan = off; > set enable_bitmapscan = off; > > SELECT balance FROM bank_account WHERE id = 'X'; > > Session -1: > > update bank_account set balance = 10 where id = 'X'; > > Session -2: > > update bank_account set balance = 10 where id = 'Y'; > ERROR: could not serialize access due to read/write dependencies among > transactions > DETAIL: Reason code: Canceled on identification as a pivot, during write. > HINT: The transaction might succeed if retried. > > Without the parallel query of select statement in session-2, > the update statement in session-2 is passed. Hi Haribabu, Thanks for looking at this! Yeah. The difference seems to be that session 2 chooses a Parallel Seq Scan instead of an Index Scan when you flip all those GUCs into parallelism-is-free mode. Seq Scan takes a table-level predicate lock (see heap_beginscan_internal()). But if you continue your example in non-parallel mode (patched or unpatched), you'll find that only one of those transactions can commit successfully. Using the fancy notation in the papers about this stuff where w1[x=42] means "write by transaction 1 on object x with value 42", let's see if there is an apparent sequential order of these transactions that makes sense: Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ... Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense) Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense) Both potential commit orders are nonsensical. I think what happened in your example was that a Seq Scan allowed the SSI algorithm to reject a transaction sooner. Instead of r2[X=0], the executor sees r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in this case X and Y, even though we only asked to read X). Then the SSI algorithm is able to detect a "dangerous structure" at w2[Y=10], instead of later at commit time. So I don't think this indicates a problem with the patch. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 21, 2017 at 4:13 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> During testing of this patch, I found some behavior difference
> with the support of parallel query, while experimenting with the provided
> test case in the patch.
>
> But I tested the V6 patch, and I don't think that this version contains
> any fixes other than rebase.
>
> Test steps:
>
> CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
> INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
>
> Session -1:
>
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT balance FROM bank_account WHERE id = 'Y';
>
> Session -2:
>
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SET max_parallel_workers_per_gather = 2;
> SET force_parallel_mode = on;
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
> set min_parallel_table_scan_size = 0;
> set enable_indexscan = off;
> set enable_bitmapscan = off;
>
> SELECT balance FROM bank_account WHERE id = 'X';
>
> Session -1:
>
> update bank_account set balance = 10 where id = 'X';
>
> Session -2:
>
> update bank_account set balance = 10 where id = 'Y';
> ERROR: could not serialize access due to read/write dependencies among
> transactions
> DETAIL: Reason code: Canceled on identification as a pivot, during write.
> HINT: The transaction might succeed if retried.
>
> Without the parallel query of select statement in session-2,
> the update statement in session-2 is passed.
Hi Thomas,
Yeah. The difference seems to be that session 2 chooses a Parallel
Seq Scan instead of an Index Scan when you flip all those GUCs into
parallelism-is-free mode. Seq Scan takes a table-level predicate lock
(see heap_beginscan_internal()). But if you continue your example in
non-parallel mode (patched or unpatched), you'll find that only one of
those transactions can commit successfully.
Yes, That's correct. Only one commit can be successful.
Using the fancy notation in the papers about this stuff where w1[x=42]
means "write by transaction 1 on object x with value 42", let's see if
there is an apparent sequential order of these transactions that makes
sense:
Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ...
Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense)
Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense)
Both potential commit orders are nonsensical. I think what happened
in your example was that a Seq Scan allowed the SSI algorithm to
reject a transaction sooner. Instead of r2[X=0], the executor sees
r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in
this case X and Y, even though we only asked to read X). Then the SSI
algorithm is able to detect a "dangerous structure" at w2[Y=10],
instead of later at commit time.
Thanks for explaining with more details, now I can understand some more
about serialization.
After I tune the GUC to go with sequence scan, still I am not getting the error
in the session-2 for update operation like it used to generate an error for parallel
sequential scan, and also it even takes some many commands until unless the S1
commits.
I am just thinking that with parallel sequential scan with serialize isolation,
the user has lost the control of committing the desired session. I may be thinking
a rare and never happen scenario.
I will continue my review on the latest patch and share any updates.
Regards,
Hari Babu
Fujitsu Australia
On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > After I tune the GUC to go with sequence scan, still I am not getting the > error > in the session-2 for update operation like it used to generate an error for > parallel > sequential scan, and also it even takes some many commands until unless the > S1 > commits. Hmm. Then this requires more explanation because I don't expect a difference. I did some digging and realised that the error detail message "Reason code: Canceled on identification as a pivot, during write." was reached in a code path that requires SxactIsPrepared(writer) and also MySerializableXact == writer, which means that the process believes it is committing. Clearly something is wrong. After some more digging I realised that ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls CommitTransaction() which calls PreCommit_CheckForSerializationFailure(). Since the worker is connected to the leader's SERIALIZABLEXACT, that finishes up being marked as preparing to commit (not true!), and then the leader get confused during that write, causing a serialization failure to be raised sooner (though I can't explain why it should be raised then anyway, but that's another topic). Oops. I think the fix here is just not to do that in a worker (the worker's CommitTransaction() doesn't really mean what it says). Here's a version with a change that makes that conditional. This way your test case behaves the same as non-parallel mode. > I will continue my review on the latest patch and share any updates. Thanks! -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Sep 25, 2017 at 6:57 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> After I tune the GUC to go with sequence scan, still I am not getting the
> error
> in the session-2 for update operation like it used to generate an error for
> parallel
> sequential scan, and also it even takes some many commands until unless the
> S1
> commits.
Hmm. Then this requires more explanation because I don't expect a
difference. I did some digging and realised that the error detail
message "Reason code: Canceled on identification as a pivot, during
write." was reached in a code path that requires
SxactIsPrepared(writer) and also MySerializableXact == writer, which
means that the process believes it is committing. Clearly something
is wrong. After some more digging I realised that
ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
CommitTransaction() which calls
PreCommit_CheckForSerializationFailure() . Since the worker is
connected to the leader's SERIALIZABLEXACT, that finishes up being
marked as preparing to commit (not true!), and then the leader get
confused during that write, causing a serialization failure to be
raised sooner (though I can't explain why it should be raised then
anyway, but that's another topic). Oops. I think the fix here is
just not to do that in a worker (the worker's CommitTransaction()
doesn't really mean what it says).
Here's a version with a change that makes that conditional. This way
your test case behaves the same as non-parallel mode.
With the latest patch, I didn't find any problems.
> I will continue my review on the latest patch and share any updates.
Thanks!
The patch looks good, and I don't have any comments for the code.
The test that is going to add by the patch is not generating a true
parallelism scenario, I feel it is better to change the test that can
generate a parallel sequence/index/bitmap scan.
Regards,
Hari Babu
Fujitsu Australia
On Tue, Sep 26, 2017 at 4:41 PM, Haribabu Kommi
wrote:
>
>
> On Mon, Sep 25, 2017 at 6:57 PM, Thomas Munro <
> thomas.munro@enterprisedb.com> wrote:
>
>> On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
>> wrote:
>> > After I tune the GUC to go with sequence scan, still I am not getting
>> the
>> > error
>> > in the session-2 for update operation like it used to generate an error
>> for
>> > parallel
>> > sequential scan, and also it even takes some many commands until unless
>> the
>> > S1
>> > commits.
>>
>> Hmm. Then this requires more explanation because I don't expect a
>> difference. I did some digging and realised that the error detail
>> message "Reason code: Canceled on identification as a pivot, during
>> write." was reached in a code path that requires
>> SxactIsPrepared(writer) and also MySerializableXact == writer, which
>> means that the process believes it is committing. Clearly something
>> is wrong. After some more digging I realised that
>> ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
>> CommitTransaction() which calls
>> PreCommit_CheckForSerializationFailure(). Since the worker is
>> connected to the leader's SERIALIZABLEXACT, that finishes up being
>> marked as preparing to commit (not true!), and then the leader get
>> confused during that write, causing a serialization failure to be
>> raised sooner (though I can't explain why it should be raised then
>> anyway, but that's another topic). Oops. I think the fix here is
>> just not to do that in a worker (the worker's CommitTransaction()
>> doesn't really mean what it says).
>>
>> Here's a version with a change that makes that conditional. This way
>> your test case behaves the same as non-parallel mode.
>>
>
> The patch looks good, and I don't have any comments for the code.
> The test that is going to add by the patch is not generating a true
> parallelism scenario, I feel it is better to change the test that can
> generate a parallel sequence/index/bitmap scan.
>
The latest patch is good. It lacks a test that verifies the serialize
support with actual parallel workers, so in case if it broken, it is
difficult
to know.
Regards,
Hari Babu
Fujitsu Australia
On Fri, Nov 24, 2017 at 1:06 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > The latest patch is good. It lacks a test that verifies the serialize > support with actual parallel workers, so in case if it broken, it is > difficult to know. Could this question be answered? The patch still applies so I am moving it to next CF. -- Michael
On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Nov 24, 2017 at 1:06 PM, Haribabu Kommi > <kommi.haribabu@gmail.com> wrote: >> The latest patch is good. It lacks a test that verifies the serialize >> support with actual parallel workers, so in case if it broken, it is >> difficult to know. > > Could this question be answered? The patch still applies so I am > moving it to next CF. Thanks. The answer is: It does run queries in two different backends, proving that different backends associated with the same session are correctly detecting conflicts and enabling the SSI algorithm to work. But yeah, Haribabu is right that it doesn't ever cause them to run simultaneously in a way that would cause the new locking to contend (or break if the locking code is incorrect). I have been unable to think of a good way to do that in a regression or isolation test so far. -- Thomas Munro http://www.enterprisedb.com
On Thu, Nov 30, 2017 at 2:44 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Could this question be answered? The patch still applies so I am >> moving it to next CF. Rebased, 'cause it broke. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Fri, Dec 8, 2017 at 11:33 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Thu, Nov 30, 2017 at 2:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Could this question be answered? The patch still applies so I am
>> moving it to next CF.
Rebased, 'cause it broke.
Thanks for explaining the problem in generating an isolation test to
test the serialize parallel query.
Committer can decide whether existing test is fine to part of the test suite
or remove it, other than that everything is fine. so I am moving the patch
into "ready for committer" state.
Regards,
Hari Babu
Fujitsu Australia
On Wed, Dec 13, 2017 at 2:09 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > Thanks for explaining the problem in generating an isolation test to > test the serialize parallel query. > > Committer can decide whether existing test is fine to part of the test suite > or remove it, other than that everything is fine. so I am moving the patch > into "ready for committer" state. Thank you! I will try to find a good benchmark that will really exercise parallel query + SSI. -- Thomas Munro http://www.enterprisedb.com
On Wed, Dec 13, 2017 at 5:30 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Dec 13, 2017 at 2:09 PM, Haribabu Kommi > <kommi.haribabu@gmail.com> wrote: >> Thanks for explaining the problem in generating an isolation test to >> test the serialize parallel query. >> >> Committer can decide whether existing test is fine to part of the test suite >> or remove it, other than that everything is fine. so I am moving the patch >> into "ready for committer" state. > > Thank you! I will try to find a good benchmark that will really > exercise parallel query + SSI. This started crashing some time yesterday with an assertion failure in the isolation tests after commit 2badb5af landed. Reordering of code in parallel.c confused patch's fuzz heuristics leading SetSerializableXact() to be called too soon. Here's a fix for that. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Jan 24, 2018 at 7:39 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > This started crashing some time yesterday with an assertion failure in > the isolation tests after commit 2badb5af landed. Reordering of code > in parallel.c confused patch's fuzz heuristics leading > SetSerializableXact() to be called too soon. Here's a fix for that. I took a look at this today and thought it might be OK to commit, modulo a few minor issues: (1) you didn't document the new tranche and (2) I prefer to avoid if (blah) { Assert(thing) } in favor of Assert(!blah || thing). But then I got a little bit concerned about ReleasePredicateLocks(). Suppose that in the middle of a read-only transaction, SXACT_FLAG_RO_SAFE becomes true. The next call to SerializationNeededForRead in each process will call ReleasePredicateLocks. In the workers, this won't do anything, so we'll just keep coming back there. But in the leader, we'll go ahead do all that stuff, including MySerializableXact = InvalidSerializableXact. But in the workers, it's still set. Maybe that's OK, but I'm not sure that it's OK... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 26, 2018 at 4:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I took a look at this today and thought it might be OK to commit, Thank you for looking at this! > modulo a few minor issues: (1) you didn't document the new tranche and Fixed. > (2) I prefer to avoid if (blah) { Assert(thing) } in favor of > Assert(!blah || thing). Done. > But then I got a little bit concerned about ReleasePredicateLocks(). > Suppose that in the middle of a read-only transaction, > SXACT_FLAG_RO_SAFE becomes true. The next call to > SerializationNeededForRead in each process will call > ReleasePredicateLocks. In the workers, this won't do anything, so > we'll just keep coming back there. But in the leader, we'll go ahead > do all that stuff, including MySerializableXact = > InvalidSerializableXact. But in the workers, it's still set. Maybe > that's OK, but I'm not sure that it's OK... Ouch. Yeah. It's not OK. If the leader gives up its SERIALIZABLEXACT object early due to that safe-read-only optimisation, the workers are left with a dangling pointer to a SERIALIZABLEXACT object that has been pushed onto FinishedSerializableTransactions. From there it will move to PredXact->availableTransactions and might be handed out to some other transaction, so it is not safe to retain a pointer to that object. The best solution I have come up with so far is to add a reference count to SERIALIZABLEXACT. I toyed with putting the refcount into the DSM instead, but then I ran into problems making that work when you have a query with multiple Gather nodes. Since the refcount is in SERIALIZABLEXACT I also had to add a generation counter so that I could detect the case where you try to attach too late (the leader has already errored out, the refcount has reached 0 and the SERIALIZABLEXACT object has been recycled). The attached is a draft patch only, needing some testing and polish. Brickbats, better ideas? FWIW I also considered a couple of other ideas: (1) Keeping the object alive on the FinishedSerializableTransactions list until the leader's transaction is finished seems broken because we need to be able to spill that list to the SLRU at any time, and if we somehow made them sticky we could run out of memory. (2) Anything involving the leader having sole control of the object lifetime seems problematic... well, it might work if you disabled the SXACT_FLAG_RO_SAFE optimisation so that ReleasePredicateLocks() always happens after all workers have finished, but that seems like cheating. PS I noticed that for BecomeLockGroupMember() we say "If we can't join the lock group, the leader has gone away, so just exit quietly" but for various other similar things we spew errors (most commonly seen one being "ERROR: could not map dynamic shared memory segment"). Intentional? -- Thomas Munro http://www.enterprisedb.com
Attachment
On Fri, Feb 23, 2018 at 1:54 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > The attached is a draft patch only, needing some testing and polish. > Brickbats, better ideas? Note, that version is broken for multiple Gather nodes, but that's fixable. Comments on the general idea welcome. -- Thomas Munro http://www.enterprisedb.com
On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:> > The best solution I have come up with so far is to add a reference > count to SERIALIZABLEXACT. I toyed with putting the refcount into the > DSM instead, but then I ran into problems making that work when you > have a query with multiple Gather nodes. Since the refcount is in > SERIALIZABLEXACT I also had to add a generation counter so that I > could detect the case where you try to attach too late (the leader has > already errored out, the refcount has reached 0 and the > SERIALIZABLEXACT object has been recycled). I don't know whether that's safe or not. It certainly sounds like it's solving one category of problem, but is that the only issue? If some backends haven't noticed that we're safe, they might keep acquiring SIREAD locks or doing other manipulations of shared state, which maybe could cause confusion. I haven't looked into this deeply enough to understand whether there's actually a possibility of trouble there, but I can't rule it out off-hand. One approach is to just disable this optimization for parallel query. Being able to use SERIALIZABLE with parallel query is better than not being able to do it, even if some optimizations are not applied in that case. Of course making the optimizations work is better, but we've got to be sure we're doing it right. > PS I noticed that for BecomeLockGroupMember() we say "If we can't > join the lock group, the leader has gone away, so just exit quietly" > but for various other similar things we spew errors (most commonly > seen one being "ERROR: could not map dynamic shared memory segment"). > Intentional? I suppose I thought that if we failed to map the dynamic shared memory segment, it might be down to any one of several causes; whereas if we fail to join the lock group, it must be because the leader has already exited. There might be a flaw in that thinking, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro >> PS I noticed that for BecomeLockGroupMember() we say "If we can't >> join the lock group, the leader has gone away, so just exit quietly" >> but for various other similar things we spew errors (most commonly >> seen one being "ERROR: could not map dynamic shared memory segment"). >> Intentional? > > I suppose I thought that if we failed to map the dynamic shared memory > segment, it might be down to any one of several causes; whereas if we > fail to join the lock group, it must be because the leader has already > exited. There might be a flaw in that thinking, though. > By the way, in which case leader can exit early? As of now, we do wait for workers to end both before the query is finished or in error cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro >>> PS I noticed that for BecomeLockGroupMember() we say "If we can't >>> join the lock group, the leader has gone away, so just exit quietly" >>> but for various other similar things we spew errors (most commonly >>> seen one being "ERROR: could not map dynamic shared memory segment"). >>> Intentional? >> >> I suppose I thought that if we failed to map the dynamic shared memory >> segment, it might be down to any one of several causes; whereas if we >> fail to join the lock group, it must be because the leader has already >> exited. There might be a flaw in that thinking, though. >> > > By the way, in which case leader can exit early? As of now, we do > wait for workers to end both before the query is finished or in error > cases. create table foo as select generate_series(1, 10)::int a; alter table foo set (parallel_workers = 2); set parallel_setup_cost = 0; set parallel_tuple_cost = 0; select count(a / 0) from foo; That reliably gives me: ERROR: division by zero [from leader] ERROR: could not map dynamic shared memory segment [from workers] I thought this was coming from resource manager cleanup, but you're right: that happens after we wait for all workers to finish. Perhaps this is a race within DestroyParallelContext() itself: when it is called by AtEOXact_Parallel() during an abort, it asks the postmaster to SIGTERM the workers, then it immediately detaches from the DSM segment, and then it waits for the worker to start up. The workers unblock signals before the they try to attach to the DSM segment, but they don't CHECK_FOR_INTERRUPTS before they try to attach (and even if they did it wouldn't solve nothing). I don't like the error much, though at least the root cause error is logged first. I don't immediately see how BecomeLockGroupMember() could have the same kind of problem though, for the reason you said: the leader waits for the workers to finish, so I'm not sure in which circumstances it would cease to be the lock group leader while the workers are still running. -- Thomas Munro http://www.enterprisedb.com
On Fri, Feb 23, 2018 at 8:48 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro >>>> PS I noticed that for BecomeLockGroupMember() we say "If we can't >>>> join the lock group, the leader has gone away, so just exit quietly" >>>> but for various other similar things we spew errors (most commonly >>>> seen one being "ERROR: could not map dynamic shared memory segment"). >>>> Intentional? >>> >>> I suppose I thought that if we failed to map the dynamic shared memory >>> segment, it might be down to any one of several causes; whereas if we >>> fail to join the lock group, it must be because the leader has already >>> exited. There might be a flaw in that thinking, though. >>> >> >> By the way, in which case leader can exit early? As of now, we do >> wait for workers to end both before the query is finished or in error >> cases. > > create table foo as select generate_series(1, 10)::int a; > alter table foo set (parallel_workers = 2); > set parallel_setup_cost = 0; > set parallel_tuple_cost = 0; > select count(a / 0) from foo; > > That reliably gives me: > ERROR: division by zero [from leader] > ERROR: could not map dynamic shared memory segment [from workers] > > I thought this was coming from resource manager cleanup, but you're > right: that happens after we wait for all workers to finish. Perhaps > this is a race within DestroyParallelContext() itself: when it is > called by AtEOXact_Parallel() during an abort, it asks the postmaster > to SIGTERM the workers, then it immediately detaches from the DSM > segment, and then it waits for the worker to start up. > I guess you mean to say worker waits to shutdown/exit. Why would it wait for startup at that stage? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Feb 23, 2018 at 7:56 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Feb 23, 2018 at 8:48 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> By the way, in which case leader can exit early? As of now, we do >>> wait for workers to end both before the query is finished or in error >>> cases. >> >> create table foo as select generate_series(1, 10)::int a; >> alter table foo set (parallel_workers = 2); >> set parallel_setup_cost = 0; >> set parallel_tuple_cost = 0; >> select count(a / 0) from foo; >> >> That reliably gives me: >> ERROR: division by zero [from leader] >> ERROR: could not map dynamic shared memory segment [from workers] >> >> I thought this was coming from resource manager cleanup, but you're >> right: that happens after we wait for all workers to finish. Perhaps >> this is a race within DestroyParallelContext() itself: when it is >> called by AtEOXact_Parallel() during an abort, it asks the postmaster >> to SIGTERM the workers, then it immediately detaches from the DSM >> segment, and then it waits for the worker to start up. >> > > I guess you mean to say worker waits to shutdown/exit. Why would it > wait for startup at that stage? Right, I meant to say shutdown/exit. -- Thomas Munro http://www.enterprisedb.com
On Fri, Feb 23, 2018 at 6:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote:> >> The best solution I have come up with so far is to add a reference >> count to SERIALIZABLEXACT. I toyed with putting the refcount into the >> DSM instead, but then I ran into problems making that work when you >> have a query with multiple Gather nodes. Since the refcount is in >> SERIALIZABLEXACT I also had to add a generation counter so that I >> could detect the case where you try to attach too late (the leader has >> already errored out, the refcount has reached 0 and the >> SERIALIZABLEXACT object has been recycled). > > I don't know whether that's safe or not. It certainly sounds like > it's solving one category of problem, but is that the only issue? If > some backends haven't noticed that we're safe, they might keep > acquiring SIREAD locks or doing other manipulations of shared state, > which maybe could cause confusion. I haven't looked into this deeply > enough to understand whether there's actually a possibility of trouble > there, but I can't rule it out off-hand. After some testing, I think the refcount approach could be made to work, but it seems quite complicated and there are some weird edge cases that showed up that started to make it look like more trouble than it was worth. One downside of refcounts is that you never get to free the SERIALIZABLEXACT until the end of the transaction with parallel_leader_participation = off. I'm testing another version that is a lot simpler: like v10, it relies on the knowledge that the leader's transaction will always end after the workers have finished, but it handles the RO_SAFE optimisation by keeping the SERIALIZABLEXACT alive but freeing its locks etc. More soon.
On Sat, Feb 24, 2018 at 12:04 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I'm testing another version that is a lot simpler: like v10, it relies > on the knowledge that the leader's transaction will always end after > the workers have finished, but it handles the RO_SAFE optimisation by > keeping the SERIALIZABLEXACT alive but freeing its locks etc. More > soon. I've now broken it into two patches. Patch 0001 is like my original patch with some minor improvements, except that it now disables the RO_SAFE optimisation completely in parallel mode. In other words, it's the stupidest fix possible to the problem you flagged up. I think the main questions to answer about the 0001 patch are whether this new locking protocol is sufficient, whether anything bad could happen as a result of lock escalation/transfer, and whether the underlying assumption about the SERIALIZABLEXACT's lifetime holds true (that the leader will never call ReleasePredicateLocks() while a worker is still running). There are a couple of easy incremental improvements that could be made on top of that patch, but I didn't make them because I'm trying to be conservative in the hope of landing at least the basic feature in PostgreSQL 11. Namely: 1. We could still return false if we see SXACT_FLAG_RO_SAFE in SerializationNeededForRead() (we just couldn't call ReleasePredicateLocks()). 2. We could set MySerializableXact to InvalidSerializableXact in worker backends so at least they'd benefit from the optimisation (we just couldn't do that in the leader or it'd leak resources). Patch 0002 aims a bit higher than those ideas. I wanted to make sure that the leader wouldn't arbitrarily miss out on the optimisation, and I also suspect that the optimisation might be contagious in the sense that actually releasing sooner might cause the RO_SAFE flag to be set on *other* transactions sooner. Patch 0002 works like this: The first backend to observe the RO_SAFE flag 'partially releases' the SERIALIZABLEXACT, so that the SERIALIZABLEXACT itself remains valid. (The concept of 'partial release' already existed, but I'm using it in a new way.) All backends clear their MySerializableXact variable so that they drop to faster SI in their own time. The leader keeps a copy of it in SavedSerializableXact, so that it can fully release it at the end of the transaction when we know that no other backend has a reference to it. These patches survive hammering with a simple test that generates a mixture of read only and read write parallel queries that hit the interesting case (attached; this test helped me understand that the refcount scheme I considered was going to be hard). I haven't personally tried to measure the value of the optimisation (though I'm pretty sure it exists, based on the VLDB paper and the knowledge that REPEATABLE READ (what the optimisation effectively gives you) just has to be faster than SERIALIZABLE 'cause I've see all that code you get to not run!). I'd like to propose the 0001 patch for now, but keep the 0002 patch back for a bit as it's very new and needs more feedback, if possible from Kevin and others involved in the SSI project. Of course their input on the 0001 patch is also super welcome. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Mon, Feb 26, 2018 at 6:37 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I've now broken it into two patches. Rebased. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Feb 28, 2018 at 11:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Mon, Feb 26, 2018 at 6:37 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I've now broken it into two patches. > > Rebased. +SerializableXactHandle +ShareSerializableXact(void) +{ + Assert(!IsParallelWorker()); + + return MySerializableXact; +} Uh, how's that OK? There's no rule that you can't create a ParallelContext in a worker. Parallel query currently doesn't, so it probably won't happen, but burying an assertion to that effect in the predicate locking code doesn't seem nice. Is "sxact" really the best (i.e. clearest) name we can come up with for the lock tranche? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 8, 2018 at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: > +SerializableXactHandle > +ShareSerializableXact(void) > +{ > + Assert(!IsParallelWorker()); > + > + return MySerializableXact; > +} > > Uh, how's that OK? There's no rule that you can't create a > ParallelContext in a worker. Parallel query currently doesn't, so it > probably won't happen, but burying an assertion to that effect in the > predicate locking code doesn't seem nice. Hmm. I suppose you could have a PARALLEL SAFE function that itself launches parallel workers explicitly (not via parallel query), and they should inherit the same SERIALIZABLEXACT from their parent and that should all just work. > Is "sxact" really the best (i.e. clearest) name we can come up with > for the lock tranche? Yeah, needs a better name. I have some lingering uncertainty about this patch and we're out of time, so I moved it to PG12 CF1. Thanks Haribabu, Robert, Amit for the reviews and comments so far. -- Thomas Munro http://www.enterprisedb.com
On Fri, Mar 30, 2018 at 2:56 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Mar 8, 2018 at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> +SerializableXactHandle >> +ShareSerializableXact(void) >> +{ >> + Assert(!IsParallelWorker()); >> + >> + return MySerializableXact; >> +} >> >> Uh, how's that OK? There's no rule that you can't create a >> ParallelContext in a worker. Parallel query currently doesn't, so it >> probably won't happen, but burying an assertion to that effect in the >> predicate locking code doesn't seem nice. > > Hmm. I suppose you could have a PARALLEL SAFE function that itself > launches parallel workers explicitly (not via parallel query), and > they should inherit the same SERIALIZABLEXACT from their parent and > that should all just work. > >> Is "sxact" really the best (i.e. clearest) name we can come up with >> for the lock tranche? > > Yeah, needs a better name. > > I have some lingering uncertainty about this patch and we're out of > time, so I moved it to PG12 CF1. Thanks Haribabu, Robert, Amit for > the reviews and comments so far. > I'd like to test and review this patches but they seem to conflict with current HEAD. Could you please rebase them? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I'd like to test and review this patches but they seem to conflict > with current HEAD. Could you please rebase them? Hi Sawada-san, Thanks! Rebased and attached. The only changes are: the LWLock tranche is now shown as "serializable_xact" instead of "sxact" (hmm, LWLock tranches have lower_case_names_with_underscores, but individual LWLocks have CamelCaseName...), and ShareSerializableXact() no longer does Assert(!IsParallelWorker()). These changes are based on the last feedback from Robert. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Fri, Jun 29, 2018 at 7:28 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> I'd like to test and review this patches but they seem to conflict >> with current HEAD. Could you please rebase them? > > Hi Sawada-san, > > Thanks! Rebased and attached. The only changes are: the LWLock > tranche is now shown as "serializable_xact" instead of "sxact" (hmm, > LWLock tranches have lower_case_names_with_underscores, but individual > LWLocks have CamelCaseName...), and ShareSerializableXact() no longer > does Assert(!IsParallelWorker()). These changes are based on the last > feedback from Robert. > Thank you! Will look at patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jul 2, 2018 at 3:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Jun 29, 2018 at 7:28 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> I'd like to test and review this patches but they seem to conflict >>> with current HEAD. Could you please rebase them? >> >> Hi Sawada-san, >> >> Thanks! Rebased and attached. The only changes are: the LWLock >> tranche is now shown as "serializable_xact" instead of "sxact" (hmm, >> LWLock tranches have lower_case_names_with_underscores, but individual >> LWLocks have CamelCaseName...), and ShareSerializableXact() no longer >> does Assert(!IsParallelWorker()). These changes are based on the last >> feedback from Robert. >> > > Thank you! Will look at patches. > I looked at this patches. The latest patch can build without any errors and warnings and pass all regression tests. I don't see critical bugs but there are random comments. + /* + * If the leader in a parallel query earler stashed a partially + * released SERIALIZABLEXACT for final clean-up at end of transaction + * (because workers might still have been accessing it), then it's + * time to restore it. + */ There is a typo. s/earler/earlier/ ---- Should we add test to check if write-skew[1] anomaly doesn't happen even in parallel mode? ---- - * We aren't acquiring lightweight locks for the predicate lock or lock + * We acquire an LWLock in the case of parallel mode, because worker + * backends have access to the leader's SERIALIZABLEXACT. Otherwise, + * we aren't acquiring lightweight locks for the predicate lock or lock There are LWLock and lightweight lock. Maybe it's better to unify the spelling. ---- @@ -2231,9 +2234,12 @@ PrepareTransaction(void) /* * Mark serializable transaction as complete for predicate locking * purposes. This should be done as late as we can put it and still allow - * errors to be raised for failure patterns found at commit. + * errors to be raised for failure patterns found at commit. This is not + * appropriate for parallel workers however, because we aren't committing + * the leader's transaction and its serializable state will live on. */ - PreCommit_CheckForSerializationFailure(); + if (!IsParallelWorker()) + PreCommit_CheckForSerializationFailure(); This code assumes that parallel workers could prepare transaction. Is that expected behavior of parallel query? There is an assertion !IsInParallelMode() at the beginning of that function though. ---- + /* + * If the transaction is committing, but it has been partially released + * already, then treat this as a roll back. It was marked as rolled back. + */ + if (isCommit && SxactIsPartiallyReleased(MySerializableXact)) + isCommit = false; + Isn't it better to add an assertion to check if MySerializableXact->flags has SXACT_FLAG_ROLLED_BACK flag for safety? [1] https://en.wikipedia.org/wiki/Snapshot_isolation#Definition Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
After reviewing the thread and the current two patches, I agree with Masahiko Sawada plus preferring one adjustment to the coding: I would prefer to break out the majority of the ReleasePredicateLocks function to a static ReleasePredicateLocksMain (or similar) function and eliminating the goto. The optimization in patch 0002 is important. Previous benchmarks showed a fairly straightforward pgbench test scaled as well as REPEATABLE READ when it was present, but performance fell off up to 20% as the scale increased without it. I will spend a few more days in testing and review, but figured I should pass along "first impressions" now. On Tue, Jul 10, 2018 at 8:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Jul 2, 2018 at 3:12 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jun 29, 2018 at 7:28 PM, Thomas Munro > > <thomas.munro@enterprisedb.com> wrote: > >> On Thu, Jun 28, 2018 at 7:55 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> I'd like to test and review this patches but they seem to conflict > >>> with current HEAD. Could you please rebase them? > >> > >> Hi Sawada-san, > >> > >> Thanks! Rebased and attached. The only changes are: the LWLock > >> tranche is now shown as "serializable_xact" instead of "sxact" (hmm, > >> LWLock tranches have lower_case_names_with_underscores, but individual > >> LWLocks have CamelCaseName...), and ShareSerializableXact() no longer > >> does Assert(!IsParallelWorker()). These changes are based on the last > >> feedback from Robert. > >> > > > > Thank you! Will look at patches. > > > > I looked at this patches. The latest patch can build without any > errors and warnings and pass all regression tests. I don't see > critical bugs but there are random comments. > > + /* > + * If the leader in a parallel query earler stashed a partially > + * released SERIALIZABLEXACT for final clean-up at end > of transaction > + * (because workers might still have been accessing > it), then it's > + * time to restore it. > + */ > > There is a typo. > s/earler/earlier/ > > ---- > Should we add test to check if write-skew[1] anomaly doesn't happen > even in parallel mode? > > ---- > - * We aren't acquiring lightweight locks for the predicate lock or lock > + * We acquire an LWLock in the case of parallel mode, because worker > + * backends have access to the leader's SERIALIZABLEXACT. Otherwise, > + * we aren't acquiring lightweight locks for the predicate lock or lock > > There are LWLock and lightweight lock. Maybe it's better to unify the spelling. > > ---- > @@ -2231,9 +2234,12 @@ PrepareTransaction(void) > /* > * Mark serializable transaction as complete for predicate locking > * purposes. This should be done as late as we can put it and > still allow > - * errors to be raised for failure patterns found at commit. > + * errors to be raised for failure patterns found at commit. > This is not > + * appropriate for parallel workers however, because we aren't > committing > + * the leader's transaction and its serializable state will live on. > */ > - PreCommit_CheckForSerializationFailure(); > + if (!IsParallelWorker()) > + PreCommit_CheckForSerializationFailure(); > > This code assumes that parallel workers could prepare transaction. Is > that expected behavior of parallel query? There is an assertion > !IsInParallelMode() at the beginning of that function though. > > ---- > + /* > + * If the transaction is committing, but it has been partially released > + * already, then treat this as a roll back. It was marked as rolled back. > + */ > + if (isCommit && SxactIsPartiallyReleased(MySerializableXact)) > + isCommit = false; > + > > Isn't it better to add an assertion to check if > MySerializableXact->flags has SXACT_FLAG_ROLLED_BACK flag for safety? > > [1] https://en.wikipedia.org/wiki/Snapshot_isolation#Definition > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
On Wed, Sep 19, 2018 at 04:50:40PM -0500, Kevin Grittner wrote: > I will spend a few more days in testing and review, but figured I > should pass along "first impressions" now. Kevin, it seems that this patch is pending on your input. I have moved this patch to next CF for now. -- Michael
Attachment
On Thu, Sep 20, 2018 at 9:50 AM Kevin Grittner <kgrittn@gmail.com> wrote: > On Tue, Jul 10, 2018 at 8:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I looked at this patches. The latest patch can build without any > > errors and warnings and pass all regression tests. I don't see > > critical bugs but there are random comments. Thanks for the review! And sorry for my delayed response. Here is a rebased patch, with changes as requested. I have replies also for Kevin, see further down. > > + /* > > + * If the leader in a parallel query earler stashed a partially > > + * released SERIALIZABLEXACT for final clean-up at end > > of transaction > > + * (because workers might still have been accessing > > it), then it's > > + * time to restore it. > > + */ > > > > There is a typo. > > s/earler/earlier/ Fixed. > > ---- > > Should we add test to check if write-skew[1] anomaly doesn't happen > > even in parallel mode? I suppose we could find another one of the existing specs that shows write-skew and adapt it to run a read-only part of the transaction in a parallel worker, but what would it prove that the proposed new test doesn't prove already? > > ---- > > - * We aren't acquiring lightweight locks for the predicate lock or lock > > + * We acquire an LWLock in the case of parallel mode, because worker > > + * backends have access to the leader's SERIALIZABLEXACT. Otherwise, > > + * we aren't acquiring lightweight locks for the predicate lock or lock > > > > There are LWLock and lightweight lock. Maybe it's better to unify the spelling. Done. > > ---- > > @@ -2231,9 +2234,12 @@ PrepareTransaction(void) > > /* > > * Mark serializable transaction as complete for predicate locking > > * purposes. This should be done as late as we can put it and > > still allow > > - * errors to be raised for failure patterns found at commit. > > + * errors to be raised for failure patterns found at commit. > > This is not > > + * appropriate for parallel workers however, because we aren't > > committing > > + * the leader's transaction and its serializable state will live on. > > */ > > - PreCommit_CheckForSerializationFailure(); > > + if (!IsParallelWorker()) > > + PreCommit_CheckForSerializationFailure(); > > > > This code assumes that parallel workers could prepare transaction. Is > > that expected behavior of parallel query? There is an assertion > > !IsInParallelMode() at the beginning of that function though. You are right. I made a change exactly like this in CommitTransaction(), where it is necessary, but then somehow I managed to apply that hunk to the identical code in PrepareTransaction() also, where it is not necessary. Fixed. > > ---- > > + /* > > + * If the transaction is committing, but it has been partially released > > + * already, then treat this as a roll back. It was marked as rolled back. > > + */ > > + if (isCommit && SxactIsPartiallyReleased(MySerializableXact)) > > + isCommit = false; > > + > > > > Isn't it better to add an assertion to check if > > MySerializableXact->flags has SXACT_FLAG_ROLLED_BACK flag for safety? That can't hurt. Added. It's don't really the fact that the flag contradicts reality here... but it was already established that the read-only safe optimisation calls ReleasePredicateLocks(false) which behaves like a rollback and marks the SERIALIZABLEXACT that way. I don't have a better idea right now. On Thu, Sep 20, 2018 at 9:50 AM Kevin Grittner <kgrittn@gmail.com> wrote: > After reviewing the thread and the current two patches, I agree with > Masahiko Sawada plus preferring one adjustment to the coding: I would > prefer to break out the majority of the ReleasePredicateLocks function > to a static ReleasePredicateLocksMain (or similar) function and > eliminating the goto. Hi Kevin, Thanks for the review. How about moving that bit of local-cleanup code from the end of the function into a new static function ReleasePredicateLocksLocal(), so that we can call it and then return, instead of the evil "goto"? Done that way in the attached. > The optimization in patch 0002 is important. Previous benchmarks > showed a fairly straightforward pgbench test scaled as well as > REPEATABLE READ when it was present, but performance fell off up to > 20% as the scale increased without it. +1 > I will spend a few more days in testing and review, but figured I > should pass along "first impressions" now. Thanks! -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Oct 2, 2018 at 4:53 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Thanks for the review! And sorry for my delayed response. Here is a > rebased patch, with changes as requested. Rebased. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Oct 2, 2018 at 4:53 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Thanks for the review! And sorry for my delayed response. Here is a > rebased patch, with changes as requested. Rebased. -- Thomas Munro http://www.enterprisedb.com
On Mon, Oct 8, 2018 at 9:40 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Rebased. It applies and builds clean, it passed make world with cassert and TAP tests, and I can't see any remaining flaws. This is true both of just the 0001 v16 patch and that with 0002 v16 applied on top of it. It would be great if someone with a big test machine could stress test and benchmark this versus current production versions. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
On Thu, Oct 11, 2018 at 10:15 AM Kevin Grittner <kgrittn@gmail.com> wrote: > It applies and builds clean, it passed make world with cassert and TAP > tests, and I can't see any remaining flaws. This is true both of just > the 0001 v16 patch and that with 0002 v16 applied on top of it. Thanks. I'd like to commit this soon. > It would be great if someone with a big test machine could stress test > and benchmark this versus current production versions. Hmm. I can't compare it with current production versions directly since SERIALIZABLE + parallel query wasn't possible before. I could compare it against lower isolation levels or non-parallel query, but those tests don't seem to tell us anything we don't already know: SERIALIZABLE slows some stuff down, parallel query speeds some stuff up. As for stress-testing, most benchmarks are either good for testing parallelism (TPC-H etc) but don't do any writes, or good for testing writes (TPC-B etc) but don't do any parallelism. I'm going to experiment with the "SIBENCH" approach from the Cahill paper and see where that leads. -- Thomas Munro https://enterprisedb.com
On Mon, Mar 4, 2019 at 10:17 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Oct 11, 2018 at 10:15 AM Kevin Grittner <kgrittn@gmail.com> wrote: > > It applies and builds clean, it passed make world with cassert and TAP > > tests, and I can't see any remaining flaws. This is true both of just > > the 0001 v16 patch and that with 0002 v16 applied on top of it. > > Thanks. I'd like to commit this soon. I did a round of testing under load and some printf-debugging to convince myself that the SXACT_FLAG_RO_SAFE handling really is exercised by serializable-parallel-2.spec and behaving as expected, along with some more testing by hand, and pushed this. To generate load I used a knock-off of sibench[1], run as eg ./petit-sibench --rows 10000 --threads 8 --ssi, against a server running with -c min_parallel_table_scan_size=128kB -c parallel_setup_cost=0 -c max_worker_processes=16 -c max_parallel_workers=16. [1] https://github.com/macdice/petit-sibench/blob/master/petit-sibench.c -- Thomas Munro https://enterprisedb.com