Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl |
Date | |
Msg-id | 22842.1455474783@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel
processes from deadl
Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > First, the overall concept here is that processes can either be a > member of a lock group or a member of no lock group. Check. > Second, I can review the data structure changes and the associated > invariants. The data structure additions seemed relatively straightforward, though I did wonder why you added groupLeader to PROCLOCK. Why is that not redundant with tag.myProc->lockGroupLeader? If it isn't redundant, it's inadequately documented. If it is, seems better to lose it. Also, isn't the comment on this PGPROC field incorrect: PGPROC *lockGroupLeader; /* lock group leader, if I'm a follower */ ie shouldn't you s/follower/member/ ? > ... the lock manager lock that protects the fields in a > given PGPROC is chosen by taking pgprocno modulo the number of lock > manager partitions. pgprocno of the specific PGPROC, or of the group leader? If it's the former, I'm pretty suspicious that there are deadlock and/or linked-list-corruption hazards here. If it's the latter, at least the comments around this are misleading. > Each PROCLOCK now has a new groupLeader flag. While a PGPROC's > lockGroupLeader may be NULL if that process is not involved in group > locking, a PROCLOCK's groupLeader always has a non-NULL value; it > points back to the PGPROC that acquired the lock. This is not what the comment on it says; and your prose explanation here implies that it should actually be equal to tag.myProc, or else you are using some definition of "acquired the lock" that I don't follow at all. There could be lots of PGPROCs that have acquired a lock in one sense or another; which one do you mean? > With respect to pg_locks - and for that matter also pg_stat_activity - > I think you are right that improvement is needed. This is really the core of my concern at the moment. I think that isolationtester is probably outright broken for any situation where the queries-under-test are being parallel executed, and so will be any other client that's trying to identify who blocks whom from pg_locks. > The simplest thing we could do to make that easier is, in > pg_stat_activity, have parallel workers advertise the PID that > launched them in a new field; and in pg_locks, have members of a lock > group advertise the leader's PID in a new field. That would be simple for us, but it would break every existing client-side query that tries to identify blockers/blockees; and not only would those queries need work but they would become very substantially more complex and slower (probably at least 4-way joins not 2-way). We already know that isolationtester's query has performance problems in the buildfarm. I think more thought is needed here, and that this area should be considered a release blocker. It's certainly a blocker for any thought of turning parallel query on by default. > I hope this helps and please let me know what more you think I should do. At the least you need to make another pass over the comments and README addition. I highlighted above a few critical inaccuracies, and it's not hard to find a lot of less-critical typos in the comments in that commit. Transposing some of what you've written here into the README would likely be a good thing too. regards, tom lane
pgsql-hackers by date: