Thread: [HACKERS] Moving relation extension locks out of heavyweight lock manager
[HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
Hi all, Currently, the relation extension lock is implemented using heavyweight lock manager and almost functions (except for brin_page_cleanup) using LockRelationForExntesion use it with ExclusiveLock mode. But actually it doesn't need multiple lock modes or deadlock detection or any of the other functionality that the heavyweight lock manager provides. I think It's enough to use something like LWLock. So I'd like to propose to change relation extension lock management so that it works using LWLock instead. Attached draft patch makes relation extension locks uses LWLock rather than heavyweight lock manager, using by shared hash table storing information of the relation extension lock. The basic idea is that we add hash table in shared memory for relation extension locks and each hash entry is LWLock struct. Whenever the process wants to acquire relation extension locks, it searches appropriate LWLock entry in hash table and acquire it. The process can remove a hash entry when unlocking it if nobody is holding and waiting it. This work would be helpful not only for existing workload but also future works like some parallel utility commands, which is discussed on other threads[1]. At least for parallel vacuum, this feature helps to solve issue that the implementation of parallel vacuum has. I ran pgbench for 10 min three times(scale factor is 5000), here is a performance measurement result. clients TPS(HEAD) TPS(Patched) 4 2092.612 2031.277 8 3153.732 3046.789 16 4562.072 4625.419 32 6439.391 6479.526 64 7767.364 7779.636 100 7917.173 7906.567 * 16 core Xeon E5620 2.4GHz * 32 GB RAM * ioDrive In current implementation, it seems there is no performance degradation so far. Please give me feedback. [1] * Block level parallel vacuum WIP <https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com> * CREATE TABLE with parallel workers, 10.0? <https://www.postgresql.org/message-id/CAFBoRzeoDdjbPV4riCE%2B2ApV%2BY8nV4HDepYUGftm5SuKWna3rQ%40mail.gmail.com> * utility commands benefiting from parallel plan <https://www.postgresql.org/message-id/CAJrrPGcY3SZa40vU%2BR8d8dunXp9JRcFyjmPn2RF9_4cxjHd7uA%40mail.gmail.com> Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Currently, the relation extension lock is implemented using > heavyweight lock manager and almost functions (except for > brin_page_cleanup) using LockRelationForExntesion use it with > ExclusiveLock mode. But actually it doesn't need multiple lock modes > or deadlock detection or any of the other functionality that the > heavyweight lock manager provides. I think It's enough to use > something like LWLock. So I'd like to propose to change relation > extension lock management so that it works using LWLock instead. That's not a good idea because it'll make the code that executes while holding that lock noninterruptible. Possibly something based on condition variables would work better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> ... I'd like to propose to change relation >> extension lock management so that it works using LWLock instead. > That's not a good idea because it'll make the code that executes while > holding that lock noninterruptible. Is that really a problem? We typically only hold it over one kernel call, which ought to be noninterruptible anyway. Also, the CheckpointLock is held for far longer, and we've not heard complaints about that one. I'm slightly suspicious of the claim that we don't need deadlock detection. There are places that e.g. touch FSM while holding this lock. It might be all right but it needs close review, not just an assertion that it's not a problem. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > This work would be helpful not only for existing workload but also > future works like some parallel utility commands, which is discussed > on other threads[1]. At least for parallel vacuum, this feature helps > to solve issue that the implementation of parallel vacuum has. > > I ran pgbench for 10 min three times(scale factor is 5000), here is a > performance measurement result. > > clients TPS(HEAD) TPS(Patched) > 4 2092.612 2031.277 > 8 3153.732 3046.789 > 16 4562.072 4625.419 > 32 6439.391 6479.526 > 64 7767.364 7779.636 > 100 7917.173 7906.567 > > * 16 core Xeon E5620 2.4GHz > * 32 GB RAM > * ioDrive > > In current implementation, it seems there is no performance degradation so far. > I think it is good to check pgbench, but we should do tests of the bulk load as this lock is stressed during such a workload. Some of the tests we have done when we have improved the performance of bulk load can be found in an e-mail [1]. [1] - https://www.postgresql.org/message-id/CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, May 12, 2017 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> ... I'd like to propose to change relation >>> extension lock management so that it works using LWLock instead. > >> That's not a good idea because it'll make the code that executes while >> holding that lock noninterruptible. > > Is that really a problem? We typically only hold it over one kernel call, > which ought to be noninterruptible anyway. > During parallel bulk load operations, I think we hold it over multiple kernel calls. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Sat, May 13, 2017 at 8:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> This work would be helpful not only for existing workload but also >> future works like some parallel utility commands, which is discussed >> on other threads[1]. At least for parallel vacuum, this feature helps >> to solve issue that the implementation of parallel vacuum has. >> >> I ran pgbench for 10 min three times(scale factor is 5000), here is a >> performance measurement result. >> >> clients TPS(HEAD) TPS(Patched) >> 4 2092.612 2031.277 >> 8 3153.732 3046.789 >> 16 4562.072 4625.419 >> 32 6439.391 6479.526 >> 64 7767.364 7779.636 >> 100 7917.173 7906.567 >> >> * 16 core Xeon E5620 2.4GHz >> * 32 GB RAM >> * ioDrive >> >> In current implementation, it seems there is no performance degradation so far. >> > > I think it is good to check pgbench, but we should do tests of the > bulk load as this lock is stressed during such a workload. Some of > the tests we have done when we have improved the performance of bulk > load can be found in an e-mail [1]. > Thank you for sharing. I've measured using two test scripts attached on that thread. Here is result. * Copy test script Client HEAD Patched 4 452.60 455.53 8 561.74 561.09 16 592.50 592.21 32 602.53 599.53 64 605.01 606.42 * Insert test script Client HEAD Patched 4 159.04 158.44 8 169.41 169.69 16 177.11 178.14 32 182.14 181.99 64 182.11 182.73 It seems there is no performance degradation so far. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, May 12, 2017 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> ... I'd like to propose to change relation >>>> extension lock management so that it works using LWLock instead. >> >>> That's not a good idea because it'll make the code that executes while >>> holding that lock noninterruptible. >> >> Is that really a problem? We typically only hold it over one kernel call, >> which ought to be noninterruptible anyway. > > During parallel bulk load operations, I think we hold it over multiple > kernel calls. We do. Also, RelationGetNumberOfBlocks() is not necessarily only one kernel call, no? Nor is vm_extend. Also, it's not just the backend doing the filesystem operation that's non-interruptible, but also any waiters, right? Maybe this isn't a big problem, but it does seem to be that it would be better to avoid it if we can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, May 17, 2017 at 1:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> ... I'd like to propose to change relation >>>>> extension lock management so that it works using LWLock instead. >>> >>>> That's not a good idea because it'll make the code that executes while >>>> holding that lock noninterruptible. >>> >>> Is that really a problem? We typically only hold it over one kernel call, >>> which ought to be noninterruptible anyway. >> >> During parallel bulk load operations, I think we hold it over multiple >> kernel calls. > > We do. Also, RelationGetNumberOfBlocks() is not necessarily only one > kernel call, no? Nor is vm_extend. Yeah, these functions could call more than one kernel calls while holding extension lock. > Also, it's not just the backend doing the filesystem operation that's > non-interruptible, but also any waiters, right? > > Maybe this isn't a big problem, but it does seem to be that it would > be better to avoid it if we can. > I agree to change it to be interruptible for more safety. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, May 17, 2017 at 1:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Robert Haas <robertmhaas@gmail.com> writes: >>>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>> ... I'd like to propose to change relation >>>>>> extension lock management so that it works using LWLock instead. >>>> >>>>> That's not a good idea because it'll make the code that executes while >>>>> holding that lock noninterruptible. >>>> >>>> Is that really a problem? We typically only hold it over one kernel call, >>>> which ought to be noninterruptible anyway. >>> >>> During parallel bulk load operations, I think we hold it over multiple >>> kernel calls. >> >> We do. Also, RelationGetNumberOfBlocks() is not necessarily only one >> kernel call, no? Nor is vm_extend. > > Yeah, these functions could call more than one kernel calls while > holding extension lock. > >> Also, it's not just the backend doing the filesystem operation that's >> non-interruptible, but also any waiters, right? >> >> Maybe this isn't a big problem, but it does seem to be that it would >> be better to avoid it if we can. >> > > I agree to change it to be interruptible for more safety. > Attached updated version patch. To use the lock mechanism similar to LWLock but interrupt-able, I introduced new lock manager for extension lock. A lot of code especially locking and unlocking, is inspired by LWLock but it uses the condition variables to wait for acquiring lock. Other part is not changed from previous patch. This is still a PoC patch, lacks documentation. The following is the measurement result with test script same as I used before. * Copy test script HEAD Patched 4 436.6 436.1 8 561.8 561.8 16 580.7 579.4 32 588.5 597.0 64 596.1 599.0 * Insert test script HEAD Patched 4 156.5 156.0 8 167.0 167.9 16 176.2 175.6 32 181.1 181.0 64 181.5 183.0 Since I replaced heavyweight lock with lightweight lock I expected the performance slightly improves from HEAD but it was almost same result. I'll continue to look at more detail. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Thu, Jun 22, 2017 at 12:03 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Wed, May 17, 2017 at 1:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> Robert Haas <robertmhaas@gmail.com> writes: >>>>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>>>> ... I'd like to propose to change relation >>>>>>> extension lock management so that it works using LWLock instead. >>>>> >>>>>> That's not a good idea because it'll make the code that executes while >>>>>> holding that lock noninterruptible. >>>>> >>>>> Is that really a problem? We typically only hold it over one kernel call, >>>>> which ought to be noninterruptible anyway. >>>> >>>> During parallel bulk load operations, I think we hold it over multiple >>>> kernel calls. >>> >>> We do. Also, RelationGetNumberOfBlocks() is not necessarily only one >>> kernel call, no? Nor is vm_extend. >> >> Yeah, these functions could call more than one kernel calls while >> holding extension lock. >> >>> Also, it's not just the backend doing the filesystem operation that's >>> non-interruptible, but also any waiters, right? >>> >>> Maybe this isn't a big problem, but it does seem to be that it would >>> be better to avoid it if we can. >>> >> >> I agree to change it to be interruptible for more safety. >> > > Attached updated version patch. To use the lock mechanism similar to > LWLock but interrupt-able, I introduced new lock manager for extension > lock. A lot of code especially locking and unlocking, is inspired by > LWLock but it uses the condition variables to wait for acquiring lock. > Other part is not changed from previous patch. This is still a PoC > patch, lacks documentation. The following is the measurement result > with test script same as I used before. > > * Copy test script > HEAD Patched > 4 436.6 436.1 > 8 561.8 561.8 > 16 580.7 579.4 > 32 588.5 597.0 > 64 596.1 599.0 > > * Insert test script > HEAD Patched > 4 156.5 156.0 > 8 167.0 167.9 > 16 176.2 175.6 > 32 181.1 181.0 > 64 181.5 183.0 > > Since I replaced heavyweight lock with lightweight lock I expected the > performance slightly improves from HEAD but it was almost same result. > I'll continue to look at more detail. > The previous patch conflicts with current HEAD, I rebased the patch to current HEAD. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Thomas Munro
Date:
On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The previous patch conflicts with current HEAD, I rebased the patch to > current HEAD. Hi Masahiko-san, FYI this doesn't build anymore. I think it's just because the wait event enumerators were re-alphabetised in pgstat.h: ../../../../src/include/pgstat.h:820:2: error: redeclaration of enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ WAIT_EVENT_LOGICAL_SYNC_DATA, ^ ../../../../src/include/pgstat.h:806:2: note: previous definition of ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here WAIT_EVENT_LOGICAL_SYNC_DATA, ^ ../../../../src/include/pgstat.h:821:2: error: redeclaration of enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, ^ ../../../../src/include/pgstat.h:807:2: note: previous definition of ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, ^ -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Thomas Munro
Date:
On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> The previous patch conflicts with current HEAD, I rebased the patch to >> current HEAD. > > Hi Masahiko-san, Hi Sawada-san, I have just learned from a colleague who is knowledgeable about Japanese customs and kind enough to correct me that the appropriate term of address for our colleagues in Japan on this mailing list is <lastname>-san. I was confused about that -- apologies for my clumsiness. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Sep 8, 2017 at 8:25 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> The previous patch conflicts with current HEAD, I rebased the patch to >>> current HEAD. >> >> Hi Masahiko-san, > > Hi Sawada-san, > > I have just learned from a colleague who is knowledgeable about > Japanese customs and kind enough to correct me that the appropriate > term of address for our colleagues in Japan on this mailing list is > <lastname>-san. I was confused about that -- apologies for my > clumsiness. Don't worry about it, either is ok. In Japan there is a custom of writing <lastname>-san but <firstname>-san is also not incorrect :-) (also I think it's hard to distinguish between last name and first name of Japanese name). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> The previous patch conflicts with current HEAD, I rebased the patch to >> current HEAD. > > Hi Masahiko-san, > > FYI this doesn't build anymore. I think it's just because the wait > event enumerators were re-alphabetised in pgstat.h: > > ../../../../src/include/pgstat.h:820:2: error: redeclaration of > enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ > WAIT_EVENT_LOGICAL_SYNC_DATA, > ^ > ../../../../src/include/pgstat.h:806:2: note: previous definition of > ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here > WAIT_EVENT_LOGICAL_SYNC_DATA, > ^ > ../../../../src/include/pgstat.h:821:2: error: redeclaration of > enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ > WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, > ^ > ../../../../src/include/pgstat.h:807:2: note: previous definition of > ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here > WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, > ^ > Thank you for the information! Attached rebased patch. -- Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Sep 8, 2017 at 4:32 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> The previous patch conflicts with current HEAD, I rebased the patch to >>> current HEAD. >> >> Hi Masahiko-san, >> >> FYI this doesn't build anymore. I think it's just because the wait >> event enumerators were re-alphabetised in pgstat.h: >> >> ../../../../src/include/pgstat.h:820:2: error: redeclaration of >> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ >> WAIT_EVENT_LOGICAL_SYNC_DATA, >> ^ >> ../../../../src/include/pgstat.h:806:2: note: previous definition of >> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here >> WAIT_EVENT_LOGICAL_SYNC_DATA, >> ^ >> ../../../../src/include/pgstat.h:821:2: error: redeclaration of >> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ >> WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, >> ^ >> ../../../../src/include/pgstat.h:807:2: note: previous definition of >> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here >> WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, >> ^ >> > > Thank you for the information! Attached rebased patch. > Since the previous patch conflicts with current HEAD, I attached the updated patch for next CF. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Since the previous patch conflicts with current HEAD, I attached the > updated patch for next CF. I think we should back up here and ask ourselves a couple of questions: 1. What are we trying to accomplish here? 2. Is this the best way to accomplish it? To the first question, the problem as I understand it as follows: Heavyweight locks don't conflict between members of a parallel group. However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases don't arise, because parallel operations are strictly read-only (except for inserts by the leader into a just-created table, when only one member of the group can be taking the lock anyway). However, once we allow writes, they become possible, so some solution is needed. To the second question, there are a couple of ways we could fix this. First, we could continue to allow these locks to be taken in the heavyweight lock manager, but make them conflict even between members of the same lock group. This is, however, complicated. A significant problem (or so I think) is that the deadlock detector logic, which is already quite hard to test, will become even more complicated, since wait edges between members of a lock group need to exist at some times and not other times. Moreover, to the best of my knowledge, the increased complexity would have no benefit, because it doesn't look to me like we ever take any other heavyweight lock while holding one of these four kinds of locks. Therefore, no deadlock can occur: if we're waiting for one of these locks, the process that holds it is not waiting for any other heavyweight lock. This gives rise to a second idea: move these locks out of the heavyweight lock manager and handle them with separate code that does not have deadlock detection and doesn't need as many lock modes. I think that idea is basically sound, although it's possibly not the only sound idea. However, that makes me wonder whether we shouldn't be a bit more aggressive with this patch: why JUST relation extension locks? Why not all four types of locks listed above? Actually, tuple locks are a bit sticky, because they have four lock modes. The other three kinds are very similar -- all you can do is "take it" (implicitly, in exclusive mode), "try to take it" (again, implicitly, in exclusive mode), or "wait for it to be released" (i.e. share lock and then release). Another idea is to try to handle those three types and leave the tuple locking problem for another day. I suggest that a good thing to do more or less immediately, regardless of when this patch ends up being ready, would be to insert an insertion that LockAcquire() is never called while holding a lock of one of these types. If that assertion ever fails, then the whole theory that these lock types don't need deadlock detection is wrong, and we'd like to find out about that sooner or later. On the details of the patch, it appears that RelExtLockAcquire() executes the wait-for-lock code with the partition lock held, and then continues to hold the partition lock for the entire time that the relation extension lock is held. That not only makes all code that runs while holding the lock non-interruptible but makes a lot of the rest of this code pointless. How is any of this atomics code going to be reached by more than one process at the same time if the entire bucket is exclusive-locked? I would guess that the concurrency is not very good here for the same reason. Of course, just releasing the bucket lock wouldn't be right either, because then ext_lock might go away while we've got a pointer to it, which wouldn't be good. I think you could make this work if each lock had both a locker count and a pin count, and the object can only be removed when the pin_count is 0. So the lock algorithm would look like this: - Acquire the partition LWLock. - Find the item of interest, creating it if necessary. If out of memory for more elements, sweep through the table and reclaim 0-pin-count entries, then retry. - Increment the pin count. - Attempt to acquire the lock atomically; if we succeed, release the partition lock and return. - If this was a conditional-acquire, then decrement the pin count, release the partition lock and return. - Release the partition lock. - Sleep on the condition variable until we manage to atomically acquire the lock. The unlock algorithm would just decrement the pin count and, if the resulting value is non-zero, broadcast on the condition variable. Although I think this will work, I'm not sure this is actually a great algorithm. Every lock acquisition has to take and release the partition lock, use at least two more atomic ops (to take the pin and the lock), and search a hash table. I don't think that's going to be staggeringly fast. Maybe it's OK. It's not that much worse, possibly not any worse, than what the main lock manager does now. However, especially if we implement a solution specific to relation locks, it seems like it would be better if we could somehow optimize based on the facts that (1) many relation locks will not conflict and (2) it's very common for the same backend to take and release the same extension lock over and over again. I don't have a specific proposal right now. Whatever we end up with, I think we should write some kind of a test harness to benchmark the number of acquire/release cycles per second that we can do with the current relation extension lock system vs. the proposed new system. Ideally, we'd be faster, since we're proposing a more specialized mechanism. But at least we should not be slower. pgbench isn't a good test because the relation extension lock will barely be taken let alone contended; we need to check something like parallel copies into the same table to see any effect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Since the previous patch conflicts with current HEAD, I attached the >> updated patch for next CF. > > I think we should back up here and ask ourselves a couple of questions: Thank you for summarizing of the purpose and discussion of this patch. > 1. What are we trying to accomplish here? > > 2. Is this the best way to accomplish it? > > To the first question, the problem as I understand it as follows: > Heavyweight locks don't conflict between members of a parallel group. > However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, > LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases > don't arise, because parallel operations are strictly read-only > (except for inserts by the leader into a just-created table, when only > one member of the group can be taking the lock anyway). However, once > we allow writes, they become possible, so some solution is needed. > > To the second question, there are a couple of ways we could fix this. > First, we could continue to allow these locks to be taken in the > heavyweight lock manager, but make them conflict even between members > of the same lock group. This is, however, complicated. A significant > problem (or so I think) is that the deadlock detector logic, which is > already quite hard to test, will become even more complicated, since > wait edges between members of a lock group need to exist at some times > and not other times. Moreover, to the best of my knowledge, the > increased complexity would have no benefit, because it doesn't look to > me like we ever take any other heavyweight lock while holding one of > these four kinds of locks. Therefore, no deadlock can occur: if we're > waiting for one of these locks, the process that holds it is not > waiting for any other heavyweight lock. This gives rise to a second > idea: move these locks out of the heavyweight lock manager and handle > them with separate code that does not have deadlock detection and > doesn't need as many lock modes. I think that idea is basically > sound, although it's possibly not the only sound idea. I'm on the same page. > > However, that makes me wonder whether we shouldn't be a bit more > aggressive with this patch: why JUST relation extension locks? Why > not all four types of locks listed above? Actually, tuple locks are a > bit sticky, because they have four lock modes. The other three kinds > are very similar -- all you can do is "take it" (implicitly, in > exclusive mode), "try to take it" (again, implicitly, in exclusive > mode), or "wait for it to be released" (i.e. share lock and then > release). Another idea is to try to handle those three types and > leave the tuple locking problem for another day. > > I suggest that a good thing to do more or less immediately, regardless > of when this patch ends up being ready, would be to insert an > insertion that LockAcquire() is never called while holding a lock of > one of these types. If that assertion ever fails, then the whole > theory that these lock types don't need deadlock detection is wrong, > and we'd like to find out about that sooner or later. I understood. I'll check that first. If this direction has no problem and we changed these three locks so that it uses new lock mechanism, we'll not be able to use these locks at the same time. Since it also means that we impose a limitation to the future we should think carefully about it. We can implement the deadlock detection mechanism for it again but it doesn't make sense. > > On the details of the patch, it appears that RelExtLockAcquire() > executes the wait-for-lock code with the partition lock held, and then > continues to hold the partition lock for the entire time that the > relation extension lock is held. That not only makes all code that > runs while holding the lock non-interruptible but makes a lot of the > rest of this code pointless. How is any of this atomics code going to > be reached by more than one process at the same time if the entire > bucket is exclusive-locked? I would guess that the concurrency is not > very good here for the same reason. Of course, just releasing the > bucket lock wouldn't be right either, because then ext_lock might go > away while we've got a pointer to it, which wouldn't be good. I think > you could make this work if each lock had both a locker count and a > pin count, and the object can only be removed when the pin_count is 0. > So the lock algorithm would look like this: > > - Acquire the partition LWLock. > - Find the item of interest, creating it if necessary. If out of > memory for more elements, sweep through the table and reclaim > 0-pin-count entries, then retry. > - Increment the pin count. > - Attempt to acquire the lock atomically; if we succeed, release the > partition lock and return. > - If this was a conditional-acquire, then decrement the pin count, > release the partition lock and return. > - Release the partition lock. > - Sleep on the condition variable until we manage to atomically > acquire the lock. > > The unlock algorithm would just decrement the pin count and, if the > resulting value is non-zero, broadcast on the condition variable. Thank you for the suggestion! > Although I think this will work, I'm not sure this is actually a great > algorithm. Every lock acquisition has to take and release the > partition lock, use at least two more atomic ops (to take the pin and > the lock), and search a hash table. I don't think that's going to be > staggeringly fast. Maybe it's OK. It's not that much worse, possibly > not any worse, than what the main lock manager does now. However, > especially if we implement a solution specific to relation locks, it > seems like it would be better if we could somehow optimize based on > the facts that (1) many relation locks will not conflict and (2) it's > very common for the same backend to take and release the same > extension lock over and over again. I don't have a specific proposal > right now. Yeah, we can optimize based on the purpose of the solution. In either case I should answer the above question first. > > Whatever we end up with, I think we should write some kind of a test > harness to benchmark the number of acquire/release cycles per second > that we can do with the current relation extension lock system vs. the > proposed new system. Ideally, we'd be faster, since we're proposing a > more specialized mechanism. But at least we should not be slower. > pgbench isn't a good test because the relation extension lock will > barely be taken let alone contended; we need to check something like > parallel copies into the same table to see any effect. > I did a benchmark using a custom script that always updates the primary key (disabling HOT updates). But parallel copies into the same tale would also be good. Thank you. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, Oct 30, 2017 at 3:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Since the previous patch conflicts with current HEAD, I attached the >>> updated patch for next CF. >> >> I think we should back up here and ask ourselves a couple of questions: > > Thank you for summarizing of the purpose and discussion of this patch. > >> 1. What are we trying to accomplish here? >> >> 2. Is this the best way to accomplish it? >> >> To the first question, the problem as I understand it as follows: >> Heavyweight locks don't conflict between members of a parallel group. >> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, >> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases >> don't arise, because parallel operations are strictly read-only >> (except for inserts by the leader into a just-created table, when only >> one member of the group can be taking the lock anyway). However, once >> we allow writes, they become possible, so some solution is needed. >> >> To the second question, there are a couple of ways we could fix this. >> First, we could continue to allow these locks to be taken in the >> heavyweight lock manager, but make them conflict even between members >> of the same lock group. This is, however, complicated. A significant >> problem (or so I think) is that the deadlock detector logic, which is >> already quite hard to test, will become even more complicated, since >> wait edges between members of a lock group need to exist at some times >> and not other times. Moreover, to the best of my knowledge, the >> increased complexity would have no benefit, because it doesn't look to >> me like we ever take any other heavyweight lock while holding one of >> these four kinds of locks. Therefore, no deadlock can occur: if we're >> waiting for one of these locks, the process that holds it is not >> waiting for any other heavyweight lock. This gives rise to a second >> idea: move these locks out of the heavyweight lock manager and handle >> them with separate code that does not have deadlock detection and >> doesn't need as many lock modes. I think that idea is basically >> sound, although it's possibly not the only sound idea. > > I'm on the same page. > >> >> However, that makes me wonder whether we shouldn't be a bit more >> aggressive with this patch: why JUST relation extension locks? Why >> not all four types of locks listed above? Actually, tuple locks are a >> bit sticky, because they have four lock modes. The other three kinds >> are very similar -- all you can do is "take it" (implicitly, in >> exclusive mode), "try to take it" (again, implicitly, in exclusive >> mode), or "wait for it to be released" (i.e. share lock and then >> release). Another idea is to try to handle those three types and >> leave the tuple locking problem for another day. >> >> I suggest that a good thing to do more or less immediately, regardless >> of when this patch ends up being ready, would be to insert an >> insertion that LockAcquire() is never called while holding a lock of >> one of these types. If that assertion ever fails, then the whole >> theory that these lock types don't need deadlock detection is wrong, >> and we'd like to find out about that sooner or later. > > I understood. I'll check that first. I've checked whether LockAcquire is called while holding a lock of one of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that we cannot move these four lock types together out of heavy-weight lock, but can move only the relation extension lock with tricks. Here is detail of the survey. * LOCKTAG_RELATION_EXTENSION There is a path that LockRelationForExtension() could be called while holding another relation extension lock. In brin_getinsertbuffer(), we acquire a relation extension lock for a index relation and could initialize a new buffer (brin_initailize_empty_new_buffer()). During initializing a new buffer, we call RecordPageWithFreeSpace() which eventually can call fsm_readbuf(rel, addr, true) where the third argument is "extend". We can process this problem by having the list (or local hash) of acquired locks and skip acquiring the lock if already had. For other call paths calling LockRelationForExtension, I don't see any problem. * LOCKTAG_PAGE, LOCKTAG_TUPLE, LOCKTAG_SPECULATIVE_INSERTION There is a path that we can acquire a relation extension lock while holding these lock. For LOCKTAG_PAGE, in ginInsertCleanup() we acquire a page lock for the meta page and process the pending list which could acquire a relation extension lock for a index relation. For LOCKTAG_TUPLE, in heap_update() we acquire a tuple lock and could call RelationGetBufferForTuple(). For LOCKTAG_SPECULATIVE_INSERTION, in ExecInsert() we acquire a speculative insertion lock and call heap_insert and ExecInsertIndexTuples(). The operation that is called while holding each lock type can acquire a relation extension lock. Also the following is the list of places where we call LockAcquire() with four lock types (result of git grep "XXX"). I've checked based on the following list. * LockRelationForExtension() contrib/bloom/blutils.c: LockRelationForExtension(index, ExclusiveLock); contrib/pgstattuple/pgstattuple.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/access/brin/brin_pageops.c: LockRelationForExtension(idxrel, ShareLock); src/backend/access/brin/brin_pageops.c: LockRelationForExtension(irel, ExclusiveLock); src/backend/access/brin/brin_revmap.c: LockRelationForExtension(irel, ExclusiveLock); src/backend/access/gin/ginutil.c: LockRelationForExtension(index, ExclusiveLock); src/backend/access/gin/ginvacuum.c: LockRelationForExtension(index, ExclusiveLock); src/backend/access/gin/ginvacuum.c: LockRelationForExtension(index, ExclusiveLock); src/backend/access/gist/gistutil.c: LockRelationForExtension(r, ExclusiveLock); src/backend/access/gist/gistvacuum.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/access/gist/gistvacuum.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/access/heap/hio.c: LockRelationForExtension(relation, ExclusiveLock); src/backend/access/heap/hio.c: LockRelationForExtension(relation, ExclusiveLock); src/backend/access/heap/visibilitymap.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/access/nbtree/nbtpage.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/access/nbtree/nbtree.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/access/spgist/spgutils.c: LockRelationForExtension(index, ExclusiveLock); src/backend/access/spgist/spgvacuum.c: LockRelationForExtension(index, ExclusiveLock); src/backend/commands/vacuumlazy.c: LockRelationForExtension(onerel, ExclusiveLock); src/backend/storage/freespace/freespace.c: LockRelationForExtension(rel, ExclusiveLock); src/backend/storage/lmgr/lmgr.c:LockRelationForExtension(Relation relation, LOCKMODE lockmode) * ConditionalLockRelationForExtension src/backend/access/heap/hio.c: else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock)) src/backend/storage/lmgr/lmgr.c:ConditionalLockRelationForExtension(Relation relation, LOCKMODE lockmode) * LockPage src/backend/access/gin/ginfast.c: LockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); * ConditionalLockPage src/backend/access/gin/ginfast.c: if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) * LockTuple src/backend/access/heap/heapam.c: LockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock) * ConditionalLockTuple src/backend/access/heap/heapam.c: ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock) src/backend/storage/lmgr/lmgr.c:ConditionalLockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode) * SpeculativeInsertionLockAcquire src/backend/executor/nodeModifyTable.c: specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId()); > If this direction has no problem > and we changed these three locks so that it uses new lock mechanism, > we'll not be able to use these locks at the same time. Since it also > means that we impose a limitation to the future we should think > carefully about it. We can implement the deadlock detection mechanism > for it again but it doesn't make sense. > >> >> On the details of the patch, it appears that RelExtLockAcquire() >> executes the wait-for-lock code with the partition lock held, and then >> continues to hold the partition lock for the entire time that the >> relation extension lock is held. That not only makes all code that >> runs while holding the lock non-interruptible but makes a lot of the >> rest of this code pointless. How is any of this atomics code going to >> be reached by more than one process at the same time if the entire >> bucket is exclusive-locked? I would guess that the concurrency is not >> very good here for the same reason. Of course, just releasing the >> bucket lock wouldn't be right either, because then ext_lock might go >> away while we've got a pointer to it, which wouldn't be good. I think >> you could make this work if each lock had both a locker count and a >> pin count, and the object can only be removed when the pin_count is 0. >> So the lock algorithm would look like this: >> >> - Acquire the partition LWLock. >> - Find the item of interest, creating it if necessary. If out of >> memory for more elements, sweep through the table and reclaim >> 0-pin-count entries, then retry. >> - Increment the pin count. >> - Attempt to acquire the lock atomically; if we succeed, release the >> partition lock and return. >> - If this was a conditional-acquire, then decrement the pin count, >> release the partition lock and return. >> - Release the partition lock. >> - Sleep on the condition variable until we manage to atomically >> acquire the lock. >> >> The unlock algorithm would just decrement the pin count and, if the >> resulting value is non-zero, broadcast on the condition variable. > > Thank you for the suggestion! > >> Although I think this will work, I'm not sure this is actually a great >> algorithm. Every lock acquisition has to take and release the >> partition lock, use at least two more atomic ops (to take the pin and >> the lock), and search a hash table. I don't think that's going to be >> staggeringly fast. Maybe it's OK. It's not that much worse, possibly >> not any worse, than what the main lock manager does now. However, >> especially if we implement a solution specific to relation locks, it >> seems like it would be better if we could somehow optimize based on >> the facts that (1) many relation locks will not conflict and (2) it's >> very common for the same backend to take and release the same >> extension lock over and over again. I don't have a specific proposal >> right now. > > Yeah, we can optimize based on the purpose of the solution. In either > case I should answer the above question first. > >> >> Whatever we end up with, I think we should write some kind of a test >> harness to benchmark the number of acquire/release cycles per second >> that we can do with the current relation extension lock system vs. the >> proposed new system. Ideally, we'd be faster, since we're proposing a >> more specialized mechanism. But at least we should not be slower. >> pgbench isn't a good test because the relation extension lock will >> barely be taken let alone contended; we need to check something like >> parallel copies into the same table to see any effect. >> > > I did a benchmark using a custom script that always updates the > primary key (disabling HOT updates). But parallel copies into the same > tale would also be good. Thank you. > Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> I suggest that a good thing to do more or less immediately, regardless >>> of when this patch ends up being ready, would be to insert an >>> insertion that LockAcquire() is never called while holding a lock of >>> one of these types. If that assertion ever fails, then the whole >>> theory that these lock types don't need deadlock detection is wrong, >>> and we'd like to find out about that sooner or later. >> >> I understood. I'll check that first. > > I've checked whether LockAcquire is called while holding a lock of one > of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, > LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that > we cannot move these four lock types together out of heavy-weight > lock, but can move only the relation extension lock with tricks. > > Here is detail of the survey. Thanks for these details, but I'm not sure I fully understand. > * LOCKTAG_RELATION_EXTENSION > There is a path that LockRelationForExtension() could be called while > holding another relation extension lock. In brin_getinsertbuffer(), we > acquire a relation extension lock for a index relation and could > initialize a new buffer (brin_initailize_empty_new_buffer()). During > initializing a new buffer, we call RecordPageWithFreeSpace() which > eventually can call fsm_readbuf(rel, addr, true) where the third > argument is "extend". We can process this problem by having the list > (or local hash) of acquired locks and skip acquiring the lock if > already had. For other call paths calling LockRelationForExtension, I > don't see any problem. Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock? Basically, what matters here in the end is whether we can articulate a deadlock-proof rule around the order in which these locks are acquired. The simplest such rule would be "you can only acquire one lock of any of these types at a time, and you can't subsequently acquire a heavyweight lock". But a more complicated rule would be OK too, e.g. "you can acquire as many heavyweight locks as you want, and after that you can optionally acquire one page, tuple, or speculative token lock, and after that you can acquire a relation extension lock". The latter rule, although more complex, is still deadlock-proof, because the heavyweight locks still use the deadlock detector, and the rest has a consistent order of lock acquisition that precludes one backend taking A then B while another backend takes B then A. I'm not entirely clear whether your survey leads us to a place where we can articulate such a deadlock-proof rule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Nov 8, 2017 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> I suggest that a good thing to do more or less immediately, regardless >>>> of when this patch ends up being ready, would be to insert an >>>> insertion that LockAcquire() is never called while holding a lock of >>>> one of these types. If that assertion ever fails, then the whole >>>> theory that these lock types don't need deadlock detection is wrong, >>>> and we'd like to find out about that sooner or later. >>> >>> I understood. I'll check that first. >> >> I've checked whether LockAcquire is called while holding a lock of one >> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, >> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that >> we cannot move these four lock types together out of heavy-weight >> lock, but can move only the relation extension lock with tricks. >> >> Here is detail of the survey. > > Thanks for these details, but I'm not sure I fully understand. > >> * LOCKTAG_RELATION_EXTENSION >> There is a path that LockRelationForExtension() could be called while >> holding another relation extension lock. In brin_getinsertbuffer(), we >> acquire a relation extension lock for a index relation and could >> initialize a new buffer (brin_initailize_empty_new_buffer()). During >> initializing a new buffer, we call RecordPageWithFreeSpace() which >> eventually can call fsm_readbuf(rel, addr, true) where the third >> argument is "extend". We can process this problem by having the list >> (or local hash) of acquired locks and skip acquiring the lock if >> already had. For other call paths calling LockRelationForExtension, I >> don't see any problem. > > Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock? No, I meant fsm_readbuf(rel,addr,true) can acquire a relation extension lock. So it's not problem. > Basically, what matters here in the end is whether we can articulate a > deadlock-proof rule around the order in which these locks are > acquired. You're right, my survey was not enough to make a decision. As far as the acquiring these four lock types goes, there are two call paths that acquire any type of locks while holding another type of lock. The one is that acquiring a relation extension lock and then acquiring a relation extension lock for the same relation again. As explained before, this can be resolved by remembering the holding lock (perhaps holding only last one is enough). Another is that acquiring either a tuple lock, a page lock or a speculative insertion lock and then acquiring a relation extension lock. In the second case, we try to acquire these two locks in the same order; acquiring 3 types lock and then extension lock. So it's not problem if we apply the rule that is that we disallow to try acquiring these three lock types while holding any relation extension lock. Also, as far as I surveyed there is no path to acquire a relation lock while holding other 3 type locks. > The simplest such rule would be "you can only acquire one > lock of any of these types at a time, and you can't subsequently > acquire a heavyweight lock". But a more complicated rule would be OK > too, e.g. "you can acquire as many heavyweight locks as you want, and > after that you can optionally acquire one page, tuple, or speculative > token lock, and after that you can acquire a relation extension lock". > The latter rule, although more complex, is still deadlock-proof, > because the heavyweight locks still use the deadlock detector, and the > rest has a consistent order of lock acquisition that precludes one > backend taking A then B while another backend takes B then A. I'm not > entirely clear whether your survey leads us to a place where we can > articulate such a deadlock-proof rule. Speaking of the acquiring these four lock types and heavy weight lock, there obviously is a call path to acquire any of four lock types while holding a heavy weight lock. In reverse, there also is a call path that we acquire a heavy weight lock while holding any of four lock types. The call path I found is that in heap_delete we acquire a tuple lock and call XactLockTableWait or MultiXactIdWait which eventually could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent transactions finish. But IIUC since these functions acquire the lock for the concurrent transaction's transaction id, deadlocks doesn't happen. However, there might be other similar call paths if I'm missing something. For example, we do some operations that might acquire any heavy weight locks other than LOCKTAG_TRANSACTION, while holding a page lock (in ginInsertCleanup) or holding a specualtive insertion lock (in nodeModifyTable). To summary, I think we can put the following rules in order to move four lock types out of heavy weight lock. 1. Do not acquire either a tuple lock, a page lock or a speculative insertion lock while holding a extension lock. 2. Do not acquire any heavy weight lock except for LOCKTAG_TRANSACTION while holding any of these four lock types. Also I'm concerned that it imposes the rules for developers which is difficult to check statically. We can put several assertions to source code but it's hard to test the all possible paths by regression tests. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Wed, Nov 8, 2017 at 9:40 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Speaking of the acquiring these four lock types and heavy weight lock, > there obviously is a call path to acquire any of four lock types while > holding a heavy weight lock. In reverse, there also is a call path > that we acquire a heavy weight lock while holding any of four lock > types. The call path I found is that in heap_delete we acquire a tuple > lock and call XactLockTableWait or MultiXactIdWait which eventually > could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent > transactions finish. But IIUC since these functions acquire the lock > for the concurrent transaction's transaction id, deadlocks doesn't > happen. No, that's not right. Now that you mention it, I realize that tuple locks can definitely cause deadlocks. Example: setup: rhaas=# create table foo (a int, b text); CREATE TABLE rhaas=# create table bar (a int, b text); CREATE TABLE rhaas=# insert into foo values (1, 'hoge'); INSERT 0 1 session 1: rhaas=# begin; BEGIN rhaas=# update foo set b = 'hogehoge' where a = 1; UPDATE 1 session 2: rhaas=# begin; BEGIN rhaas=# update foo set b = 'quux' where a = 1; session 3: rhaas=# begin; BEGIN rhaas=# lock bar; LOCK TABLE rhaas=# update foo set b = 'blarfle' where a = 1; back to session 1: rhaas=# select * from bar; ERROR: deadlock detected LINE 1: select * from bar; ^ DETAIL: Process 88868 waits for AccessShareLock on relation 16391 of database 16384; blocked by process 88845. Process 88845 waits for ExclusiveLock on tuple (0,1) of relation 16385 of database 16384; blocked by process 88840. Process 88840 waits for ShareLock on transaction 1193; blocked by process 88868. HINT: See server log for query details. So what I said before was wrong: we definitely cannot exclude tuple locks from deadlock detection. However, we might be able to handle the problem in another way: introduce a separate, parallel-query specific mechanism to avoid having two participants try to update and/or delete the same tuple at the same time - e.g. advertise the BufferTag + offset within the page in DSM, and if somebody else already has that same combination advertised, wait until they no longer do. That shouldn't ever deadlock, because the other worker shouldn't be able to find itself waiting for us while it's busy updating a tuple. After some further study, speculative insertion locks look problematic too. I'm worried about the code path ExecInsert() [taking speculative insertion locking] -> heap_insert -> heap_prepare_insert -> toast_insert_or_update -> toast_save_datum -> heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock). That sure looks like we can end up waiting for a relation lock while holding a speculative insertion lock, which seems to mean that speculative insertion locks are subject to at least theoretical deadlock hazards as well. Note that even if we were guaranteed to be holding the lock on the toast relation already at this point, it wouldn't fix the problem, because we might still have to build or refresh a relcache entry at this point, which could end up scanning (and thus locking) system catalogs. Any syscache lookup can theoretically take a lock, even though most of the time it doesn't, and thus taking a lock that has been removed from the deadlock detector (or, say, an lwlock) and then performing a syscache lookup with it held is not OK. So I don't think we can remove speculative insertion locks from the deadlock detector either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes: > No, that's not right. Now that you mention it, I realize that tuple > locks can definitely cause deadlocks. Example: Yeah. Foreign-key-related tuple locks are another rich source of examples. > ... So I don't > think we can remove speculative insertion locks from the deadlock > detector either. That scares me too. I think that relation extension can safely be transferred to some lower-level mechanism, because what has to be done while holding the lock is circumscribed and below the level of database operations (which might need other locks). These other ideas seem a lot riskier. (But see recent conversation where I discouraged Alvaro from holding extension locks across BRIN summarization activity. We'll need to look and make sure that nobody else has had creative ideas like that.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
Thank you for pointing out and comments. On Fri, Nov 10, 2017 at 12:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> No, that's not right. Now that you mention it, I realize that tuple >> locks can definitely cause deadlocks. Example: > > Yeah. Foreign-key-related tuple locks are another rich source of > examples. > >> ... So I don't >> think we can remove speculative insertion locks from the deadlock >> detector either. > > That scares me too. I think that relation extension can safely > be transferred to some lower-level mechanism, because what has to > be done while holding the lock is circumscribed and below the level > of database operations (which might need other locks). These other > ideas seem a lot riskier. > > (But see recent conversation where I discouraged Alvaro from holding > extension locks across BRIN summarization activity. We'll need to look > and make sure that nobody else has had creative ideas like that.) > It seems that we should focus on transferring only relation extension locks as a first step. The page locks would also be safe but it might require some fundamental changes related to fast insertion, which is discussed on other thread[1]. Also in this case I think it's better to focus on relation extension locks so that we can optimize the lower-level lock mechanism for it. So I'll update the patch based on the comment I got from Robert before. [1] https://www.postgresql.org/message-id/CAD21AoBLUSyiYKnTYtSAbC%2BF%3DXDjiaBrOUEGK%2BzUXdQ8owfPKw%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Tue, Nov 14, 2017 at 4:36 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for pointing out and comments. > > On Fri, Nov 10, 2017 at 12:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> No, that's not right. Now that you mention it, I realize that tuple >>> locks can definitely cause deadlocks. Example: >> >> Yeah. Foreign-key-related tuple locks are another rich source of >> examples. >> >>> ... So I don't >>> think we can remove speculative insertion locks from the deadlock >>> detector either. >> >> That scares me too. I think that relation extension can safely >> be transferred to some lower-level mechanism, because what has to >> be done while holding the lock is circumscribed and below the level >> of database operations (which might need other locks). These other >> ideas seem a lot riskier. >> >> (But see recent conversation where I discouraged Alvaro from holding >> extension locks across BRIN summarization activity. We'll need to look >> and make sure that nobody else has had creative ideas like that.) >> > > It seems that we should focus on transferring only relation extension > locks as a first step. The page locks would also be safe but it might > require some fundamental changes related to fast insertion, which is > discussed on other thread[1]. Also in this case I think it's better to > focus on relation extension locks so that we can optimize the > lower-level lock mechanism for it. > > So I'll update the patch based on the comment I got from Robert before. > Attached updated version patch. I've moved only relation extension locks out of heavy-weight lock as per discussion so far. I've done a write-heavy benchmark on my laptop; loading 24kB data to one table using COPY by 1 client, for 10 seconds. The through-put of patched is 10% better than current HEAD. The result of 5 times is the following. ----- PATCHED ----- tps = 178.791515 (excluding connections establishing) tps = 176.522693 (excluding connections establishing) tps = 168.705442 (excluding connections establishing) tps = 158.158009 (excluding connections establishing) tps = 161.145709 (excluding connections establishing) ----- HEAD ----- tps = 147.079803 (excluding connections establishing) tps = 149.079540 (excluding connections establishing) tps = 149.082275 (excluding connections establishing) tps = 148.255376 (excluding connections establishing) tps = 145.542552 (excluding connections establishing) Also I've done a micro-benchmark; calling LockRelationForExtension and UnlockRelationForExtension tightly in order to measure the number of lock/unlock cycles per second. The result is, PATCHED = 3.95892e+06 (cycles/sec) HEAD = 1.15284e+06 (cycles/sec) The patched is 3 times faster than current HEAD. Attached updated patch and the function I used for micro-benchmark. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Mon, Nov 20, 2017 at 5:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached updated version patch. I've moved only relation extension > locks out of heavy-weight lock as per discussion so far. > > I've done a write-heavy benchmark on my laptop; loading 24kB data to > one table using COPY by 1 client, for 10 seconds. The through-put of > patched is 10% better than current HEAD. The result of 5 times is the > following. > > ----- PATCHED ----- > tps = 178.791515 (excluding connections establishing) > tps = 176.522693 (excluding connections establishing) > tps = 168.705442 (excluding connections establishing) > tps = 158.158009 (excluding connections establishing) > tps = 161.145709 (excluding connections establishing) > > ----- HEAD ----- > tps = 147.079803 (excluding connections establishing) > tps = 149.079540 (excluding connections establishing) > tps = 149.082275 (excluding connections establishing) > tps = 148.255376 (excluding connections establishing) > tps = 145.542552 (excluding connections establishing) > > Also I've done a micro-benchmark; calling LockRelationForExtension and > UnlockRelationForExtension tightly in order to measure the number of > lock/unlock cycles per second. The result is, > PATCHED = 3.95892e+06 (cycles/sec) > HEAD = 1.15284e+06 (cycles/sec) > The patched is 3 times faster than current HEAD. > > Attached updated patch and the function I used for micro-benchmark. > Please review it. That's a nice speed-up. How about a preliminary patch that asserts that we never take another heavyweight lock while holding a relation extension lock? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Nov 22, 2017 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 20, 2017 at 5:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached updated version patch. I've moved only relation extension >> locks out of heavy-weight lock as per discussion so far. >> >> I've done a write-heavy benchmark on my laptop; loading 24kB data to >> one table using COPY by 1 client, for 10 seconds. The through-put of >> patched is 10% better than current HEAD. The result of 5 times is the >> following. >> >> ----- PATCHED ----- >> tps = 178.791515 (excluding connections establishing) >> tps = 176.522693 (excluding connections establishing) >> tps = 168.705442 (excluding connections establishing) >> tps = 158.158009 (excluding connections establishing) >> tps = 161.145709 (excluding connections establishing) >> >> ----- HEAD ----- >> tps = 147.079803 (excluding connections establishing) >> tps = 149.079540 (excluding connections establishing) >> tps = 149.082275 (excluding connections establishing) >> tps = 148.255376 (excluding connections establishing) >> tps = 145.542552 (excluding connections establishing) >> >> Also I've done a micro-benchmark; calling LockRelationForExtension and >> UnlockRelationForExtension tightly in order to measure the number of >> lock/unlock cycles per second. The result is, >> PATCHED = 3.95892e+06 (cycles/sec) >> HEAD = 1.15284e+06 (cycles/sec) >> The patched is 3 times faster than current HEAD. >> >> Attached updated patch and the function I used for micro-benchmark. >> Please review it. > > That's a nice speed-up. > > How about a preliminary patch that asserts that we never take another > heavyweight lock while holding a relation extension lock? > Agreed. Also, since we disallow to holding more than one locks of different relations at once I'll add an assertion for it as well. I think we no longer need to pass the lock level to UnloclRelationForExtension(). Now that relation extension lock will be simple we can release the lock in the mode that we used to acquire like LWLock. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Nov 22, 2017 at 11:32 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Nov 22, 2017 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Nov 20, 2017 at 5:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Attached updated version patch. I've moved only relation extension >>> locks out of heavy-weight lock as per discussion so far. >>> >>> I've done a write-heavy benchmark on my laptop; loading 24kB data to >>> one table using COPY by 1 client, for 10 seconds. The through-put of >>> patched is 10% better than current HEAD. The result of 5 times is the >>> following. >>> >>> ----- PATCHED ----- >>> tps = 178.791515 (excluding connections establishing) >>> tps = 176.522693 (excluding connections establishing) >>> tps = 168.705442 (excluding connections establishing) >>> tps = 158.158009 (excluding connections establishing) >>> tps = 161.145709 (excluding connections establishing) >>> >>> ----- HEAD ----- >>> tps = 147.079803 (excluding connections establishing) >>> tps = 149.079540 (excluding connections establishing) >>> tps = 149.082275 (excluding connections establishing) >>> tps = 148.255376 (excluding connections establishing) >>> tps = 145.542552 (excluding connections establishing) >>> >>> Also I've done a micro-benchmark; calling LockRelationForExtension and >>> UnlockRelationForExtension tightly in order to measure the number of >>> lock/unlock cycles per second. The result is, >>> PATCHED = 3.95892e+06 (cycles/sec) >>> HEAD = 1.15284e+06 (cycles/sec) >>> The patched is 3 times faster than current HEAD. >>> >>> Attached updated patch and the function I used for micro-benchmark. >>> Please review it. >> >> That's a nice speed-up. >> >> How about a preliminary patch that asserts that we never take another >> heavyweight lock while holding a relation extension lock? >> > > Agreed. Also, since we disallow to holding more than one locks of > different relations at once I'll add an assertion for it as well. > > I think we no longer need to pass the lock level to > UnloclRelationForExtension(). Now that relation extension lock will be > simple we can release the lock in the mode that we used to acquire > like LWLock. > Attached latest patch incorporated all comments so far. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Sun, Nov 26, 2017 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached latest patch incorporated all comments so far. Please review it. I think you only need RelExtLockReleaseAllI() where we currently have LockReleaseAll(DEFAULT_LOCKMETHOD, ...) not where we have LockReleaseAll(USER_LOCKMETHOD, ...). That's because relation extension locks use the default lock method, not USER_LOCKMETHOD. You need to update the table of wait events in the documentation. Please be sure to actually build the documentation afterwards and make sure it looks OK. Maybe the way event name should be RelationExtensionLock rather than just RelationExtension; we are not waiting for the extension itself. You have a typo/thinko in lmgr/README: confliction is not a word. Maybe you mean "When conflicts occur, lock waits are implemented using condition variables." Instead of having shared and exclusive locks, how about just having exclusive locks and introducing a new primitive operation that waits for the lock to be free and returns without acquiring it? That is essentially what brin_pageops.c is doing by taking and releasing the shared lock, and it's the only caller that takes anything but an exclusive lock. This seems like it would permit a considerable simplification of the locking mechanism, since there would then be only two possible states: 1 (locked) and 0 (not locked). In RelExtLockAcquire, I can't endorse this sort of coding: + if (relid == held_relextlock.lock->relid && + lockmode == held_relextlock.mode) + { + held_relextlock.nLocks++; + return true; + } + else + Assert(false); /* cannot happen */ Either convert the Assert() to an elog(), or change the if-statement to an Assert() of the same condition. I'd probably vote for the first one. As it is, if that Assert(false) is ever hit, chaos will (maybe) ensue. Let's make sure we nip any such problems in the bud. "successed" is not a good variable name; that's not an English word. + /* Could not got the lock, return iff in conditional locking */ + if (mustwait && conditional) Comment contradicts code. The comment is right; the code need not test mustwait, as that's already been done. The way this is hooked into the shared-memory initialization stuff looks strange in a number of ways: - Apparently, you're making initialize enough space for as many relation extension locks as the save of the main heavyweight lock table, but that seems like overkill. I'm not sure how much space we actually need for relation extension locks, but I bet it's a lot less than we need for regular heavyweight locks. - The error message emitted when you run out of space also claims that you can fix the issue by raising max_pred_locks_per_transaction, but that has no effect on the size of the main lock table or this table. - The changes to LockShmemSize() suppose that the hash table elements have a size equal to the size of an LWLock, but the actual size is sizeof(RELEXTLOCK). - I don't really know why the code for this should be daisy-chained off of the lock.c code inside of being called from CreateSharedMemoryAndSemaphores() just like (almost) all of the other subsystems. This code ignores the existence of multiple databases; RELEXTLOCK contains a relid, but no database OID. That's easy enough to fix, but it actually causes no problem unless, by bad luck, you have two relations with the same OID in different databases that are both being rapidly extended at the same time -- and even then, it's only a performance problem, not a correctness problem. In fact, I wonder if we shouldn't go further: instead of creating these RELEXTLOCK structures dynamically, let's just have a fixed number of them, say 1024. When we get a request to take a lock, hash <dboid, reloid> and take the result modulo 1024; lock the RELEXTLOCK at that offset in the array. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Michael Paquier
Date:
On Wed, Nov 29, 2017 at 5:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Nov 26, 2017 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached latest patch incorporated all comments so far. Please review it. > > I think you only need RelExtLockReleaseAllI() where we currently have > LockReleaseAll(DEFAULT_LOCKMETHOD, ...) not where we have > LockReleaseAll(USER_LOCKMETHOD, ...). That's because relation > extension locks use the default lock method, not USER_LOCKMETHOD. Latest review is fresh. I am moving this to next CF with "waiting on author" as status. -- Michael
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Thu, Nov 30, 2017 at 10:52 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Nov 29, 2017 at 5:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Nov 26, 2017 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Attached latest patch incorporated all comments so far. Please review it. >> >> I think you only need RelExtLockReleaseAllI() where we currently have >> LockReleaseAll(DEFAULT_LOCKMETHOD, ...) not where we have >> LockReleaseAll(USER_LOCKMETHOD, ...). That's because relation >> extension locks use the default lock method, not USER_LOCKMETHOD. > > Latest review is fresh. I am moving this to next CF with "waiting on > author" as status. Thank you Michael-san, I'll submit a latest patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Nov 29, 2017 at 5:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Nov 26, 2017 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached latest patch incorporated all comments so far. Please review it. > > I think you only need RelExtLockReleaseAllI() where we currently have > LockReleaseAll(DEFAULT_LOCKMETHOD, ...) not where we have > LockReleaseAll(USER_LOCKMETHOD, ...). That's because relation > extension locks use the default lock method, not USER_LOCKMETHOD. Fixed. > You need to update the table of wait events in the documentation. > Please be sure to actually build the documentation afterwards and make > sure it looks OK. Maybe the way event name should be > RelationExtensionLock rather than just RelationExtension; we are not > waiting for the extension itself. Fixed. I added both new wait_event and wait_event_type for relext lock. Also checked to pass a documentation build. > > You have a typo/thinko in lmgr/README: confliction is not a word. > Maybe you mean "When conflicts occur, lock waits are implemented using > condition variables." Fixed. > > Instead of having shared and exclusive locks, how about just having > exclusive locks and introducing a new primitive operation that waits > for the lock to be free and returns without acquiring it? That is > essentially what brin_pageops.c is doing by taking and releasing the > shared lock, and it's the only caller that takes anything but an > exclusive lock. This seems like it would permit a considerable > simplification of the locking mechanism, since there would then be > only two possible states: 1 (locked) and 0 (not locked). I think it's a good idea. With this change, the concurrency of executing brin_page_cleanup() would get decreased. But since brin_page_cleanup() is called only during vacuum so far it's no problem. I think we can process the code in vacuumlazy.c in the same manner as well. I've changed the patch so that it has only exclusive locks and introduces WaitForRelationExtensionLockToBeFree() function to wait for the the lock to be free. Also, now that we got rid of shared locks, I gathered lock state and pin count into a atomic uint32. > In RelExtLockAcquire, I can't endorse this sort of coding: > > + if (relid == held_relextlock.lock->relid && > + lockmode == held_relextlock.mode) > + { > + held_relextlock.nLocks++; > + return true; > + } > + else > + Assert(false); /* cannot happen */ > > Either convert the Assert() to an elog(), or change the if-statement > to an Assert() of the same condition. I'd probably vote for the first > one. As it is, if that Assert(false) is ever hit, chaos will (maybe) > ensue. Let's make sure we nip any such problems in the bud. Agreed, fixed. > > "successed" is not a good variable name; that's not an English word. Fixed. > + /* Could not got the lock, return iff in conditional locking */ > + if (mustwait && conditional) > > Comment contradicts code. The comment is right; the code need not > test mustwait, as that's already been done. Fixed. > The way this is hooked into the shared-memory initialization stuff > looks strange in a number of ways: > > - Apparently, you're making initialize enough space for as many > relation extension locks as the save of the main heavyweight lock > table, but that seems like overkill. I'm not sure how much space we > actually need for relation extension locks, but I bet it's a lot less > than we need for regular heavyweight locks. Agreed. The maximum of the number of relext locks is the number of relations on a database cluster, it's not relevant with the number of clients. Currently NLOCKENTS() counts the number of locks including relation extension lock. One idea is to introduce a new GUC to control the memory size, although the total memory size for locks will get increased. Probably we can make it behave similar to max_pred_locks_per_relation. Or, in order to not change total memory size for lock even after moved it out of heavyweight lock, we can divide NLOCKENTS() into heavyweight lock and relation extension lock (for example, 80% for heavyweight locks and 20% relation extension locks). But the latter would make parameter tuning hard. I'd vote for the first one to keep it simple. Any ideas? This part is not fixed in the patch yet. > - The error message emitted when you run out of space also claims that > you can fix the issue by raising max_pred_locks_per_transaction, but > that has no effect on the size of the main lock table or this table. Fixed. > - The changes to LockShmemSize() suppose that the hash table elements > have a size equal to the size of an LWLock, but the actual size is > sizeof(RELEXTLOCK). Fixed. > - I don't really know why the code for this should be daisy-chained > off of the lock.c code inside of being called from > CreateSharedMemoryAndSemaphores() just like (almost) all of the other > subsystems. Fixed. > > This code ignores the existence of multiple databases; RELEXTLOCK > contains a relid, but no database OID. That's easy enough to fix, but > it actually causes no problem unless, by bad luck, you have two > relations with the same OID in different databases that are both being > rapidly extended at the same time -- and even then, it's only a > performance problem, not a correctness problem. In fact, I wonder if > we shouldn't go further: instead of creating these RELEXTLOCK > structures dynamically, let's just have a fixed number of them, say > 1024. When we get a request to take a lock, hash <dboid, reloid> and > take the result modulo 1024; lock the RELEXTLOCK at that offset in the > array. > Attached the latest patch incorporated comments except for the fix of the memory size for relext lock. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Nov 30, 2017 at 6:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> This code ignores the existence of multiple databases; RELEXTLOCK >> contains a relid, but no database OID. That's easy enough to fix, but >> it actually causes no problem unless, by bad luck, you have two >> relations with the same OID in different databases that are both being >> rapidly extended at the same time -- and even then, it's only a >> performance problem, not a correctness problem. In fact, I wonder if >> we shouldn't go further: instead of creating these RELEXTLOCK >> structures dynamically, let's just have a fixed number of them, say >> 1024. When we get a request to take a lock, hash <dboid, reloid> and >> take the result modulo 1024; lock the RELEXTLOCK at that offset in the >> array. > > Attached the latest patch incorporated comments except for the fix of > the memory size for relext lock. It doesn't do anything about the comment of mine quoted above. Since it's only possible to hold one relation extension lock at a time, we don't really need the hash table here at all. We can just have an array of 1024 or so locks and map every <db,relid> pair on to one of them by hashing. The worst thing we'll get it some false contention, but that doesn't seem awful, and it would permit considerable further simplification of this code -- and maybe make it faster in the process, because we'd no longer need the hash table, or the pin count, or the extra LWLocks that protect the hash table. All we would have is atomic operations manipulating the lock state, which seems like it would be quite a lot faster and simpler. BTW, I think RelExtLockReleaseAll is broken because it shouldn't HOLD_INTERRUPTS(); I also think it's kind of silly to loop here when we know we can only hold one lock. Maybe RelExtLockRelease can take bool force and do if (force) held_relextlock.nLocks = 0; else held_relextlock.nLocks--. Or, better yet, have the caller adjust that value and then only call RelExtLockRelease() if we needed to release the lock in shared memory. That avoids needless branching. On a related note, is there any point in having both held_relextlock.nLocks and num_held_relextlocks? I think RelationExtensionLock should be a new type of IPC wait event, rather than a whole new category. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Dec 1, 2017 at 10:26 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Dec 1, 2017 at 3:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Nov 30, 2017 at 6:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> This code ignores the existence of multiple databases; RELEXTLOCK >>>> contains a relid, but no database OID. That's easy enough to fix, but >>>> it actually causes no problem unless, by bad luck, you have two >>>> relations with the same OID in different databases that are both being >>>> rapidly extended at the same time -- and even then, it's only a >>>> performance problem, not a correctness problem. In fact, I wonder if >>>> we shouldn't go further: instead of creating these RELEXTLOCK >>>> structures dynamically, let's just have a fixed number of them, say >>>> 1024. When we get a request to take a lock, hash <dboid, reloid> and >>>> take the result modulo 1024; lock the RELEXTLOCK at that offset in the >>>> array. >>> >>> Attached the latest patch incorporated comments except for the fix of >>> the memory size for relext lock. >> >> It doesn't do anything about the comment of mine quoted above. > > Sorry I'd missed the comment. > >> Since it's only possible to hold one relation extension lock at a time, we >> don't really need the hash table here at all. We can just have an >> array of 1024 or so locks and map every <db,relid> pair on to one of >> them by hashing. The worst thing we'll get it some false contention, >> but that doesn't seem awful, and it would permit considerable further >> simplification of this code -- and maybe make it faster in the >> process, because we'd no longer need the hash table, or the pin count, >> or the extra LWLocks that protect the hash table. All we would have >> is atomic operations manipulating the lock state, which seems like it >> would be quite a lot faster and simpler. > > Agreed. With this change, we will have an array of the struct that has > lock state and cv. The lock state has the wait count as well as the > status of lock. > >> BTW, I think RelExtLockReleaseAll is broken because it shouldn't >> HOLD_INTERRUPTS(); I also think it's kind of silly to loop here when >> we know we can only hold one lock. Maybe RelExtLockRelease can take >> bool force and do if (force) held_relextlock.nLocks = 0; else >> held_relextlock.nLocks--. Or, better yet, have the caller adjust that >> value and then only call RelExtLockRelease() if we needed to release >> the lock in shared memory. That avoids needless branching. > > Agreed. I'd vote for the latter. > >> On a >> related note, is there any point in having both held_relextlock.nLocks >> and num_held_relextlocks? > > num_held_relextlocks is actually unnecessary, will be removed. > >> I think RelationExtensionLock should be a new type of IPC wait event, >> rather than a whole new category. > > Hmm, I thought the wait event types of IPC seems related to events > that communicates to other processes for the same purpose, for example > parallel query, sync repli etc. On the other hand, the relation > extension locks are one kind of the lock mechanism. That's way I added > a new category. But maybe it can be fit to the IPC wait event. > Attached updated patch. I've done a performance measurement again on the same configuration as before since the acquiring/releasing procedures have been changed. ----- PATCHED ----- tps = 162.579320 (excluding connections establishing) tps = 162.144352 (excluding connections establishing) tps = 160.659403 (excluding connections establishing) tps = 161.213995 (excluding connections establishing) tps = 164.560460 (excluding connections establishing) ----- HEAD ----- tps = 157.738645 (excluding connections establishing) tps = 146.178575 (excluding connections establishing) tps = 143.788961 (excluding connections establishing) tps = 144.886594 (excluding connections establishing) tps = 145.496337 (excluding connections establishing) * micro-benchmark PATCHED = 1.61757e+07 (cycles/sec) HEAD = 1.48685e+06 (cycles/sec) The patched is 10 times faster than current HEAD. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Dec 1, 2017 at 10:14 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The patched is 10 times faster than current HEAD. Nifty. The first hunk in monitoring.sgml looks unnecessary. The second hunk breaks the formatting of the documentation; you need to adjust the "morerows" value from 9 to 8 here: <entry morerows="9"><literal>Lock</literal></entry> And similarly make this one 18: <entry morerows="17"><literal>IPC</literal></entry> +* Relation extension locks. The relation extension lock manager is +specialized in relation extensions. In PostgreSQL 11 relation extension +lock has been moved out of regular lock. It's similar to regular locks +but doesn't have full dead lock detection, group locking and multiple +lock modes. When conflicts occur, lock waits are implemented using +condition variables. Higher up, it says that "Postgres uses four types of interprocess locks", but because you added this, it's now a list of five items. I suggest moving the section on relation locks to the end and rewriting the text here as follows: Only one process can extend a relation at a time; we use a specialized lock manager for this purpose, which is much simpler than the regular lock manager. It is similar to the lightweight lock mechanism, but is ever simpler because there is only one lock mode and only one lock can be taken at a time. A process holding a relation extension lock is interruptible, unlike a process holding an LWLock. +#define RelExtLockTargetTagToIndex(relextlock_tag) \ + (tag_hash((const void *) relextlock_tag, sizeof(RelExtLockTag)) \ + % N_RELEXTLOCK_ENTS) How about using a static inline function for this? +#define SET_RELEXTLOCK_TAG(locktag, d, r) \ + ((locktag).dbid = (d), \ + (locktag).relid = (r)) How about getting rid of this and just doing the assignments instead? +#define RELEXT_VAL_LOCK ((uint32) ((1 << 25))) +#define RELEXT_LOCK_MASK ((uint32) ((1 << 25))) It seems confusing to have two macros for the same value and an almost-interchangeable purpose. Maybe just call it RELEXT_LOCK_BIT? +RelationExtensionLockWaiterCount(Relation relation) Hmm. This is sort of problematic, because with then new design we have no guarantee that the return value is actually accurate. I don't think that's a functional problem, but the optics aren't great. + if (held_relextlock.nLocks > 0) + { + RelExtLockRelease(held_relextlock.relid, true); + } Excess braces. +int +RelExtLockHoldingLockCount(void) +{ + return held_relextlock.nLocks; +} Maybe IsAnyRelationExtensionLockHeld(), returning bool? + /* If the lock is held by me, no need to wait */ If we already hold the lock, no need to wait. + * Luckily if we're trying to acquire the same lock as what we + * had held just before, we don't need to get the entry from the + * array by hashing. We're not trying to acquire a lock here. "If the last relation extension lock we touched is the same one for which we now need to wait, we can use our cached pointer to the lock instead of recomputing it." + registered_wait_list = true; Isn't it really registered_wait_count? The only list here is encapsulated in the CV. + /* Before retuning, decrement the wait count if we had been waiting */ returning -> returning, but I'd rewrite this as "Release any wait count we hold." + * Acquire the relation extension lock. If we're trying to acquire the same + * lock as what already held, we just increment nLock locally and return + * without touching the RelExtLock array. "Acquire a relation extension lock." I think you can forget the rest of this; it duplicates comments in the function body. + * Since we don't support dead lock detection for relation extension + * lock and don't control the order of lock acquisition, it cannot not + * happen that trying to take a new lock while holding an another lock. Since we don't do deadlock detection, caller must not try to take a new relation extension lock while already holding them. + if (relid == held_relextlock.relid) + { + held_relextlock.nLocks++; + return true; + } + else + elog(ERROR, + "cannot acquire relation extension locks for multiple relations at the same"); I'd prefer if (relid != held_relextlock.relid) elog(ERROR, ...) to save a level of indentation for the rest. + * If we're trying to acquire the same lock as what we just released + * we don't need to get the entry from the array by hashing. we expect + * to happen this case because it's a common case in acquisition of + * relation extension locks. "If the last relation extension lock we touched is the same one for we now need to acquire, we can use our cached pointer to the lock instead of recomputing it. This is likely to be a common case in practice." + /* Could not got the lock, return iff in conditional locking */ "locking conditionally" + ConditionVariableSleep(&(relextlock->cv), WAIT_EVENT_RELATION_EXTENSION_LOCK); Break line at comma + /* Decrement wait count if we had been waiting */ "Release any wait count we hold." + /* Always return true if not conditional lock */ "We got the lock!" + /* If force releasing, release all locks we're holding */ + if (force) + held_relextlock.nLocks = 0; + else + held_relextlock.nLocks--; + + Assert(held_relextlock.nLocks >= 0); + + /* Return if we're still holding the lock even after computation */ + if (held_relextlock.nLocks > 0) + return; I thought you were going to have the caller adjust nLocks? + /* Get RelExtLock entry from the array */ + SET_RELEXTLOCK_TAG(tag, MyDatabaseId, relid); + relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)]; This seems to make no sense in RelExtLockRelease -- isn't the cache guaranteed valid? + /* Wake up waiters if there is someone looking at this lock */ "If there may be waiters, wake them up." + * We allow to take a relation extension lock after took a + * heavy-weight lock. However, since we don't have dead lock + * detection mechanism between heavy-weight lock and relation + * extension lock it's not allowed taking an another heavy-weight + * lock while holding a relation extension lock. "Relation extension locks don't participate in deadlock detection, so make sure we don't try to acquire a heavyweight lock while holding one." + /* Release relation extension locks */ "If we hold a relation extension lock, release it." +/* Number of partitions the shared relation extension lock tables are divided into */ +#define LOG2_NUM_RELEXTLOCK_PARTITIONS 4 +#define NUM_RELEXTLOCK_PARTITIONS (1 << LOG2_NUM_RELEXTLOCK_PARTITIONS) Dead code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Dec 1, 2017 at 1:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > [ lots of minor comments ] When I took a break from sitting at the computer, I realized that I think this has a more serious problem: won't it permanently leak reference counts if someone hits ^C or an error occurs while the lock is held? I think it will -- it probably needs to do cleanup at the places where we do LWLockReleaseAll() that includes decrementing the shared refcount if necessary, rather than doing cleanup at the places we release heavyweight locks. I might be wrong about the details here -- this is off the top of my head. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Sat, Dec 2, 2017 at 3:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 1, 2017 at 10:14 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> The patched is 10 times faster than current HEAD. > > Nifty. Thank you for your dedicated reviewing the patch. > The first hunk in monitoring.sgml looks unnecessary. You meant the following hunk? diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 8d461c8..7aa7981 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -669,8 +669,8 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser Heavyweight locks, also known as lock manager locks or simply locks, primarily protect SQL-visible objects such as tables. However, they are also used to ensure mutual exclusion for certain internal - operations such as relation extension. <literal>wait_event</literal> will - identify the type of lock awaited. + operations such as waiting for a transaction to finish. + <literal>wait_event</literal> will identify the type of lock awaited. </para> </listitem> <listitem> I think that since the extension locks are no longer a part of heavyweight locks we should change the explanation. > The second hunk breaks the formatting of the documentation; you need > to adjust the "morerows" value from 9 to 8 here: > > <entry morerows="9"><literal>Lock</literal></entry> > > And similarly make this one 18: > > <entry morerows="17"><literal>IPC</literal></entry> Fixed. > +* Relation extension locks. The relation extension lock manager is > +specialized in relation extensions. In PostgreSQL 11 relation extension > +lock has been moved out of regular lock. It's similar to regular locks > +but doesn't have full dead lock detection, group locking and multiple > +lock modes. When conflicts occur, lock waits are implemented using > +condition variables. > > Higher up, it says that "Postgres uses four types of interprocess > locks", but because you added this, it's now a list of five items. Fixed. > I suggest moving the section on relation locks to the end and > rewriting the text here as follows: Only one process can extend a > relation at a time; we use a specialized lock manager for this > purpose, which is much simpler than the regular lock manager. It is > similar to the lightweight lock mechanism, but is ever simpler because > there is only one lock mode and only one lock can be taken at a time. > A process holding a relation extension lock is interruptible, unlike a > process holding an LWLock. Agreed and fixed. > +#define RelExtLockTargetTagToIndex(relextlock_tag) \ > + (tag_hash((const void *) relextlock_tag, sizeof(RelExtLockTag)) \ > + % N_RELEXTLOCK_ENTS) > > How about using a static inline function for this? Fixed. > +#define SET_RELEXTLOCK_TAG(locktag, d, r) \ > + ((locktag).dbid = (d), \ > + (locktag).relid = (r)) > > How about getting rid of this and just doing the assignments instead? Fixed. > +#define RELEXT_VAL_LOCK ((uint32) ((1 << 25))) > +#define RELEXT_LOCK_MASK ((uint32) ((1 << 25))) > > It seems confusing to have two macros for the same value and an > almost-interchangeable purpose. Maybe just call it RELEXT_LOCK_BIT? Fixed. > > +RelationExtensionLockWaiterCount(Relation relation) > > Hmm. This is sort of problematic, because with then new design we > have no guarantee that the return value is actually accurate. I don't > think that's a functional problem, but the optics aren't great. Yeah, with this patch we could overestimate it and then add extra blocks to the relation. Since the number of extra blocks is capped at 512 I think it would not become serious problem. > + if (held_relextlock.nLocks > 0) > + { > + RelExtLockRelease(held_relextlock.relid, true); > + } > > Excess braces. Fixed. > > +int > +RelExtLockHoldingLockCount(void) > +{ > + return held_relextlock.nLocks; > +} > > Maybe IsAnyRelationExtensionLockHeld(), returning bool? Fixed. > + /* If the lock is held by me, no need to wait */ > > If we already hold the lock, no need to wait. Fixed. > + * Luckily if we're trying to acquire the same lock as what we > + * had held just before, we don't need to get the entry from the > + * array by hashing. > > We're not trying to acquire a lock here. "If the last relation > extension lock we touched is the same one for which we now need to > wait, we can use our cached pointer to the lock instead of recomputing > it." Fixed. > + registered_wait_list = true; > > Isn't it really registered_wait_count? The only list here is > encapsulated in the CV. Changed to "waiting". > > + /* Before retuning, decrement the wait count if we had been waiting */ > > returning -> returning, but I'd rewrite this as "Release any wait > count we hold." Fixed. > + * Acquire the relation extension lock. If we're trying to acquire the same > + * lock as what already held, we just increment nLock locally and return > + * without touching the RelExtLock array. > > "Acquire a relation extension lock." I think you can forget the rest > of this; it duplicates comments in the function body. Fixed. > + * Since we don't support dead lock detection for relation extension > + * lock and don't control the order of lock acquisition, it cannot not > + * happen that trying to take a new lock while holding an another lock. > > Since we don't do deadlock detection, caller must not try to take a > new relation extension lock while already holding them. Fixed. > > + if (relid == held_relextlock.relid) > + { > + held_relextlock.nLocks++; > + return true; > + } > + else > + elog(ERROR, > + "cannot acquire relation extension locks for > multiple relations at the same"); > > I'd prefer if (relid != held_relextlock.relid) elog(ERROR, ...) to > save a level of indentation for the rest. Fixed. > > + * If we're trying to acquire the same lock as what we just released > + * we don't need to get the entry from the array by hashing. we expect > + * to happen this case because it's a common case in acquisition of > + * relation extension locks. > > "If the last relation extension lock we touched is the same one for we > now need to acquire, we can use our cached pointer to the lock instead > of recomputing it. This is likely to be a common case in practice." Fixed. > > + /* Could not got the lock, return iff in conditional locking */ > > "locking conditionally" Fixed. > + ConditionVariableSleep(&(relextlock->cv), > WAIT_EVENT_RELATION_EXTENSION_LOCK); > Break line at comma > Fixed. > + /* Decrement wait count if we had been waiting */ > > "Release any wait count we hold." Fixed. > + /* Always return true if not conditional lock */ > > "We got the lock!" Fixed. > + /* If force releasing, release all locks we're holding */ > + if (force) > + held_relextlock.nLocks = 0; > + else > + held_relextlock.nLocks--; > + > + Assert(held_relextlock.nLocks >= 0); > + > + /* Return if we're still holding the lock even after computation */ > + if (held_relextlock.nLocks > 0) > + return; > > I thought you were going to have the caller adjust nLocks? Yeah, I was supposed to change so but since we always release either one lock or all relext locks I thought it'd better to pass a bool rather than an int. > + /* Get RelExtLock entry from the array */ > + SET_RELEXTLOCK_TAG(tag, MyDatabaseId, relid); > + relextlock = &RelExtLockArray[RelExtLockTargetTagToIndex(&tag)]; > > This seems to make no sense in RelExtLockRelease -- isn't the cache > guaranteed valid? Right, fixed. > > + /* Wake up waiters if there is someone looking at this lock */ > > "If there may be waiters, wake them up." Fixed. > + * We allow to take a relation extension lock after took a > + * heavy-weight lock. However, since we don't have dead lock > + * detection mechanism between heavy-weight lock and relation > + * extension lock it's not allowed taking an another heavy-weight > + * lock while holding a relation extension lock. > > "Relation extension locks don't participate in deadlock detection, so > make sure we don't try to acquire a heavyweight lock while holding > one." Fixed. > + /* Release relation extension locks */ > > "If we hold a relation extension lock, release it." Fixed. > +/* Number of partitions the shared relation extension lock tables are > divided into */ > +#define LOG2_NUM_RELEXTLOCK_PARTITIONS 4 > +#define NUM_RELEXTLOCK_PARTITIONS (1 << LOG2_NUM_RELEXTLOCK_PARTITIONS) > > Dead code. Fixed. > When I took a break from sitting at the computer, I realized that I > think this has a more serious problem: won't it permanently leak > reference counts if someone hits ^C or an error occurs while the lock > is held? I think it will -- it probably needs to do cleanup at the > places where we do LWLockReleaseAll() that includes decrementing the > shared refcount if necessary, rather than doing cleanup at the places > we release heavyweight locks. > I might be wrong about the details here -- this is off the top of my head. Good catch. It can leak reference counts if someone hits ^C or an error occurs while waiting. Fixed in the latest patch. But since RelExtLockReleaseAll() is called even when such situations I think we don't need to change the place where releasing the all relext lock. We just moved it from heavyweight locks. Am I missing something? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Dec 8, 2017 at 3:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> The first hunk in monitoring.sgml looks unnecessary. > > You meant the following hunk? > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > index 8d461c8..7aa7981 100644 > --- a/doc/src/sgml/monitoring.sgml > +++ b/doc/src/sgml/monitoring.sgml > @@ -669,8 +669,8 @@ postgres 27093 0.0 0.0 30096 2752 ? > Ss 11:34 0:00 postgres: ser > Heavyweight locks, also known as lock manager locks or simply locks, > primarily protect SQL-visible objects such as tables. However, > they are also used to ensure mutual exclusion for certain internal > - operations such as relation extension. > <literal>wait_event</literal> will > - identify the type of lock awaited. > + operations such as waiting for a transaction to finish. > + <literal>wait_event</literal> will identify the type of lock awaited. > </para> > </listitem> > <listitem> > > I think that since the extension locks are no longer a part of > heavyweight locks we should change the explanation. Yes, you are right. >> +RelationExtensionLockWaiterCount(Relation relation) >> >> Hmm. This is sort of problematic, because with then new design we >> have no guarantee that the return value is actually accurate. I don't >> think that's a functional problem, but the optics aren't great. > > Yeah, with this patch we could overestimate it and then add extra > blocks to the relation. Since the number of extra blocks is capped at > 512 I think it would not become serious problem. How about renaming it EstimateNumberOfExtensionLockWaiters? >> + /* If force releasing, release all locks we're holding */ >> + if (force) >> + held_relextlock.nLocks = 0; >> + else >> + held_relextlock.nLocks--; >> + >> + Assert(held_relextlock.nLocks >= 0); >> + >> + /* Return if we're still holding the lock even after computation */ >> + if (held_relextlock.nLocks > 0) >> + return; >> >> I thought you were going to have the caller adjust nLocks? > > Yeah, I was supposed to change so but since we always release either > one lock or all relext locks I thought it'd better to pass a bool > rather than an int. I don't see why you need to pass either one. The caller can set held_relextlock.nLocks either with -- or = 0, and then call RelExtLockRelease() only if the resulting value is 0. >> When I took a break from sitting at the computer, I realized that I >> think this has a more serious problem: won't it permanently leak >> reference counts if someone hits ^C or an error occurs while the lock >> is held? I think it will -- it probably needs to do cleanup at the >> places where we do LWLockReleaseAll() that includes decrementing the >> shared refcount if necessary, rather than doing cleanup at the places >> we release heavyweight locks. >> I might be wrong about the details here -- this is off the top of my head. > > Good catch. It can leak reference counts if someone hits ^C or an > error occurs while waiting. Fixed in the latest patch. But since > RelExtLockReleaseAll() is called even when such situations I think we > don't need to change the place where releasing the all relext lock. We > just moved it from heavyweight locks. Am I missing something? Hmm, that might be an OK way to handle it. I don't see a problem off the top of my head. It might be clearer to rename it to RelExtLockCleanup() though, since it is not just releasing the lock but also any wait count we hold. +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ +#define RELEXT_WAIT_COUNT_MASK ((uint32) ((1 << 24) - 1)) Let's drop the comment here and instead add a StaticAssertStmt() that checks this. I am slightly puzzled, though. If I read this correctly, bits 0-23 are used for the waiter count, bit 24 is always 0, bit 25 indicates the presence or absence of an exclusive lock, and bits 26+ are always 0. That seems slightly odd. Shouldn't we either use the highest available bit for the locker (bit 31) or the lowest one (bit 24)? The former seems better, in case MAX_BACKENDS changes later. We could make RELEXT_WAIT_COUNT_MASK bigger too, just in case. + /* Make a lock tag */ + tag.dbid = MyDatabaseId; + tag.relid = relid; What about shared relations? I bet we need to use 0 in that case. Otherwise, if backends in two different databases try to extend the same shared relation at the same time, we'll (probably) fail to notice that they conflict. + * To avoid unnecessary recomputations of the hash code, we try to do this + * just once per function, and then pass it around as needed. we can + * extract the index number of RelExtLockArray. This is just a copy-and-paste from lock.c, but actually we have a more sophisticated scheme here. I think you can just drop this comment altogether, really. + return (tag_hash((const void *) locktag, sizeof(RelExtLockTag)) + % N_RELEXTLOCK_ENTS); I would drop the outermost set of parentheses. Is the cast to (const void *) really doing anything? + "cannot acquire relation extension locks for multiple relations at the same"); cannot simultaneously acquire more than one distinct relation lock? As you have it, you'd have to add the word "time" at the end, but my version is shorter. + /* Sleep until the lock is released */ Really, there's no guarantee that the lock will be released when we wake up. I think just /* Sleep until something happens, then recheck */ + lock_free = (oldstate & RELEXT_LOCK_BIT) == 0; + if (lock_free) + desired_state += RELEXT_LOCK_BIT; + + if (pg_atomic_compare_exchange_u32(&relextlock->state, + &oldstate, desired_state)) + { + if (lock_free) + return false; + else + return true; + } Hmm. If the lock is not free, we attempt to compare-and-swap anyway, but then return false? Why not just lock_free = (oldstate & RELEXT_LOCK_BIT) == 0; if (!lock_free) return true; if (pg_atomic_compare_exchange(&relextlock->state, &oldstate, oldstate | RELEXT_LOCK_BIT)) return false; + Assert(IsAnyRelationExtensionLockHeld() == 0); Since this is return bool now, it should just be Assert(!IsAnyRelationExtensionLockHeld()). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Sat, Dec 9, 2017 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 8, 2017 at 3:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> The first hunk in monitoring.sgml looks unnecessary. >> >> You meant the following hunk? >> >> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml >> index 8d461c8..7aa7981 100644 >> --- a/doc/src/sgml/monitoring.sgml >> +++ b/doc/src/sgml/monitoring.sgml >> @@ -669,8 +669,8 @@ postgres 27093 0.0 0.0 30096 2752 ? >> Ss 11:34 0:00 postgres: ser >> Heavyweight locks, also known as lock manager locks or simply locks, >> primarily protect SQL-visible objects such as tables. However, >> they are also used to ensure mutual exclusion for certain internal >> - operations such as relation extension. >> <literal>wait_event</literal> will >> - identify the type of lock awaited. >> + operations such as waiting for a transaction to finish. >> + <literal>wait_event</literal> will identify the type of lock awaited. >> </para> >> </listitem> >> <listitem> >> >> I think that since the extension locks are no longer a part of >> heavyweight locks we should change the explanation. > > Yes, you are right. > >>> +RelationExtensionLockWaiterCount(Relation relation) >>> >>> Hmm. This is sort of problematic, because with then new design we >>> have no guarantee that the return value is actually accurate. I don't >>> think that's a functional problem, but the optics aren't great. >> >> Yeah, with this patch we could overestimate it and then add extra >> blocks to the relation. Since the number of extra blocks is capped at >> 512 I think it would not become serious problem. > > How about renaming it EstimateNumberOfExtensionLockWaiters? Agreed, fixed. > >>> + /* If force releasing, release all locks we're holding */ >>> + if (force) >>> + held_relextlock.nLocks = 0; >>> + else >>> + held_relextlock.nLocks--; >>> + >>> + Assert(held_relextlock.nLocks >= 0); >>> + >>> + /* Return if we're still holding the lock even after computation */ >>> + if (held_relextlock.nLocks > 0) >>> + return; >>> >>> I thought you were going to have the caller adjust nLocks? >> >> Yeah, I was supposed to change so but since we always release either >> one lock or all relext locks I thought it'd better to pass a bool >> rather than an int. > > I don't see why you need to pass either one. The caller can set > held_relextlock.nLocks either with -- or = 0, and then call > RelExtLockRelease() only if the resulting value is 0. Fixed. > >>> When I took a break from sitting at the computer, I realized that I >>> think this has a more serious problem: won't it permanently leak >>> reference counts if someone hits ^C or an error occurs while the lock >>> is held? I think it will -- it probably needs to do cleanup at the >>> places where we do LWLockReleaseAll() that includes decrementing the >>> shared refcount if necessary, rather than doing cleanup at the places >>> we release heavyweight locks. >>> I might be wrong about the details here -- this is off the top of my head. >> >> Good catch. It can leak reference counts if someone hits ^C or an >> error occurs while waiting. Fixed in the latest patch. But since >> RelExtLockReleaseAll() is called even when such situations I think we >> don't need to change the place where releasing the all relext lock. We >> just moved it from heavyweight locks. Am I missing something? > > Hmm, that might be an OK way to handle it. I don't see a problem off > the top of my head. It might be clearer to rename it to > RelExtLockCleanup() though, since it is not just releasing the lock > but also any wait count we hold. Yeah, it seems better. Fixed. > +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ > +#define RELEXT_WAIT_COUNT_MASK ((uint32) ((1 << 24) - 1)) > > Let's drop the comment here and instead add a StaticAssertStmt() that > checks this. Fixed. I added StaticAssertStmt() to InitRelExtLocks(). > > I am slightly puzzled, though. If I read this correctly, bits 0-23 > are used for the waiter count, bit 24 is always 0, bit 25 indicates > the presence or absence of an exclusive lock, and bits 26+ are always > 0. That seems slightly odd. Shouldn't we either use the highest > available bit for the locker (bit 31) or the lowest one (bit 24)? The > former seems better, in case MAX_BACKENDS changes later. We could > make RELEXT_WAIT_COUNT_MASK bigger too, just in case. I agree with the former. Fixed. > + /* Make a lock tag */ > + tag.dbid = MyDatabaseId; > + tag.relid = relid; > > What about shared relations? I bet we need to use 0 in that case. > Otherwise, if backends in two different databases try to extend the > same shared relation at the same time, we'll (probably) fail to notice > that they conflict. > You're right. I changed it so that we set invalidOId to tag.dbid if the relation is shared relation. > + * To avoid unnecessary recomputations of the hash code, we try to do this > + * just once per function, and then pass it around as needed. we can > + * extract the index number of RelExtLockArray. > > This is just a copy-and-paste from lock.c, but actually we have a more > sophisticated scheme here. I think you can just drop this comment > altogether, really. Fixed. > > + return (tag_hash((const void *) locktag, sizeof(RelExtLockTag)) > + % N_RELEXTLOCK_ENTS); > > I would drop the outermost set of parentheses. Is the cast to (const > void *) really doing anything? > Fixed. > + "cannot acquire relation extension locks for > multiple relations at the same"); > > cannot simultaneously acquire more than one distinct relation lock? > As you have it, you'd have to add the word "time" at the end, but my > version is shorter. I wanted to mean, cannot acquire relation extension locks for multiple relations at the "time". Fixed. > > + /* Sleep until the lock is released */ > > Really, there's no guarantee that the lock will be released when we > wake up. I think just /* Sleep until something happens, then recheck > */ Fixed. > + lock_free = (oldstate & RELEXT_LOCK_BIT) == 0; > + if (lock_free) > + desired_state += RELEXT_LOCK_BIT; > + > + if (pg_atomic_compare_exchange_u32(&relextlock->state, > + &oldstate, desired_state)) > + { > + if (lock_free) > + return false; > + else > + return true; > + } > > Hmm. If the lock is not free, we attempt to compare-and-swap anyway, > but then return false? Why not just lock_free = (oldstate & > RELEXT_LOCK_BIT) == 0; if (!lock_free) return true; if > (pg_atomic_compare_exchange(&relextlock->state, &oldstate, oldstate | > RELEXT_LOCK_BIT)) return false; Fixed. > > + Assert(IsAnyRelationExtensionLockHeld() == 0); > > Since this is return bool now, it should just be > Assert(!IsAnyRelationExtensionLockHeld()). Fixed. Attached updated version patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Sun, Dec 10, 2017 at 11:51 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached updated version patch. Please review it. I went over this today; please find attached an updated version which I propose to commit. Changes: - Various formatting fixes, including running pgindent. - Various comment updates. - Make RELEXT_WAIT_COUNT_MASK equal RELEXT_LOCK_BIT - 1 rather than some unnecessarily smaller number. - In InitRelExtLocks, don't bother using mul_size; we already know it won't overflow, because we did the same thing in RelExtLockShmemSize. - When we run into an error trying to release a lock, log it as a WARNING and don't mark it as translatable. Follows lock.c. An ERROR here probably just recurses infinitely. - Don't bother passing OID to RelExtLockRelease. - Reorder functions a bit for (IMHO) better clarity. - Make UnlockRelationForExtension just use a single message for both failure modes. They are closely-enough related that I think that's fine. - Make WaitForRelationExtensionLockToBeFree complain if we already hold an extension lock. - In RelExtLockCleanup, clear held_relextlock.waiting. This would've made for a nasty bug. - Also in that function, assert that we don't hold both a lock and a wait count. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2017-12-11 15:15:50 -0500, Robert Haas wrote: > +* Relation extension locks. Only one process can extend a relation at > +a time; we use a specialized lock manager for this purpose, which is > +much simpler than the regular lock manager. It is similar to the > +lightweight lock mechanism, but is ever simpler because there is only > +one lock mode and only one lock can be taken at a time. A process holding > +a relation extension lock is interruptible, unlike a process holding an > +LWLock. > +/*------------------------------------------------------------------------- > + * > + * extension_lock.c > + * Relation extension lock manager > + * > + * This specialized lock manager is used only for relation extension > + * locks. Unlike the heavyweight lock manager, it doesn't provide > + * deadlock detection or group locking. Unlike lwlock.c, extension lock > + * waits are interruptible. Unlike both systems, there is only one lock > + * mode. > + * > + * False sharing is possible. We have a fixed-size array of locks, and > + * every database OID/relation OID combination is mapped to a slot in > + * the array. Therefore, if two processes try to extend relations that > + * map to the same array slot, they will contend even though it would > + * be OK to let both proceed at once. Since these locks are typically > + * taken only for very short periods of time, this doesn't seem likely > + * to be a big problem in practice. If it is, we could make the array > + * bigger. For me "very short periods of time" and journaled metadatachanging filesystem operations don't quite mesh. Language lawyering aside, this seems quite likely to bite us down the road. It's imo perfectly fine to say that there's only a limited number of file extension locks, but that there's a far from neglegible chance of conflict even without the array being full doesn't seem nice. Think this needs use some open addressing like conflict handling or something alike. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Mon, Dec 11, 2017 at 3:25 PM, Andres Freund <andres@anarazel.de> wrote: > For me "very short periods of time" and journaled metadatachanging > filesystem operations don't quite mesh. Language lawyering aside, this > seems quite likely to bite us down the road. > > It's imo perfectly fine to say that there's only a limited number of > file extension locks, but that there's a far from neglegible chance of > conflict even without the array being full doesn't seem nice. Think this > needs use some open addressing like conflict handling or something > alike. I guess we could consider that, but I'm not really convinced that it's solving a real problem. Right now, you start having meaningful chance of lock-manager lock contention when the number of concurrent processes in the system requesting heavyweight locks is still in the single digits, because there are only 16 lock-manager locks. With this, there are effectively 1024 partitions. Now I realize you're going to point out, not wrongly, that we're contending on the locks themselves rather than the locks protecting the locks, and that this makes everything worse because the hold time is much longer. Fair enough. On the other hand, what workload would actually be harmed? I think you basically have to imagine a lot of relations being extended simultaneously, like a parallel bulk load, and an underlying filesystem which performs individual operations slowly but scales really well. I'm slightly skeptical that's how real-world filesystems behave. It might be a good idea, though, to test how parallel bulk loading behaves with this patch applied, maybe even after reducing N_RELEXTLOCK_ENTS to simulate an unfortunate number of collisions. This isn't a zero-sum game. If we add collision resolution, we're going to slow down the ordinary uncontended case; the bookkeeping will get significantly more complicated. That is only worth doing if the current behavior produces pathological cases on workloads that are actually somewhat realistic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Tue, Dec 12, 2017 at 5:15 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Dec 10, 2017 at 11:51 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Attached updated version patch. Please review it. > > I went over this today; please find attached an updated version which > I propose to commit. > > Changes: > > - Various formatting fixes, including running pgindent. > > - Various comment updates. > > - Make RELEXT_WAIT_COUNT_MASK equal RELEXT_LOCK_BIT - 1 rather than > some unnecessarily smaller number. > > - In InitRelExtLocks, don't bother using mul_size; we already know it > won't overflow, because we did the same thing in RelExtLockShmemSize. > > - When we run into an error trying to release a lock, log it as a > WARNING and don't mark it as translatable. Follows lock.c. An ERROR > here probably just recurses infinitely. > > - Don't bother passing OID to RelExtLockRelease. > > - Reorder functions a bit for (IMHO) better clarity. > > - Make UnlockRelationForExtension just use a single message for both > failure modes. They are closely-enough related that I think that's > fine. > > - Make WaitForRelationExtensionLockToBeFree complain if we already > hold an extension lock. > > - In RelExtLockCleanup, clear held_relextlock.waiting. This would've > made for a nasty bug. > > - Also in that function, assert that we don't hold both a lock and a wait count. > Thank you for updating the patch. Here is two minor comments. + * we acquire the same relation extension lock repeatedly. nLocks is 0 is the + * number of times we've acquired that lock; Should it be "nLocks is the number of times we've acquired that lock:"? + /* Remember lock held by this backend */ + held_relextlock.relid = relid; + held_relextlock.lock = relextlock; + held_relextlock.nLocks = 1; We set held_relextlock.relid and held_relextlock.lock again. Can we remove them? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
On 2017-12-11 15:55:42 -0500, Robert Haas wrote: > On Mon, Dec 11, 2017 at 3:25 PM, Andres Freund <andres@anarazel.de> wrote: > > For me "very short periods of time" and journaled metadatachanging > > filesystem operations don't quite mesh. Language lawyering aside, this > > seems quite likely to bite us down the road. > > > > It's imo perfectly fine to say that there's only a limited number of > > file extension locks, but that there's a far from neglegible chance of > > conflict even without the array being full doesn't seem nice. Think this > > needs use some open addressing like conflict handling or something > > alike. > > I guess we could consider that, but I'm not really convinced that it's > solving a real problem. Right now, you start having meaningful chance > of lock-manager lock contention when the number of concurrent > processes in the system requesting heavyweight locks is still in the > single digits, because there are only 16 lock-manager locks. With > this, there are effectively 1024 partitions. > > Now I realize you're going to point out, not wrongly, that we're > contending on the locks themselves rather than the locks protecting > the locks, and that this makes everything worse because the hold time > is much longer. Indeed. > Fair enough. On the other hand, what workload would actually be > harmed? I think you basically have to imagine a lot of relations > being extended simultaneously, like a parallel bulk load, and an > underlying filesystem which performs individual operations slowly but > scales really well. I'm slightly skeptical that's how real-world > filesystems behave. Or just two independent relations on two different filesystems. > It might be a good idea, though, to test how parallel bulk loading > behaves with this patch applied, maybe even after reducing > N_RELEXTLOCK_ENTS to simulate an unfortunate number of collisions. Yea, that sounds like a good plan. Measure two COPYs to relations on different filesystems, reduce N_RELEXTLOCK_ENTS to 1, and measure performance. Then increase the concurrency of the copies to each relation. > This isn't a zero-sum game. If we add collision resolution, we're > going to slow down the ordinary uncontended case; the bookkeeping will > get significantly more complicated. That is only worth doing if the > current behavior produces pathological cases on workloads that are > actually somewhat realistic. Yea, measuring sounds like a good plan. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Mon, Dec 11, 2017 at 4:10 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for updating the patch. Here is two minor comments. > > + * we acquire the same relation extension lock repeatedly. nLocks is 0 is the > + * number of times we've acquired that lock; > > Should it be "nLocks is the number of times we've acquired that lock:"? Yes. > + /* Remember lock held by this backend */ > + held_relextlock.relid = relid; > + held_relextlock.lock = relextlock; > + held_relextlock.nLocks = 1; > > We set held_relextlock.relid and held_relextlock.lock again. Can we remove them? Yes. Can you also try the experiment Andres mentions: "Measure two COPYs to relations on different filesystems, reduce N_RELEXTLOCK_ENTS to 1, and measure performance. Then increase the concurrency of the copies to each relation." We want to see whether and how much this regresses performance in that case. It simulates the case of a hash collision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Dec 13, 2017 at 12:42 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 11, 2017 at 4:10 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Thank you for updating the patch. Here is two minor comments. >> >> + * we acquire the same relation extension lock repeatedly. nLocks is 0 is the >> + * number of times we've acquired that lock; >> >> Should it be "nLocks is the number of times we've acquired that lock:"? > > Yes. > >> + /* Remember lock held by this backend */ >> + held_relextlock.relid = relid; >> + held_relextlock.lock = relextlock; >> + held_relextlock.nLocks = 1; >> >> We set held_relextlock.relid and held_relextlock.lock again. Can we remove them? > > Yes. > > Can you also try the experiment Andres mentions: "Measure two COPYs to > relations on different filesystems, reduce N_RELEXTLOCK_ENTS to 1, and > measure performance. Yes. I'll measure the performance on such environment. > Then increase the concurrency of the copies to > each relation." We want to see whether and how much this regresses > performance in that case. It simulates the case of a hash collision. > When we add extra blocks on a relation do we access to the disk? I guess we just call lseek and write and don't access to the disk. If so the performance degradation regression might not be much. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
On 2017-12-13 16:02:45 +0900, Masahiko Sawada wrote: > When we add extra blocks on a relation do we access to the disk? I > guess we just call lseek and write and don't access to the disk. If so > the performance degradation regression might not be much. Usually changes in the file size require the filesystem to perform metadata operations, which in turn requires journaling on most FSs. Which'll often result in synchronous disk writes. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Dec 13, 2017 at 4:30 PM, Andres Freund <andres@anarazel.de> wrote: > On 2017-12-13 16:02:45 +0900, Masahiko Sawada wrote: >> When we add extra blocks on a relation do we access to the disk? I >> guess we just call lseek and write and don't access to the disk. If so >> the performance degradation regression might not be much. > > Usually changes in the file size require the filesystem to perform > metadata operations, which in turn requires journaling on most > FSs. Which'll often result in synchronous disk writes. > Thank you. I understood the reason why this measurement should use two different filesystems. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Dec 13, 2017 at 5:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Dec 13, 2017 at 4:30 PM, Andres Freund <andres@anarazel.de> wrote: >> On 2017-12-13 16:02:45 +0900, Masahiko Sawada wrote: >>> When we add extra blocks on a relation do we access to the disk? I >>> guess we just call lseek and write and don't access to the disk. If so >>> the performance degradation regression might not be much. >> >> Usually changes in the file size require the filesystem to perform >> metadata operations, which in turn requires journaling on most >> FSs. Which'll often result in synchronous disk writes. >> > > Thank you. I understood the reason why this measurement should use two > different filesystems. > Here is the result. I've measured the through-put with some cases on my virtual machine. Each client loads 48k file to each different relations located on either xfs filesystem or ext4 filesystem, for 30 sec. Case 1: COPYs to relations on different filessystems(xfs and ext4) and N_RELEXTLOCK_ENTS is 1024 clients = 2, avg = 296.2068 clients = 5, avg = 372.0707 clients = 10, avg = 389.8850 clients = 50, avg = 428.8050 Case 2: COPYs to relations on different filessystems(xfs and ext4) and N_RELEXTLOCK_ENTS is 1 clients = 2, avg = 294.3633 clients = 5, avg = 358.9364 clients = 10, avg = 383.6945 clients = 50, avg = 424.3687 And the result of current HEAD is following. clients = 2, avg = 284.9976 clients = 5, avg = 356.1726 clients = 10, avg = 375.9856 clients = 50, avg = 429.5745 In case2, the through-put got decreased compare to case 1 but it seems to be almost same as current HEAD. Because the speed of acquiring and releasing extension lock got x10 faster than current HEAD as I mentioned before, the performance degradation may not have gotten decreased than I expected even in case 2. Since my machine doesn't have enough resources the result of clients = 50 might not be a valid result. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Dec 14, 2017 at 5:45 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Here is the result. > I've measured the through-put with some cases on my virtual machine. > Each client loads 48k file to each different relations located on > either xfs filesystem or ext4 filesystem, for 30 sec. > > Case 1: COPYs to relations on different filessystems(xfs and ext4) and > N_RELEXTLOCK_ENTS is 1024 > > clients = 2, avg = 296.2068 > clients = 5, avg = 372.0707 > clients = 10, avg = 389.8850 > clients = 50, avg = 428.8050 > > Case 2: COPYs to relations on different filessystems(xfs and ext4) and > N_RELEXTLOCK_ENTS is 1 > > clients = 2, avg = 294.3633 > clients = 5, avg = 358.9364 > clients = 10, avg = 383.6945 > clients = 50, avg = 424.3687 > > And the result of current HEAD is following. > > clients = 2, avg = 284.9976 > clients = 5, avg = 356.1726 > clients = 10, avg = 375.9856 > clients = 50, avg = 429.5745 > > In case2, the through-put got decreased compare to case 1 but it seems > to be almost same as current HEAD. Because the speed of acquiring and > releasing extension lock got x10 faster than current HEAD as I > mentioned before, the performance degradation may not have gotten > decreased than I expected even in case 2. > Since my machine doesn't have enough resources the result of clients = > 50 might not be a valid result. I have to admit that result is surprising to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Sun, Dec 17, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 14, 2017 at 5:45 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Here is the result. >> I've measured the through-put with some cases on my virtual machine. >> Each client loads 48k file to each different relations located on >> either xfs filesystem or ext4 filesystem, for 30 sec. >> >> Case 1: COPYs to relations on different filessystems(xfs and ext4) and >> N_RELEXTLOCK_ENTS is 1024 >> >> clients = 2, avg = 296.2068 >> clients = 5, avg = 372.0707 >> clients = 10, avg = 389.8850 >> clients = 50, avg = 428.8050 >> >> Case 2: COPYs to relations on different filessystems(xfs and ext4) and >> N_RELEXTLOCK_ENTS is 1 >> >> clients = 2, avg = 294.3633 >> clients = 5, avg = 358.9364 >> clients = 10, avg = 383.6945 >> clients = 50, avg = 424.3687 >> >> And the result of current HEAD is following. >> >> clients = 2, avg = 284.9976 >> clients = 5, avg = 356.1726 >> clients = 10, avg = 375.9856 >> clients = 50, avg = 429.5745 >> >> In case2, the through-put got decreased compare to case 1 but it seems >> to be almost same as current HEAD. Because the speed of acquiring and >> releasing extension lock got x10 faster than current HEAD as I >> mentioned before, the performance degradation may not have gotten >> decreased than I expected even in case 2. >> Since my machine doesn't have enough resources the result of clients = >> 50 might not be a valid result. > > I have to admit that result is surprising to me. > I think the environment I used for performance measurement did not have enough resources. I will do the same benchmark on an another environment to see if it was a valid result, and will share it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, Dec 18, 2017 at 2:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Sun, Dec 17, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Dec 14, 2017 at 5:45 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Here is the result. >>> I've measured the through-put with some cases on my virtual machine. >>> Each client loads 48k file to each different relations located on >>> either xfs filesystem or ext4 filesystem, for 30 sec. >>> >>> Case 1: COPYs to relations on different filessystems(xfs and ext4) and >>> N_RELEXTLOCK_ENTS is 1024 >>> >>> clients = 2, avg = 296.2068 >>> clients = 5, avg = 372.0707 >>> clients = 10, avg = 389.8850 >>> clients = 50, avg = 428.8050 >>> >>> Case 2: COPYs to relations on different filessystems(xfs and ext4) and >>> N_RELEXTLOCK_ENTS is 1 >>> >>> clients = 2, avg = 294.3633 >>> clients = 5, avg = 358.9364 >>> clients = 10, avg = 383.6945 >>> clients = 50, avg = 424.3687 >>> >>> And the result of current HEAD is following. >>> >>> clients = 2, avg = 284.9976 >>> clients = 5, avg = 356.1726 >>> clients = 10, avg = 375.9856 >>> clients = 50, avg = 429.5745 >>> >>> In case2, the through-put got decreased compare to case 1 but it seems >>> to be almost same as current HEAD. Because the speed of acquiring and >>> releasing extension lock got x10 faster than current HEAD as I >>> mentioned before, the performance degradation may not have gotten >>> decreased than I expected even in case 2. >>> Since my machine doesn't have enough resources the result of clients = >>> 50 might not be a valid result. >> >> I have to admit that result is surprising to me. >> > > I think the environment I used for performance measurement did not > have enough resources. I will do the same benchmark on an another > environment to see if it was a valid result, and will share it. > I did performance measurement on an different environment where has 4 cores and physically separated two disk volumes. Also I've change the benchmarking so that COPYs load only 300 integer tuples which are not fit within single page, and changed tables to unlogged tables to observe the overhead of locking/unlocking relext locks. Case 1: COPYs to relations on different filessystems(xfs and ext4) and N_RELEXTLOCK_ENTS is 1024 clients = 1, avg = 3033.8933 clients = 2, avg = 5992.9077 clients = 4, avg = 8055.9515 clients = 8, avg = 8468.9306 clients = 16, avg = 7718.6879 Case 2: COPYs to relations on different filessystems(xfs and ext4) and N_RELEXTLOCK_ENTS is 1 clients = 1, avg = 3012.4993 clients = 2, avg = 5854.9966 clients = 4, avg = 7380.6082 clients = 8, avg = 7091.8367 clients = 16, avg = 7573.2904 And the result of current HEAD is following. clients = 1, avg = 2962.2416 clients = 2, avg = 5856.9774 clients = 4, avg = 7561.1376 clients = 8, avg = 7252.0192 clients = 16, avg = 7916.7651 As per the above results, compared with current HEAD the through-put of case 1 got increased up to 17%. On the other hand, the through-put of case 2 got decreased 2%~5%. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Dec 19, 2017 at 5:52 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Mon, Dec 18, 2017 at 2:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Sun, Dec 17, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> I have to admit that result is surprising to me. >> >> I think the environment I used for performance measurement did not >> have enough resources. I will do the same benchmark on an another >> environment to see if it was a valid result, and will share it. >> > I did performance measurement on an different environment where has 4 > cores and physically separated two disk volumes. Also I've change the > benchmarking so that COPYs load only 300 integer tuples which are not > fit within single page, and changed tables to unlogged tables to > observe the overhead of locking/unlocking relext locks. I ran same test as asked by Robert it was just an extension of tests [1] pointed by Amit Kapila, Machine : cthulhu ------------------------ Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz Stepping: 2 CPU MHz: 1064.000 CPU max MHz: 2129.0000 CPU min MHz: 1064.0000 BogoMIPS: 4266.59 Virtualization: VT-x Hypervisor vendor: vertical Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 It has 2 discs with different filesytem as below /dev/mapper/vg_mag-data2 ext4 5.1T 3.6T 1.2T 76% /mnt/data-mag2 /dev/mapper/vg_mag-data1 xfs 5.1T 1.6T 3.6T 31% /mnt/data-mag I have created 2 tables each one on above filesystem. test_size_copy.sh --> automated script to run copy test. copy_script1, copy_script2 -> copy pg_bench script's used by test_size_copy.sh to load to 2 different tables. To run above copy_scripts in parallel I have run it with equal weights as below. ./pgbench -c $threads -j $threads -f copy_script1@1 -f copy_script2@1 -T 120 postgres >> test_results.txt Results : ----------- Clients HEAD-TPS --------- --------------- 1 84.460734 2 121.359035 4 175.886335 8 268.764828 16 369.996667 32 439.032756 64 482.185392 Clients N_RELEXTLOCK_ENTS = 1024 %diff with DEAD ---------------------------------------------------------------------------------- 1 87.165777 3.20272258112273 2 131.094037 8.02165409439848 4 181.667104 3.2866504381935 8 267.412856 -0.503031594595423 16 376.118671 1.65461058058666 32 460.756357 4.94805927419228 64 492.723975 2.18558736428913 Not much of an improvement from HEAD Clients N_RELEXTLOCK_ENTS = 1 %diff with HEAD ----------------------------------------------------------------------------- 1 86.288574 2.16412990206786 2 131.398667 8.27266960387414 4 168.681079 -4.09654109854526 8 245.841999 -8.52895416806549 16 321.972147 -12.9797169226933 32 375.783299 -14.4065462395703 64 360.134531 -25.3120196142317 So in case of N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ? [1]https://www.postgresql.org/message-id/CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA%40mail.gmail.com -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Tue, Jan 2, 2018 at 1:09 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > So in case of N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ? So now the question is: what do these results mean for this patch? I think that the chances of someone simultaneously bulk-loading 16 or more relations that all happen to hash to the same relation extension lock bucket is pretty darn small. Most people aren't going to be running 16 bulk loads at the same time in the first place, and if they are, then there's a good chance that at least some of those loads are either actually to the same relation, or that many or all of the loads are targeting the same filesystem and the bottleneck will occur at that level, or that the loads are to relations which hash to different buckets. Now, if we want to reduce the chances of hash collisions, we could boost the default value of N_RELEXTLOCK_ENTS to 2048 or 4096. However, if we take the position that no hash collision probability is low enough and that we must eliminate all chance of false collisions, except perhaps when the table is full, then we have to make this locking mechanism a whole lot more complicated. We can no longer compute the location of the lock we need without first taking some other kind of lock that protects the mapping from {db_oid, rel_oid} -> {memory address of the relevant lock}. We can no longer cache the location where we found the lock last time so that we can retake it. If we do that, we're adding extra cycles and extra atomics and extra code that can harbor bugs to every relation extension to guard against something which I'm not sure is really going to happen. Something that's 3-8% faster in a case that occurs all the time and as much as 25% slower in a case that virtually never arises seems like it might be a win overall. However, it's quite possible that I'm not seeing the whole picture here. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Jan 5, 2018 at 1:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jan 2, 2018 at 1:09 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> So in case of N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ? Thank you for the performance measurement! > So now the question is: what do these results mean for this patch? > > I think that the chances of someone simultaneously bulk-loading 16 or > more relations that all happen to hash to the same relation extension > lock bucket is pretty darn small. Most people aren't going to be > running 16 bulk loads at the same time in the first place, and if they > are, then there's a good chance that at least some of those loads are > either actually to the same relation, or that many or all of the loads > are targeting the same filesystem and the bottleneck will occur at > that level, or that the loads are to relations which hash to different > buckets. Now, if we want to reduce the chances of hash collisions, we > could boost the default value of N_RELEXTLOCK_ENTS to 2048 or 4096. > > However, if we take the position that no hash collision probability is > low enough and that we must eliminate all chance of false collisions, > except perhaps when the table is full, then we have to make this > locking mechanism a whole lot more complicated. We can no longer > compute the location of the lock we need without first taking some > other kind of lock that protects the mapping from {db_oid, rel_oid} -> > {memory address of the relevant lock}. We can no longer cache the > location where we found the lock last time so that we can retake it. > If we do that, we're adding extra cycles and extra atomics and extra > code that can harbor bugs to every relation extension to guard against > something which I'm not sure is really going to happen. Something > that's 3-8% faster in a case that occurs all the time and as much as > 25% slower in a case that virtually never arises seems like it might > be a win overall. > > However, it's quite possible that I'm not seeing the whole picture > here. Thoughts? > I agree that the chances of the case where through-put got worse is pretty small and we can get performance improvement in common cases. Also, we could mistakenly overestimate the number of blocks we need to add by false collisions. Thereby the performance might got worse and we extend a relation more than necessary but I think the chances are small. Considering the further parallel operations (e.g. parallel loading, parallel index creation etc) multiple processes will be taking a relext lock of the same relation. Thinking of that, the benefit of this patch that improves the speeds of acquiring/releasing the lock would be effective. In short I personally think the current patch is simple and the result is not a bad. But If community cannot accept these degradations we have to deal with the problem. For example, we could make the length of relext lock array configurable by users. That way, users can reduce the possibility of collisions. Or we could improve the relext lock manager to eliminate false collision by changing it to a open-addressing hash table. The code would get complex but false collisions don't happen unless the array is not full. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Sun, Jan 7, 2018 at 11:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Jan 5, 2018 at 1:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jan 2, 2018 at 1:09 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>> So in case of N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ? > > Thank you for the performance measurement! > >> So now the question is: what do these results mean for this patch? >> >> I think that the chances of someone simultaneously bulk-loading 16 or >> more relations that all happen to hash to the same relation extension >> lock bucket is pretty darn small. Most people aren't going to be >> running 16 bulk loads at the same time in the first place, and if they >> are, then there's a good chance that at least some of those loads are >> either actually to the same relation, or that many or all of the loads >> are targeting the same filesystem and the bottleneck will occur at >> that level, or that the loads are to relations which hash to different >> buckets. Now, if we want to reduce the chances of hash collisions, we >> could boost the default value of N_RELEXTLOCK_ENTS to 2048 or 4096. >> >> However, if we take the position that no hash collision probability is >> low enough and that we must eliminate all chance of false collisions, >> except perhaps when the table is full, then we have to make this >> locking mechanism a whole lot more complicated. We can no longer >> compute the location of the lock we need without first taking some >> other kind of lock that protects the mapping from {db_oid, rel_oid} -> >> {memory address of the relevant lock}. We can no longer cache the >> location where we found the lock last time so that we can retake it. >> If we do that, we're adding extra cycles and extra atomics and extra >> code that can harbor bugs to every relation extension to guard against >> something which I'm not sure is really going to happen. Something >> that's 3-8% faster in a case that occurs all the time and as much as >> 25% slower in a case that virtually never arises seems like it might >> be a win overall. >> >> However, it's quite possible that I'm not seeing the whole picture >> here. Thoughts? >> > > I agree that the chances of the case where through-put got worse is > pretty small and we can get performance improvement in common cases. > Also, we could mistakenly overestimate the number of blocks we need to > add by false collisions. Thereby the performance might got worse and > we extend a relation more than necessary but I think the chances are > small. Considering the further parallel operations (e.g. parallel > loading, parallel index creation etc) multiple processes will be > taking a relext lock of the same relation. Thinking of that, the > benefit of this patch that improves the speeds of acquiring/releasing > the lock would be effective. > > In short I personally think the current patch is simple and the result > is not a bad. But If community cannot accept these degradations we > have to deal with the problem. For example, we could make the length > of relext lock array configurable by users. That way, users can reduce > the possibility of collisions. Or we could improve the relext lock > manager to eliminate false collision by changing it to a > open-addressing hash table. The code would get complex but false > collisions don't happen unless the array is not full. > On second thought, perhaps we should also do performance measurement with the patch that uses HTAB instead a fixed array. Probably the performance with that patch will be equal to or slightly greater than current HEAD, hopefully not be worse. In addition to that, if the performance degradation by false collision doesn't happen or we can avoid it by increasing GUC parameter, I think it's better than current fixed array approach. Thoughts? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2018-01-04 11:39:40 -0500, Robert Haas wrote: > On Tue, Jan 2, 2018 at 1:09 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > > So in case of N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ? > > So now the question is: what do these results mean for this patch? > I think that the chances of someone simultaneously bulk-loading 16 or > more relations that all happen to hash to the same relation extension > lock bucket is pretty darn small. I'm not convinced that that's true. Especially with partitioning in the mix. Also, birthday paradoxon and all that make collisions not that unlikely. And you really don't need a 16 way conflict to feel pain, you'll imo feel it earlier. I think bumping up the size a bit would make that less likely. Not sure it actually addresses the issue. > However, if we take the position that no hash collision probability is > low enough and that we must eliminate all chance of false collisions, > except perhaps when the table is full, then we have to make this > locking mechanism a whole lot more complicated. We can no longer > compute the location of the lock we need without first taking some > other kind of lock that protects the mapping from {db_oid, rel_oid} -> > {memory address of the relevant lock}. Hm, that's not necessarily true, is it? Wile not trivial, it also doesn't seem impossible? Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Mar 1, 2018 at 2:17 PM, Andres Freund <andres@anarazel.de> wrote: >> However, if we take the position that no hash collision probability is >> low enough and that we must eliminate all chance of false collisions, >> except perhaps when the table is full, then we have to make this >> locking mechanism a whole lot more complicated. We can no longer >> compute the location of the lock we need without first taking some >> other kind of lock that protects the mapping from {db_oid, rel_oid} -> >> {memory address of the relevant lock}. > > Hm, that's not necessarily true, is it? Wile not trivial, it also > doesn't seem impossible? You can't both store every lock at a fixed address and at the same time put locks at a different address if the one they would have used is already occupied. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
On 2018-03-01 15:37:17 -0500, Robert Haas wrote: > On Thu, Mar 1, 2018 at 2:17 PM, Andres Freund <andres@anarazel.de> wrote: > >> However, if we take the position that no hash collision probability is > >> low enough and that we must eliminate all chance of false collisions, > >> except perhaps when the table is full, then we have to make this > >> locking mechanism a whole lot more complicated. We can no longer > >> compute the location of the lock we need without first taking some > >> other kind of lock that protects the mapping from {db_oid, rel_oid} -> > >> {memory address of the relevant lock}. > > > > Hm, that's not necessarily true, is it? Wile not trivial, it also > > doesn't seem impossible? > > You can't both store every lock at a fixed address and at the same > time put locks at a different address if the one they would have used > is already occupied. Right, but why does that require a lock? Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Mar 1, 2018 at 3:40 PM, Andres Freund <andres@anarazel.de> wrote: >> You can't both store every lock at a fixed address and at the same >> time put locks at a different address if the one they would have used >> is already occupied. > > Right, but why does that require a lock? Maybe I'm being dense here but ... how could it not? If the lock for relation X is always at pointer P, then I can compute the address for the lock and assume it will be there, because that's where it *always is*. If the lock for relation X can be at any of various addresses depending on other system activity, then I cannot assume that an address that I compute for it remains valid except for so long as I hold a lock strong enough to keep it from being moved. Concretely, I imagine that if you put the lock at different addresses at different times, you would implement that by reclaiming unused entries to make room for new entries that you need to allocate. So if I hold the lock at 0x1000, I can probably it will assume it will stay there for as long as I hold it. But the instant I release it, even for a moment, somebody might garbage-collect the entry and reallocate it for something else. Now the next time I need it it will be elsewhere. I'll have to search for it, I presume, while holding some analogue of the buffer-mapping lock. In the patch as proposed, that's not needed. Once you know that the lock for relation 123 is at 0x1000, you can just keep locking it at that same address without checking anything, which is quite appealing given that the same backend extending the same relation many times in a row is a pretty common pattern. If you have a clever idea how to make this work with as few atomic operations as the current patch uses while at the same time reducing the possibility of contention, I'm all ears. But I don't see how to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Michael Paquier
Date:
On Thu, Mar 01, 2018 at 04:01:28PM -0500, Robert Haas wrote: > If you have a clever idea how to make this work with as few atomic > operations as the current patch uses while at the same time reducing > the possibility of contention, I'm all ears. But I don't see how to > do that. This thread has no activity since the beginning of the commit fest, and it seems that it would be hard to reach something committable for v11, so I am marking it as returned with feedback. -- Michael
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Mar 30, 2018 at 4:43 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Mar 01, 2018 at 04:01:28PM -0500, Robert Haas wrote: >> If you have a clever idea how to make this work with as few atomic >> operations as the current patch uses while at the same time reducing >> the possibility of contention, I'm all ears. But I don't see how to >> do that. > > This thread has no activity since the beginning of the commit fest, and > it seems that it would be hard to reach something committable for v11, > so I am marking it as returned with feedback. Thank you. The probability of performance degradation can be reduced by increasing N_RELEXTLOCK_ENTS. But as Robert mentioned, while keeping fast and simple implementation like acquiring lock by a few atomic operation it's hard to improve or at least keep the current performance on all cases. I was thinking that this patch is necessary by parallel DML operations and vacuum but if the community cannot accept this approach it might be better to mark it as "Rejected" and then I should reconsider the design of parallel vacuum. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Tue, Apr 10, 2018 at 5:40 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The probability of performance degradation can be reduced by > increasing N_RELEXTLOCK_ENTS. But as Robert mentioned, while keeping > fast and simple implementation like acquiring lock by a few atomic > operation it's hard to improve or at least keep the current > performance on all cases. I was thinking that this patch is necessary > by parallel DML operations and vacuum but if the community cannot > accept this approach it might be better to mark it as "Rejected" and > then I should reconsider the design of parallel vacuum. I'm sorry that I didn't get time to work further on this during the CommitFest. In terms of moving forward, I'd still like to hear what Andres has to say about the comments I made on March 1st. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, Apr 11, 2018 at 1:40 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 10, 2018 at 5:40 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> The probability of performance degradation can be reduced by >> increasing N_RELEXTLOCK_ENTS. But as Robert mentioned, while keeping >> fast and simple implementation like acquiring lock by a few atomic >> operation it's hard to improve or at least keep the current >> performance on all cases. I was thinking that this patch is necessary >> by parallel DML operations and vacuum but if the community cannot >> accept this approach it might be better to mark it as "Rejected" and >> then I should reconsider the design of parallel vacuum. > > I'm sorry that I didn't get time to work further on this during the > CommitFest. Never mind. There was a lot of items especially at the last CommitFest. > In terms of moving forward, I'd still like to hear what > Andres has to say about the comments I made on March 1st. Yeah, agreed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Never mind. There was a lot of items especially at the last CommitFest. > >> In terms of moving forward, I'd still like to hear what >> Andres has to say about the comments I made on March 1st. > > Yeah, agreed. $ ping -n andres.freund Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 Request timeout for icmp_seq 4 ^C --- andres.freund ping statistics --- 6 packets transmitted, 0 packets received, 100.0% packet loss Meanwhile, https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru shows that this patch has some benefits for other cases, which is a point in favor IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Never mind. There was a lot of items especially at the last CommitFest. >> >>> In terms of moving forward, I'd still like to hear what >>> Andres has to say about the comments I made on March 1st. >> >> Yeah, agreed. > > $ ping -n andres.freund > Request timeout for icmp_seq 0 > Request timeout for icmp_seq 1 > Request timeout for icmp_seq 2 > Request timeout for icmp_seq 3 > Request timeout for icmp_seq 4 > ^C > --- andres.freund ping statistics --- > 6 packets transmitted, 0 packets received, 100.0% packet loss > > Meanwhile, https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru > shows that this patch has some benefits for other cases, which is a > point in favor IMHO. Thank you for sharing. That's good to know. Andres pointed out the performance degradation due to hash collision when multiple loading. I think the point is that it happens at where users don't know. Therefore even if we make N_RELEXTLOCK_ENTS configurable parameter, since users don't know the hash collision they don't know when they should tune it. So it's just an idea but how about adding an SQL-callable function that returns the estimated number of lock waiters of the given relation? Since user knows how many processes are loading to the relation, if a returned value by the function is greater than the expected value user can know hash collision and will be able to start to consider to increase N_RELEXTLOCK_ENTS. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Apr 26, 2018 at 2:10 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for sharing. That's good to know. > > Andres pointed out the performance degradation due to hash collision > when multiple loading. I think the point is that it happens at where > users don't know. Therefore even if we make N_RELEXTLOCK_ENTS > configurable parameter, since users don't know the hash collision they > don't know when they should tune it. > > So it's just an idea but how about adding an SQL-callable function > that returns the estimated number of lock waiters of the given > relation? Since user knows how many processes are loading to the > relation, if a returned value by the function is greater than the > expected value user can know hash collision and will be able to start > to consider to increase N_RELEXTLOCK_ENTS. I don't think that's a very useful suggestion. Changing N_RELEXTLOCK_ENTS requires a recompile, which is going to be impractical for most users. Even if we made it a GUC, we don't want users to have to tune stuff like this. If we actually think this is going to be a problem, we'd probably better rethink the desgin. I think the real question is whether the scenario is common enough to worry about. In practice, you'd have to be extremely unlucky to be doing many bulk loads at the same time that all happened to hash to the same bucket. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2018-04-26 15:08:24 -0400, Robert Haas wrote: > I don't think that's a very useful suggestion. Changing > N_RELEXTLOCK_ENTS requires a recompile, which is going to be > impractical for most users. Even if we made it a GUC, we don't want > users to have to tune stuff like this. If we actually think this is > going to be a problem, we'd probably better rethink the desgin. Agreed. > I think the real question is whether the scenario is common enough to > worry about. In practice, you'd have to be extremely unlucky to be > doing many bulk loads at the same time that all happened to hash to > the same bucket. With a bunch of parallel bulkloads into partitioned tables that really doesn't seem that unlikely? Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >> I think the real question is whether the scenario is common enough to >> worry about. In practice, you'd have to be extremely unlucky to be >> doing many bulk loads at the same time that all happened to hash to >> the same bucket. > > With a bunch of parallel bulkloads into partitioned tables that really > doesn't seem that unlikely? It increases the likelihood of collisions, but probably decreases the number of cases where the contention gets really bad. For example, suppose each table has 100 partitions and you are bulk-loading 10 of them at a time. It's virtually certain that you will have some collisions, but the amount of contention within each bucket will remain fairly low because each backend spends only 1% of its time in the bucket corresponding to any given partition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
"Alex Ignatov"
Date:
-----Original Message----- From: Robert Haas <robertmhaas@gmail.com> Sent: Thursday, April 26, 2018 10:25 PM To: Andres Freund <andres@anarazel.de> Cc: Masahiko Sawada <sawada.mshk@gmail.com>; Michael Paquier <michael@paquier.xyz>; Mithun Cy <mithun.cy@enterprisedb.com>;Tom Lane <tgl@sss.pgh.pa.us>; Thomas Munro <thomas.munro@enterprisedb.com>; Amit Kapila <amit.kapila16@gmail.com>;PostgreSQL-development <pgsql-hackers@postgresql.org> Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >> I think the real question is whether the scenario is common enough to >> worry about. In practice, you'd have to be extremely unlucky to be >> doing many bulk loads at the same time that all happened to hash to >> the same bucket. > > With a bunch of parallel bulkloads into partitioned tables that really > doesn't seem that unlikely? It increases the likelihood of collisions, but probably decreases the number of cases where the contention gets really bad. For example, suppose each table has 100 partitions and you are bulk-loading 10 of them at a time. It's virtually certainthat you will have some collisions, but the amount of contention within each bucket will remain fairly low becauseeach backend spends only 1% of its time in the bucket corresponding to any given partition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Hello! I want to try to test this patch on 302(704 ht) core machine. Patching on master (commit 81256cd05f0745353c6572362155b57250a0d2a0) is ok but got some error while compiling : gistvacuum.c: In function ‘gistvacuumcleanup’: gistvacuum.c:92:3: error: too many arguments to function ‘LockRelationForExtension’ LockRelationForExtension(rel, ExclusiveLock); ^ In file included from gistvacuum.c:21:0: ../../../../src/include/storage/extension_lock.h:30:13: note: declared here extern void LockRelationForExtension(Relation relation); ^ gistvacuum.c:95:3: error: too many arguments to function ‘UnlockRelationForExtension’ UnlockRelationForExtension(rel, ExclusiveLock); ^ In file included from gistvacuum.c:21:0: ../../../../src/include/storage/extension_lock.h:31:13: note: declared here extern void UnlockRelationForExtension(Relation relation); -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
RE: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
"Alex Ignatov"
Date:
-- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -----Original Message----- From: Alex Ignatov <a.ignatov@postgrespro.ru> Sent: Monday, May 21, 2018 6:00 PM To: 'Robert Haas' <robertmhaas@gmail.com>; 'Andres Freund' <andres@anarazel.de> Cc: 'Masahiko Sawada' <sawada.mshk@gmail.com>; 'Michael Paquier' <michael@paquier.xyz>; 'Mithun Cy' <mithun.cy@enterprisedb.com>;'Tom Lane' <tgl@sss.pgh.pa.us>; 'Thomas Munro' <thomas.munro@enterprisedb.com>; 'Amit Kapila'<amit.kapila16@gmail.com>; 'PostgreSQL-development' <pgsql-hackers@postgresql.org> Subject: RE: [HACKERS] Moving relation extension locks out of heavyweight lock manager -----Original Message----- From: Robert Haas <robertmhaas@gmail.com> Sent: Thursday, April 26, 2018 10:25 PM To: Andres Freund <andres@anarazel.de> Cc: Masahiko Sawada <sawada.mshk@gmail.com>; Michael Paquier <michael@paquier.xyz>; Mithun Cy <mithun.cy@enterprisedb.com>;Tom Lane <tgl@sss.pgh.pa.us>; Thomas Munro <thomas.munro@enterprisedb.com>; Amit Kapila <amit.kapila16@gmail.com>;PostgreSQL-development <pgsql-hackers@postgresql.org> Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >> I think the real question is whether the scenario is common enough to >> worry about. In practice, you'd have to be extremely unlucky to be >> doing many bulk loads at the same time that all happened to hash to >> the same bucket. > > With a bunch of parallel bulkloads into partitioned tables that really > doesn't seem that unlikely? It increases the likelihood of collisions, but probably decreases the number of cases where the contention gets really bad. For example, suppose each table has 100 partitions and you are bulk-loading 10 of them at a time. It's virtually certainthat you will have some collisions, but the amount of contention within each bucket will remain fairly low becauseeach backend spends only 1% of its time in the bucket corresponding to any given partition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Hello! I want to try to test this patch on 302(704 ht) core machine. Patching on master (commit 81256cd05f0745353c6572362155b57250a0d2a0) is ok but got some error while compiling : gistvacuum.c: In function ‘gistvacuumcleanup’: gistvacuum.c:92:3: error: too many arguments to function ‘LockRelationForExtension’ LockRelationForExtension(rel, ExclusiveLock); ^ In file included from gistvacuum.c:21:0: ../../../../src/include/storage/extension_lock.h:30:13: note: declared here extern void LockRelationForExtension(Relationrelation); ^ gistvacuum.c:95:3: error: too many arguments to function ‘UnlockRelationForExtension’ UnlockRelationForExtension(rel, ExclusiveLock); ^ In file included from gistvacuum.c:21:0: ../../../../src/include/storage/extension_lock.h:31:13: note: declared here extern void UnlockRelationForExtension(Relationrelation); Sorry, forgot to mention that patch version is extension-lock-v12.patch -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Tue, May 22, 2018 at 12:05 AM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote: > > > -- > Alex Ignatov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > -----Original Message----- > From: Alex Ignatov <a.ignatov@postgrespro.ru> > Sent: Monday, May 21, 2018 6:00 PM > To: 'Robert Haas' <robertmhaas@gmail.com>; 'Andres Freund' <andres@anarazel.de> > Cc: 'Masahiko Sawada' <sawada.mshk@gmail.com>; 'Michael Paquier' <michael@paquier.xyz>; 'Mithun Cy' <mithun.cy@enterprisedb.com>;'Tom Lane' <tgl@sss.pgh.pa.us>; 'Thomas Munro' <thomas.munro@enterprisedb.com>; 'Amit Kapila'<amit.kapila16@gmail.com>; 'PostgreSQL-development' <pgsql-hackers@postgresql.org> > Subject: RE: [HACKERS] Moving relation extension locks out of heavyweight lock manager > > > > > -----Original Message----- > From: Robert Haas <robertmhaas@gmail.com> > Sent: Thursday, April 26, 2018 10:25 PM > To: Andres Freund <andres@anarazel.de> > Cc: Masahiko Sawada <sawada.mshk@gmail.com>; Michael Paquier <michael@paquier.xyz>; Mithun Cy <mithun.cy@enterprisedb.com>;Tom Lane <tgl@sss.pgh.pa.us>; Thomas Munro <thomas.munro@enterprisedb.com>; Amit Kapila <amit.kapila16@gmail.com>;PostgreSQL-development <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >>> I think the real question is whether the scenario is common enough to >>> worry about. In practice, you'd have to be extremely unlucky to be >>> doing many bulk loads at the same time that all happened to hash to >>> the same bucket. >> >> With a bunch of parallel bulkloads into partitioned tables that really >> doesn't seem that unlikely? > > It increases the likelihood of collisions, but probably decreases the number of cases where the contention gets reallybad. > > For example, suppose each table has 100 partitions and you are bulk-loading 10 of them at a time. It's virtually certainthat you will have some collisions, but the amount of contention within each bucket will remain fairly low becauseeach backend spends only 1% of its time in the bucket corresponding to any given partition. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company > > Hello! > I want to try to test this patch on 302(704 ht) core machine. > > Patching on master (commit 81256cd05f0745353c6572362155b57250a0d2a0) is ok but got some error while compiling : Thank you for reporting. Attached an rebased patch with current HEAD. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager
From
Konstantin Knizhnik
Date:
On 26.04.2018 09:10, Masahiko Sawada wrote: > On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> Never mind. There was a lot of items especially at the last CommitFest. >>> >>>> In terms of moving forward, I'd still like to hear what >>>> Andres has to say about the comments I made on March 1st. >>> Yeah, agreed. >> $ ping -n andres.freund >> Request timeout for icmp_seq 0 >> Request timeout for icmp_seq 1 >> Request timeout for icmp_seq 2 >> Request timeout for icmp_seq 3 >> Request timeout for icmp_seq 4 >> ^C >> --- andres.freund ping statistics --- >> 6 packets transmitted, 0 packets received, 100.0% packet loss >> >> Meanwhile, https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru >> shows that this patch has some benefits for other cases, which is a >> point in favor IMHO. > Thank you for sharing. That's good to know. > > Andres pointed out the performance degradation due to hash collision > when multiple loading. I think the point is that it happens at where > users don't know. Therefore even if we make N_RELEXTLOCK_ENTS > configurable parameter, since users don't know the hash collision they > don't know when they should tune it. > > So it's just an idea but how about adding an SQL-callable function > that returns the estimated number of lock waiters of the given > relation? Since user knows how many processes are loading to the > relation, if a returned value by the function is greater than the > expected value user can know hash collision and will be able to start > to consider to increase N_RELEXTLOCK_ENTS. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center > We in PostgresProc were faced with lock extension contention problem at two more customers and tried to use this patch (v13) to address this issue. Unfortunately replacing heavy lock with lwlock couldn't completely eliminate contention, now most of backends are blocked on conditional variable: 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00000000007024ee in WaitEventSetWait () #2 0x0000000000718fa6 in ConditionVariableSleep () #3 0x000000000071954d in RelExtLockAcquire () #4 0x00000000004ba99d in RelationGetBufferForTuple () #5 0x00000000004b3f18 in heap_insert () #6 0x00000000006109c8 in ExecInsert () #7 0x0000000000611a49 in ExecModifyTable () #8 0x00000000005ef97a in standard_ExecutorRun () #9 0x000000000072440a in ProcessQuery () #10 0x0000000000724631 in PortalRunMulti () #11 0x00000000007250ec in PortalRun () #12 0x0000000000721287 in exec_simple_query () #13 0x0000000000722532 in PostgresMain () #14 0x000000000047a9eb in ServerLoop () #15 0x00000000006b9fe9 in PostmasterMain () #16 0x000000000047b431 in main () Obviously there is nothing surprising here: if a lot of processes try to acquire the same exclusive lock, then high contention is expected. I just want to notice that this patch is not able to completely eliminate the problem with large number of concurrent inserts to the same table. Second problem we observed was even more critical: if backed is granted relation extension lock and then got some error before releasing this lock, then abort of the current transaction doesn't release this lock (unlike heavy weight lock) and the relation is kept locked. So database is actually stalled and server has to be restarted. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2018-06-04 16:47:29 +0300, Konstantin Knizhnik wrote: > We in PostgresProc were faced with lock extension contention problem at two > more customers and tried to use this patch (v13) to address this issue. > Unfortunately replacing heavy lock with lwlock couldn't completely eliminate > contention, now most of backends are blocked on conditional variable: > > 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 > #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 > #1 0x00000000007024ee in WaitEventSetWait () > #2 0x0000000000718fa6 in ConditionVariableSleep () > #3 0x000000000071954d in RelExtLockAcquire () That doesn't necessarily mean that the postgres code is to fault here. It's entirely possible that the filesystem or storage is the bottleneck. Could you briefly describe workload & hardware? > Second problem we observed was even more critical: if backed is granted > relation extension lock and then got some error before releasing this lock, > then abort of the current transaction doesn't release this lock (unlike > heavy weight lock) and the relation is kept locked. > So database is actually stalled and server has to be restarted. That obvioulsy needs to be fixed... Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, Jun 4, 2018 at 10:47 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > On 26.04.2018 09:10, Masahiko Sawada wrote: >> >> On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> >> wrote: >>> >>> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>> wrote: >>>> >>>> Never mind. There was a lot of items especially at the last CommitFest. >>>> >>>>> In terms of moving forward, I'd still like to hear what >>>>> Andres has to say about the comments I made on March 1st. >>>> >>>> Yeah, agreed. >>> >>> $ ping -n andres.freund >>> Request timeout for icmp_seq 0 >>> Request timeout for icmp_seq 1 >>> Request timeout for icmp_seq 2 >>> Request timeout for icmp_seq 3 >>> Request timeout for icmp_seq 4 >>> ^C >>> --- andres.freund ping statistics --- >>> 6 packets transmitted, 0 packets received, 100.0% packet loss >>> >>> Meanwhile, >>> https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru >>> shows that this patch has some benefits for other cases, which is a >>> point in favor IMHO. >> >> Thank you for sharing. That's good to know. >> >> Andres pointed out the performance degradation due to hash collision >> when multiple loading. I think the point is that it happens at where >> users don't know. Therefore even if we make N_RELEXTLOCK_ENTS >> configurable parameter, since users don't know the hash collision they >> don't know when they should tune it. >> >> So it's just an idea but how about adding an SQL-callable function >> that returns the estimated number of lock waiters of the given >> relation? Since user knows how many processes are loading to the >> relation, if a returned value by the function is greater than the >> expected value user can know hash collision and will be able to start >> to consider to increase N_RELEXTLOCK_ENTS. >> >> Regards, >> >> -- >> Masahiko Sawada >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >> NTT Open Source Software Center >> > We in PostgresProc were faced with lock extension contention problem at two > more customers and tried to use this patch (v13) to address this issue. > Unfortunately replacing heavy lock with lwlock couldn't completely eliminate > contention, now most of backends are blocked on conditional variable: > > 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 > #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 > #1 0x00000000007024ee in WaitEventSetWait () > #2 0x0000000000718fa6 in ConditionVariableSleep () > #3 0x000000000071954d in RelExtLockAcquire () > #4 0x00000000004ba99d in RelationGetBufferForTuple () > #5 0x00000000004b3f18 in heap_insert () > #6 0x00000000006109c8 in ExecInsert () > #7 0x0000000000611a49 in ExecModifyTable () > #8 0x00000000005ef97a in standard_ExecutorRun () > #9 0x000000000072440a in ProcessQuery () > #10 0x0000000000724631 in PortalRunMulti () > #11 0x00000000007250ec in PortalRun () > #12 0x0000000000721287 in exec_simple_query () > #13 0x0000000000722532 in PostgresMain () > #14 0x000000000047a9eb in ServerLoop () > #15 0x00000000006b9fe9 in PostmasterMain () > #16 0x000000000047b431 in main () > > Obviously there is nothing surprising here: if a lot of processes try to > acquire the same exclusive lock, then high contention is expected. > I just want to notice that this patch is not able to completely eliminate > the problem with large number of concurrent inserts to the same table. > > Second problem we observed was even more critical: if backed is granted > relation extension lock and then got some error before releasing this lock, > then abort of the current transaction doesn't release this lock (unlike > heavy weight lock) and the relation is kept locked. > So database is actually stalled and server has to be restarted. > Thank you for reporting. Regarding the second problem, I tried to reproduce that bug with latest version patch (v13) but could not. When transaction aborts, we call ReousrceOwnerRelease()->ResourceOwnerReleaseInternal()->ProcReleaseLocks()->RelExtLockCleanup() and clear either relext lock bits we are holding or waiting. If we raise an error after we added a relext lock bit but before we increment its holding count then the relext lock is remained, but I couldn't see the code raises an error between them. Could you please share the concrete reproduction steps of the cause of database stalled if possible? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager
From
Konstantin Knizhnik
Date:
On 04.06.2018 21:42, Andres Freund wrote:
Hi, On 2018-06-04 16:47:29 +0300, Konstantin Knizhnik wrote:We in PostgresProc were faced with lock extension contention problem at two more customers and tried to use this patch (v13) to address this issue. Unfortunately replacing heavy lock with lwlock couldn't completely eliminate contention, now most of backends are blocked on conditional variable: 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x00000000007024ee in WaitEventSetWait () #2 0x0000000000718fa6 in ConditionVariableSleep () #3 0x000000000071954d in RelExtLockAcquire ()That doesn't necessarily mean that the postgres code is to fault here. It's entirely possible that the filesystem or storage is the bottleneck. Could you briefly describe workload & hardware?
Workload is combination of inserts and selects.
Looks like shared locks obtained by select cause starvation of inserts, trying to get exclusive relation extension lock.
The problem is fixed by fair lwlock patch, implemented by Alexander Korotkov. This patch prevents granting of shared lock if wait queue is not empty.
May be we should use this patch or find some other way to prevent starvation of writers on relation extension locks for such workloads.
Second problem we observed was even more critical: if backed is granted relation extension lock and then got some error before releasing this lock, then abort of the current transaction doesn't release this lock (unlike heavy weight lock) and the relation is kept locked. So database is actually stalled and server has to be restarted.That obvioulsy needs to be fixed...
Sorry, looks like the problem is more obscure than I expected.
What we have observed is that all backends are blocked in lwlock (sorry stack trace is not complete):
#0 0x00007ff5a9c566d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff3c57b9b38) at ../sysdeps/unix/sysv/lin ux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x7ff3c57b9b38, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007ff5a9c567c8 in __new_sem_wait_slow (sem=sem@entry=0x7ff3c57b9b38, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007ff5a9c56839 in __new_sem_wait (sem=sem@entry=0x7ff3c57b9b38) at sem_wait.c:42 #4 0x000056290c901582 in PGSemaphoreLock (sema=0x7ff3c57b9b38) at pg_sema.c:310 #5 0x000056290c97923c in LWLockAcquire (lock=0x7ff3c7038c64, mode=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:1233 I happen after error in disk write operation. Unfortunately we do not have core files and not able to reproduce the problem. All LW locks should be cleared by LWLockReleaseAll but ... for some reasons it doesn't happen. We will continue investigation and try to reproduce the problem. I will let you know if we find the reason of the problem.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager
From
Konstantin Knizhnik
Date:
On 05.06.2018 07:22, Masahiko Sawada wrote: > On Mon, Jun 4, 2018 at 10:47 PM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> >> On 26.04.2018 09:10, Masahiko Sawada wrote: >>> On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> >>> wrote: >>>> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>>> wrote: >>>>> Never mind. There was a lot of items especially at the last CommitFest. >>>>> >>>>>> In terms of moving forward, I'd still like to hear what >>>>>> Andres has to say about the comments I made on March 1st. >>>>> Yeah, agreed. >>>> $ ping -n andres.freund >>>> Request timeout for icmp_seq 0 >>>> Request timeout for icmp_seq 1 >>>> Request timeout for icmp_seq 2 >>>> Request timeout for icmp_seq 3 >>>> Request timeout for icmp_seq 4 >>>> ^C >>>> --- andres.freund ping statistics --- >>>> 6 packets transmitted, 0 packets received, 100.0% packet loss >>>> >>>> Meanwhile, >>>> https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru >>>> shows that this patch has some benefits for other cases, which is a >>>> point in favor IMHO. >>> Thank you for sharing. That's good to know. >>> >>> Andres pointed out the performance degradation due to hash collision >>> when multiple loading. I think the point is that it happens at where >>> users don't know. Therefore even if we make N_RELEXTLOCK_ENTS >>> configurable parameter, since users don't know the hash collision they >>> don't know when they should tune it. >>> >>> So it's just an idea but how about adding an SQL-callable function >>> that returns the estimated number of lock waiters of the given >>> relation? Since user knows how many processes are loading to the >>> relation, if a returned value by the function is greater than the >>> expected value user can know hash collision and will be able to start >>> to consider to increase N_RELEXTLOCK_ENTS. >>> >>> Regards, >>> >>> -- >>> Masahiko Sawada >>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >>> NTT Open Source Software Center >>> >> We in PostgresProc were faced with lock extension contention problem at two >> more customers and tried to use this patch (v13) to address this issue. >> Unfortunately replacing heavy lock with lwlock couldn't completely eliminate >> contention, now most of backends are blocked on conditional variable: >> >> 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 >> #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 >> #1 0x00000000007024ee in WaitEventSetWait () >> #2 0x0000000000718fa6 in ConditionVariableSleep () >> #3 0x000000000071954d in RelExtLockAcquire () >> #4 0x00000000004ba99d in RelationGetBufferForTuple () >> #5 0x00000000004b3f18 in heap_insert () >> #6 0x00000000006109c8 in ExecInsert () >> #7 0x0000000000611a49 in ExecModifyTable () >> #8 0x00000000005ef97a in standard_ExecutorRun () >> #9 0x000000000072440a in ProcessQuery () >> #10 0x0000000000724631 in PortalRunMulti () >> #11 0x00000000007250ec in PortalRun () >> #12 0x0000000000721287 in exec_simple_query () >> #13 0x0000000000722532 in PostgresMain () >> #14 0x000000000047a9eb in ServerLoop () >> #15 0x00000000006b9fe9 in PostmasterMain () >> #16 0x000000000047b431 in main () >> >> Obviously there is nothing surprising here: if a lot of processes try to >> acquire the same exclusive lock, then high contention is expected. >> I just want to notice that this patch is not able to completely eliminate >> the problem with large number of concurrent inserts to the same table. >> >> Second problem we observed was even more critical: if backed is granted >> relation extension lock and then got some error before releasing this lock, >> then abort of the current transaction doesn't release this lock (unlike >> heavy weight lock) and the relation is kept locked. >> So database is actually stalled and server has to be restarted. >> > Thank you for reporting. > > Regarding the second problem, I tried to reproduce that bug with > latest version patch (v13) but could not. When transaction aborts, we > call ReousrceOwnerRelease()->ResourceOwnerReleaseInternal()->ProcReleaseLocks()->RelExtLockCleanup() > and clear either relext lock bits we are holding or waiting. If we > raise an error after we added a relext lock bit but before we > increment its holding count then the relext lock is remained, but I > couldn't see the code raises an error between them. Could you please > share the concrete reproduction Sorry, my original guess that LW-locks are not released in case of transaction abort is not correct. There was really situation when all backends were blocked in relation extension lock and looks like it happens after disk write error, but as far as it happens at customer's site, we have no time for investigation and not able to reproduce this problem locally. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Alexander Korotkov
Date:
On Tue, Jun 5, 2018 at 12:48 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > Workload is combination of inserts and selects. > Looks like shared locks obtained by select cause starvation of inserts, trying to get exclusive relation extension lock. > The problem is fixed by fair lwlock patch, implemented by Alexander Korotkov. This patch prevents granting of shared lockif wait queue is not empty. > May be we should use this patch or find some other way to prevent starvation of writers on relation extension locks forsuch workloads. Fair lwlock patch really fixed starvation of exclusive lwlock waiters. But that starvation happens not on relation extension lock – selects don't get shared relation extension lock. The real issue there was not relation extension lock itself, but the time spent inside this lock. It appears that buffer replacement happening inside relation extension lock is affected by starvation on exclusive buffer mapping lwlocks and buffer content lwlocks, caused by many concurrent shared lockers. So, fair lwlock patch have no direct influence to relation extension lock, which is naturally not even lwlock... I'll post fair lwlock path in a separate thread. It requires detailed consideration and benchmarking, because there is a risk of regression on specific workloads. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Tue, Jun 5, 2018 at 6:47 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > On 05.06.2018 07:22, Masahiko Sawada wrote: >> >> On Mon, Jun 4, 2018 at 10:47 PM, Konstantin Knizhnik >> <k.knizhnik@postgrespro.ru> wrote: >>> >>> >>> On 26.04.2018 09:10, Masahiko Sawada wrote: >>>> >>>> On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> >>>> wrote: >>>>> >>>>> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada >>>>> <sawada.mshk@gmail.com> >>>>> wrote: >>>>>> >>>>>> Never mind. There was a lot of items especially at the last >>>>>> CommitFest. >>>>>> >>>>>>> In terms of moving forward, I'd still like to hear what >>>>>>> Andres has to say about the comments I made on March 1st. >>>>>> >>>>>> Yeah, agreed. >>>>> >>>>> $ ping -n andres.freund >>>>> Request timeout for icmp_seq 0 >>>>> Request timeout for icmp_seq 1 >>>>> Request timeout for icmp_seq 2 >>>>> Request timeout for icmp_seq 3 >>>>> Request timeout for icmp_seq 4 >>>>> ^C >>>>> --- andres.freund ping statistics --- >>>>> 6 packets transmitted, 0 packets received, 100.0% packet loss >>>>> >>>>> Meanwhile, >>>>> >>>>> https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru >>>>> shows that this patch has some benefits for other cases, which is a >>>>> point in favor IMHO. >>>> >>>> Thank you for sharing. That's good to know. >>>> >>>> Andres pointed out the performance degradation due to hash collision >>>> when multiple loading. I think the point is that it happens at where >>>> users don't know. Therefore even if we make N_RELEXTLOCK_ENTS >>>> configurable parameter, since users don't know the hash collision they >>>> don't know when they should tune it. >>>> >>>> So it's just an idea but how about adding an SQL-callable function >>>> that returns the estimated number of lock waiters of the given >>>> relation? Since user knows how many processes are loading to the >>>> relation, if a returned value by the function is greater than the >>>> expected value user can know hash collision and will be able to start >>>> to consider to increase N_RELEXTLOCK_ENTS. >>>> >>>> Regards, >>>> >>>> -- >>>> Masahiko Sawada >>>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >>>> NTT Open Source Software Center >>>> >>> We in PostgresProc were faced with lock extension contention problem at >>> two >>> more customers and tried to use this patch (v13) to address this issue. >>> Unfortunately replacing heavy lock with lwlock couldn't completely >>> eliminate >>> contention, now most of backends are blocked on conditional variable: >>> >>> 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 >>> #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 >>> #1 0x00000000007024ee in WaitEventSetWait () >>> #2 0x0000000000718fa6 in ConditionVariableSleep () >>> #3 0x000000000071954d in RelExtLockAcquire () >>> #4 0x00000000004ba99d in RelationGetBufferForTuple () >>> #5 0x00000000004b3f18 in heap_insert () >>> #6 0x00000000006109c8 in ExecInsert () >>> #7 0x0000000000611a49 in ExecModifyTable () >>> #8 0x00000000005ef97a in standard_ExecutorRun () >>> #9 0x000000000072440a in ProcessQuery () >>> #10 0x0000000000724631 in PortalRunMulti () >>> #11 0x00000000007250ec in PortalRun () >>> #12 0x0000000000721287 in exec_simple_query () >>> #13 0x0000000000722532 in PostgresMain () >>> #14 0x000000000047a9eb in ServerLoop () >>> #15 0x00000000006b9fe9 in PostmasterMain () >>> #16 0x000000000047b431 in main () >>> >>> Obviously there is nothing surprising here: if a lot of processes try to >>> acquire the same exclusive lock, then high contention is expected. >>> I just want to notice that this patch is not able to completely eliminate >>> the problem with large number of concurrent inserts to the same table. >>> >>> Second problem we observed was even more critical: if backed is granted >>> relation extension lock and then got some error before releasing this >>> lock, >>> then abort of the current transaction doesn't release this lock (unlike >>> heavy weight lock) and the relation is kept locked. >>> So database is actually stalled and server has to be restarted. >>> >> Thank you for reporting. >> >> Regarding the second problem, I tried to reproduce that bug with >> latest version patch (v13) but could not. When transaction aborts, we >> call >> ReousrceOwnerRelease()->ResourceOwnerReleaseInternal()->ProcReleaseLocks()->RelExtLockCleanup() >> and clear either relext lock bits we are holding or waiting. If we >> raise an error after we added a relext lock bit but before we >> increment its holding count then the relext lock is remained, but I >> couldn't see the code raises an error between them. Could you please >> share the concrete reproduction > > > Sorry, my original guess that LW-locks are not released in case of > transaction abort is not correct. > There was really situation when all backends were blocked in relation > extension lock and looks like it happens after disk write error, You're saying that it is not correct that LWlock are not released but it's correct that all backends were blocked in relext lock, but in other your mail you're saying something opposite. Which is correct? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager
From
Konstantin Knizhnik
Date:
On 05.06.2018 13:29, Masahiko Sawada wrote: > On Tue, Jun 5, 2018 at 6:47 PM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> >> On 05.06.2018 07:22, Masahiko Sawada wrote: >>> On Mon, Jun 4, 2018 at 10:47 PM, Konstantin Knizhnik >>> <k.knizhnik@postgrespro.ru> wrote: >>>> >>>> On 26.04.2018 09:10, Masahiko Sawada wrote: >>>>> On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> >>>>> wrote: >>>>>> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada >>>>>> <sawada.mshk@gmail.com> >>>>>> wrote: >>>>>>> Never mind. There was a lot of items especially at the last >>>>>>> CommitFest. >>>>>>> >>>>>>>> In terms of moving forward, I'd still like to hear what >>>>>>>> Andres has to say about the comments I made on March 1st. >>>>>>> Yeah, agreed. >>>>>> $ ping -n andres.freund >>>>>> Request timeout for icmp_seq 0 >>>>>> Request timeout for icmp_seq 1 >>>>>> Request timeout for icmp_seq 2 >>>>>> Request timeout for icmp_seq 3 >>>>>> Request timeout for icmp_seq 4 >>>>>> ^C >>>>>> --- andres.freund ping statistics --- >>>>>> 6 packets transmitted, 0 packets received, 100.0% packet loss >>>>>> >>>>>> Meanwhile, >>>>>> >>>>>> https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru >>>>>> shows that this patch has some benefits for other cases, which is a >>>>>> point in favor IMHO. >>>>> Thank you for sharing. That's good to know. >>>>> >>>>> Andres pointed out the performance degradation due to hash collision >>>>> when multiple loading. I think the point is that it happens at where >>>>> users don't know. Therefore even if we make N_RELEXTLOCK_ENTS >>>>> configurable parameter, since users don't know the hash collision they >>>>> don't know when they should tune it. >>>>> >>>>> So it's just an idea but how about adding an SQL-callable function >>>>> that returns the estimated number of lock waiters of the given >>>>> relation? Since user knows how many processes are loading to the >>>>> relation, if a returned value by the function is greater than the >>>>> expected value user can know hash collision and will be able to start >>>>> to consider to increase N_RELEXTLOCK_ENTS. >>>>> >>>>> Regards, >>>>> >>>>> -- >>>>> Masahiko Sawada >>>>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >>>>> NTT Open Source Software Center >>>>> >>>> We in PostgresProc were faced with lock extension contention problem at >>>> two >>>> more customers and tried to use this patch (v13) to address this issue. >>>> Unfortunately replacing heavy lock with lwlock couldn't completely >>>> eliminate >>>> contention, now most of backends are blocked on conditional variable: >>>> >>>> 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 >>>> #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 >>>> #1 0x00000000007024ee in WaitEventSetWait () >>>> #2 0x0000000000718fa6 in ConditionVariableSleep () >>>> #3 0x000000000071954d in RelExtLockAcquire () >>>> #4 0x00000000004ba99d in RelationGetBufferForTuple () >>>> #5 0x00000000004b3f18 in heap_insert () >>>> #6 0x00000000006109c8 in ExecInsert () >>>> #7 0x0000000000611a49 in ExecModifyTable () >>>> #8 0x00000000005ef97a in standard_ExecutorRun () >>>> #9 0x000000000072440a in ProcessQuery () >>>> #10 0x0000000000724631 in PortalRunMulti () >>>> #11 0x00000000007250ec in PortalRun () >>>> #12 0x0000000000721287 in exec_simple_query () >>>> #13 0x0000000000722532 in PostgresMain () >>>> #14 0x000000000047a9eb in ServerLoop () >>>> #15 0x00000000006b9fe9 in PostmasterMain () >>>> #16 0x000000000047b431 in main () >>>> >>>> Obviously there is nothing surprising here: if a lot of processes try to >>>> acquire the same exclusive lock, then high contention is expected. >>>> I just want to notice that this patch is not able to completely eliminate >>>> the problem with large number of concurrent inserts to the same table. >>>> >>>> Second problem we observed was even more critical: if backed is granted >>>> relation extension lock and then got some error before releasing this >>>> lock, >>>> then abort of the current transaction doesn't release this lock (unlike >>>> heavy weight lock) and the relation is kept locked. >>>> So database is actually stalled and server has to be restarted. >>>> >>> Thank you for reporting. >>> >>> Regarding the second problem, I tried to reproduce that bug with >>> latest version patch (v13) but could not. When transaction aborts, we >>> call >>> ReousrceOwnerRelease()->ResourceOwnerReleaseInternal()->ProcReleaseLocks()->RelExtLockCleanup() >>> and clear either relext lock bits we are holding or waiting. If we >>> raise an error after we added a relext lock bit but before we >>> increment its holding count then the relext lock is remained, but I >>> couldn't see the code raises an error between them. Could you please >>> share the concrete reproduction >> >> Sorry, my original guess that LW-locks are not released in case of >> transaction abort is not correct. >> There was really situation when all backends were blocked in relation >> extension lock and looks like it happens after disk write error, > You're saying that it is not correct that LWlock are not released but > it's correct that all backends were blocked in relext lock, but in > other your mail you're saying something opposite. Which is correct? I am sorry for confusion. I have not investigated core files myself and just share information received from our engineer. Looks like this problem may be related with relation extension locks at all. Sorry for false alarm.
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
On 2018-06-05 13:09:08 +0300, Alexander Korotkov wrote: > On Tue, Jun 5, 2018 at 12:48 PM Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: > > Workload is combination of inserts and selects. > > Looks like shared locks obtained by select cause starvation of inserts, trying to get exclusive relation extension lock. > > The problem is fixed by fair lwlock patch, implemented by Alexander Korotkov. This patch prevents granting of sharedlock if wait queue is not empty. > > May be we should use this patch or find some other way to prevent starvation of writers on relation extension locks forsuch workloads. > > Fair lwlock patch really fixed starvation of exclusive lwlock waiters. > But that starvation happens not on relation extension lock – selects > don't get shared relation extension lock. The real issue there was > not relation extension lock itself, but the time spent inside this > lock. Yea, that makes a lot more sense to me. > It appears that buffer replacement happening inside relation > extension lock is affected by starvation on exclusive buffer mapping > lwlocks and buffer content lwlocks, caused by many concurrent shared > lockers. So, fair lwlock patch have no direct influence to relation > extension lock, which is naturally not even lwlock... Yea, that makes sense. I wonder how much the fix here is to "pre-clear" a victim buffer, and how much is a saner buffer replacement implementation (either by going away from O(NBuffers), or by having a queue of clean victim buffers like my bgwriter replacement). > I'll post fair lwlock path in a separate thread. It requires detailed > consideration and benchmarking, because there is a risk of regression > on specific workloads. I bet that doing it naively will regress massively in a number of cases. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Alexander Korotkov
Date:
On Tue, Jun 5, 2018 at 4:02 PM Andres Freund <andres@anarazel.de> wrote: > On 2018-06-05 13:09:08 +0300, Alexander Korotkov wrote: > > It appears that buffer replacement happening inside relation > > extension lock is affected by starvation on exclusive buffer mapping > > lwlocks and buffer content lwlocks, caused by many concurrent shared > > lockers. So, fair lwlock patch have no direct influence to relation > > extension lock, which is naturally not even lwlock... > > Yea, that makes sense. I wonder how much the fix here is to "pre-clear" > a victim buffer, and how much is a saner buffer replacement > implementation (either by going away from O(NBuffers), or by having a > queue of clean victim buffers like my bgwriter replacement). The particular thing I observed on our environment is BufferAlloc() waiting hours on buffer partition lock. Increasing NUM_BUFFER_PARTITIONS didn't give any significant help. It appears that very hot page (root page of some frequently used index) reside on that partition, so this partition was continuously under shared lock. So, in order to resolve without changing LWLock, we probably should move our buffers hash table to something lockless. > > I'll post fair lwlock path in a separate thread. It requires detailed > > consideration and benchmarking, because there is a risk of regression > > on specific workloads. > > I bet that doing it naively will regress massively in a number of cases. Yes, I suspect the same. However, I tend to think that something is wrong with LWLock itself. It seems that it is the only of our locks, which provides some lockers almost infinite starvations under certain workloads. In contrast, even our SpinLock gives all the waiting processes nearly same chances to acquire it. So, I think idea of improving LWLock in this aspect deserves at least further investigation. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Jun 5, 2018 at 7:35 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Jun 5, 2018 at 4:02 PM Andres Freund <andres@anarazel.de> wrote:
> On 2018-06-05 13:09:08 +0300, Alexander Korotkov wrote:
> > It appears that buffer replacement happening inside relation
> > extension lock is affected by starvation on exclusive buffer mapping
> > lwlocks and buffer content lwlocks, caused by many concurrent shared
> > lockers. So, fair lwlock patch have no direct influence to relation
> > extension lock, which is naturally not even lwlock...
>
> Yea, that makes sense. I wonder how much the fix here is to "pre-clear"
> a victim buffer, and how much is a saner buffer replacement
> implementation (either by going away from O(NBuffers), or by having a
> queue of clean victim buffers like my bgwriter replacement).
The particular thing I observed on our environment is BufferAlloc()
waiting hours on buffer partition lock. Increasing NUM_BUFFER_PARTITIONS
didn't give any significant help. It appears that very hot page (root page of
some frequently used index) reside on that partition, so this partition was
continuously under shared lock. So, in order to resolve without changing
LWLock, we probably should move our buffers hash table to something
lockless.
I think Robert's chash stuff [1] might be helpful to reduce the contention you are seeing.
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: >>> I think the real question is whether the scenario is common enough to >>> worry about. In practice, you'd have to be extremely unlucky to be >>> doing many bulk loads at the same time that all happened to hash to >>> the same bucket. >> >> With a bunch of parallel bulkloads into partitioned tables that really >> doesn't seem that unlikely? > > It increases the likelihood of collisions, but probably decreases the > number of cases where the contention gets really bad. > > For example, suppose each table has 100 partitions and you are > bulk-loading 10 of them at a time. It's virtually certain that you > will have some collisions, but the amount of contention within each > bucket will remain fairly low because each backend spends only 1% of > its time in the bucket corresponding to any given partition. > I share another result of performance evaluation between current HEAD and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024). Type of table: normal table, unlogged table Number of child tables : 16, 64 (all tables are located on the same tablespace) Number of clients : 32 Number of trials : 100 Duration: 180 seconds for each trials The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB RAM, NVMe SSD 1.5TB. Each clients load 10kB random data across all partitioned tables. Here is the result. childs | type | target | avg_tps | diff with HEAD --------+----------+---------+------------+------------------ 16 | normal | HEAD | 1643.833 | 16 | normal | Patched | 1619.5404 | 0.985222 16 | unlogged | HEAD | 9069.3543 | 16 | unlogged | Patched | 9368.0263 | 1.032932 64 | normal | HEAD | 1598.698 | 64 | normal | Patched | 1587.5906 | 0.993052 64 | unlogged | HEAD | 9629.7315 | 64 | unlogged | Patched | 10208.2196 | 1.060073 (8 rows) For normal tables, loading tps decreased 1% ~ 2% with this patch whereas it increased 3% ~ 6% for unlogged tables. There were collisions at 0 ~ 5 relation extension lock slots between 2 relations in the 64 child tables case but it didn't seem to affect the tps. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Michael Paquier
Date:
On Wed, Jun 06, 2018 at 07:03:47PM +0530, Amit Kapila wrote: > I think Robert's chash stuff [1] might be helpful to reduce the contention > you are seeing. Latest patch available does not apply, so I moved it to next CF. The thread has died a bit as well... -- Michael
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dmitry Dolgov
Date:
> On Mon, Oct 1, 2018 at 8:54 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jun 06, 2018 at 07:03:47PM +0530, Amit Kapila wrote: > > I think Robert's chash stuff [1] might be helpful to reduce the contention > > you are seeing. > > Latest patch available does not apply, so I moved it to next CF. The > thread has died a bit as well... Unfortunately, patch is still needs to be rebased. Could you do this, are there any plans about the patch?
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Fri, Nov 30, 2018 at 1:17 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Mon, Oct 1, 2018 at 8:54 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Wed, Jun 06, 2018 at 07:03:47PM +0530, Amit Kapila wrote: > > > I think Robert's chash stuff [1] might be helpful to reduce the contention > > > you are seeing. > > > > Latest patch available does not apply, so I moved it to next CF. The > > thread has died a bit as well... > > Unfortunately, patch is still needs to be rebased. Could you do this, are there > any plans about the patch? I have a plan but it's a future plan. This patch is for parallel vacuum patch. As I mentioned at that thread[1], I'm focusing on only parallel index vacuum, which would not require the relation extension lock improvements for now. Therefore, I want to withdraw this patch and to reactivate when we need this enhancement. So I think we can mark it as 'Returned with feedback'. [1] https://www.postgresql.org/message-id/CAD21AoDhAutvKbQ37Btf4taMVbQaOaSvOpxpLgu814T1-OqYGg%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > >>> I think the real question is whether the scenario is common enough to > >>> worry about. In practice, you'd have to be extremely unlucky to be > >>> doing many bulk loads at the same time that all happened to hash to > >>> the same bucket. > >> > >> With a bunch of parallel bulkloads into partitioned tables that really > >> doesn't seem that unlikely? > > > > It increases the likelihood of collisions, but probably decreases the > > number of cases where the contention gets really bad. > > > > For example, suppose each table has 100 partitions and you are > > bulk-loading 10 of them at a time. It's virtually certain that you > > will have some collisions, but the amount of contention within each > > bucket will remain fairly low because each backend spends only 1% of > > its time in the bucket corresponding to any given partition. > > > > I share another result of performance evaluation between current HEAD > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024). > > Type of table: normal table, unlogged table > Number of child tables : 16, 64 (all tables are located on the same tablespace) > Number of clients : 32 > Number of trials : 100 > Duration: 180 seconds for each trials > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB > RAM, NVMe SSD 1.5TB. > Each clients load 10kB random data across all partitioned tables. > > Here is the result. > > childs | type | target | avg_tps | diff with HEAD > --------+----------+---------+------------+------------------ > 16 | normal | HEAD | 1643.833 | > 16 | normal | Patched | 1619.5404 | 0.985222 > 16 | unlogged | HEAD | 9069.3543 | > 16 | unlogged | Patched | 9368.0263 | 1.032932 > 64 | normal | HEAD | 1598.698 | > 64 | normal | Patched | 1587.5906 | 0.993052 > 64 | unlogged | HEAD | 9629.7315 | > 64 | unlogged | Patched | 10208.2196 | 1.060073 > (8 rows) > > For normal tables, loading tps decreased 1% ~ 2% with this patch > whereas it increased 3% ~ 6% for unlogged tables. There were > collisions at 0 ~ 5 relation extension lock slots between 2 relations > in the 64 child tables case but it didn't seem to affect the tps. > AFAIU, this resembles the workload that Andres was worried about. I think we should once run this test in a different environment, but considering this to be correct and repeatable, where do we go with this patch especially when we know it improves many workloads [1] as well. We know that on a pathological case constructed by Mithun [2], this causes regression as well. I am not sure if the test done by Mithun really mimics any real-world workload as he has tested by making N_RELEXTLOCK_ENTS = 1 to hit the worst case. Sawada-San, if you have a script or data for the test done by you, then please share it so that others can also try to reproduce it. [1] - https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9%40postgrespro.ru [2] - https://www.postgresql.org/message-id/CAD__Oug52j%3DDQMoP2b%3DVY7wZb0S9wMNu4irXOH3-ZjFkzWZPGg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > > >>> I think the real question is whether the scenario is common enough to > > >>> worry about. In practice, you'd have to be extremely unlucky to be > > >>> doing many bulk loads at the same time that all happened to hash to > > >>> the same bucket. > > >> > > >> With a bunch of parallel bulkloads into partitioned tables that really > > >> doesn't seem that unlikely? > > > > > > It increases the likelihood of collisions, but probably decreases the > > > number of cases where the contention gets really bad. > > > > > > For example, suppose each table has 100 partitions and you are > > > bulk-loading 10 of them at a time. It's virtually certain that you > > > will have some collisions, but the amount of contention within each > > > bucket will remain fairly low because each backend spends only 1% of > > > its time in the bucket corresponding to any given partition. > > > > > > > I share another result of performance evaluation between current HEAD > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024). > > > > Type of table: normal table, unlogged table > > Number of child tables : 16, 64 (all tables are located on the same tablespace) > > Number of clients : 32 > > Number of trials : 100 > > Duration: 180 seconds for each trials > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB > > RAM, NVMe SSD 1.5TB. > > Each clients load 10kB random data across all partitioned tables. > > > > Here is the result. > > > > childs | type | target | avg_tps | diff with HEAD > > --------+----------+---------+------------+------------------ > > 16 | normal | HEAD | 1643.833 | > > 16 | normal | Patched | 1619.5404 | 0.985222 > > 16 | unlogged | HEAD | 9069.3543 | > > 16 | unlogged | Patched | 9368.0263 | 1.032932 > > 64 | normal | HEAD | 1598.698 | > > 64 | normal | Patched | 1587.5906 | 0.993052 > > 64 | unlogged | HEAD | 9629.7315 | > > 64 | unlogged | Patched | 10208.2196 | 1.060073 > > (8 rows) > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch > > whereas it increased 3% ~ 6% for unlogged tables. There were > > collisions at 0 ~ 5 relation extension lock slots between 2 relations > > in the 64 child tables case but it didn't seem to affect the tps. > > > > AFAIU, this resembles the workload that Andres was worried about. I > think we should once run this test in a different environment, but > considering this to be correct and repeatable, where do we go with > this patch especially when we know it improves many workloads [1] as > well. We know that on a pathological case constructed by Mithun [2], > this causes regression as well. I am not sure if the test done by > Mithun really mimics any real-world workload as he has tested by > making N_RELEXTLOCK_ENTS = 1 to hit the worst case. > > Sawada-San, if you have a script or data for the test done by you, > then please share it so that others can also try to reproduce it. Unfortunately the environment I used for performance verification is no longer available. I agree to run this test in a different environment. I've attached the rebased version patch. I'm measuring the performance with/without patch, so will share the results. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:
> > > >>> I think the real question is whether the scenario is common enough to
> > > >>> worry about. In practice, you'd have to be extremely unlucky to be
> > > >>> doing many bulk loads at the same time that all happened to hash to
> > > >>> the same bucket.
> > > >>
> > > >> With a bunch of parallel bulkloads into partitioned tables that really
> > > >> doesn't seem that unlikely?
> > > >
> > > > It increases the likelihood of collisions, but probably decreases the
> > > > number of cases where the contention gets really bad.
> > > >
> > > > For example, suppose each table has 100 partitions and you are
> > > > bulk-loading 10 of them at a time. It's virtually certain that you
> > > > will have some collisions, but the amount of contention within each
> > > > bucket will remain fairly low because each backend spends only 1% of
> > > > its time in the bucket corresponding to any given partition.
> > > >
> > >
> > > I share another result of performance evaluation between current HEAD
> > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > >
> > > Type of table: normal table, unlogged table
> > > Number of child tables : 16, 64 (all tables are located on the same tablespace)
> > > Number of clients : 32
> > > Number of trials : 100
> > > Duration: 180 seconds for each trials
> > >
> > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > RAM, NVMe SSD 1.5TB.
> > > Each clients load 10kB random data across all partitioned tables.
> > >
> > > Here is the result.
> > >
> > > childs | type | target | avg_tps | diff with HEAD
> > > --------+----------+---------+------------+------------------
> > > 16 | normal | HEAD | 1643.833 |
> > > 16 | normal | Patched | 1619.5404 | 0.985222
> > > 16 | unlogged | HEAD | 9069.3543 |
> > > 16 | unlogged | Patched | 9368.0263 | 1.032932
> > > 64 | normal | HEAD | 1598.698 |
> > > 64 | normal | Patched | 1587.5906 | 0.993052
> > > 64 | unlogged | HEAD | 9629.7315 |
> > > 64 | unlogged | Patched | 10208.2196 | 1.060073
> > > (8 rows)
> > >
> > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > in the 64 child tables case but it didn't seem to affect the tps.
> > >
> >
> > AFAIU, this resembles the workload that Andres was worried about. I
> > think we should once run this test in a different environment, but
> > considering this to be correct and repeatable, where do we go with
> > this patch especially when we know it improves many workloads [1] as
> > well. We know that on a pathological case constructed by Mithun [2],
> > this causes regression as well. I am not sure if the test done by
> > Mithun really mimics any real-world workload as he has tested by
> > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> >
> > Sawada-San, if you have a script or data for the test done by you,
> > then please share it so that others can also try to reproduce it.
>
> Unfortunately the environment I used for performance verification is
> no longer available.
>
> I agree to run this test in a different environment. I've attached the
> rebased version patch. I'm measuring the performance with/without
> patch, so will share the results.
>
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:
> > > >>> I think the real question is whether the scenario is common enough to
> > > >>> worry about. In practice, you'd have to be extremely unlucky to be
> > > >>> doing many bulk loads at the same time that all happened to hash to
> > > >>> the same bucket.
> > > >>
> > > >> With a bunch of parallel bulkloads into partitioned tables that really
> > > >> doesn't seem that unlikely?
> > > >
> > > > It increases the likelihood of collisions, but probably decreases the
> > > > number of cases where the contention gets really bad.
> > > >
> > > > For example, suppose each table has 100 partitions and you are
> > > > bulk-loading 10 of them at a time. It's virtually certain that you
> > > > will have some collisions, but the amount of contention within each
> > > > bucket will remain fairly low because each backend spends only 1% of
> > > > its time in the bucket corresponding to any given partition.
> > > >
> > >
> > > I share another result of performance evaluation between current HEAD
> > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > >
> > > Type of table: normal table, unlogged table
> > > Number of child tables : 16, 64 (all tables are located on the same tablespace)
> > > Number of clients : 32
> > > Number of trials : 100
> > > Duration: 180 seconds for each trials
> > >
> > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > RAM, NVMe SSD 1.5TB.
> > > Each clients load 10kB random data across all partitioned tables.
> > >
> > > Here is the result.
> > >
> > > childs | type | target | avg_tps | diff with HEAD
> > > --------+----------+---------+------------+------------------
> > > 16 | normal | HEAD | 1643.833 |
> > > 16 | normal | Patched | 1619.5404 | 0.985222
> > > 16 | unlogged | HEAD | 9069.3543 |
> > > 16 | unlogged | Patched | 9368.0263 | 1.032932
> > > 64 | normal | HEAD | 1598.698 |
> > > 64 | normal | Patched | 1587.5906 | 0.993052
> > > 64 | unlogged | HEAD | 9629.7315 |
> > > 64 | unlogged | Patched | 10208.2196 | 1.060073
> > > (8 rows)
> > >
> > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > in the 64 child tables case but it didn't seem to affect the tps.
> > >
> >
> > AFAIU, this resembles the workload that Andres was worried about. I
> > think we should once run this test in a different environment, but
> > considering this to be correct and repeatable, where do we go with
> > this patch especially when we know it improves many workloads [1] as
> > well. We know that on a pathological case constructed by Mithun [2],
> > this causes regression as well. I am not sure if the test done by
> > Mithun really mimics any real-world workload as he has tested by
> > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> >
> > Sawada-San, if you have a script or data for the test done by you,
> > then please share it so that others can also try to reproduce it.
>
> Unfortunately the environment I used for performance verification is
> no longer available.
>
> I agree to run this test in a different environment. I've attached the
> rebased version patch. I'm measuring the performance with/without
> patch, so will share the results.
>
Thanks Sawada-san for patch.
From last few days, I was reading this thread and was reviewing v13 patch. To debug and test, I did re-base of v13 patch. I compared my re-based patch and v14 patch. I think, ordering of header files is not alphabetically in v14 patch. (I haven't reviewed v14 patch fully because before review, I wanted to test false sharing). While debugging, I didn't noticed any hang or lock related issue.
I did some testing to test false sharing(bulk insert, COPY data, bulk insert into partitions tables). Below is the testing summary.
Test setup(Bulk insert into partition tables):
autovacuum=off
shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min
Basically, I created a table with 13 partitions. Using pgbench, I inserted bulk data. I used below pgbench command:
./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres
I took scripts from previews mails and modified. For reference, I am attaching test scripts. I tested with default 1024 slots(N_RELEXTLOCK_ENTS = 1024).
Clients HEAD (tps) With v14 patch (tps) %change (time: 180s)
1 92.979796 100.877446 +8.49 %
32 392.881863 388.470622 -1.12 %
56 551.753235 528.018852 -4.30 %
60 648.273767 653.251507 +0.76 %
64 645.975124 671.322140 +3.92 %
66 662.728010 673.399762 +1.61 %
70 647.103183 660.694914 +2.10 %
74 648.824027 676.487622 +4.26 %
From above results, we can see that in most cases, TPS is slightly increased with v14 patch. I am still testing and will post my results.
I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then load will increase into extension lock. Is this correct way to test?
Please let me know if you have any specific testing scenario.
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, Feb 6, 2020 at 1:57 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote: > > > > >>> I think the real question is whether the scenario is common enough to > > > > >>> worry about. In practice, you'd have to be extremely unlucky to be > > > > >>> doing many bulk loads at the same time that all happened to hash to > > > > >>> the same bucket. > > > > >> > > > > >> With a bunch of parallel bulkloads into partitioned tables that really > > > > >> doesn't seem that unlikely? > > > > > > > > > > It increases the likelihood of collisions, but probably decreases the > > > > > number of cases where the contention gets really bad. > > > > > > > > > > For example, suppose each table has 100 partitions and you are > > > > > bulk-loading 10 of them at a time. It's virtually certain that you > > > > > will have some collisions, but the amount of contention within each > > > > > bucket will remain fairly low because each backend spends only 1% of > > > > > its time in the bucket corresponding to any given partition. > > > > > > > > > > > > > I share another result of performance evaluation between current HEAD > > > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024). > > > > > > > > Type of table: normal table, unlogged table > > > > Number of child tables : 16, 64 (all tables are located on the same tablespace) > > > > Number of clients : 32 > > > > Number of trials : 100 > > > > Duration: 180 seconds for each trials > > > > > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB > > > > RAM, NVMe SSD 1.5TB. > > > > Each clients load 10kB random data across all partitioned tables. > > > > > > > > Here is the result. > > > > > > > > childs | type | target | avg_tps | diff with HEAD > > > > --------+----------+---------+------------+------------------ > > > > 16 | normal | HEAD | 1643.833 | > > > > 16 | normal | Patched | 1619.5404 | 0.985222 > > > > 16 | unlogged | HEAD | 9069.3543 | > > > > 16 | unlogged | Patched | 9368.0263 | 1.032932 > > > > 64 | normal | HEAD | 1598.698 | > > > > 64 | normal | Patched | 1587.5906 | 0.993052 > > > > 64 | unlogged | HEAD | 9629.7315 | > > > > 64 | unlogged | Patched | 10208.2196 | 1.060073 > > > > (8 rows) > > > > > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch > > > > whereas it increased 3% ~ 6% for unlogged tables. There were > > > > collisions at 0 ~ 5 relation extension lock slots between 2 relations > > > > in the 64 child tables case but it didn't seem to affect the tps. > > > > > > > > > > AFAIU, this resembles the workload that Andres was worried about. I > > > think we should once run this test in a different environment, but > > > considering this to be correct and repeatable, where do we go with > > > this patch especially when we know it improves many workloads [1] as > > > well. We know that on a pathological case constructed by Mithun [2], > > > this causes regression as well. I am not sure if the test done by > > > Mithun really mimics any real-world workload as he has tested by > > > making N_RELEXTLOCK_ENTS = 1 to hit the worst case. > > > > > > Sawada-San, if you have a script or data for the test done by you, > > > then please share it so that others can also try to reproduce it. > > > > Unfortunately the environment I used for performance verification is > > no longer available. > > > > I agree to run this test in a different environment. I've attached the > > rebased version patch. I'm measuring the performance with/without > > patch, so will share the results. > > > > Thanks Sawada-san for patch. > > From last few days, I was reading this thread and was reviewing v13 patch. To debug and test, I did re-base of v13 patch.I compared my re-based patch and v14 patch. I think, ordering of header files is not alphabetically in v14 patch.(I haven't reviewed v14 patch fully because before review, I wanted to test false sharing). While debugging, I didn'tnoticed any hang or lock related issue. > > I did some testing to test false sharing(bulk insert, COPY data, bulk insert into partitions tables). Below is the testingsummary. > > Test setup(Bulk insert into partition tables): > autovacuum=off > shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min > > Basically, I created a table with 13 partitions. Using pgbench, I inserted bulk data. I used below pgbench command: > ./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres > > I took scripts from previews mails and modified. For reference, I am attaching test scripts. I tested with default 1024slots(N_RELEXTLOCK_ENTS = 1024). > > Clients HEAD (tps) With v14 patch (tps) %change (time: 180s) > 1 92.979796 100.877446 +8.49 % > 32 392.881863 388.470622 -1.12 % > 56 551.753235 528.018852 -4.30 % > 60 648.273767 653.251507 +0.76 % > 64 645.975124 671.322140 +3.92 % > 66 662.728010 673.399762 +1.61 % > 70 647.103183 660.694914 +2.10 % > 74 648.824027 676.487622 +4.26 % > > From above results, we can see that in most cases, TPS is slightly increased with v14 patch. I am still testing and willpost my results. > The number at 56 and 74 client count seem slightly suspicious. Can you please repeat those tests? Basically, I am not able to come up with a theory why at 56 clients the performance with the patch is a bit lower and then at 74 it is higher. > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then loadwill increase into extension lock. Is this correct way to test? > Hmm, I think instead of directly hacking the code, you might want to use the operation (probably cluster or vacuum full) where we set HEAP_INSERT_SKIP_FSM. I think along with this you can try with unlogged tables because that might stress the extension lock. In the above test, you might want to test with a higher number of partitions (say up to 100) as well. Also, see if you want to use the Copy command. > Please let me know if you have any specific testing scenario. > Can you test the scenario mentioned by Konstantin Knizhnik [1] where this patch has shown significant gain? You might want to use a higher core count machine to test it. One thing we can do is to somehow measure the collisions on each bucket. [1] - https://www.postgresql.org/message-id/ef81da49-d491-db86-3ef6-5138d091fe91%40postgrespro.ru -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Type of table: normal table, unlogged table > Number of child tables : 16, 64 (all tables are located on the same tablespace) > Number of clients : 32 > Number of trials : 100 > Duration: 180 seconds for each trials > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB > RAM, NVMe SSD 1.5TB. > Each clients load 10kB random data across all partitioned tables. > > Here is the result. > > childs | type | target | avg_tps | diff with HEAD > --------+----------+---------+------------+------------------ > 16 | normal | HEAD | 1643.833 | > 16 | normal | Patched | 1619.5404 | 0.985222 > 16 | unlogged | HEAD | 9069.3543 | > 16 | unlogged | Patched | 9368.0263 | 1.032932 > 64 | normal | HEAD | 1598.698 | > 64 | normal | Patched | 1587.5906 | 0.993052 > 64 | unlogged | HEAD | 9629.7315 | > 64 | unlogged | Patched | 10208.2196 | 1.060073 > (8 rows) > > For normal tables, loading tps decreased 1% ~ 2% with this patch > whereas it increased 3% ~ 6% for unlogged tables. There were > collisions at 0 ~ 5 relation extension lock slots between 2 relations > in the 64 child tables case but it didn't seem to affect the tps. > How did you measure the collisions in this test? I think it is better if Mahendra can also use the same technique in measuring that count. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Thu, 6 Feb 2020 at 13:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Type of table: normal table, unlogged table > > Number of child tables : 16, 64 (all tables are located on the same tablespace) > > Number of clients : 32 > > Number of trials : 100 > > Duration: 180 seconds for each trials > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB > > RAM, NVMe SSD 1.5TB. > > Each clients load 10kB random data across all partitioned tables. > > > > Here is the result. > > > > childs | type | target | avg_tps | diff with HEAD > > --------+----------+---------+------------+------------------ > > 16 | normal | HEAD | 1643.833 | > > 16 | normal | Patched | 1619.5404 | 0.985222 > > 16 | unlogged | HEAD | 9069.3543 | > > 16 | unlogged | Patched | 9368.0263 | 1.032932 > > 64 | normal | HEAD | 1598.698 | > > 64 | normal | Patched | 1587.5906 | 0.993052 > > 64 | unlogged | HEAD | 9629.7315 | > > 64 | unlogged | Patched | 10208.2196 | 1.060073 > > (8 rows) > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch > > whereas it increased 3% ~ 6% for unlogged tables. There were > > collisions at 0 ~ 5 relation extension lock slots between 2 relations > > in the 64 child tables case but it didn't seem to affect the tps. > > > > How did you measure the collisions in this test? I think it is better > if Mahendra can also use the same technique in measuring that count. > I created a created a SQL function that returns the hash value of the lock tag, which is tag_hash(locktag, sizeof(RelExtLockTag)) % N_RELEXTLOCK_ENTS. And examined the hash values of all tables. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Thu, 6 Feb 2020 at 09:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Feb 6, 2020 at 1:57 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:
> > > > > >>> I think the real question is whether the scenario is common enough to
> > > > > >>> worry about. In practice, you'd have to be extremely unlucky to be
> > > > > >>> doing many bulk loads at the same time that all happened to hash to
> > > > > >>> the same bucket.
> > > > > >>
> > > > > >> With a bunch of parallel bulkloads into partitioned tables that really
> > > > > >> doesn't seem that unlikely?
> > > > > >
> > > > > > It increases the likelihood of collisions, but probably decreases the
> > > > > > number of cases where the contention gets really bad.
> > > > > >
> > > > > > For example, suppose each table has 100 partitions and you are
> > > > > > bulk-loading 10 of them at a time. It's virtually certain that you
> > > > > > will have some collisions, but the amount of contention within each
> > > > > > bucket will remain fairly low because each backend spends only 1% of
> > > > > > its time in the bucket corresponding to any given partition.
> > > > > >
> > > > >
> > > > > I share another result of performance evaluation between current HEAD
> > > > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > > > >
> > > > > Type of table: normal table, unlogged table
> > > > > Number of child tables : 16, 64 (all tables are located on the same tablespace)
> > > > > Number of clients : 32
> > > > > Number of trials : 100
> > > > > Duration: 180 seconds for each trials
> > > > >
> > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > > > RAM, NVMe SSD 1.5TB.
> > > > > Each clients load 10kB random data across all partitioned tables.
> > > > >
> > > > > Here is the result.
> > > > >
> > > > > childs | type | target | avg_tps | diff with HEAD
> > > > > --------+----------+---------+------------+------------------
> > > > > 16 | normal | HEAD | 1643.833 |
> > > > > 16 | normal | Patched | 1619.5404 | 0.985222
> > > > > 16 | unlogged | HEAD | 9069.3543 |
> > > > > 16 | unlogged | Patched | 9368.0263 | 1.032932
> > > > > 64 | normal | HEAD | 1598.698 |
> > > > > 64 | normal | Patched | 1587.5906 | 0.993052
> > > > > 64 | unlogged | HEAD | 9629.7315 |
> > > > > 64 | unlogged | Patched | 10208.2196 | 1.060073
> > > > > (8 rows)
> > > > >
> > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > > > in the 64 child tables case but it didn't seem to affect the tps.
> > > > >
> > > >
> > > > AFAIU, this resembles the workload that Andres was worried about. I
> > > > think we should once run this test in a different environment, but
> > > > considering this to be correct and repeatable, where do we go with
> > > > this patch especially when we know it improves many workloads [1] as
> > > > well. We know that on a pathological case constructed by Mithun [2],
> > > > this causes regression as well. I am not sure if the test done by
> > > > Mithun really mimics any real-world workload as he has tested by
> > > > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> > > >
> > > > Sawada-San, if you have a script or data for the test done by you,
> > > > then please share it so that others can also try to reproduce it.
> > >
> > > Unfortunately the environment I used for performance verification is
> > > no longer available.
> > >
> > > I agree to run this test in a different environment. I've attached the
> > > rebased version patch. I'm measuring the performance with/without
> > > patch, so will share the results.
> > >
> >
> > Thanks Sawada-san for patch.
> >
> > From last few days, I was reading this thread and was reviewing v13 patch. To debug and test, I did re-base of v13 patch. I compared my re-based patch and v14 patch. I think, ordering of header files is not alphabetically in v14 patch. (I haven't reviewed v14 patch fully because before review, I wanted to test false sharing). While debugging, I didn't noticed any hang or lock related issue.
> >
> > I did some testing to test false sharing(bulk insert, COPY data, bulk insert into partitions tables). Below is the testing summary.
> >
> > Test setup(Bulk insert into partition tables):
> > autovacuum=off
> > shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min
> >
> > Basically, I created a table with 13 partitions. Using pgbench, I inserted bulk data. I used below pgbench command:
> > ./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres
> >
> > I took scripts from previews mails and modified. For reference, I am attaching test scripts. I tested with default 1024 slots(N_RELEXTLOCK_ENTS = 1024).
> >
> > Clients HEAD (tps) With v14 patch (tps) %change (time: 180s)
> > 1 92.979796 100.877446 +8.49 %
> > 32 392.881863 388.470622 -1.12 %
> > 56 551.753235 528.018852 -4.30 %
> > 60 648.273767 653.251507 +0.76 %
> > 64 645.975124 671.322140 +3.92 %
> > 66 662.728010 673.399762 +1.61 %
> > 70 647.103183 660.694914 +2.10 %
> > 74 648.824027 676.487622 +4.26 %
> >
> > From above results, we can see that in most cases, TPS is slightly increased with v14 patch. I am still testing and will post my results.
> >
>
> The number at 56 and 74 client count seem slightly suspicious. Can
> you please repeat those tests? Basically, I am not able to come up
> with a theory why at 56 clients the performance with the patch is a
> bit lower and then at 74 it is higher.
Okay. I will repeat test.
>
> > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then load will increase into extension lock. Is this correct way to test?
> >
>
> Hmm, I think instead of directly hacking the code, you might want to
> use the operation (probably cluster or vacuum full) where we set
> HEAP_INSERT_SKIP_FSM. I think along with this you can try with
> unlogged tables because that might stress the extension lock.
Okay. I will test.
>
> In the above test, you might want to test with a higher number of
> partitions (say up to 100) as well. Also, see if you want to use the
> Copy command.
Okay. I will test.
>
> > Please let me know if you have any specific testing scenario.
> >
>
> Can you test the scenario mentioned by Konstantin Knizhnik [1] where
> this patch has shown significant gain? You might want to use a higher
> core count machine to test it.
create table test (i int, md5 text);
insert.sql:
begin;
insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
end;
pgbench command:
./pgbench postgres -c 1000 -j 36 -T 180 -P 10 -f insert.sql >> results.txt
Run 1) : 608.908721
Run 2) : 599.962863
Run 3) : 606.378819
Run 4) : 607.174076
Run 5) : 598.531958
TPS with v14 patch: ( N_RELEXTLOCK_ENTS = 1024)
Run 1) : 649.488472
Run 2) : 657.902261
Run 3) : 654.478580
Run 4) : 648.085126
Run 5) : 647.171482
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Thu, Feb 6, 2020 at 1:57 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:
> > > > > >>> I think the real question is whether the scenario is common enough to
> > > > > >>> worry about. In practice, you'd have to be extremely unlucky to be
> > > > > >>> doing many bulk loads at the same time that all happened to hash to
> > > > > >>> the same bucket.
> > > > > >>
> > > > > >> With a bunch of parallel bulkloads into partitioned tables that really
> > > > > >> doesn't seem that unlikely?
> > > > > >
> > > > > > It increases the likelihood of collisions, but probably decreases the
> > > > > > number of cases where the contention gets really bad.
> > > > > >
> > > > > > For example, suppose each table has 100 partitions and you are
> > > > > > bulk-loading 10 of them at a time. It's virtually certain that you
> > > > > > will have some collisions, but the amount of contention within each
> > > > > > bucket will remain fairly low because each backend spends only 1% of
> > > > > > its time in the bucket corresponding to any given partition.
> > > > > >
> > > > >
> > > > > I share another result of performance evaluation between current HEAD
> > > > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > > > >
> > > > > Type of table: normal table, unlogged table
> > > > > Number of child tables : 16, 64 (all tables are located on the same tablespace)
> > > > > Number of clients : 32
> > > > > Number of trials : 100
> > > > > Duration: 180 seconds for each trials
> > > > >
> > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > > > RAM, NVMe SSD 1.5TB.
> > > > > Each clients load 10kB random data across all partitioned tables.
> > > > >
> > > > > Here is the result.
> > > > >
> > > > > childs | type | target | avg_tps | diff with HEAD
> > > > > --------+----------+---------+------------+------------------
> > > > > 16 | normal | HEAD | 1643.833 |
> > > > > 16 | normal | Patched | 1619.5404 | 0.985222
> > > > > 16 | unlogged | HEAD | 9069.3543 |
> > > > > 16 | unlogged | Patched | 9368.0263 | 1.032932
> > > > > 64 | normal | HEAD | 1598.698 |
> > > > > 64 | normal | Patched | 1587.5906 | 0.993052
> > > > > 64 | unlogged | HEAD | 9629.7315 |
> > > > > 64 | unlogged | Patched | 10208.2196 | 1.060073
> > > > > (8 rows)
> > > > >
> > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > > > in the 64 child tables case but it didn't seem to affect the tps.
> > > > >
> > > >
> > > > AFAIU, this resembles the workload that Andres was worried about. I
> > > > think we should once run this test in a different environment, but
> > > > considering this to be correct and repeatable, where do we go with
> > > > this patch especially when we know it improves many workloads [1] as
> > > > well. We know that on a pathological case constructed by Mithun [2],
> > > > this causes regression as well. I am not sure if the test done by
> > > > Mithun really mimics any real-world workload as he has tested by
> > > > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> > > >
> > > > Sawada-San, if you have a script or data for the test done by you,
> > > > then please share it so that others can also try to reproduce it.
> > >
> > > Unfortunately the environment I used for performance verification is
> > > no longer available.
> > >
> > > I agree to run this test in a different environment. I've attached the
> > > rebased version patch. I'm measuring the performance with/without
> > > patch, so will share the results.
> > >
> >
> > Thanks Sawada-san for patch.
> >
> > From last few days, I was reading this thread and was reviewing v13 patch. To debug and test, I did re-base of v13 patch. I compared my re-based patch and v14 patch. I think, ordering of header files is not alphabetically in v14 patch. (I haven't reviewed v14 patch fully because before review, I wanted to test false sharing). While debugging, I didn't noticed any hang or lock related issue.
> >
> > I did some testing to test false sharing(bulk insert, COPY data, bulk insert into partitions tables). Below is the testing summary.
> >
> > Test setup(Bulk insert into partition tables):
> > autovacuum=off
> > shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min
> >
> > Basically, I created a table with 13 partitions. Using pgbench, I inserted bulk data. I used below pgbench command:
> > ./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres
> >
> > I took scripts from previews mails and modified. For reference, I am attaching test scripts. I tested with default 1024 slots(N_RELEXTLOCK_ENTS = 1024).
> >
> > Clients HEAD (tps) With v14 patch (tps) %change (time: 180s)
> > 1 92.979796 100.877446 +8.49 %
> > 32 392.881863 388.470622 -1.12 %
> > 56 551.753235 528.018852 -4.30 %
> > 60 648.273767 653.251507 +0.76 %
> > 64 645.975124 671.322140 +3.92 %
> > 66 662.728010 673.399762 +1.61 %
> > 70 647.103183 660.694914 +2.10 %
> > 74 648.824027 676.487622 +4.26 %
> >
> > From above results, we can see that in most cases, TPS is slightly increased with v14 patch. I am still testing and will post my results.
> >
>
> The number at 56 and 74 client count seem slightly suspicious. Can
> you please repeat those tests? Basically, I am not able to come up
> with a theory why at 56 clients the performance with the patch is a
> bit lower and then at 74 it is higher.
Okay. I will repeat test.
>
> > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then load will increase into extension lock. Is this correct way to test?
> >
>
> Hmm, I think instead of directly hacking the code, you might want to
> use the operation (probably cluster or vacuum full) where we set
> HEAP_INSERT_SKIP_FSM. I think along with this you can try with
> unlogged tables because that might stress the extension lock.
Okay. I will test.
>
> In the above test, you might want to test with a higher number of
> partitions (say up to 100) as well. Also, see if you want to use the
> Copy command.
Okay. I will test.
>
> > Please let me know if you have any specific testing scenario.
> >
>
> Can you test the scenario mentioned by Konstantin Knizhnik [1] where
> this patch has shown significant gain? You might want to use a higher
> core count machine to test it.
I followed Konstantin Knizhnik steps and tested insert with high core . Below is the test summary:
Test setup:
autovacuum = off
max_connections = 1000
My testing machine:
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
insert.sql:
begin;
insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
end;
pgbench command:
./pgbench postgres -c 1000 -j 36 -T 180 -P 10 -f insert.sql >> results.txt
I tested with 1000 clients. Below it the tps:
TPS on HEAD:Run 1) : 608.908721
Run 2) : 599.962863
Run 3) : 606.378819
Run 4) : 607.174076
Run 5) : 598.531958
TPS with v14 patch: ( N_RELEXTLOCK_ENTS = 1024)
Run 1) : 649.488472
Run 2) : 657.902261
Run 3) : 654.478580
Run 4) : 648.085126
Run 5) : 647.171482
%change = +7.10 %
Apart from above test, I did some more tests (N_RELEXTLOCK_ENTS = 1024):
1) bulk insert into 1 table for T = 180s, 3600s, clients-100,1000, table- logged,unlogged
2) copy command
3) bulk load into table having 13 partitions
In all the cases, I can see 4-9% improvement in TPS as compared to HEAD.
@Konstantin Knizhnik, if you remember, then please let me know that how much tps gain was observed in your insert test? Is it nearby to my results?
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Sat, 8 Feb 2020 at 00:27, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Thu, 6 Feb 2020 at 09:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Feb 6, 2020 at 1:57 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > > > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:
> > > > > > >>> I think the real question is whether the scenario is common enough to
> > > > > > >>> worry about. In practice, you'd have to be extremely unlucky to be
> > > > > > >>> doing many bulk loads at the same time that all happened to hash to
> > > > > > >>> the same bucket.
> > > > > > >>
> > > > > > >> With a bunch of parallel bulkloads into partitioned tables that really
> > > > > > >> doesn't seem that unlikely?
> > > > > > >
> > > > > > > It increases the likelihood of collisions, but probably decreases the
> > > > > > > number of cases where the contention gets really bad.
> > > > > > >
> > > > > > > For example, suppose each table has 100 partitions and you are
> > > > > > > bulk-loading 10 of them at a time. It's virtually certain that you
> > > > > > > will have some collisions, but the amount of contention within each
> > > > > > > bucket will remain fairly low because each backend spends only 1% of
> > > > > > > its time in the bucket corresponding to any given partition.
> > > > > > >
> > > > > >
> > > > > > I share another result of performance evaluation between current HEAD
> > > > > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > > > > >
> > > > > > Type of table: normal table, unlogged table
> > > > > > Number of child tables : 16, 64 (all tables are located on the same tablespace)
> > > > > > Number of clients : 32
> > > > > > Number of trials : 100
> > > > > > Duration: 180 seconds for each trials
> > > > > >
> > > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > > > > RAM, NVMe SSD 1.5TB.
> > > > > > Each clients load 10kB random data across all partitioned tables.
> > > > > >
> > > > > > Here is the result.
> > > > > >
> > > > > > childs | type | target | avg_tps | diff with HEAD
> > > > > > --------+----------+---------+------------+------------------
> > > > > > 16 | normal | HEAD | 1643.833 |
> > > > > > 16 | normal | Patched | 1619.5404 | 0.985222
> > > > > > 16 | unlogged | HEAD | 9069.3543 |
> > > > > > 16 | unlogged | Patched | 9368.0263 | 1.032932
> > > > > > 64 | normal | HEAD | 1598.698 |
> > > > > > 64 | normal | Patched | 1587.5906 | 0.993052
> > > > > > 64 | unlogged | HEAD | 9629.7315 |
> > > > > > 64 | unlogged | Patched | 10208.2196 | 1.060073
> > > > > > (8 rows)
> > > > > >
> > > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > > > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > > > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > > > > in the 64 child tables case but it didn't seem to affect the tps.
> > > > > >
> > > > >
> > > > > AFAIU, this resembles the workload that Andres was worried about. I
> > > > > think we should once run this test in a different environment, but
> > > > > considering this to be correct and repeatable, where do we go with
> > > > > this patch especially when we know it improves many workloads [1] as
> > > > > well. We know that on a pathological case constructed by Mithun [2],
> > > > > this causes regression as well. I am not sure if the test done by
> > > > > Mithun really mimics any real-world workload as he has tested by
> > > > > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> > > > >
> > > > > Sawada-San, if you have a script or data for the test done by you,
> > > > > then please share it so that others can also try to reproduce it.
> > > >
> > > > Unfortunately the environment I used for performance verification is
> > > > no longer available.
> > > >
> > > > I agree to run this test in a different environment. I've attached the
> > > > rebased version patch. I'm measuring the performance with/without
> > > > patch, so will share the results.
> > > >
> > >
> > > Thanks Sawada-san for patch.
> > >
> > > From last few days, I was reading this thread and was reviewing v13 patch. To debug and test, I did re-base of v13 patch. I compared my re-based patch and v14 patch. I think, ordering of header files is not alphabetically in v14 patch. (I haven't reviewed v14 patch fully because before review, I wanted to test false sharing). While debugging, I didn't noticed any hang or lock related issue.
> > >
> > > I did some testing to test false sharing(bulk insert, COPY data, bulk insert into partitions tables). Below is the testing summary.
> > >
> > > Test setup(Bulk insert into partition tables):
> > > autovacuum=off
> > > shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min
> > >
> > > Basically, I created a table with 13 partitions. Using pgbench, I inserted bulk data. I used below pgbench command:
> > > ./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres
> > >
> > > I took scripts from previews mails and modified. For reference, I am attaching test scripts. I tested with default 1024 slots(N_RELEXTLOCK_ENTS = 1024).
> > >
> > > Clients HEAD (tps) With v14 patch (tps) %change (time: 180s)
> > > 1 92.979796 100.877446 +8.49 %
> > > 32 392.881863 388.470622 -1.12 %
> > > 56 551.753235 528.018852 -4.30 %
> > > 60 648.273767 653.251507 +0.76 %
> > > 64 645.975124 671.322140 +3.92 %
> > > 66 662.728010 673.399762 +1.61 %
> > > 70 647.103183 660.694914 +2.10 %
> > > 74 648.824027 676.487622 +4.26 %
> > >
> > > From above results, we can see that in most cases, TPS is slightly increased with v14 patch. I am still testing and will post my results.
> > >
> >
> > The number at 56 and 74 client count seem slightly suspicious. Can
> > you please repeat those tests? Basically, I am not able to come up
> > with a theory why at 56 clients the performance with the patch is a
> > bit lower and then at 74 it is higher.
>
> Okay. I will repeat test.
I re-tested in different machine because in previous machine, results are in-consistent
My testing machine:
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres
Clients HEAD(tps) With v14 patch(tps) %change (time: 180s)
1 41.491486 41.375532 -0.27%
32 335.138568 330.028739 -1.52%
56 353.783930 360.883710 +2.00%
60 341.741925 359.028041 +5.05%
64 338.521730 356.511423 +5.13%
66 339.838921 352.761766 +3.80%
70 339.305454 353.658425 +4.23%
74 332.016217 348.809042 +5.05%
>
> >
> > > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then load will increase into extension lock. Is this correct way to test?
> > >
> >
> > Hmm, I think instead of directly hacking the code, you might want to
> > use the operation (probably cluster or vacuum full) where we set
> > HEAP_INSERT_SKIP_FSM. I think along with this you can try with
> > unlogged tables because that might stress the extension lock.
>
>
> >
> > In the above test, you might want to test with a higher number of
> > partitions (say up to 100) as well. Also, see if you want to use the
> > Copy command.
>
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Thu, 6 Feb 2020 at 09:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Feb 6, 2020 at 1:57 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > > > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <andres@anarazel.de> wrote:
> > > > > > >>> I think the real question is whether the scenario is common enough to
> > > > > > >>> worry about. In practice, you'd have to be extremely unlucky to be
> > > > > > >>> doing many bulk loads at the same time that all happened to hash to
> > > > > > >>> the same bucket.
> > > > > > >>
> > > > > > >> With a bunch of parallel bulkloads into partitioned tables that really
> > > > > > >> doesn't seem that unlikely?
> > > > > > >
> > > > > > > It increases the likelihood of collisions, but probably decreases the
> > > > > > > number of cases where the contention gets really bad.
> > > > > > >
> > > > > > > For example, suppose each table has 100 partitions and you are
> > > > > > > bulk-loading 10 of them at a time. It's virtually certain that you
> > > > > > > will have some collisions, but the amount of contention within each
> > > > > > > bucket will remain fairly low because each backend spends only 1% of
> > > > > > > its time in the bucket corresponding to any given partition.
> > > > > > >
> > > > > >
> > > > > > I share another result of performance evaluation between current HEAD
> > > > > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > > > > >
> > > > > > Type of table: normal table, unlogged table
> > > > > > Number of child tables : 16, 64 (all tables are located on the same tablespace)
> > > > > > Number of clients : 32
> > > > > > Number of trials : 100
> > > > > > Duration: 180 seconds for each trials
> > > > > >
> > > > > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > > > > RAM, NVMe SSD 1.5TB.
> > > > > > Each clients load 10kB random data across all partitioned tables.
> > > > > >
> > > > > > Here is the result.
> > > > > >
> > > > > > childs | type | target | avg_tps | diff with HEAD
> > > > > > --------+----------+---------+------------+------------------
> > > > > > 16 | normal | HEAD | 1643.833 |
> > > > > > 16 | normal | Patched | 1619.5404 | 0.985222
> > > > > > 16 | unlogged | HEAD | 9069.3543 |
> > > > > > 16 | unlogged | Patched | 9368.0263 | 1.032932
> > > > > > 64 | normal | HEAD | 1598.698 |
> > > > > > 64 | normal | Patched | 1587.5906 | 0.993052
> > > > > > 64 | unlogged | HEAD | 9629.7315 |
> > > > > > 64 | unlogged | Patched | 10208.2196 | 1.060073
> > > > > > (8 rows)
> > > > > >
> > > > > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > > > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > > > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > > > > in the 64 child tables case but it didn't seem to affect the tps.
> > > > > >
> > > > >
> > > > > AFAIU, this resembles the workload that Andres was worried about. I
> > > > > think we should once run this test in a different environment, but
> > > > > considering this to be correct and repeatable, where do we go with
> > > > > this patch especially when we know it improves many workloads [1] as
> > > > > well. We know that on a pathological case constructed by Mithun [2],
> > > > > this causes regression as well. I am not sure if the test done by
> > > > > Mithun really mimics any real-world workload as he has tested by
> > > > > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> > > > >
> > > > > Sawada-San, if you have a script or data for the test done by you,
> > > > > then please share it so that others can also try to reproduce it.
> > > >
> > > > Unfortunately the environment I used for performance verification is
> > > > no longer available.
> > > >
> > > > I agree to run this test in a different environment. I've attached the
> > > > rebased version patch. I'm measuring the performance with/without
> > > > patch, so will share the results.
> > > >
> > >
> > > Thanks Sawada-san for patch.
> > >
> > > From last few days, I was reading this thread and was reviewing v13 patch. To debug and test, I did re-base of v13 patch. I compared my re-based patch and v14 patch. I think, ordering of header files is not alphabetically in v14 patch. (I haven't reviewed v14 patch fully because before review, I wanted to test false sharing). While debugging, I didn't noticed any hang or lock related issue.
> > >
> > > I did some testing to test false sharing(bulk insert, COPY data, bulk insert into partitions tables). Below is the testing summary.
> > >
> > > Test setup(Bulk insert into partition tables):
> > > autovacuum=off
> > > shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min
> > >
> > > Basically, I created a table with 13 partitions. Using pgbench, I inserted bulk data. I used below pgbench command:
> > > ./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres
> > >
> > > I took scripts from previews mails and modified. For reference, I am attaching test scripts. I tested with default 1024 slots(N_RELEXTLOCK_ENTS = 1024).
> > >
> > > Clients HEAD (tps) With v14 patch (tps) %change (time: 180s)
> > > 1 92.979796 100.877446 +8.49 %
> > > 32 392.881863 388.470622 -1.12 %
> > > 56 551.753235 528.018852 -4.30 %
> > > 60 648.273767 653.251507 +0.76 %
> > > 64 645.975124 671.322140 +3.92 %
> > > 66 662.728010 673.399762 +1.61 %
> > > 70 647.103183 660.694914 +2.10 %
> > > 74 648.824027 676.487622 +4.26 %
> > >
> > > From above results, we can see that in most cases, TPS is slightly increased with v14 patch. I am still testing and will post my results.
> > >
> >
> > The number at 56 and 74 client count seem slightly suspicious. Can
> > you please repeat those tests? Basically, I am not able to come up
> > with a theory why at 56 clients the performance with the patch is a
> > bit lower and then at 74 it is higher.
>
> Okay. I will repeat test.
I re-tested in different machine because in previous machine, results are in-consistent
My testing machine:
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
Clients HEAD(tps) With v14 patch(tps) %change (time: 180s)
1 41.491486 41.375532 -0.27%
32 335.138568 330.028739 -1.52%
56 353.783930 360.883710 +2.00%
60 341.741925 359.028041 +5.05%
64 338.521730 356.511423 +5.13%
66 339.838921 352.761766 +3.80%
70 339.305454 353.658425 +4.23%
74 332.016217 348.809042 +5.05%
From above results, it seems that there is very little regression with the patch(+-5%) that can be run to run variation.
> >
> > > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, then load will increase into extension lock. Is this correct way to test?
> > >
> >
> > Hmm, I think instead of directly hacking the code, you might want to
> > use the operation (probably cluster or vacuum full) where we set
> > HEAP_INSERT_SKIP_FSM. I think along with this you can try with
> > unlogged tables because that might stress the extension lock.
>
> Okay. I will test.
I tested with unlogged tables also. There also I was getting 3-6% gain in tps.
> >
> > In the above test, you might want to test with a higher number of
> > partitions (say up to 100) as well. Also, see if you want to use the
> > Copy command.
>
> Okay. I will test.
I tested with 500, 1000, 2000 paratitions. I observed max +5% regress in the tps and there was no performace degradation.
For example:
I created a table with 2000 paratitions and then I checked false sharing.
Slot Number | Slot Freq. | Slot Number | Slot Freq. | Slot Number | Slot Freq. |
156 | 13 | 973 | 11 | 446 | 10 |
627 | 13 | 52 | 10 | 488 | 10 |
782 | 12 | 103 | 10 | 501 | 10 |
812 | 12 | 113 | 10 | 701 | 10 |
192 | 11 | 175 | 10 | 737 | 10 |
221 | 11 | 235 | 10 | 754 | 10 |
367 | 11 | 254 | 10 | 781 | 10 |
546 | 11 | 314 | 10 | 790 | 10 |
814 | 11 | 419 | 10 | 833 | 10 |
917 | 11 | 424 | 10 | 888 | 10 |
From above table, we can see that total 13 child tables are falling in same backet (slot 156) so I did bulk-loading only in those 13 child tables to check tps in false sharing but I noticed that there was no performance degradation.
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Mon, Feb 10, 2020 at 10:28 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Sat, 8 Feb 2020 at 00:27, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Thu, 6 Feb 2020 at 09:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > The number at 56 and 74 client count seem slightly suspicious. Can > > > you please repeat those tests? Basically, I am not able to come up > > > with a theory why at 56 clients the performance with the patch is a > > > bit lower and then at 74 it is higher. > > > > Okay. I will repeat test. > > I re-tested in different machine because in previous machine, results are in-consistent > Thanks for doing detailed tests. > My testing machine: > $ lscpu > Architecture: ppc64le > Byte Order: Little Endian > CPU(s): 192 > On-line CPU(s) list: 0-191 > Thread(s) per core: 8 > Core(s) per socket: 1 > Socket(s): 24 > NUMA node(s): 4 > Model: IBM,8286-42A > L1d cache: 64K > L1i cache: 32K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): 0-47 > NUMA node1 CPU(s): 48-95 > NUMA node2 CPU(s): 96-143 > NUMA node3 CPU(s): 144-191 > > ./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1 -f insert3.sql@1 -f insert4.sql@1 postgres > > Clients HEAD(tps) With v14 patch(tps) %change (time: 180s) > 1 41.491486 41.375532 -0.27% > 32 335.138568 330.028739 -1.52% > 56 353.783930 360.883710 +2.00% > 60 341.741925 359.028041 +5.05% > 64 338.521730 356.511423 +5.13% > 66 339.838921 352.761766 +3.80% > 70 339.305454 353.658425 +4.23% > 74 332.016217 348.809042 +5.05% > > From above results, it seems that there is very little regression with the patch(+-5%) that can be run to run variation. > Hmm, I don't see 5% regression, rather it is a performance gain of ~5% with the patch? When we use regression, that indicates with the patch performance (TPS) is reduced, but I don't see that in the above numbers. Kindly clarify. > > > > > > > > > I want to test extension lock by blocking use of fsm(use_fsm=false in code). I think, if we block use of fsm, thenload will increase into extension lock. Is this correct way to test? > > > > > > > > > > Hmm, I think instead of directly hacking the code, you might want to > > > use the operation (probably cluster or vacuum full) where we set > > > HEAP_INSERT_SKIP_FSM. I think along with this you can try with > > > unlogged tables because that might stress the extension lock. > > > > Okay. I will test. > > I tested with unlogged tables also. There also I was getting 3-6% gain in tps. > > > > > > > > > In the above test, you might want to test with a higher number of > > > partitions (say up to 100) as well. Also, see if you want to use the > > > Copy command. > > > > Okay. I will test. > > I tested with 500, 1000, 2000 paratitions. I observed max +5% regress in the tps and there was no performace degradation. > Again, I am not sure if you see performance dip here. I think your usage of the word 'regression' is not correct or at least confusing. > For example: > I created a table with 2000 paratitions and then I checked false sharing. > Slot NumberSlot Freq.Slot NumberSlot Freq.Slot NumberSlot Freq. > 156139731144610 > 62713521048810 > 782121031050110 > 812121131070110 > 192111751073710 > 221112351075410 > 367112541078110 > 546113141079010 > 814114191083310 > 917114241088810 > > From above table, we can see that total 13 child tables are falling in same backet (slot 156) so I did bulk-loading onlyin those 13 child tables to check tps in false sharing but I noticed that there was no performance degradation. > Okay. Is it possible to share these numbers and scripts? Thanks for doing the detailed tests for this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Wed, Feb 5, 2020 at 12:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > Unfortunately the environment I used for performance verification is > no longer available. > > I agree to run this test in a different environment. I've attached the > rebased version patch. I'm measuring the performance with/without > patch, so will share the results. > Did you get a chance to run these tests? Lately, Mahendra has done a lot of performance testing of this patch and shared his results. I don't see much downside with the patch, rather there is a performance increase of 3-9% in various scenarios. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
I took a brief look through this patch. I agree with the fundamental idea that we shouldn't need to use the heavyweight lock manager for relation extension, since deadlock is not a concern and no backend should ever need to hold more than one such lock at once. But it feels to me like this particular solution is rather seriously overengineered. I would like to suggest that we do something similar to Robert Haas' excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, that is, * Create some predetermined number N of LWLocks for relation extension. * When we want to extend some relation R, choose one of those locks (say, R's relfilenode number mod N) and lock it. 1. As long as all backends agree on the relation-to-lock mapping, this provides full security against concurrent extensions of the same relation. 2. Occasionally a backend will be blocked when it doesn't need to be, because of false sharing of a lock between two relations that need to be extended at the same time. But as long as N is large enough (and I doubt that it needs to be very large), that will be a negligible penalty. 3. Aside from being a lot simpler than the proposed extension_lock.c, this approach involves absolutely negligible overhead beyond the raw LWLockAcquire and LWLockRelease calls. I suspect therefore that in typical noncontended cases it will be faster. It also does not require any new resource management overhead, thus eliminating this patch's small but real penalty on transaction exit/cleanup. We'd need to do a bit of performance testing to choose a good value for N. I think that with N comparable to MaxBackends, the odds of false sharing being a problem would be quite negligible ... but it could be that we could get away with a lot less than that. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Wed, 12 Feb 2020 at 00:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I took a brief look through this patch. I agree with the fundamental > idea that we shouldn't need to use the heavyweight lock manager for > relation extension, since deadlock is not a concern and no backend > should ever need to hold more than one such lock at once. But it feels > to me like this particular solution is rather seriously overengineered. > I would like to suggest that we do something similar to Robert Haas' > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > that is, > > * Create some predetermined number N of LWLocks for relation extension. My original proposal used LWLocks and hash tables for relation extension but there was a discussion that using LWLocks is not good because it's not interruptible[1]. Because of this reason and that we don't need to have two lock level (shared, exclusive) for relation extension lock we ended up with implementing dedicated lock manager for extension lock. I think we will have that problem if we use LWLocks. Regards, [1] https://www.postgresql.org/message-id/CA%2BTgmoZnWYQvmeqeGyY%2B0j-Tfmx8cTzRadfxJQwK9A-nCQ7GkA%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2020-02-11 08:01:34 +0530, Amit Kapila wrote: > I don't see much downside with the patch, rather there is a > performance increase of 3-9% in various scenarios. As I wrote in [1] I started to look at this patch. My problem with itis that it just seems like the wrong direction architecturally to me. There's two main aspects to this: 1) It basically builds a another, more lightweight but less capable, of a lock manager that can lock more objects than we can have distinct locks for. It is faster because it uses *one* hashtable without conflict handling, because it has fewer lock modes, and because it doesn't support detecting deadlocks. And probably some other things. 2) A lot of the contention around file extension comes from us doing multiple expensive things under one lock (determining current relation size, searching victim buffer, extending file), and in tiny increments (growing a 1TB table by 8kb). This patch doesn't address that at all. I've focused on 1) in the email referenced above ([1]). Here I'll focus on 2). To quantify my concerns I instrumented postgres to measure the time for various operations that are part of extending a file (all per process). The hardware is a pretty fast nvme, with unlogged tables, on a 20/40 core/threads machine. The workload is copying a scale 10 pgbench_accounts into an unindexed, unlogged table using pgbench. Here are the instrumentations for various client counts, when just measuring 20s: 1 client: LOG: extension time: lock wait: 0.00 lock held: 3.19 filesystem: 1.29 buffersearch: 1.58 2 clients: LOG: extension time: lock wait: 0.47 lock held: 2.99 filesystem: 1.24 buffersearch: 1.43 LOG: extension time: lock wait: 0.60 lock held: 3.05 filesystem: 1.23 buffersearch: 1.50 4 clients: LOG: extension time: lock wait: 3.92 lock held: 2.69 filesystem: 1.10 buffersearch: 1.29 LOG: extension time: lock wait: 4.40 lock held: 2.02 filesystem: 0.81 buffersearch: 0.93 LOG: extension time: lock wait: 3.86 lock held: 2.59 filesystem: 1.06 buffersearch: 1.22 LOG: extension time: lock wait: 4.00 lock held: 2.65 filesystem: 1.08 buffersearch: 1.26 8 clients: LOG: extension time: lock wait: 6.94 lock held: 1.74 filesystem: 0.70 buffersearch: 0.80 LOG: extension time: lock wait: 7.16 lock held: 1.81 filesystem: 0.73 buffersearch: 0.82 LOG: extension time: lock wait: 6.93 lock held: 1.95 filesystem: 0.80 buffersearch: 0.89 LOG: extension time: lock wait: 7.08 lock held: 1.87 filesystem: 0.76 buffersearch: 0.86 LOG: extension time: lock wait: 6.95 lock held: 1.95 filesystem: 0.80 buffersearch: 0.89 LOG: extension time: lock wait: 6.88 lock held: 2.01 filesystem: 0.83 buffersearch: 0.93 LOG: extension time: lock wait: 6.94 lock held: 2.02 filesystem: 0.82 buffersearch: 0.93 LOG: extension time: lock wait: 7.02 lock held: 1.95 filesystem: 0.80 buffersearch: 0.89 16 clients: LOG: extension time: lock wait: 10.37 lock held: 0.88 filesystem: 0.36 buffersearch: 0.39 LOG: extension time: lock wait: 10.53 lock held: 0.90 filesystem: 0.37 buffersearch: 0.40 LOG: extension time: lock wait: 10.72 lock held: 1.01 filesystem: 0.42 buffersearch: 0.45 LOG: extension time: lock wait: 10.45 lock held: 1.25 filesystem: 0.52 buffersearch: 0.55 LOG: extension time: lock wait: 10.66 lock held: 0.94 filesystem: 0.38 buffersearch: 0.41 LOG: extension time: lock wait: 10.50 lock held: 1.27 filesystem: 0.53 buffersearch: 0.56 LOG: extension time: lock wait: 10.53 lock held: 1.19 filesystem: 0.49 buffersearch: 0.53 LOG: extension time: lock wait: 10.57 lock held: 1.22 filesystem: 0.50 buffersearch: 0.53 LOG: extension time: lock wait: 10.72 lock held: 1.17 filesystem: 0.48 buffersearch: 0.52 LOG: extension time: lock wait: 10.67 lock held: 1.32 filesystem: 0.55 buffersearch: 0.58 LOG: extension time: lock wait: 10.95 lock held: 0.92 filesystem: 0.38 buffersearch: 0.40 LOG: extension time: lock wait: 10.81 lock held: 1.24 filesystem: 0.51 buffersearch: 0.56 LOG: extension time: lock wait: 10.62 lock held: 1.27 filesystem: 0.53 buffersearch: 0.56 LOG: extension time: lock wait: 11.14 lock held: 0.94 filesystem: 0.38 buffersearch: 0.41 LOG: extension time: lock wait: 11.20 lock held: 0.96 filesystem: 0.39 buffersearch: 0.42 LOG: extension time: lock wait: 10.75 lock held: 1.41 filesystem: 0.58 buffersearch: 0.63 0.88 + 0.90 + 1.01 + 1.25 + 0.94 + 1.27 + 1.19 + 1.22 + 1.17 + 1.32 + 0.92 + 1.24 + 1.27 + 0.94 + 0.96 + 1.41 in *none* of these cases the drive gets even close to being saturated. Like not even 1/3. If you consider the total time with the lock held, and the total time of the test, it becomes very quickly obvious that pretty quickly we spend the majority of the total time with the lock held. client count 1: 3.18/20 = 0.16 client count 2: 6.04/20 = 0.30 client count 4: 9.95/20 = 0.50 client count 8: 15.30/20 = 0.76 client count 16: 17.89/20 = 0.89 In other words, the reason that relation extension scales terribly isn't, to a significant degree, because the locking is slow. It's because we hold locks for the majority of the benchmark's time starting even at just 4 clients. Focusing on making the locking faster is just optimizing for the wrong thing. Amdahl's law will just restrict the benefits to a pretty small amount. Looking at a CPU time profile (i.e. it'll not include the time spent waiting for a lock, once sleeping in the kernel) for time spent within RelationGetBufferForTuple(): - 19.16% 0.29% postgres postgres [.] RelationGetBufferForTuple - 18.88% RelationGetBufferForTuple - 13.18% ReadBufferExtended - ReadBuffer_common + 5.02% mdextend + 4.77% FlushBuffer.part.0 + 0.61% BufTableLookup 0.52% __memset_avx2_erms + 1.65% PageInit - 1.18% LockRelationForExtension - 1.16% LockAcquireExtended - 1.07% WaitOnLock - 1.01% ProcSleep - 0.88% WaitLatchOrSocket 0.52% WaitEventSetWait 0.65% RecordAndGetPageWithFreeSpace the same workload using an assert enabled build, to get a simpler to interpret profile: - 13.28% 0.19% postgres postgres [.] RelationGetBufferForTuple - 13.09% RelationGetBufferForTuple - 8.35% RelationAddExtraBlocks - 7.67% ReadBufferBI - 7.54% ReadBufferExtended - 7.52% ReadBuffer_common - 3.64% BufferAlloc + 2.39% FlushBuffer + 0.27% BufTableLookup + 0.24% BufTableDelete + 0.15% LWLockAcquire 0.14% StrategyGetBuffer + 0.13% BufTableHashCode - 2.96% smgrextend + mdextend + 0.52% __memset_avx2_erms + 0.14% smgrnblocks 0.11% __GI___clock_gettime (inlined) + 0.57% RecordPageWithFreeSpace - 1.23% RecordAndGetPageWithFreeSpace - 1.03% fsm_set_and_search + 0.50% fsm_readbuf + 0.20% LockBuffer + 0.18% UnlockReleaseBuffer 0.11% fsm_set_avail 0.19% fsm_search - 0.86% ReadBufferBI - 0.72% ReadBufferExtended - ReadBuffer_common - 0.58% BufferAlloc + 0.20% BufTableLookup + 0.10% LWLockAcquire + 0.81% PageInit - 0.67% LockRelationForExtension - 0.67% LockAcquire - LockAcquireExtended + 0.60% WaitOnLock Which, I think, pretty clearly shows a few things: 1) It's crucial to move acquiring a victim buffer to the outside of the extension lock, as for copy acquiring the victim buffer will commonly cause a buffer having to be written out, due to the ringbuffer. This is even more crucial when using a logged table, as the writeout then also will often also trigger a WAL flush. While doing so will sometimes add a round of acquiring the buffer mapping locks, having to do the FlushBuffer while holding the extension lock is a huge problem. This'd also move a good bit of the cost of finding (i.e. clock sweep / ringbuffer replacement) and invalidating the old buffer mapping out of the lock. 2) We need to make the smgrwrite more efficient, it is costing a lot of time. A small additional experiment shows the cost of doing 8kb writes: I wrote a small program that just iteratively writes a 32GB file: pwrite using 8kb blocks: 0.24user 17.88system 0:18.16 elapsed 99%CPU pwrite using 128kb blocks: 0.00user 16.71system 0:17.01 elapsed 98%CPU pwrite using 256kb blocks: 0.00user 15.95system 0:16.03 elapsed 99%CPU pwritev() using 16 8kb blocks to write 128kb at once: 0.02user 15.94system 0:16.09 elapsed 99%CPU pwritev() using 32 8kb blocks to write 256kb at once: 0.01user 14.90system 0:14.93 elapsed 99%CPU pwritev() using 128 8kb blocks to write 1MB at once: 0.00user 13.96system 0:13.96 elapsed 99%CPU if I instead just use posix_fallocate() with 8kb blocks: 0.28user 23.49system 0:23.78elapsed 99%CPU (0avgtext+0avgdata 1212maxresident)k 0inputs+0outputs (0major+66minor)pagefaults 0swaps if I instead just use posix_fallocate() with 32 8kb blocks: 0.01user 1.18system 0:01.19elapsed 99%CPU (0avgtext+0avgdata 1200maxresident)k 0inputs+0outputs (0major+67minor)pagefaults 0swaps obviously fallocate doesn't quite have the same behaviour, and may incur a bit higher overhead for a later write. using a version that instead uses O_DIRECT + async IO, I get (but only when also doing posix_fallocate in larger chunks): 0.05user 5.53system 0:12.53 elapsed 44%CPU So we get considerably higher write throughput, at a considerably lower CPU usage (because DMA replaces the CPU doing a memcpy()). So it looks like extending the file with posix_fallocate() might be a winner, but only if we actually can do so in larger chunks than 8kb at once. Alternatively it could be worthwhile to rejigger things so we don't extend the files with zeroes once, just to then immediately overwrite it with actual content. For some users it's probably possible to pre-generate a page with contents when extending the file (would need fiddling with block numbers etc). 3) We should move the PageInit() that's currently done with the extension lock held, to the outside. Since we get the buffer with RBM_ZERO_AND_LOCK these days, that should be safe. Also, we don't need to zero the entire buffer both in RelationGetBufferForTuple()'s PageInit(), and in ReadBuffer_common() before calling smgrextend(). Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20200211042229.msv23badgqljrdg2%40alap3.anarazel.de
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Tue, 11 Feb 2020 at 11:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 5, 2020 at 12:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Unfortunately the environment I used for performance verification is > > no longer available. > > > > I agree to run this test in a different environment. I've attached the > > rebased version patch. I'm measuring the performance with/without > > patch, so will share the results. > > > > Did you get a chance to run these tests? Lately, Mahendra has done a > lot of performance testing of this patch and shared his results. I > don't see much downside with the patch, rather there is a performance > increase of 3-9% in various scenarios. I've done performance tests on my laptop while changing the number of partitions. 4 clients concurrently insert 32 tuples to randomly selected partitions in a transaction. Therefore by changing the number of partition the contention of relation extension lock would also be changed. All tables are unlogged tables and N_RELEXTLOCK_ENTS is 1024. Here is my test results: * HEAD nchilds = 64 tps = 33135 nchilds = 128 tps = 31249 nchilds = 256 tps = 29356 * Patched nchilds = 64 tps = 32057 nchilds = 128 tps = 32426 nchilds = 256 tps = 29483 The performance has been slightly improved by the patch in two cases. I've also attached the shell script I used to test. When I set N_RELEXTLOCK_ENTS to 1 so that all relation locks conflicts the result is: nchilds = 64 tps = 30887 nchilds = 128 tps = 30015 nchilds = 256 tps = 27837 Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Wed, Feb 12, 2020 at 7:36 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 12 Feb 2020 at 00:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > I took a brief look through this patch. I agree with the fundamental > > idea that we shouldn't need to use the heavyweight lock manager for > > relation extension, since deadlock is not a concern and no backend > > should ever need to hold more than one such lock at once. But it feels > > to me like this particular solution is rather seriously overengineered. > > I would like to suggest that we do something similar to Robert Haas' > > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > > that is, > > > > * Create some predetermined number N of LWLocks for relation extension. > > My original proposal used LWLocks and hash tables for relation > extension but there was a discussion that using LWLocks is not good > because it's not interruptible[1]. Because of this reason and that we > don't need to have two lock level (shared, exclusive) for relation > extension lock we ended up with implementing dedicated lock manager > for extension lock. I think we will have that problem if we use LWLocks. > Hmm, but we use LWLocks for (a) WALWrite/Flush (see the usage of WALWriteLock), (b) writing the shared buffer contents (see io_in_progress lock and its usage in FlushBuffer) and might be for few other similar stuff. Many times those take more time than extending a block in relation especially when we combine the WAL write for multiple commits. So, if this is a problem for relation extension lock, then the same thing holds true there also. Now, there are cases like when we extend the relation with multiple blocks, finding victim buffer under this lock, etc. where this can be also equally or more costly, but I think we can improve some of those cases (some of this is even pointed by Andres in his email) if we agree on a fundamental idea of using LWLocks as proposed by Tom. I am not telling that we implement Tom's idea without weighing its pros and cons, but it has an appeal due to its simplicity. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Wed, Feb 12, 2020 at 10:24 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-02-11 08:01:34 +0530, Amit Kapila wrote: > > I don't see much downside with the patch, rather there is a > > performance increase of 3-9% in various scenarios. > > As I wrote in [1] I started to look at this patch. My problem with itis > that it just seems like the wrong direction architecturally to > me. There's two main aspects to this: > > 1) It basically builds a another, more lightweight but less capable, of > a lock manager that can lock more objects than we can have distinct > locks for. It is faster because it uses *one* hashtable without > conflict handling, because it has fewer lock modes, and because it > doesn't support detecting deadlocks. And probably some other things. > > 2) A lot of the contention around file extension comes from us doing > multiple expensive things under one lock (determining current > relation size, searching victim buffer, extending file), and in tiny > increments (growing a 1TB table by 8kb). This patch doesn't address > that at all. > It seems to me both the two points try to address the performance angle of the patch, but here our actual intention was to make this lock block among parallel workers so that we can implement/improve some of the parallel writes operations (like parallelly vacuuming the heap or index, parallel bulk load, etc.). Both independently are worth accomplishing, but not w.r.t parallel writes. Here, we were doing some benchmarking to see if we haven't regressed performance in any cases. > I've focused on 1) in the email referenced above ([1]). Here I'll focus > on 2). > > > > Which, I think, pretty clearly shows a few things: > I agree with all your below observations. > 1) It's crucial to move acquiring a victim buffer to the outside of the > extension lock, as for copy acquiring the victim buffer will commonly > cause a buffer having to be written out, due to the ringbuffer. This > is even more crucial when using a logged table, as the writeout then > also will often also trigger a WAL flush. > > While doing so will sometimes add a round of acquiring the buffer > mapping locks, having to do the FlushBuffer while holding the > extension lock is a huge problem. > > This'd also move a good bit of the cost of finding (i.e. clock sweep > / ringbuffer replacement) and invalidating the old buffer mapping out > of the lock. > I think this mostly because of the way currently code is arranged to extend a block via ReadBuffer* API. IIUC, currently the main operations under relation extension lock are as follows: a. get the block number for extension via smgrnblocks. b. find victim buffer c. associate buffer with the block no. found in step-a. d. initialize the block with zeros e. write the block f. PageInit I think if we can rearrange such that steps b and c can be done after e or f, then we don't need to hold the extension lock to find the victim buffer. > 2) We need to make the smgrwrite more efficient, it is costing a lot of > time. A small additional experiment shows the cost of doing 8kb > writes: > > I wrote a small program that just iteratively writes a 32GB file: > .. > > > So it looks like extending the file with posix_fallocate() might be a > winner, but only if we actually can do so in larger chunks than 8kb > at once. > A good experiment and sounds like worth doing. > > > 3) We should move the PageInit() that's currently done with the > extension lock held, to the outside. Since we get the buffer with > RBM_ZERO_AND_LOCK these days, that should be safe. Also, we don't > need to zero the entire buffer both in RelationGetBufferForTuple()'s > PageInit(), and in ReadBuffer_common() before calling smgrextend(). > Agreed. I feel all three are independent improvements and can be done separately. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Wed, Feb 12, 2020 at 7:36 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: >> On Wed, 12 Feb 2020 at 00:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I would like to suggest that we do something similar to Robert Haas' >>> excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, >> My original proposal used LWLocks and hash tables for relation >> extension but there was a discussion that using LWLocks is not good >> because it's not interruptible[1]. > Hmm, but we use LWLocks for (a) WALWrite/Flush (see the usage of > WALWriteLock), (b) writing the shared buffer contents (see > io_in_progress lock and its usage in FlushBuffer) and might be for few > other similar stuff. Many times those take more time than extending a > block in relation especially when we combine the WAL write for > multiple commits. So, if this is a problem for relation extension > lock, then the same thing holds true there also. Yeah. I would say a couple more things: * I see no reason to think that a relation extension lock would ever be held long enough for noninterruptibility to be a real issue. Our expectations for query cancel response time are in the tens to hundreds of msec anyway. * There are other places where an LWLock can be held for a *long* time, notably the CheckpointLock. If we do think this is an issue, we could devise a way to not insist on noninterruptibility. The easiest fix is just to do a matching RESUME_INTERRUPTS after getting the lock and HOLD_INTERRUPTS again before releasing it; though maybe it'd be worth offering some slightly cleaner way. Point here is that LWLockAcquire only does that because it's useful to the majority of callers, not because it's graven in stone that it must be like that. In general, if we think there are issues with LWLock, it seems to me we'd be better off to try to fix them, not to invent a whole new single-purpose lock manager that we'll have to debug and maintain. I do not see anything about this problem that suggests that that would provide a major win. As Andres has noted, there are lots of other aspects of it that are likely to be more useful to spend effort on. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Wed, Feb 12, 2020 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Wed, Feb 12, 2020 at 7:36 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > >> On Wed, 12 Feb 2020 at 00:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> I would like to suggest that we do something similar to Robert Haas' > >>> excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > > >> My original proposal used LWLocks and hash tables for relation > >> extension but there was a discussion that using LWLocks is not good > >> because it's not interruptible[1]. > > > Hmm, but we use LWLocks for (a) WALWrite/Flush (see the usage of > > WALWriteLock), (b) writing the shared buffer contents (see > > io_in_progress lock and its usage in FlushBuffer) and might be for few > > other similar stuff. Many times those take more time than extending a > > block in relation especially when we combine the WAL write for > > multiple commits. So, if this is a problem for relation extension > > lock, then the same thing holds true there also. > > Yeah. I would say a couple more things: > > * I see no reason to think that a relation extension lock would ever > be held long enough for noninterruptibility to be a real issue. Our > expectations for query cancel response time are in the tens to > hundreds of msec anyway. > > * There are other places where an LWLock can be held for a *long* time, > notably the CheckpointLock. If we do think this is an issue, we could > devise a way to not insist on noninterruptibility. The easiest fix > is just to do a matching RESUME_INTERRUPTS after getting the lock and > HOLD_INTERRUPTS again before releasing it; though maybe it'd be worth > offering some slightly cleaner way. > Yeah, this sounds like the better answer for noninterruptibility aspect of this design. One idea that occurred to me was to pass a parameter to LWLOCK acquire/release APIs to indicate whether to hold/resume interrupts, but I don't know if that is any better than doing it at the required place. I am not sure if all places are careful whether they really want to hold interrupts, so if we provide a new parameter at least new users of API will think about it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Feb 11, 2020 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I took a brief look through this patch. I agree with the fundamental > idea that we shouldn't need to use the heavyweight lock manager for > relation extension, since deadlock is not a concern and no backend > should ever need to hold more than one such lock at once. But it feels > to me like this particular solution is rather seriously overengineered. > I would like to suggest that we do something similar to Robert Haas' > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > that is, > > * Create some predetermined number N of LWLocks for relation extension. > * When we want to extend some relation R, choose one of those locks > (say, R's relfilenode number mod N) and lock it. > I am imagining something on the lines of BufferIOLWLockArray (here it will be RelExtLWLockArray). The size (N) could MaxBackends or some percentage of it (depending on testing) and indexing into an array could be as suggested (R's relfilenode number mod N). We need to initialize this during shared memory initialization. Then, to extend the relation with multiple blocks at-a-time (as we do in RelationAddExtraBlocks), we can either use the already proven technique of group clear xid mechanism (see ProcArrayGroupClearXid) or have an additional state in the RelExtLWLockArray which will keep the count of waiters (as done in latest patch of Sawada-san [1]). We might want to experiment with both approaches and see which yields better results. [1] - https://www.postgresql.org/message-id/CAD21AoADkWhkLEB_%3DkjLZeZ_ML9_hSQqNBWz%2Bd821QHf%3DO9LJQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Thu, Feb 13, 2020 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 11, 2020 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > I took a brief look through this patch. I agree with the fundamental > > idea that we shouldn't need to use the heavyweight lock manager for > > relation extension, since deadlock is not a concern and no backend > > should ever need to hold more than one such lock at once. But it feels > > to me like this particular solution is rather seriously overengineered. > > I would like to suggest that we do something similar to Robert Haas' > > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > > that is, > > > > * Create some predetermined number N of LWLocks for relation extension. > > * When we want to extend some relation R, choose one of those locks > > (say, R's relfilenode number mod N) and lock it. > > > > I am imagining something on the lines of BufferIOLWLockArray (here it > will be RelExtLWLockArray). The size (N) could MaxBackends or some > percentage of it (depending on testing) and indexing into an array > could be as suggested (R's relfilenode number mod N). We need to > initialize this during shared memory initialization. Then, to extend > the relation with multiple blocks at-a-time (as we do in > RelationAddExtraBlocks), we can either use the already proven > technique of group clear xid mechanism (see ProcArrayGroupClearXid) or > have an additional state in the RelExtLWLockArray which will keep the > count of waiters (as done in latest patch of Sawada-san [1]). We > might want to experiment with both approaches and see which yields > better results. IMHO, in this case, there is no point in using the "group clear" type of mechanism mainly for two reasons 1) It will unnecessarily make PGPROC structure heavy. 2) For our case, we don't need any specific pieces of information from other waiters, we just need the count. Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Thu, 13 Feb 2020 at 09:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 11, 2020 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > I took a brief look through this patch. I agree with the fundamental > > idea that we shouldn't need to use the heavyweight lock manager for > > relation extension, since deadlock is not a concern and no backend > > should ever need to hold more than one such lock at once. But it feels > > to me like this particular solution is rather seriously overengineered. > > I would like to suggest that we do something similar to Robert Haas' > > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > > that is, > > > > * Create some predetermined number N of LWLocks for relation extension. > > * When we want to extend some relation R, choose one of those locks > > (say, R's relfilenode number mod N) and lock it. > > > > I am imagining something on the lines of BufferIOLWLockArray (here it > will be RelExtLWLockArray). The size (N) could MaxBackends or some > percentage of it (depending on testing) and indexing into an array > could be as suggested (R's relfilenode number mod N). We need to > initialize this during shared memory initialization. Then, to extend > the relation with multiple blocks at-a-time (as we do in > RelationAddExtraBlocks), we can either use the already proven > technique of group clear xid mechanism (see ProcArrayGroupClearXid) or > have an additional state in the RelExtLWLockArray which will keep the > count of waiters (as done in latest patch of Sawada-san [1]). We > might want to experiment with both approaches and see which yields > better results. Thanks all for the suggestions. I have started working on the implementation based on the suggestion. I will post a patch for this in few days. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Thu, 13 Feb 2020 at 13:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 11, 2020 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > I took a brief look through this patch. I agree with the fundamental > > idea that we shouldn't need to use the heavyweight lock manager for > > relation extension, since deadlock is not a concern and no backend > > should ever need to hold more than one such lock at once. But it feels > > to me like this particular solution is rather seriously overengineered. > > I would like to suggest that we do something similar to Robert Haas' > > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > > that is, > > > > * Create some predetermined number N of LWLocks for relation extension. > > * When we want to extend some relation R, choose one of those locks > > (say, R's relfilenode number mod N) and lock it. > > > > I am imagining something on the lines of BufferIOLWLockArray (here it > will be RelExtLWLockArray). The size (N) could MaxBackends or some > percentage of it (depending on testing) and indexing into an array > could be as suggested (R's relfilenode number mod N). I'm not sure it's good that the contention of LWLock slot depends on MaxBackends. Because it means that the more MaxBackends is larger, the less the LWLock slot conflicts, even if the same number of backends actually connecting. Normally we don't want to increase unnecessarily MaxBackends for security reasons. In the current patch we defined a fixed length of array for extension lock but I agree that we need to determine what approach is the best depending on testing. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, Feb 14, 2020 at 11:42 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 13 Feb 2020 at 13:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Feb 11, 2020 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > I took a brief look through this patch. I agree with the fundamental > > > idea that we shouldn't need to use the heavyweight lock manager for > > > relation extension, since deadlock is not a concern and no backend > > > should ever need to hold more than one such lock at once. But it feels > > > to me like this particular solution is rather seriously overengineered. > > > I would like to suggest that we do something similar to Robert Haas' > > > excellent hack (daa7527af) for the !HAVE_SPINLOCK case in lmgr/spin.c, > > > that is, > > > > > > * Create some predetermined number N of LWLocks for relation extension. > > > * When we want to extend some relation R, choose one of those locks > > > (say, R's relfilenode number mod N) and lock it. > > > > > > > I am imagining something on the lines of BufferIOLWLockArray (here it > > will be RelExtLWLockArray). The size (N) could MaxBackends or some > > percentage of it (depending on testing) and indexing into an array > > could be as suggested (R's relfilenode number mod N). > > I'm not sure it's good that the contention of LWLock slot depends on > MaxBackends. Because it means that the more MaxBackends is larger, the > less the LWLock slot conflicts, even if the same number of backends > actually connecting. Normally we don't want to increase unnecessarily > MaxBackends for security reasons. In the current patch we defined a > fixed length of array for extension lock but I agree that we need to > determine what approach is the best depending on testing. > I think MaxBackends will generally limit the number of different relations that can simultaneously extend, but maybe tables with many partitions might change the situation. You are right that some tests might suggest a good number, let Mahendra write a patch and then we can test it. Do you have any better idea? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > I think MaxBackends will generally limit the number of different > relations that can simultaneously extend, but maybe tables with many > partitions might change the situation. You are right that some tests > might suggest a good number, let Mahendra write a patch and then we > can test it. Do you have any better idea? In the first place, there certainly isn't more than one extension happening at a time per backend, else the entire premise of this thread is wrong. Handwaving about partitions won't change that. In the second place, it's ludicrous to expect that the underlying platform/filesystem can support an infinite number of concurrent file-extension operations. At some level (e.g. where disk blocks are handed out, or where a record of the operation is written to a filesystem journal) it's quite likely that things are bottlenecked down to *one* such operation at a time per filesystem. So I'm not that concerned about occasional false-sharing limiting our ability to issue concurrent requests. There are probably worse restrictions at lower levels. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Wed, Feb 12, 2020 at 11:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah. I would say a couple more things: > > * I see no reason to think that a relation extension lock would ever > be held long enough for noninterruptibility to be a real issue. Our > expectations for query cancel response time are in the tens to > hundreds of msec anyway. I don't agree, because (1) the time to perform a relation extension on a busy system can be far longer than that and (2) if the disk is failing, then it can be *really* long, or indefinite. > * There are other places where an LWLock can be held for a *long* time, > notably the CheckpointLock. If we do think this is an issue, we could > devise a way to not insist on noninterruptibility. The easiest fix > is just to do a matching RESUME_INTERRUPTS after getting the lock and > HOLD_INTERRUPTS again before releasing it; though maybe it'd be worth > offering some slightly cleaner way. Point here is that LWLockAcquire > only does that because it's useful to the majority of callers, not > because it's graven in stone that it must be like that. That's an interesting idea, but it doesn't make the lock acquisition itself interruptible, which seems pretty important to me in this case. I wonder if we could have an LWLockAcquireInterruptibly() or some such that allows the lock acquisition itself to be interruptible. I think that would require some rejiggering but it might be doable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 12, 2020 at 11:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> * I see no reason to think that a relation extension lock would ever >> be held long enough for noninterruptibility to be a real issue. Our >> expectations for query cancel response time are in the tens to >> hundreds of msec anyway. > I don't agree, because (1) the time to perform a relation extension on > a busy system can be far longer than that and (2) if the disk is > failing, then it can be *really* long, or indefinite. I remain unconvinced ... wouldn't both of those claims apply to any disk I/O request? Are we going to try to ensure that no I/O ever happens while holding an LWLock, and if so how? (Again, CheckpointLock is a counterexample, which has been that way for decades without reported problems. But actually I think buffer I/O locks are an even more direct counterexample.) >> * There are other places where an LWLock can be held for a *long* time, >> notably the CheckpointLock. If we do think this is an issue, we could >> devise a way to not insist on noninterruptibility. The easiest fix >> is just to do a matching RESUME_INTERRUPTS after getting the lock and >> HOLD_INTERRUPTS again before releasing it; though maybe it'd be worth >> offering some slightly cleaner way. Point here is that LWLockAcquire >> only does that because it's useful to the majority of callers, not >> because it's graven in stone that it must be like that. > That's an interesting idea, but it doesn't make the lock acquisition > itself interruptible, which seems pretty important to me in this case. Good point: if you think the contained operation might run too long to suit you, then you don't want other backends to be stuck behind it for the same amount of time. > I wonder if we could have an LWLockAcquireInterruptibly() or some such > that allows the lock acquisition itself to be interruptible. I think > that would require some rejiggering but it might be doable. Yeah, I had the impression from a brief look at LWLockAcquire that it was itself depending on not throwing errors partway through. But with careful and perhaps-a-shade-slower coding, we could probably make a version that didn't require that. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Feb 14, 2020 at 10:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I remain unconvinced ... wouldn't both of those claims apply to any disk > I/O request? Are we going to try to ensure that no I/O ever happens > while holding an LWLock, and if so how? (Again, CheckpointLock is a > counterexample, which has been that way for decades without reported > problems. But actually I think buffer I/O locks are an even more > direct counterexample.) Yes, that's a problem. I proposed a patch a few years ago that replaced the buffer I/O locks with condition variables, and I think that's a good idea for lots of reasons, including this one. I never quite got around to pushing that through to commit, but I think we should do that. Aside from fixing this problem, it also prevents certain scenarios where we can currently busy-loop. I do realize that we're unlikely to ever solve this problem completely, but I don't think that should discourage us from making incremental progress. Just as debuggability is a sticking point for you, what I'm going to call operate-ability is a sticking point for me. My work here at EnterpriseDB exposes me on a fairly regular basis to real broken systems, and I'm therefore really sensitive to the concerns that people have when trying to recover after a system has become, for one reason or another, really broken. Interruptibility may not be the #1 concern in that area, but it's very high on the list. EnterpriseDB customers, as a rule, *really* hate being told to restart the database because one session is stuck. It causes a lot of disruption for them and the person who does the restart gets yelled at by their boss, and maybe their bosses boss and the boss above that. It means that their whole application, which may be mission-critical, is down until the database finishes restarting, and that is not always a quick process, especially after an immediate shutdown. I don't think we can ever make everything that can get stuck interruptible, but the more we can do the better. The work you and others have done over the years to add CHECK_FOR_INTERRUPTS() to more places pays real dividends. Making sessions that are blocked on disk I/O interruptible in at least some of the more common cases would be a huge win. Other people may well have different experiences, but my experience is that the disk deciding to conk out for a while or just respond very very slowly is a very common problem even (and sometimes especially) on very expensive hardware. Obviously that's not great and you're in lots of trouble, but being able to hit ^C and get control back significantly improves your chances of being able to understand what has happened and recover from it. > > That's an interesting idea, but it doesn't make the lock acquisition > > itself interruptible, which seems pretty important to me in this case. > > Good point: if you think the contained operation might run too long to > suit you, then you don't want other backends to be stuck behind it for > the same amount of time. Right. > > I wonder if we could have an LWLockAcquireInterruptibly() or some such > > that allows the lock acquisition itself to be interruptible. I think > > that would require some rejiggering but it might be doable. > > Yeah, I had the impression from a brief look at LWLockAcquire that > it was itself depending on not throwing errors partway through. > But with careful and perhaps-a-shade-slower coding, we could probably > make a version that didn't require that. Yeah, that was my thought, too, but I didn't study it that carefully, so somebody would need to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2020-02-12 11:53:49 -0500, Tom Lane wrote: > In general, if we think there are issues with LWLock, it seems to me > we'd be better off to try to fix them, not to invent a whole new > single-purpose lock manager that we'll have to debug and maintain. My impression is that what's being discussed here is doing exactly that, except with s/lwlock/heavyweight locks/. We're basically replacing the lock.c lock mapping table with an ad-hoc implementation, and now we're also reinventing interruptability etc. I still find the performance arguments pretty ludicruous, to be honest - I think the numbers I posted about how much time we spend with the locks held, back that up. I have a bit more understanding for the parallel worker arguments, but only a bit: I think if we develop a custom solution for the extension lock, we're just going to end up having to develop another custom solution for a bunch of other types of locks. It seems quite likely that we'll end up also wanting TUPLE and also SPECULATIVE and PAGE type locks that we don't want to share between leader & workers. IMO the right thing here is to extend lock.c so we can better represent whether certain types of lockmethods (& levels ?) are [not] to be shared. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2020-02-14 09:42:40 -0500, Tom Lane wrote: > In the second place, it's ludicrous to expect that the underlying > platform/filesystem can support an infinite number of concurrent > file-extension operations. At some level (e.g. where disk blocks > are handed out, or where a record of the operation is written to > a filesystem journal) it's quite likely that things are bottlenecked > down to *one* such operation at a time per filesystem. That's probably true to some degree from a theoretical POV, but I think it's so far from where we are at, that it's effectively wrong. I can concurrently extend a few files at close to 10GB/s on a set of fast devices below a *single* filesystem. Whereas postgres bottlenecks far far before this. Given that a lot of today's storage has latencies in the 10-100s of microseconds, a journal flush doesn't necessarily cause that much serialization - and OS journals do group commit like things too. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Feb 14, 2020 at 11:40 AM Andres Freund <andres@anarazel.de> wrote: > IMO the right thing here is to extend lock.c so we can better represent > whether certain types of lockmethods (& levels ?) are [not] to be > shared. The part that I find awkward about that is the whole thing with the deadlock detector. The deadlock detection code is old, crufty, complex, and very difficult to test (or at least I have found it so). A bug that I introduced when inventing group locking took like 5 years for somebody to find. One way of looking at the requirement that we have here is that certain kinds of locks need to be exempted from group locking. Basically, these are because they are a lower-level concept: a lock on a relation is more of a "logical" concept, and you hold the lock until eoxact, whereas a lock on an extend the relation is more of a "physical" concept, and you give it up as soon as you are done. Page locks are like relation extension locks in this regard. Unlike locks on SQL-level objects, these should not be shared between members of a lock group. Now, if it weren't for the deadlock detector, that would be easy enough. But figuring out what to do with the deadlock detector seems really painful to me. I wonder if there's some way we can make an end run around that problem. For instance, if we could make (and enforce) a coding rule that you cannot acquire a heavyweight lock while holding a relation extension or page lock, then maybe we could somehow teach the deadlock detector to just ignore those kinds of locks, and teach the lock acquisition machinery that they conflict between lock group members. On the other hand, I think you might also be understating the differences between these kinds of locks and other heavyweight locks. I suspect that the reason why we use lwlocks for buffers and heavyweight locks here is because there are a conceptually infinite number of relations, and lwlocks can't handle that. The only mechanism we currently have that does handle that is the heavyweight lock mechanism, and from my point of view, somebody just beat it with a stick to make it fit this application. But the fact that it has been made to fit does not mean that it is really fit for purpose. We use 2 of 9 lock levels, we don't need deadlock detection, we need different behavior when group locking is in use, we release locks right away rather than at eoxact. I don't think it's crazy to think that those differences are significant enough to justify having a separate mechanism, even if the one that is currently on the table is not exactly what we want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2020-02-14 12:08:45 -0500, Robert Haas wrote: > On Fri, Feb 14, 2020 at 11:40 AM Andres Freund <andres@anarazel.de> wrote: > > IMO the right thing here is to extend lock.c so we can better represent > > whether certain types of lockmethods (& levels ?) are [not] to be > > shared. > > The part that I find awkward about that is the whole thing with the > deadlock detector. The deadlock detection code is old, crufty, > complex, and very difficult to test (or at least I have found it so). > A bug that I introduced when inventing group locking took like 5 years > for somebody to find. Oh, I agree, lock.c and surrounding code is pretty crufty. Doubtful that just building up a largely parallel piece of infrastructure next to it is a good answer though. > One way of looking at the requirement that we have here is that > certain kinds of locks need to be exempted from group locking. > Basically, these are because they are a lower-level concept: a lock on > a relation is more of a "logical" concept, and you hold the lock until > eoxact, whereas a lock on an extend the relation is more of a > "physical" concept, and you give it up as soon as you are done. Page > locks are like relation extension locks in this regard. Unlike locks > on SQL-level objects, these should not be shared between members of a > lock group. > > Now, if it weren't for the deadlock detector, that would be easy > enough. But figuring out what to do with the deadlock detector seems > really painful to me. I wonder if there's some way we can make an end > run around that problem. For instance, if we could make (and enforce) > a coding rule that you cannot acquire a heavyweight lock while holding > a relation extension or page lock, then maybe we could somehow teach > the deadlock detector to just ignore those kinds of locks, and teach > the lock acquisition machinery that they conflict between lock group > members. Yea, that seems possible. I'm not really sure it's needed however? As long as you're not teaching the locking mechanism new tricks that influence the wait graph, why would the deadlock detector care? That's quite different from the group locking case, where you explicitly needed to teach it something fairly fundamental. It might still be a good idea independently to add the rule & enforce that acquire heavyweight locks while holding certain classes of locks is not allowed. > On the other hand, I think you might also be understating the > differences between these kinds of locks and other heavyweight locks. > I suspect that the reason why we use lwlocks for buffers and > heavyweight locks here is because there are a conceptually infinite > number of relations, and lwlocks can't handle that. Right. For me that's *the* fundamental service that lock.c delivers. And it's the fundamental bit this thread so far largely has been focusing on. > The only mechanism we currently have that does handle that is the > heavyweight lock mechanism, and from my point of view, somebody just > beat it with a stick to make it fit this application. But the fact > that it has been made to fit does not mean that it is really fit for > purpose. We use 2 of 9 lock levels, we don't need deadlock detection, > we need different behavior when group locking is in use, we release > locks right away rather than at eoxact. I don't think it's crazy to > think that those differences are significant enough to justify having > a separate mechanism, even if the one that is currently on the table > is not exactly what we want. Isn't that mostly true to varying degrees for the majority of lock types in lock.c? Sure, perhaps historically that's a misuse of lock.c, but it's been pretty ingrained by now. I just don't see where leaving out any of these features is going to give us fundamental advantages justifying a different locking infrastructure. E.g. not needing to support "conceptually infinite" number of relations IMO does provide a fundamental advantage - no need for a mapping. I'm not yet seeing anything equivalent for the extension vs. lock.c style lock case. Greetings, Andres Freund
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Feb 14, 2020 at 1:07 PM Andres Freund <andres@anarazel.de> wrote: > Yea, that seems possible. I'm not really sure it's needed however? As > long as you're not teaching the locking mechanism new tricks that > influence the wait graph, why would the deadlock detector care? That's > quite different from the group locking case, where you explicitly needed > to teach it something fairly fundamental. Well, you have to teach it that locks of certain types conflict even if they are in the same group, and that bleeds over pretty quickly into the whole area of deadlock detection, because lock waits are the edges in the graph that the deadlock detector processes. > It might still be a good idea independently to add the rule & enforce > that acquire heavyweight locks while holding certain classes of locks is > not allowed. I think that's absolutely essential, if we're going to continue using the main lock manager for this. I remain somewhat unconvinced that doing so is the best way forward, but it is *a* way forward. > Right. For me that's *the* fundamental service that lock.c delivers. And > it's the fundamental bit this thread so far largely has been focusing > on. For me, the deadlock detection is the far more complicated and problematic bit. > Isn't that mostly true to varying degrees for the majority of lock types > in lock.c? Sure, perhaps historically that's a misuse of lock.c, but > it's been pretty ingrained by now. I just don't see where leaving out > any of these features is going to give us fundamental advantages > justifying a different locking infrastructure. I think the group locking + deadlock detection things are more fundamental than you might be crediting, but I agree that having parallel mechanisms has its own set of pitfalls. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, Feb 14, 2020 at 8:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > I think MaxBackends will generally limit the number of different > > relations that can simultaneously extend, but maybe tables with many > > partitions might change the situation. You are right that some tests > > might suggest a good number, let Mahendra write a patch and then we > > can test it. Do you have any better idea? > > In the first place, there certainly isn't more than one extension > happening at a time per backend, else the entire premise of this > thread is wrong. Handwaving about partitions won't change that. > Having more number of partitions theoretically increases the chances of false-sharing with the same number of concurrent sessions. For ex. two sessions operating on two relations vs. two sessions working on two relations with 100 partitions each would increase the chances of false-sharing. Sawada-San and Mahendra have done many tests on different systems and some monitoring with the previous patch that with a decent number of fixed slots (1024), the false-sharing was very less and even if it was there the effect was close to nothing. So, in short, this is not the point to worry about, but to ensure that we don't create any significant regressions in this area. > In the second place, it's ludicrous to expect that the underlying > platform/filesystem can support an infinite number of concurrent > file-extension operations. At some level (e.g. where disk blocks > are handed out, or where a record of the operation is written to > a filesystem journal) it's quite likely that things are bottlenecked > down to *one* such operation at a time per filesystem. So I'm not > that concerned about occasional false-sharing limiting our ability > to issue concurrent requests. There are probably worse restrictions > at lower levels. > Agreed and what we have observed during the tests is what you have said in this paragraph. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, Feb 14, 2020 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Feb 12, 2020 at 11:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > That's an interesting idea, but it doesn't make the lock acquisition > > itself interruptible, which seems pretty important to me in this case. > > Good point: if you think the contained operation might run too long to > suit you, then you don't want other backends to be stuck behind it for > the same amount of time. > It is not clear to me why we should add that as a requirement for this patch when other places like WALWriteLock, etc. have similar coding patterns and we haven't heard a ton of complaints about making it interruptable or if there are then I am not aware. > > I wonder if we could have an LWLockAcquireInterruptibly() or some such > > that allows the lock acquisition itself to be interruptible. I think > > that would require some rejiggering but it might be doable. > > Yeah, I had the impression from a brief look at LWLockAcquire that > it was itself depending on not throwing errors partway through. > But with careful and perhaps-a-shade-slower coding, we could probably > make a version that didn't require that. > If this becomes a requirement to move this patch, then surely we can do that. BTW, what exactly we need to ensure for it? Is it something on the lines of ensuring that in error path the state of the lock is cleared? Are we worried that interrupt handler might do something which will change the state of lock we are acquiring? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2020-02-14 13:34:03 -0500, Robert Haas wrote: > On Fri, Feb 14, 2020 at 1:07 PM Andres Freund <andres@anarazel.de> wrote: > > Yea, that seems possible. I'm not really sure it's needed however? As > > long as you're not teaching the locking mechanism new tricks that > > influence the wait graph, why would the deadlock detector care? That's > > quite different from the group locking case, where you explicitly needed > > to teach it something fairly fundamental. > > Well, you have to teach it that locks of certain types conflict even > if they are in the same group, and that bleeds over pretty quickly > into the whole area of deadlock detection, because lock waits are the > edges in the graph that the deadlock detector processes. Shouldn't this *theretically* be doable with changes mostly localized to lock.c, by not using proc->lockGroupLeader but proc for lock types that don't support group locking? I do see that deadlock.c largely looks at ->lockGroupLeader, but that kind of doesn't seem right to me. > > It might still be a good idea independently to add the rule & enforce > > that acquire heavyweight locks while holding certain classes of locks is > > not allowed. > > I think that's absolutely essential, if we're going to continue using > the main lock manager for this. I remain somewhat unconvinced that > doing so is the best way forward, but it is *a* way forward. Seems like we should build this part independently of the lock.c/new infra piece. > > Right. For me that's *the* fundamental service that lock.c delivers. And > > it's the fundamental bit this thread so far largely has been focusing > > on. > > For me, the deadlock detection is the far more complicated and problematic bit. > > > Isn't that mostly true to varying degrees for the majority of lock types > > in lock.c? Sure, perhaps historically that's a misuse of lock.c, but > > it's been pretty ingrained by now. I just don't see where leaving out > > any of these features is going to give us fundamental advantages > > justifying a different locking infrastructure. > > I think the group locking + deadlock detection things are more > fundamental than you might be crediting, but I agree that having > parallel mechanisms has its own set of pitfalls. It's possible. But I'm also hesitant to believe that we'll not need other lock types that conflict between leader/worker, but that still need deadlock detection. The more work we want to parallelize, the more likely that imo will become. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-02-14 13:34:03 -0500, Robert Haas wrote: >> I think the group locking + deadlock detection things are more >> fundamental than you might be crediting, but I agree that having >> parallel mechanisms has its own set of pitfalls. > It's possible. But I'm also hesitant to believe that we'll not need > other lock types that conflict between leader/worker, but that still > need deadlock detection. The more work we want to parallelize, the more > likely that imo will become. Yeah. The concept that leader and workers can't conflict seems to me to be dependent, in a very fundamental way, on the assumption that we only need to parallelize read-only workloads. I don't think that's going to have a long half-life. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Mon, Feb 17, 2020 at 2:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > On 2020-02-14 13:34:03 -0500, Robert Haas wrote: > >> I think the group locking + deadlock detection things are more > >> fundamental than you might be crediting, but I agree that having > >> parallel mechanisms has its own set of pitfalls. > > > It's possible. But I'm also hesitant to believe that we'll not need > > other lock types that conflict between leader/worker, but that still > > need deadlock detection. The more work we want to parallelize, the more > > likely that imo will become. > > Yeah. The concept that leader and workers can't conflict seems to me > to be dependent, in a very fundamental way, on the assumption that > we only need to parallelize read-only workloads. I don't think that's > going to have a long half-life. > Surely, someday, we need to solve that problem. But it is not clear when because if we see the operations for which we want to solve the relation extension lock problem doesn't require that. For example, for a parallel copy or further improving parallel vacuum to allow multiple workers to scan and process the heap and individual index, we don't need to change anything in group locking as far as I understand. Now, for parallel deletes/updates, I think it will depend on how we choose to parallelize those operations. I mean if we decide that each worker will work on an independent set of pages like we do for a sequential scan, we again might not need to change the group locking unless I am missing something which is possible. I think till we know the real need for changing group locking, going in the direction of what Tom suggested to use an array of LWLocks [1] to address the problems in hand is a good idea. It is not very clear to me that are we thinking to give up on Tom's idea [1] and change group locking even though it is not clear or at least nobody has proposed an idea/patch which requires that? Or are we thinking that we can do what Tom suggested for relation extension lock and also plan to change group locking for future parallel operations that might require it? [1] - https://www.postgresql.org/message-id/19443.1581435793%40sss.pgh.pa.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager
From
Andres Freund
Date:
Hi, On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > I think till we know the real need for changing group locking, going > in the direction of what Tom suggested to use an array of LWLocks [1] > to address the problems in hand is a good idea. -many I think that building yet another locking subsystem is the entirely wrong idea - especially when there's imo no convincing architectural reasons to do so. > It is not very clear to me that are we thinking to give up on Tom's > idea [1] and change group locking even though it is not clear or at > least nobody has proposed an idea/patch which requires that? Or are > we thinking that we can do what Tom suggested for relation extension > lock and also plan to change group locking for future parallel > operations that might require it? What I'm advocating is that extension locks should continue to go through lock.c. And yes, that requires some changes to group locking, but I still don't see why they'd be complicated. And if there's concerns about the cost of lock.c, I outlined a pretty long list of improvements that'll help everyone, and I showed that the locking itself isn't actually a large fraction of the scalability issues that extension has. Regards, Andres
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > I think till we know the real need for changing group locking, going > > in the direction of what Tom suggested to use an array of LWLocks [1] > > to address the problems in hand is a good idea. > > -many > > I think that building yet another locking subsystem is the entirely > wrong idea - especially when there's imo no convincing architectural > reasons to do so. > Hmm, AFAIU, it will be done by having an array of LWLocks which we do at other places as well (like BufferIO locks). I am not sure if we can call it as new locking subsystem, but if we decide to continue using lock.c and change group locking then I think we can do that as well, see my comments below regarding that. > > > It is not very clear to me that are we thinking to give up on Tom's > > idea [1] and change group locking even though it is not clear or at > > least nobody has proposed an idea/patch which requires that? Or are > > we thinking that we can do what Tom suggested for relation extension > > lock and also plan to change group locking for future parallel > > operations that might require it? > > What I'm advocating is that extension locks should continue to go > through lock.c. And yes, that requires some changes to group locking, > but I still don't see why they'd be complicated. > Fair position, as per initial analysis, I think if we do below three things, it should work out without changing to a new way of locking for relation extension or page type locks. a. As per the discussion above, ensure in code we will never try to acquire another heavy-weight lock after acquiring relation extension or page type locks (probably by having Asserts in code or maybe some other way). b. Change lock.c so that group locking is not considered for these two lock types. For ex. in LockCheckConflicts, along with the check (if (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), we also check lock->tag and call it a conflict for these two locks. c. The deadlock detector can ignore checking these two types of locks because point (a) ensures that those won't lead to deadlock. One idea could be that FindLockCycleRecurseMember just ignores these two types of locks by checking the lock tag. It is possible that I might be missing something or we could achieve this some other way as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > > I think till we know the real need for changing group locking, going > > > in the direction of what Tom suggested to use an array of LWLocks [1] > > > to address the problems in hand is a good idea. > > > > -many > > > > I think that building yet another locking subsystem is the entirely > > wrong idea - especially when there's imo no convincing architectural > > reasons to do so. > > > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do > at other places as well (like BufferIO locks). I am not sure if we > can call it as new locking subsystem, but if we decide to continue > using lock.c and change group locking then I think we can do that as > well, see my comments below regarding that. > > > > > > It is not very clear to me that are we thinking to give up on Tom's > > > idea [1] and change group locking even though it is not clear or at > > > least nobody has proposed an idea/patch which requires that? Or are > > > we thinking that we can do what Tom suggested for relation extension > > > lock and also plan to change group locking for future parallel > > > operations that might require it? > > > > What I'm advocating is that extension locks should continue to go > > through lock.c. And yes, that requires some changes to group locking, > > but I still don't see why they'd be complicated. > > > > Fair position, as per initial analysis, I think if we do below three > things, it should work out without changing to a new way of locking > for relation extension or page type locks. > a. As per the discussion above, ensure in code we will never try to > acquire another heavy-weight lock after acquiring relation extension > or page type locks (probably by having Asserts in code or maybe some > other way). > b. Change lock.c so that group locking is not considered for these two > lock types. For ex. in LockCheckConflicts, along with the check (if > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > we also check lock->tag and call it a conflict for these two locks. > c. The deadlock detector can ignore checking these two types of locks > because point (a) ensures that those won't lead to deadlock. One idea > could be that FindLockCycleRecurseMember just ignores these two types > of locks by checking the lock tag. Thanks Amit for summary. Based on above 3 points, here attaching 2 patches for review. 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) Basically this patch is for point b and c. 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch (Patch by me) This patch is for point a. After applying both the patches, make check-world is passing. We are testing both the patches and will post results. Thoughts? -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > > > I think till we know the real need for changing group locking, going > > > > in the direction of what Tom suggested to use an array of LWLocks [1] > > > > to address the problems in hand is a good idea. > > > > > > -many > > > > > > I think that building yet another locking subsystem is the entirely > > > wrong idea - especially when there's imo no convincing architectural > > > reasons to do so. > > > > > > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do > > at other places as well (like BufferIO locks). I am not sure if we > > can call it as new locking subsystem, but if we decide to continue > > using lock.c and change group locking then I think we can do that as > > well, see my comments below regarding that. > > > > > > > > > It is not very clear to me that are we thinking to give up on Tom's > > > > idea [1] and change group locking even though it is not clear or at > > > > least nobody has proposed an idea/patch which requires that? Or are > > > > we thinking that we can do what Tom suggested for relation extension > > > > lock and also plan to change group locking for future parallel > > > > operations that might require it? > > > > > > What I'm advocating is that extension locks should continue to go > > > through lock.c. And yes, that requires some changes to group locking, > > > but I still don't see why they'd be complicated. > > > > > > > Fair position, as per initial analysis, I think if we do below three > > things, it should work out without changing to a new way of locking > > for relation extension or page type locks. > > a. As per the discussion above, ensure in code we will never try to > > acquire another heavy-weight lock after acquiring relation extension > > or page type locks (probably by having Asserts in code or maybe some > > other way). > > b. Change lock.c so that group locking is not considered for these two > > lock types. For ex. in LockCheckConflicts, along with the check (if > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > > we also check lock->tag and call it a conflict for these two locks. > > c. The deadlock detector can ignore checking these two types of locks > > because point (a) ensures that those won't lead to deadlock. One idea > > could be that FindLockCycleRecurseMember just ignores these two types > > of locks by checking the lock tag. > > Thanks Amit for summary. > > Based on above 3 points, here attaching 2 patches for review. > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) > Basically this patch is for point b and c. > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch > (Patch by me) > This patch is for point a. > > After applying both the patches, make check-world is passing. > > We are testing both the patches and will post results. > > Thoughts? +static void AssertAnyExtentionLockHeadByMe(void); +/* + * AssertAnyExtentionLockHeadByMe -- test whether any EXTENSION lock held by + * this backend. If any EXTENSION lock is hold by this backend, then assert + * will fail. To use this function, assert should be enabled. + */ +void AssertAnyExtentionLockHeadByMe() +{ Some minor observations on 0002. 1. static is missing in a function definition. 2. Function name should start in new line after function return type in function definition, as per pg guideline. +void AssertAnyExtentionLockHeadByMe() -> void AssertAnyExtentionLockHeadByMe() -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Wed, 4 Mar 2020 at 12:03, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > Hi, > > > > > > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > > > > I think till we know the real need for changing group locking, going > > > > > in the direction of what Tom suggested to use an array of LWLocks [1] > > > > > to address the problems in hand is a good idea. > > > > > > > > -many > > > > > > > > I think that building yet another locking subsystem is the entirely > > > > wrong idea - especially when there's imo no convincing architectural > > > > reasons to do so. > > > > > > > > > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do > > > at other places as well (like BufferIO locks). I am not sure if we > > > can call it as new locking subsystem, but if we decide to continue > > > using lock.c and change group locking then I think we can do that as > > > well, see my comments below regarding that. > > > > > > > > > > > > It is not very clear to me that are we thinking to give up on Tom's > > > > > idea [1] and change group locking even though it is not clear or at > > > > > least nobody has proposed an idea/patch which requires that? Or are > > > > > we thinking that we can do what Tom suggested for relation extension > > > > > lock and also plan to change group locking for future parallel > > > > > operations that might require it? > > > > > > > > What I'm advocating is that extension locks should continue to go > > > > through lock.c. And yes, that requires some changes to group locking, > > > > but I still don't see why they'd be complicated. > > > > > > > > > > Fair position, as per initial analysis, I think if we do below three > > > things, it should work out without changing to a new way of locking > > > for relation extension or page type locks. > > > a. As per the discussion above, ensure in code we will never try to > > > acquire another heavy-weight lock after acquiring relation extension > > > or page type locks (probably by having Asserts in code or maybe some > > > other way). > > > b. Change lock.c so that group locking is not considered for these two > > > lock types. For ex. in LockCheckConflicts, along with the check (if > > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > > > we also check lock->tag and call it a conflict for these two locks. > > > c. The deadlock detector can ignore checking these two types of locks > > > because point (a) ensures that those won't lead to deadlock. One idea > > > could be that FindLockCycleRecurseMember just ignores these two types > > > of locks by checking the lock tag. > > > > Thanks Amit for summary. > > > > Based on above 3 points, here attaching 2 patches for review. > > > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) > > Basically this patch is for point b and c. > > > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch > > (Patch by me) > > This patch is for point a. > > > > After applying both the patches, make check-world is passing. > > > > We are testing both the patches and will post results. Hi all, I am planing to test below 3 points on v1 patch set: 1. We will check that new added assert can be hit by hacking code (while holding extension lock, try to take any heavyweight lock) 2. In FindLockCycleRecurseMember, for testing purposes, we can put additional loop to check that for all relext holders, there must not be any outer edge. 3. Test that group members are not granted the lock for the relation extension lock (group members should conflict). Please let me know your thoughts to test this patch. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > What I'm advocating is that extension locks should continue to go > > > through lock.c. And yes, that requires some changes to group locking, > > > but I still don't see why they'd be complicated. > > > > > > > Fair position, as per initial analysis, I think if we do below three > > things, it should work out without changing to a new way of locking > > for relation extension or page type locks. > > a. As per the discussion above, ensure in code we will never try to > > acquire another heavy-weight lock after acquiring relation extension > > or page type locks (probably by having Asserts in code or maybe some > > other way). > > b. Change lock.c so that group locking is not considered for these two > > lock types. For ex. in LockCheckConflicts, along with the check (if > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > > we also check lock->tag and call it a conflict for these two locks. > > c. The deadlock detector can ignore checking these two types of locks > > because point (a) ensures that those won't lead to deadlock. One idea > > could be that FindLockCycleRecurseMember just ignores these two types > > of locks by checking the lock tag. > > Thanks Amit for summary. > > Based on above 3 points, here attaching 2 patches for review. > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) > Basically this patch is for point b and c. > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch > (Patch by me) > This patch is for point a. > > After applying both the patches, make check-world is passing. > > We are testing both the patches and will post results. > I think we need to do detailed code review in the places where we are taking Relation Extension Lock and see whether we are acquiring another heavy-weight lock after that. It seems to me that in brin_getinsertbuffer, after acquiring Relation Extension Lock, we might again try to acquire the same lock. See brin_initialize_empty_new_buffer which is called after acquiring Relation Extension Lock, in that function, we call RecordPageWithFreeSpace and that can again try to acquire the same lock if it needs to perform fsm_extend. I think there will be similar instances in the code. I think it is fine if we again try to acquire it, but the current assertion in your patch needs to be adjusted for that. Few other minor comments on v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any: 1. Ideally, this should be the first patch as we first need to ensure that we don't take any heavy-weight locks after acquiring a relation extension lock. 2. I think it is better to add an Assert after initial error checks (after RecoveryInProgress().. check) 3. + Assert (locallock->tag.lock.locktag_type != LOCKTAG_RELATION_EXTEND || + locallock->nLocks == 0); I think it is not possible that we have an entry in LockMethodLocalHash and its value is zero. Do you see any such possibility, if not, then we might want to remove it? 4. We already have a macro for LOCALLOCK_LOCKMETHOD, can we write another one tag type? This will make the check look a bit cleaner and probably if we need to extend it in future for Page type locks, then also it will be good. 5. I have also tried to think of another way to check if we already hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a cheaper way than this. Basically, I think if we traverse the MyProc->myProcLocks queue, we will get this information, but that doesn't seem much cheaper than this. 6. Another thing that could be possible is to make this a test and elog so that it can hit in production scenarios, but I think the cost of that will be high unless we have a very simple way to write this test condition. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > What I'm advocating is that extension locks should continue to go > > > > through lock.c. And yes, that requires some changes to group locking, > > > > but I still don't see why they'd be complicated. > > > > > > > > > > Fair position, as per initial analysis, I think if we do below three > > > things, it should work out without changing to a new way of locking > > > for relation extension or page type locks. > > > a. As per the discussion above, ensure in code we will never try to > > > acquire another heavy-weight lock after acquiring relation extension > > > or page type locks (probably by having Asserts in code or maybe some > > > other way). > > > b. Change lock.c so that group locking is not considered for these two > > > lock types. For ex. in LockCheckConflicts, along with the check (if > > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > > > we also check lock->tag and call it a conflict for these two locks. > > > c. The deadlock detector can ignore checking these two types of locks > > > because point (a) ensures that those won't lead to deadlock. One idea > > > could be that FindLockCycleRecurseMember just ignores these two types > > > of locks by checking the lock tag. > > > > Thanks Amit for summary. > > > > Based on above 3 points, here attaching 2 patches for review. > > > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) > > Basically this patch is for point b and c. > > > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch > > (Patch by me) > > This patch is for point a. > > > > After applying both the patches, make check-world is passing. > > > > We are testing both the patches and will post results. > > > > I think we need to do detailed code review in the places where we are > taking Relation Extension Lock and see whether we are acquiring > another heavy-weight lock after that. It seems to me that in > brin_getinsertbuffer, after acquiring Relation Extension Lock, we > might again try to acquire the same lock. See > brin_initialize_empty_new_buffer which is called after acquiring > Relation Extension Lock, in that function, we call > RecordPageWithFreeSpace and that can again try to acquire the same > lock if it needs to perform fsm_extend. I think there will be similar > instances in the code. I think it is fine if we again try to acquire > it, but the current assertion in your patch needs to be adjusted for > that. > > Few other minor comments on > v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any: > 1. Ideally, this should be the first patch as we first need to ensure > that we don't take any heavy-weight locks after acquiring a relation > extension lock. > > 2. I think it is better to add an Assert after initial error checks > (after RecoveryInProgress().. check) > > 3. > + Assert (locallock->tag.lock.locktag_type != LOCKTAG_RELATION_EXTEND || > + locallock->nLocks == 0); > > I think it is not possible that we have an entry in > LockMethodLocalHash and its value is zero. Do you see any such > possibility, if not, then we might want to remove it? > > 4. We already have a macro for LOCALLOCK_LOCKMETHOD, can we write > another one tag type? This will make the check look a bit cleaner and > probably if we need to extend it in future for Page type locks, then > also it will be good. > > 5. I have also tried to think of another way to check if we already > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a > cheaper way than this. Basically, I think if we traverse the > MyProc->myProcLocks queue, we will get this information, but that > doesn't seem much cheaper than this. I think we can maintain a flag (rel_extlock_held). And, we can set that true in LockRelationForExtension, ConditionalLockRelationForExtension functions and we can reset it in UnlockRelationForExtension or in the error path e.g. LockReleaseAll. I think, this way we will be able to elog and this will be much cheaper. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Thu, 5 Mar 2020 at 13:54, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor > > <mahi6run@gmail.com> wrote: > > > > > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > > > What I'm advocating is that extension locks should continue to go > > > > > through lock.c. And yes, that requires some changes to group locking, > > > > > but I still don't see why they'd be complicated. > > > > > > > > > > > > > Fair position, as per initial analysis, I think if we do below three > > > > things, it should work out without changing to a new way of locking > > > > for relation extension or page type locks. > > > > a. As per the discussion above, ensure in code we will never try to > > > > acquire another heavy-weight lock after acquiring relation extension > > > > or page type locks (probably by having Asserts in code or maybe some > > > > other way). > > > > b. Change lock.c so that group locking is not considered for these two > > > > lock types. For ex. in LockCheckConflicts, along with the check (if > > > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > > > > we also check lock->tag and call it a conflict for these two locks. > > > > c. The deadlock detector can ignore checking these two types of locks > > > > because point (a) ensures that those won't lead to deadlock. One idea > > > > could be that FindLockCycleRecurseMember just ignores these two types > > > > of locks by checking the lock tag. > > > > > > Thanks Amit for summary. > > > > > > Based on above 3 points, here attaching 2 patches for review. > > > > > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) > > > Basically this patch is for point b and c. > > > > > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch > > > (Patch by me) > > > This patch is for point a. > > > > > > After applying both the patches, make check-world is passing. > > > > > > We are testing both the patches and will post results. > > > > > Thanks Amit and Dilip for reviewing the patches. > > I think we need to do detailed code review in the places where we are > > taking Relation Extension Lock and see whether we are acquiring > > another heavy-weight lock after that. It seems to me that in > > brin_getinsertbuffer, after acquiring Relation Extension Lock, we > > might again try to acquire the same lock. See > > brin_initialize_empty_new_buffer which is called after acquiring > > Relation Extension Lock, in that function, we call > > RecordPageWithFreeSpace and that can again try to acquire the same > > lock if it needs to perform fsm_extend. I think there will be similar > > instances in the code. I think it is fine if we again try to acquire > > it, but the current assertion in your patch needs to be adjusted for > > that. I agree with you. Dilip is doing code review and he will post results. As you mentioned that while holing Relation Extension Lock, we might again try to acquire same Relation Extension Lock, so to handle this in assert I did some changes in patch and attaching patch for review. (I will test this scenario) > > > > Few other minor comments on > > v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any: > > 1. Ideally, this should be the first patch as we first need to ensure > > that we don't take any heavy-weight locks after acquiring a relation > > extension lock. Fixed. > > 2. I think it is better to add an Assert after initial error checks > > (after RecoveryInProgress().. check) I am not getting your points. Can you explain me, that which type of assert you are suggesting? > > 3. > > + Assert (locallock->tag.lock.locktag_type != LOCKTAG_RELATION_EXTEND || > > + locallock->nLocks == 0); > > > > I think it is not possible that we have an entry in > > LockMethodLocalHash and its value is zero. Do you see any such > > possibility, if not, then we might want to remove it? Yes, this condition is not needed. Fixed. > > > > 4. We already have a macro for LOCALLOCK_LOCKMETHOD, can we write > > another one tag type? This will make the check look a bit cleaner and > > probably if we need to extend it in future for Page type locks, then > > also it will be good. Good point. I added macros in this version. Here, attaching new patch set for review. Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Mahendra Singh Thalor
Date:
On Wed, 4 Mar 2020 at 12:03, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Mon, 24 Feb 2020 at 15:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > Hi, > > > > > > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > > > > I think till we know the real need for changing group locking, going > > > > > in the direction of what Tom suggested to use an array of LWLocks [1] > > > > > to address the problems in hand is a good idea. > > > > > > > > -many > > > > > > > > I think that building yet another locking subsystem is the entirely > > > > wrong idea - especially when there's imo no convincing architectural > > > > reasons to do so. > > > > > > > > > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do > > > at other places as well (like BufferIO locks). I am not sure if we > > > can call it as new locking subsystem, but if we decide to continue > > > using lock.c and change group locking then I think we can do that as > > > well, see my comments below regarding that. > > > > > > > > > > > > It is not very clear to me that are we thinking to give up on Tom's > > > > > idea [1] and change group locking even though it is not clear or at > > > > > least nobody has proposed an idea/patch which requires that? Or are > > > > > we thinking that we can do what Tom suggested for relation extension > > > > > lock and also plan to change group locking for future parallel > > > > > operations that might require it? > > > > > > > > What I'm advocating is that extension locks should continue to go > > > > through lock.c. And yes, that requires some changes to group locking, > > > > but I still don't see why they'd be complicated. > > > > > > > > > > Fair position, as per initial analysis, I think if we do below three > > > things, it should work out without changing to a new way of locking > > > for relation extension or page type locks. > > > a. As per the discussion above, ensure in code we will never try to > > > acquire another heavy-weight lock after acquiring relation extension > > > or page type locks (probably by having Asserts in code or maybe some > > > other way). > > > b. Change lock.c so that group locking is not considered for these two > > > lock types. For ex. in LockCheckConflicts, along with the check (if > > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)), > > > we also check lock->tag and call it a conflict for these two locks. > > > c. The deadlock detector can ignore checking these two types of locks > > > because point (a) ensures that those won't lead to deadlock. One idea > > > could be that FindLockCycleRecurseMember just ignores these two types > > > of locks by checking the lock tag. > > > > Thanks Amit for summary. > > > > Based on above 3 points, here attaching 2 patches for review. > > > > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip Kumar) > > Basically this patch is for point b and c. > > > > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch > > (Patch by me) > > This patch is for point a. > > > > After applying both the patches, make check-world is passing. > > > > We are testing both the patches and will post results. > > > > Thoughts? > > +static void AssertAnyExtentionLockHeadByMe(void); > > +/* > + * AssertAnyExtentionLockHeadByMe -- test whether any EXTENSION lock held by > + * this backend. If any EXTENSION lock is hold by this backend, then assert > + * will fail. To use this function, assert should be enabled. > + */ > +void AssertAnyExtentionLockHeadByMe() > +{ > > Some minor observations on 0002. > 1. static is missing in a function definition. > 2. Function name should start in new line after function return type > in function definition, as per pg guideline. > +void AssertAnyExtentionLockHeadByMe() > -> > void > AssertAnyExtentionLockHeadByMe() Thanks Dilip for review. I have fixed above 2 points in v2 patch set. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Mar 5, 2020 at 2:18 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > Here, attaching new patch set for review. I was kind of assuming that the way this would work is that it would set a flag or increment a counter or something when we acquire a relation extension lock, and then reverse the process when we release it. Then the Assert could just check the flag. Walking the whole LOCALLOCK table is expensive. Also, spelling counts. This patch features "extention" multiple times, plus also "hask," "beloging," "belog," and "whle", which is an awful lot of typos for a 70-line patch. If you are using macOS, try opening the patch in TextEdit. If you are inventing a new function name, spell the words you include the same way they are spelled elsewhere. Even aside from the typo, AssertAnyExtentionLockHeadByMe() is not a very good function name. It sounds like it's asserting that we hold an extension lock, rather than that we don't, and also, that's not exactly what it checks anyway, because there's this special case for when we're acquiring a relation extension lock we already hold. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, Mar 5, 2020 at 1:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 5. I have also tried to think of another way to check if we already > > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a > > cheaper way than this. Basically, I think if we traverse the > > MyProc->myProcLocks queue, we will get this information, but that > > doesn't seem much cheaper than this. > > I think we can maintain a flag (rel_extlock_held). And, we can set > that true in LockRelationForExtension, > ConditionalLockRelationForExtension functions and we can reset it in > UnlockRelationForExtension or in the error path e.g. LockReleaseAll. > I think if we reset it in LockReleaseAll during the error path, then we need to find a way to reset it during LockReleaseCurrentOwner as that is called during Subtransaction Abort which can be tricky as we don't know if it belongs to the current owner. How about resetting in Abort(Sub)Transaction and CommitTransaction after we release locks via ResourceOwnerRelease. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, Mar 6, 2020 at 2:19 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 5, 2020 at 2:18 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Here, attaching new patch set for review. > > I was kind of assuming that the way this would work is that it would > set a flag or increment a counter or something when we acquire a > relation extension lock, and then reverse the process when we release > it. Then the Assert could just check the flag. Walking the whole > LOCALLOCK table is expensive. > I think we can keep such a flag in TopTransactionState. We free such locks after the work is done (except during error where we free them at transaction abort) rather than at transaction commit, so one might say it is better not to associate with transaction state, but not sure if there is other better place. Do you have any suggestions? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Thu, Mar 5, 2020 at 11:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I think we can keep such a flag in TopTransactionState. We free such > locks after the work is done (except during error where we free them > at transaction abort) rather than at transaction commit, so one might > say it is better not to associate with transaction state, but not sure > if there is other better place. Do you have any suggestions? I assumed it would be a global variable in lock.c. lock.c has got to know when any lock is required or released, so I don't know why we need to involve xact.c in the bookkeeping. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Fri, Mar 6, 2020 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 5, 2020 at 1:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > 5. I have also tried to think of another way to check if we already > > > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a > > > cheaper way than this. Basically, I think if we traverse the > > > MyProc->myProcLocks queue, we will get this information, but that > > > doesn't seem much cheaper than this. > > > > I think we can maintain a flag (rel_extlock_held). And, we can set > > that true in LockRelationForExtension, > > ConditionalLockRelationForExtension functions and we can reset it in > > UnlockRelationForExtension or in the error path e.g. LockReleaseAll. > > > > I think if we reset it in LockReleaseAll during the error path, then > we need to find a way to reset it during LockReleaseCurrentOwner as > that is called during Subtransaction Abort which can be tricky as we > don't know if it belongs to the current owner. How about resetting in > Abort(Sub)Transaction and CommitTransaction after we release locks via > ResourceOwnerRelease. I think instead of the flag we need to keep the counter because we can acquire the same relation extension lock multiple times. So basically, every time we acquire the lock we can increment the counter and while releasing we can decrement it. During an error path, I think it is fine to set it to 0 in CommitTransaction/AbortTransaction. But, I am not sure that we can set to 0 or decrement it in AbortSubTransaction because we are not sure whether we have acquired the lock under this subtransaction or not. Having said that, I think there should not be any case that we are starting the sub-transaction while holding the relation extension lock. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sat, Mar 7, 2020 at 9:57 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Mar 6, 2020 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 5, 2020 at 1:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > 5. I have also tried to think of another way to check if we already
> > > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a
> > > cheaper way than this. Basically, I think if we traverse the
> > > MyProc->myProcLocks queue, we will get this information, but that
> > > doesn't seem much cheaper than this.
> >
> > I think we can maintain a flag (rel_extlock_held). And, we can set
> > that true in LockRelationForExtension,
> > ConditionalLockRelationForExtension functions and we can reset it in
> > UnlockRelationForExtension or in the error path e.g. LockReleaseAll.
> >
>
> I think if we reset it in LockReleaseAll during the error path, then
> we need to find a way to reset it during LockReleaseCurrentOwner as
> that is called during Subtransaction Abort which can be tricky as we
> don't know if it belongs to the current owner. How about resetting in
> Abort(Sub)Transaction and CommitTransaction after we release locks via
> ResourceOwnerRelease.
I think instead of the flag we need to keep the counter because we can
acquire the same relation extension lock multiple times. So
basically, every time we acquire the lock we can increment the counter
and while releasing we can decrement it. During an error path, I
think it is fine to set it to 0 in CommitTransaction/AbortTransaction.
But, I am not sure that we can set to 0 or decrement it in
AbortSubTransaction because we are not sure whether we have acquired
the lock under this subtransaction or not.
Having said that, I think there should not be any case that we are
starting the sub-transaction while holding the relation extension
lock.
Right, this is exactly the point. I think we can mention this in comments to make it clear why setting it to zero is fine during subtransaction abort.
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sat, Mar 7, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 7, 2020 at 9:57 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:On Fri, Mar 6, 2020 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 5, 2020 at 1:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > 5. I have also tried to think of another way to check if we already
> > > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a
> > > cheaper way than this. Basically, I think if we traverse the
> > > MyProc->myProcLocks queue, we will get this information, but that
> > > doesn't seem much cheaper than this.
> >
> > I think we can maintain a flag (rel_extlock_held). And, we can set
> > that true in LockRelationForExtension,
> > ConditionalLockRelationForExtension functions and we can reset it in
> > UnlockRelationForExtension or in the error path e.g. LockReleaseAll.
> >
>
> I think if we reset it in LockReleaseAll during the error path, then
> we need to find a way to reset it during LockReleaseCurrentOwner as
> that is called during Subtransaction Abort which can be tricky as we
> don't know if it belongs to the current owner. How about resetting in
> Abort(Sub)Transaction and CommitTransaction after we release locks via
> ResourceOwnerRelease.
I think instead of the flag we need to keep the counter because we can
acquire the same relation extension lock multiple times. So
basically, every time we acquire the lock we can increment the counter
and while releasing we can decrement it. During an error path, I
think it is fine to set it to 0 in CommitTransaction/AbortTransaction.
But, I am not sure that we can set to 0 or decrement it in
AbortSubTransaction because we are not sure whether we have acquired
the lock under this subtransaction or not.
Having said that, I think there should not be any case that we are
starting the sub-transaction while holding the relation extension
lock.Right, this is exactly the point. I think we can mention this in comments to make it clear why setting it to zero is fine during subtransaction abort.
Is there anything wrong with having an Assert during subtransaction start to indicate that we don't have a relation extension lock?
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Sat, Mar 7, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Mar 7, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Sat, Mar 7, 2020 at 9:57 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: >>> >>> On Fri, Mar 6, 2020 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >>> > >>> > On Thu, Mar 5, 2020 at 1:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >>> > > >>> > > On Thu, Mar 5, 2020 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >>> > > > >>> > > > >>> > > > 5. I have also tried to think of another way to check if we already >>> > > > hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a >>> > > > cheaper way than this. Basically, I think if we traverse the >>> > > > MyProc->myProcLocks queue, we will get this information, but that >>> > > > doesn't seem much cheaper than this. >>> > > >>> > > I think we can maintain a flag (rel_extlock_held). And, we can set >>> > > that true in LockRelationForExtension, >>> > > ConditionalLockRelationForExtension functions and we can reset it in >>> > > UnlockRelationForExtension or in the error path e.g. LockReleaseAll. >>> > > >>> > >>> > I think if we reset it in LockReleaseAll during the error path, then >>> > we need to find a way to reset it during LockReleaseCurrentOwner as >>> > that is called during Subtransaction Abort which can be tricky as we >>> > don't know if it belongs to the current owner. How about resetting in >>> > Abort(Sub)Transaction and CommitTransaction after we release locks via >>> > ResourceOwnerRelease. >>> >>> I think instead of the flag we need to keep the counter because we can >>> acquire the same relation extension lock multiple times. So >>> basically, every time we acquire the lock we can increment the counter >>> and while releasing we can decrement it. During an error path, I >>> think it is fine to set it to 0 in CommitTransaction/AbortTransaction. >>> But, I am not sure that we can set to 0 or decrement it in >>> AbortSubTransaction because we are not sure whether we have acquired >>> the lock under this subtransaction or not. >>> >>> Having said that, I think there should not be any case that we are >>> starting the sub-transaction while holding the relation extension >>> lock. >> >> >> Right, this is exactly the point. I think we can mention this in comments to make it clear why setting it to zero isfine during subtransaction abort. > > > Is there anything wrong with having an Assert during subtransaction start to indicate that we don't have a relation extensionlock? Yes, I was planning to do that. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Dilip Kumar <dilipbalaut@gmail.com> writes: > I think instead of the flag we need to keep the counter because we can > acquire the same relation extension lock multiple times. Uh ... what? How would that not be broken usage on its face? I continue to think that we'd be better off getting all of this out of the heavyweight lock manager. There is no reason why we should need deadlock detection, or multiple holds of the same lock, or pretty much anything that LWLocks don't give you. regards, tom lane
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Sat, Mar 7, 2020 at 8:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dilip Kumar <dilipbalaut@gmail.com> writes: > > I think instead of the flag we need to keep the counter because we can > > acquire the same relation extension lock multiple times. > > Uh ... what? How would that not be broken usage on its face? Basically, if we can ensure that while holding the relation extension lock we will not wait for any other lock then we can ignore in the deadlock detection path so that we don't detect the false deadlock due to the group locking mechanism. So if we are already holding the relation extension lock and trying to acquire the same lock-in same mode then it can never wait so this is safe. > I continue to think that we'd be better off getting all of this > out of the heavyweight lock manager. There is no reason why we > should need deadlock detection, or multiple holds of the same > lock, or pretty much anything that LWLocks don't give you. Right, we never need deadlock detection for this lock. But, I think there are quite a few cases where we have multiple holds at the same time. e.g, during RelationAddExtraBlocks, while holding the relation extension lock we try to update the block in FSM and FSM might need to add extra FSM block which will again try to acquire the same lock. But, I think the main reason for not converting it to an LWLocks is because Andres has a concern about inventing new lock mechanism as discuss upthread[1] [1] https://www.postgresql.org/message-id/20200220023612.c44ggploywxtlvmx%40alap3.anarazel.de -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, 24 Feb 2020 at 19:08, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > > I think till we know the real need for changing group locking, going > > > in the direction of what Tom suggested to use an array of LWLocks [1] > > > to address the problems in hand is a good idea. > > > > -many > > > > I think that building yet another locking subsystem is the entirely > > wrong idea - especially when there's imo no convincing architectural > > reasons to do so. > > > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do > at other places as well (like BufferIO locks). I am not sure if we > can call it as new locking subsystem, but if we decide to continue > using lock.c and change group locking then I think we can do that as > well, see my comments below regarding that. > > > > > > It is not very clear to me that are we thinking to give up on Tom's > > > idea [1] and change group locking even though it is not clear or at > > > least nobody has proposed an idea/patch which requires that? Or are > > > we thinking that we can do what Tom suggested for relation extension > > > lock and also plan to change group locking for future parallel > > > operations that might require it? > > > > What I'm advocating is that extension locks should continue to go > > through lock.c. And yes, that requires some changes to group locking, > > but I still don't see why they'd be complicated. > > > > Fair position, as per initial analysis, I think if we do below three > things, it should work out without changing to a new way of locking > for relation extension or page type locks. > a. As per the discussion above, ensure in code we will never try to > acquire another heavy-weight lock after acquiring relation extension > or page type locks (probably by having Asserts in code or maybe some > other way). The current patch (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) doesn't check that acquiring a heavy-weight lock after page type lock, is that right? There is the path doing that: ginInsertCleanup() holds a page lock and insert the pending list items, which might hold a relation extension lock. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sat, Mar 7, 2020 at 9:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sat, Mar 7, 2020 at 8:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dilip Kumar <dilipbalaut@gmail.com> writes:
> > I think instead of the flag we need to keep the counter because we can
> > acquire the same relation extension lock multiple times.
>
> Uh ... what? How would that not be broken usage on its face?
Basically, if we can ensure that while holding the relation extension
lock we will not wait for any other lock then we can ignore in the
deadlock detection path so that we don't detect the false deadlock due
to the group locking mechanism. So if we are already holding the
relation extension lock and trying to acquire the same lock-in same
mode then it can never wait so this is safe.
> I continue to think that we'd be better off getting all of this
> out of the heavyweight lock manager. There is no reason why we
> should need deadlock detection, or multiple holds of the same
> lock, or pretty much anything that LWLocks don't give you.
Right, we never need deadlock detection for this lock. But, I think
there are quite a few cases where we have multiple holds at the same
time. e.g, during RelationAddExtraBlocks, while holding the relation
extension lock we try to update the block in FSM and FSM might need to
add extra FSM block which will again try to acquire the same lock.
But, I think the main reason for not converting it to an LWLocks is
because Andres has a concern about inventing new lock mechanism as
discuss upthread[1]
Right, that is one point and another is that if we go via the route of converting it to LWLocks, then we also need to think of some solution for page locks that are used in ginInsertCleanup. However, if we go with the approach being pursued [1] then the page locks will be handled in a similar way as relation extension locks.
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Mon, 24 Feb 2020 at 19:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote:
> > > I think till we know the real need for changing group locking, going
> > > in the direction of what Tom suggested to use an array of LWLocks [1]
> > > to address the problems in hand is a good idea.
> >
> > -many
> >
> > I think that building yet another locking subsystem is the entirely
> > wrong idea - especially when there's imo no convincing architectural
> > reasons to do so.
> >
>
> Hmm, AFAIU, it will be done by having an array of LWLocks which we do
> at other places as well (like BufferIO locks). I am not sure if we
> can call it as new locking subsystem, but if we decide to continue
> using lock.c and change group locking then I think we can do that as
> well, see my comments below regarding that.
>
> >
> > > It is not very clear to me that are we thinking to give up on Tom's
> > > idea [1] and change group locking even though it is not clear or at
> > > least nobody has proposed an idea/patch which requires that? Or are
> > > we thinking that we can do what Tom suggested for relation extension
> > > lock and also plan to change group locking for future parallel
> > > operations that might require it?
> >
> > What I'm advocating is that extension locks should continue to go
> > through lock.c. And yes, that requires some changes to group locking,
> > but I still don't see why they'd be complicated.
> >
>
> Fair position, as per initial analysis, I think if we do below three
> things, it should work out without changing to a new way of locking
> for relation extension or page type locks.
> a. As per the discussion above, ensure in code we will never try to
> acquire another heavy-weight lock after acquiring relation extension
> or page type locks (probably by having Asserts in code or maybe some
> other way).
The current patch
(v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
doesn't check that acquiring a heavy-weight lock after page type lock,
is that right?
No, it should do that.
There is the path doing that: ginInsertCleanup() holds
a page lock and insert the pending list items, which might hold a
relation extension lock.
Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case, but I don't see any fundamental problem with that.
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, 9 Mar 2020 at 14:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Mon, 24 Feb 2020 at 19:08, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: >> > > >> > > Hi, >> > > >> > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: >> > > > I think till we know the real need for changing group locking, going >> > > > in the direction of what Tom suggested to use an array of LWLocks [1] >> > > > to address the problems in hand is a good idea. >> > > >> > > -many >> > > >> > > I think that building yet another locking subsystem is the entirely >> > > wrong idea - especially when there's imo no convincing architectural >> > > reasons to do so. >> > > >> > >> > Hmm, AFAIU, it will be done by having an array of LWLocks which we do >> > at other places as well (like BufferIO locks). I am not sure if we >> > can call it as new locking subsystem, but if we decide to continue >> > using lock.c and change group locking then I think we can do that as >> > well, see my comments below regarding that. >> > >> > > >> > > > It is not very clear to me that are we thinking to give up on Tom's >> > > > idea [1] and change group locking even though it is not clear or at >> > > > least nobody has proposed an idea/patch which requires that? Or are >> > > > we thinking that we can do what Tom suggested for relation extension >> > > > lock and also plan to change group locking for future parallel >> > > > operations that might require it? >> > > >> > > What I'm advocating is that extension locks should continue to go >> > > through lock.c. And yes, that requires some changes to group locking, >> > > but I still don't see why they'd be complicated. >> > > >> > >> > Fair position, as per initial analysis, I think if we do below three >> > things, it should work out without changing to a new way of locking >> > for relation extension or page type locks. >> > a. As per the discussion above, ensure in code we will never try to >> > acquire another heavy-weight lock after acquiring relation extension >> > or page type locks (probably by having Asserts in code or maybe some >> > other way). >> >> The current patch >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) >> doesn't check that acquiring a heavy-weight lock after page type lock, >> is that right? > > > No, it should do that. > >> >> There is the path doing that: ginInsertCleanup() holds >> a page lock and insert the pending list items, which might hold a >> relation extension lock. > > > Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case, but Idon't see any fundamental problem with that. I think that could be a problem if we change the group locking so that it doesn't consider page lock type. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Mon, 9 Mar 2020 at 14:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>> >
>> > Fair position, as per initial analysis, I think if we do below three
>> > things, it should work out without changing to a new way of locking
>> > for relation extension or page type locks.
>> > a. As per the discussion above, ensure in code we will never try to
>> > acquire another heavy-weight lock after acquiring relation extension
>> > or page type locks (probably by having Asserts in code or maybe some
>> > other way).
>>
>> The current patch
>> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
>> doesn't check that acquiring a heavy-weight lock after page type lock,
>> is that right?
>
>
> No, it should do that.
>
>>
>> There is the path doing that: ginInsertCleanup() holds
>> a page lock and insert the pending list items, which might hold a
>> relation extension lock.
>
>
> Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case, but I don't see any fundamental problem with that.
I think that could be a problem if we change the group locking so that
it doesn't consider page lock type.
I might be missing something, but won't that be a problem only when if there is a case where we acquire page lock after acquiring a relation extension lock? Can you please explain the scenario you have in mind which can create a problem?
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, 9 Mar 2020 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Mon, 9 Mar 2020 at 14:16, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> > >> >> > Fair position, as per initial analysis, I think if we do below three >> >> > things, it should work out without changing to a new way of locking >> >> > for relation extension or page type locks. >> >> > a. As per the discussion above, ensure in code we will never try to >> >> > acquire another heavy-weight lock after acquiring relation extension >> >> > or page type locks (probably by having Asserts in code or maybe some >> >> > other way). >> >> >> >> The current patch >> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) >> >> doesn't check that acquiring a heavy-weight lock after page type lock, >> >> is that right? >> > >> > >> > No, it should do that. >> > >> >> >> >> There is the path doing that: ginInsertCleanup() holds >> >> a page lock and insert the pending list items, which might hold a >> >> relation extension lock. >> > >> > >> > Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case, butI don't see any fundamental problem with that. >> >> I think that could be a problem if we change the group locking so that >> it doesn't consider page lock type. > > > I might be missing something, but won't that be a problem only when if there is a case where we acquire page lock afteracquiring a relation extension lock? Yes, you're right. Well I meant that the reason why we need to make Assert should cover page locks case is the same as the reason for extension lock type case. If we change the group locking so that it doesn't consider extension lock and change deadlock so that it doesn't make a wait edge for it, we need to ensure that the same backend doesn't acquire heavy-weight lock after holding relation extension lock. These are already done in the current patch. Similarly, if we did the similar change for page lock in the group locking and deadlock , we need to ensure the same things for page lock. But ISTM it doesn't necessarily need to support page lock for now because currently we use it only for cleanup pending list of gin index. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Mon, Mar 9, 2020 at 2:09 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Mon, 9 Mar 2020 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Mon, 9 Mar 2020 at 14:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>> >> >
>> >> > Fair position, as per initial analysis, I think if we do below three
>> >> > things, it should work out without changing to a new way of locking
>> >> > for relation extension or page type locks.
>> >> > a. As per the discussion above, ensure in code we will never try to
>> >> > acquire another heavy-weight lock after acquiring relation extension
>> >> > or page type locks (probably by having Asserts in code or maybe some
>> >> > other way).
>> >>
>> >> The current patch
>> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
>> >> doesn't check that acquiring a heavy-weight lock after page type lock,
>> >> is that right?
>> >
>> >
>> > No, it should do that.
>> >
>> >>
>> >> There is the path doing that: ginInsertCleanup() holds
>> >> a page lock and insert the pending list items, which might hold a
>> >> relation extension lock.
>> >
>> >
>> > Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case, but I don't see any fundamental problem with that.
>>
>> I think that could be a problem if we change the group locking so that
>> it doesn't consider page lock type.
>
>
> I might be missing something, but won't that be a problem only when if there is a case where we acquire page lock after acquiring a relation extension lock?
Yes, you're right.
Well I meant that the reason why we need to make Assert should cover
page locks case is the same as the reason for extension lock type
case. If we change the group locking so that it doesn't consider
extension lock and change deadlock so that it doesn't make a wait edge
for it, we need to ensure that the same backend doesn't acquire
heavy-weight lock after holding relation extension lock. These are
already done in the current patch. Similarly, if we did the similar
change for page lock in the group locking and deadlock , we need to
ensure the same things for page lock.
Agreed.
But ISTM it doesn't necessarily
need to support page lock for now because currently we use it only for
cleanup pending list of gin index.
I agree, but I think it is better to have a patch for the same even if we want to review/commit that separately. That will help us to look at how the complete solution looks.
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Mon, Mar 9, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 9, 2020 at 2:09 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Mon, 9 Mar 2020 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> >> >> On Mon, 9 Mar 2020 at 14:16, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> > >> >> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> >> > >> >> >> > Fair position, as per initial analysis, I think if we do below three >> >> >> > things, it should work out without changing to a new way of locking >> >> >> > for relation extension or page type locks. >> >> >> > a. As per the discussion above, ensure in code we will never try to >> >> >> > acquire another heavy-weight lock after acquiring relation extension >> >> >> > or page type locks (probably by having Asserts in code or maybe some >> >> >> > other way). >> >> >> >> >> >> The current patch >> >> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) >> >> >> doesn't check that acquiring a heavy-weight lock after page type lock, >> >> >> is that right? >> >> > >> >> > >> >> > No, it should do that. >> >> > >> >> >> >> >> >> There is the path doing that: ginInsertCleanup() holds >> >> >> a page lock and insert the pending list items, which might hold a >> >> >> relation extension lock. >> >> > >> >> > >> >> > Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case,but I don't see any fundamental problem with that. >> >> >> >> I think that could be a problem if we change the group locking so that >> >> it doesn't consider page lock type. >> > >> > >> > I might be missing something, but won't that be a problem only when if there is a case where we acquire page lock afteracquiring a relation extension lock? >> >> Yes, you're right. >> >> Well I meant that the reason why we need to make Assert should cover >> page locks case is the same as the reason for extension lock type >> case. If we change the group locking so that it doesn't consider >> extension lock and change deadlock so that it doesn't make a wait edge >> for it, we need to ensure that the same backend doesn't acquire >> heavy-weight lock after holding relation extension lock. These are >> already done in the current patch. Similarly, if we did the similar >> change for page lock in the group locking and deadlock , we need to >> ensure the same things for page lock. > > > Agreed. > >> >> But ISTM it doesn't necessarily >> need to support page lock for now because currently we use it only for >> cleanup pending list of gin index. >> > > I agree, but I think it is better to have a patch for the same even if we want to review/commit that separately. Thatwill help us to look at how the complete solution looks. Please find the updated patch (summary of the changes) - Instead of searching the lock hash table for assert, it maintains a counter. - Also, handled the case where we can acquire the relation extension lock while holding the relation extension lock on the same relation. - Handled the error case. In addition to that prepared a WIP patch for handling the PageLock. First, I thought that we can use the same counter for the PageLock and the RelationExtensionLock because in assert we just need to check whether we are trying to acquire any other heavyweight lock while holding any of these locks. But, the exceptional case where we allowed to acquire a relation extension lock while holding any of these locks is a bit different. Because, if we are holding a relation extension lock then we allowed to acquire the relation extension lock on the same relation but it can not be any other relation otherwise it can create a cycle. But, the same is not true with the PageLock, i.e. while holding the PageLock you can acquire the relation extension lock on any relation and that will be safe because the relation extension lock guarantee that, it will never create the cycle. However, I agree that we don't have any such cases where we want to acquire a relation extension lock on the different relations while holding the PageLock. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Mon, Feb 24, 2020 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > What I'm advocating is that extension locks should continue to go > > through lock.c. And yes, that requires some changes to group locking, > > but I still don't see why they'd be complicated. > > > > Fair position, as per initial analysis, I think if we do below three > things, it should work out without changing to a new way of locking > for relation extension or page type locks. > a. As per the discussion above, ensure in code we will never try to > acquire another heavy-weight lock after acquiring relation extension > or page type locks (probably by having Asserts in code or maybe some > other way). I have done an analysis of the relation extension lock (which can be acquired via LockRelationForExtension or ConditionalLockRelationForExtension) and found that we don't acquire any other heavyweight lock after acquiring it. However, we do sometimes try to acquire it again in the places where we update FSM after extension, see points (e) and (f) described below. The usage of this lock can be broadly divided into six categories and each one is explained as follows: a. Where after taking the relation extension lock we call ReadBuffer (or its variant) and then LockBuffer. The LockBuffer internally calls either LWLock to acquire or release neither of which acquire another heavy-weight lock. It is quite obvious as well that while taking some lightweight lock, there is no reason to acquire another heavyweight lock on any object. The specs/comments of ReadBufferExtended (which gets called from variants of ReadBuffer) API says that if the blknum requested is P_NEW, only one backend can call it at-a-time which indicates that we don't need to acquire any heavy-weight lock inside this API. Otherwise, also, this API won't need a heavy-weight lock to read the existing block into shared buffer as two different backends are allowed to read the same block. I have also gone through all the functions called/used in this path to ensure that we don't use heavy-weight locks inside it. The usage by APIs BloomNewBuffer, GinNewBuffer, gistNewBuffer, _bt_getbuf, and SpGistNewBuffer falls in this category. Another API that falls under this category is revmap_physical_extend which uses ReadBuffer, LocakBuffer and ReleaseBuffer. The ReleaseBuffer API unpins aka decrement the reference count for buffer and disassociates a buffer from the resource owner. None of that requires heavy-weight lock. T b. After taking relation extension lock, we call RelationGetNumberOfBlocks which primarily calls file-level functions to determine the size of the file. This doesn't acquire any other heavy-weight lock after relation extension lock. The usage by APIs ginvacuumcleanup, gistvacuumscan, btvacuumscan, and spgvacuumscan falls in this category. c. There is a usage in API brin_page_cleanup() where we just acquire and release the relation extension lock to avoid reinitializing the page. As there is no call in-between acquire and release, so there is no chance of another heavy-weight lock acquire after having relation extension lock. d. In fsm_extend() and vm_extend(), after acquiring relation extension lock, we perform various file-level operations like RelationOpenSmgr, smgrexists, smgrcreate, smgrnblocks, smgrextend. First, from theory, we don't have any heavy-weight lock other than relation extension lock which can cover such operations and then I have verified it by going through these APIs that these don't acquire any other heavy-weight lock. Then these APIs also call PageSetChecksumInplace computes a checksum of the page and sets the same in page header which is quite straight-forward and doesn't acquire any heavy-weight lock. In vm_extend, we additionally call CacheInvalidateSmgr to send a shared-inval message to force other backends to close any smgr references they may have for the relation for which we extending visibility map which has no reason to acquire any heavy-weight lock. I have checked the code path as well and I didn't find any heavy-weight lock call in that. e. In brin_getinsertbuffer, we call ReadBuffer() and LockBuffer(), the usage of which is the same as what is mentioned in (a). In addition to that it calls brin_initialize_empty_new_buffer() which further calls RecordPageWithFreeSpace which can again acquire relation extension lock for same relation. This usage is safe because we have a mechanism in heavy-weight lock manager that if we already hold a lock and a request came for the same lock and in same mode, the lock will be granted. f. In RelationGetBufferForTuple(), there are multiple APIs that get called and like (e), it can try to reacquire the relation extension lock in one of those APIs. The main APIs it calls after acquiring relation extension lock are described as follows: - GetPageWithFreeSpace: This tries to find a page in the given relation with at least the specified amount of free space. This mainly checks the FSM pages and in one of the paths might call fsm_extend which can again try to acquire the relation extension lock on the same relation. - RelationAddExtraBlocks: This adds multiple pages in a relation if there is contention around relation extension lock. This calls RelationExtensionLockWaiterCount which is mainly to check how many lockers are waiting for the same lock, then call ReadBufferBI which as explained above won't require heavy-weight locks and FSM APIs which can acquire Relation extension lock on the same relation, but that is safe as discussed previously. The Page locks can be acquired via LockPage and ConditionalLockPage. This is acquired from one place in the code during Gin index cleanup (ginInsertCleanup). The basic idea is that it will scan the pending list and move entries into the main index. While moving entries to the main page, it might need to add a new page that will require us to take a relation extension lock. Now, unlike relation extension lock, after acquiring page lock, we do acquire another heavy-weight lock (relation extension lock), but as we never acquire it in reverse order, this is safe. So, as per this analysis, we can add Asserts for relation extension and page locks which will indicate that they won't participate in deadlocks. It would be good if someone else can also do independent analysis and verify my findings. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Fri, Mar 6, 2020 at 11:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I think instead of the flag we need to keep the counter because we can > acquire the same relation extension lock multiple times. So > basically, every time we acquire the lock we can increment the counter > and while releasing we can decrement it. During an error path, I > think it is fine to set it to 0 in CommitTransaction/AbortTransaction. > But, I am not sure that we can set to 0 or decrement it in > AbortSubTransaction because we are not sure whether we have acquired > the lock under this subtransaction or not. I think that CommitTransaction, AbortTransaction, and friends have *zero* business touching this. I think the counter - or flag - should track whether we've got a PROCLOCK entry for a relation extension lock. We either do, or we do not, and that does not change because of anything have to do with the transaction state. It changes because somebody calls LockRelease() or LockReleaseAll(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Robert Haas
Date:
On Sat, Mar 7, 2020 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I continue to think that we'd be better off getting all of this > out of the heavyweight lock manager. There is no reason why we > should need deadlock detection, or multiple holds of the same > lock, or pretty much anything that LWLocks don't give you. Well, that was my initial inclination too, but Andres didn't like it. I don't know whether it's better to take his advice or yours. The one facility that we need here which the heavyweight lock facility does provide and the lightweight lock facility does not is the ability to take locks on an effectively unlimited number of distinct objects. That is, we can't have a separate LWLock for every relation, because there ~2^32 relation OIDs per database, and ~2^32 database OIDs, and a patch that tried to allocate a tranche of 2^64 LWLocks would probably get shot down. The patch I wrote for this tried to work around this by having an array of LWLocks and hashing <DBOID, RELOID> pairs onto array slots. This produces some false sharing, though, which Andres didn't like (and I can understand his concern). We could work around that problem with a more complex design, where the LWLocks in the array do not themselves represent the right to extend the relation, but only protect the list of lockers. But at that point it starts to look like you are reinventing the whole LOCK/PROCLOCK division. So from my point of view we've got three possible approaches here, all imperfect: - Hash <DB, REL> pairs onto an array of LWLocks that represent the right to extend the relation. Problem: false sharing for the whole time the lock is held. - Hash <DB, REL> pairs onto an array of LWLocks that protect a list of lockers. Problem: looks like reinventing LOCK/PROCLOCK mechanism, which is a fair amount of complexity to be duplicating. - Adapt the heavyweight lock manager. Problem: Code is old, complex, grotty, and doesn't need more weird special cases. Whatever we choose, I think we ought to try to get Page locks and Relation Extension locks into the same system. They're conceptually the same kind of thing: you're not locking an SQL object, you basically want an LWLock, but you can't use an LWLock because you want to lock an OID not a piece of shared memory, so you can't have enough LWLocks to use them in the regular way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Mar 10, 2020 at 6:48 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Mar 7, 2020 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I continue to think that we'd be better off getting all of this > > out of the heavyweight lock manager. There is no reason why we > > should need deadlock detection, or multiple holds of the same > > lock, or pretty much anything that LWLocks don't give you. > > Well, that was my initial inclination too, but Andres didn't like it. > I don't know whether it's better to take his advice or yours. > > The one facility that we need here which the heavyweight lock facility > does provide and the lightweight lock facility does not is the ability > to take locks on an effectively unlimited number of distinct objects. > That is, we can't have a separate LWLock for every relation, because > there ~2^32 relation OIDs per database, and ~2^32 database OIDs, and a > patch that tried to allocate a tranche of 2^64 LWLocks would probably > get shot down. > I think if we have to follow any LWLock based design, then we also need to think about a case where if it is already acquired by the backend (say in X mode), then it should be granted if the same backend tries to acquire it in same mode (or mode that is compatible with the mode in which it is already acquired). As per my analysis above [1], we do this at multiple places for relation extension lock. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE8Vu%3D9PYZBZvMrga0Ynz_m6jmT3G_vJv-3L1PWv9Krg%40mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Mar 10, 2020 at 8:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Please find the updated patch (summary of the changes) > - Instead of searching the lock hash table for assert, it maintains a counter. > - Also, handled the case where we can acquire the relation extension > lock while holding the relation extension lock on the same relation. > - Handled the error case. > > In addition to that prepared a WIP patch for handling the PageLock. > First, I thought that we can use the same counter for the PageLock and > the RelationExtensionLock because in assert we just need to check > whether we are trying to acquire any other heavyweight lock while > holding any of these locks. But, the exceptional case where we > allowed to acquire a relation extension lock while holding any of > these locks is a bit different. Because, if we are holding a relation > extension lock then we allowed to acquire the relation extension lock > on the same relation but it can not be any other relation otherwise it > can create a cycle. But, the same is not true with the PageLock, > i.e. while holding the PageLock you can acquire the relation extension > lock on any relation and that will be safe because the relation > extension lock guarantee that, it will never create the cycle. > However, I agree that we don't have any such cases where we want to > acquire a relation extension lock on the different relations while > holding the PageLock. > Right, today, we don't have such cases where after acquiring relation extension or page lock for a particular relation, we need to acquire any of those for other relation and I am not able to offhand think of many cases where we might have such a need in the future. The one theoretical possibility is to include fork_num in the lock tag while acquiring extension lock for fsm/vm, but that will also have the same relation. Similarly one might say it is valid to acquire extension lock in share mode after we have acquired it exclusive mode. I am not sure how much futuristic we want to make these Asserts. I feel we should cover the current possible cases (which I think will make the asserts more strict then required) and if there is a need to relax them in the future for any particular use case, then we will consider those. In general, if we consider the way Mahendra has written a patch which is to find the entry via the local hash table to check for an Assert condition, then it will be a bit easier to extend the checks if required in future as that way we have more information about the particular lock. However, it will make the check more expensive which might be okay considering that it is only for Assert enabled builds. One minor comment: /* + * We should not acquire any other lock if we are already holding the + * relation extension lock. Only exception is that if we are trying to + * acquire the relation extension lock then we can hold the relation + * extension on the same relation. + */ + Assert(!IsRelExtLockHeld() || + ((locktag->locktag_type == LOCKTAG_RELATION_EXTEND) && found)); I think you don't need the second part of the check because if we have found the lock in the local lock table, we would return before this check. I think it will catch the case where if we have an extension lock on one relation, then it won't allow us to acquire it on another relation. OTOH, it will also not allow cases where backend has relation extension lock in Exclusive mode and it tries to acquire it in Shared mode. So, not sure if it is a good idea. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Mar 10, 2020 at 6:39 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Mar 6, 2020 at 11:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I think instead of the flag we need to keep the counter because we can > > acquire the same relation extension lock multiple times. So > > basically, every time we acquire the lock we can increment the counter > > and while releasing we can decrement it. During an error path, I > > think it is fine to set it to 0 in CommitTransaction/AbortTransaction. > > But, I am not sure that we can set to 0 or decrement it in > > AbortSubTransaction because we are not sure whether we have acquired > > the lock under this subtransaction or not. > > I think that CommitTransaction, AbortTransaction, and friends have > *zero* business touching this. I think the counter - or flag - should > track whether we've got a PROCLOCK entry for a relation extension > lock. We either do, or we do not, and that does not change because of > anything have to do with the transaction state. It changes because > somebody calls LockRelease() or LockReleaseAll(). > Do we want to have a special check in the LockRelease() to identify whether we are releasing relation extension lock? If not, then how we will identify that relation extension is released and we can reset it during subtransaction abort due to error? During success paths, we know when we have released RelationExtension or Page Lock (via UnlockRelationForExtension or UnlockPage). During the top-level transaction end, we know when we have released all the locks, so that will imply that RelationExtension and or Page locks must have been released by that time. If we have no other choice, then I see a few downsides of adding a special check in the LockRelease() call: 1. Instead of resetting/decrement the variable from specific APIs like UnlockRelationForExtension or UnlockPage, we need to have it in LockRelease. It will also look odd, if set variable in LockRelationForExtension, but don't reset in the UnlockRelationForExtension variant. Now, maybe we can allow to reset it at both places if it is a flag, but not if it is a counter variable. 2. One can argue that adding extra instructions in a generic path (like LockRelease) is not a good idea, especially if those are for an Assert. I understand this won't add anything which we can measure by standard benchmarks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Wed, Mar 11, 2020 at 2:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 10, 2020 at 8:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Please find the updated patch (summary of the changes) > > - Instead of searching the lock hash table for assert, it maintains a counter. > > - Also, handled the case where we can acquire the relation extension > > lock while holding the relation extension lock on the same relation. > > - Handled the error case. > > > > In addition to that prepared a WIP patch for handling the PageLock. > > First, I thought that we can use the same counter for the PageLock and > > the RelationExtensionLock because in assert we just need to check > > whether we are trying to acquire any other heavyweight lock while > > holding any of these locks. But, the exceptional case where we > > allowed to acquire a relation extension lock while holding any of > > these locks is a bit different. Because, if we are holding a relation > > extension lock then we allowed to acquire the relation extension lock > > on the same relation but it can not be any other relation otherwise it > > can create a cycle. But, the same is not true with the PageLock, > > i.e. while holding the PageLock you can acquire the relation extension > > lock on any relation and that will be safe because the relation > > extension lock guarantee that, it will never create the cycle. > > However, I agree that we don't have any such cases where we want to > > acquire a relation extension lock on the different relations while > > holding the PageLock. > > > > Right, today, we don't have such cases where after acquiring relation > extension or page lock for a particular relation, we need to acquire > any of those for other relation and I am not able to offhand think of > many cases where we might have such a need in the future. The one > theoretical possibility is to include fork_num in the lock tag while > acquiring extension lock for fsm/vm, but that will also have the same > relation. Similarly one might say it is valid to acquire extension > lock in share mode after we have acquired it exclusive mode. I am not > sure how much futuristic we want to make these Asserts. > > I feel we should cover the current possible cases (which I think will > make the asserts more strict then required) and if there is a need to > relax them in the future for any particular use case, then we will > consider those. In general, if we consider the way Mahendra has > written a patch which is to find the entry via the local hash table to > check for an Assert condition, then it will be a bit easier to extend > the checks if required in future as that way we have more information > about the particular lock. However, it will make the check more > expensive which might be okay considering that it is only for Assert > enabled builds. > > One minor comment: > /* > + * We should not acquire any other lock if we are already holding the > + * relation extension lock. Only exception is that if we are trying to > + * acquire the relation extension lock then we can hold the relation > + * extension on the same relation. > + */ > + Assert(!IsRelExtLockHeld() || > + ((locktag->locktag_type == LOCKTAG_RELATION_EXTEND) && found)); > > I think you don't need the second part of the check because if we have > found the lock in the local lock table, we would return before this > check. Right. I think it will catch the case where if we have an extension > lock on one relation, then it won't allow us to acquire it on another > relation. But, those will be caught even if we remove the second part right. Basically, if we have Assert(!IsRelExtLockHeld(), that means by this time you should not hold any relation extension lock. The exceptional case where we allow relation extension on the same relation will anyway not reach here. I think the second part of the Assert is just useless. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Tue, Mar 10, 2020 at 4:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Feb 24, 2020 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <andres@anarazel.de> wrote: > > > What I'm advocating is that extension locks should continue to go > > > through lock.c. And yes, that requires some changes to group locking, > > > but I still don't see why they'd be complicated. > > > > > > > Fair position, as per initial analysis, I think if we do below three > > things, it should work out without changing to a new way of locking > > for relation extension or page type locks. > > a. As per the discussion above, ensure in code we will never try to > > acquire another heavy-weight lock after acquiring relation extension > > or page type locks (probably by having Asserts in code or maybe some > > other way). > > I have done an analysis of the relation extension lock (which can be > acquired via LockRelationForExtension or > ConditionalLockRelationForExtension) and found that we don't acquire > any other heavyweight lock after acquiring it. However, we do > sometimes try to acquire it again in the places where we update FSM > after extension, see points (e) and (f) described below. The usage of > this lock can be broadly divided into six categories and each one is > explained as follows: > > a. Where after taking the relation extension lock we call ReadBuffer > (or its variant) and then LockBuffer. The LockBuffer internally calls > either LWLock to acquire or release neither of which acquire another > heavy-weight lock. It is quite obvious as well that while taking some > lightweight lock, there is no reason to acquire another heavyweight > lock on any object. The specs/comments of ReadBufferExtended (which > gets called from variants of ReadBuffer) API says that if the blknum > requested is P_NEW, only one backend can call it at-a-time which > indicates that we don't need to acquire any heavy-weight lock inside > this API. Otherwise, also, this API won't need a heavy-weight lock to > read the existing block into shared buffer as two different backends > are allowed to read the same block. I have also gone through all the > functions called/used in this path to ensure that we don't use > heavy-weight locks inside it. > > The usage by APIs BloomNewBuffer, GinNewBuffer, gistNewBuffer, > _bt_getbuf, and SpGistNewBuffer falls in this category. Another API > that falls under this category is revmap_physical_extend which uses > ReadBuffer, LocakBuffer and ReleaseBuffer. The ReleaseBuffer API > unpins aka decrement the reference count for buffer and disassociates > a buffer from the resource owner. None of that requires heavy-weight > lock. T > > b. After taking relation extension lock, we call > RelationGetNumberOfBlocks which primarily calls file-level functions > to determine the size of the file. This doesn't acquire any other > heavy-weight lock after relation extension lock. > > The usage by APIs ginvacuumcleanup, gistvacuumscan, btvacuumscan, and > spgvacuumscan falls in this category. > > c. There is a usage in API brin_page_cleanup() where we just acquire > and release the relation extension lock to avoid reinitializing the > page. As there is no call in-between acquire and release, so there is > no chance of another heavy-weight lock acquire after having relation > extension lock. > > d. In fsm_extend() and vm_extend(), after acquiring relation extension > lock, we perform various file-level operations like RelationOpenSmgr, > smgrexists, smgrcreate, smgrnblocks, smgrextend. First, from theory, > we don't have any heavy-weight lock other than relation extension lock > which can cover such operations and then I have verified it by going > through these APIs that these don't acquire any other heavy-weight > lock. Then these APIs also call PageSetChecksumInplace computes a > checksum of the page and sets the same in page header which is quite > straight-forward and doesn't acquire any heavy-weight lock. > > In vm_extend, we additionally call CacheInvalidateSmgr to send a > shared-inval message to force other backends to close any smgr > references they may have for the relation for which we extending > visibility map which has no reason to acquire any heavy-weight lock. > I have checked the code path as well and I didn't find any > heavy-weight lock call in that. > > e. In brin_getinsertbuffer, we call ReadBuffer() and LockBuffer(), the > usage of which is the same as what is mentioned in (a). In addition > to that it calls brin_initialize_empty_new_buffer() which further > calls RecordPageWithFreeSpace which can again acquire relation > extension lock for same relation. This usage is safe because we have > a mechanism in heavy-weight lock manager that if we already hold a > lock and a request came for the same lock and in same mode, the lock > will be granted. > > f. In RelationGetBufferForTuple(), there are multiple APIs that get > called and like (e), it can try to reacquire the relation extension > lock in one of those APIs. The main APIs it calls after acquiring > relation extension lock are described as follows: > - GetPageWithFreeSpace: This tries to find a page in the given > relation with at least the specified amount of free space. This > mainly checks the FSM pages and in one of the paths might call > fsm_extend which can again try to acquire the relation extension lock > on the same relation. > - RelationAddExtraBlocks: This adds multiple pages in a relation if > there is contention around relation extension lock. This calls > RelationExtensionLockWaiterCount which is mainly to check how many > lockers are waiting for the same lock, then call ReadBufferBI which as > explained above won't require heavy-weight locks and FSM APIs which > can acquire Relation extension lock on the same relation, but that is > safe as discussed previously. > > The Page locks can be acquired via LockPage and ConditionalLockPage. > This is acquired from one place in the code during Gin index cleanup > (ginInsertCleanup). The basic idea is that it will scan the pending > list and move entries into the main index. While moving entries to > the main page, it might need to add a new page that will require us to > take a relation extension lock. Now, unlike relation extension lock, > after acquiring page lock, we do acquire another heavy-weight lock > (relation extension lock), but as we never acquire it in reverse > order, this is safe. > > So, as per this analysis, we can add Asserts for relation extension > and page locks which will indicate that they won't participate in > deadlocks. It would be good if someone else can also do independent > analysis and verify my findings. I have also analyzed the usage for the RelationExtensioLock and the PageLock. And, my findings are on the same lines. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Wed, Mar 11, 2020 at 2:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 10, 2020 at 8:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Please find the updated patch (summary of the changes) > > - Instead of searching the lock hash table for assert, it maintains a counter. > > - Also, handled the case where we can acquire the relation extension > > lock while holding the relation extension lock on the same relation. > > - Handled the error case. > > > > In addition to that prepared a WIP patch for handling the PageLock. > > First, I thought that we can use the same counter for the PageLock and > > the RelationExtensionLock because in assert we just need to check > > whether we are trying to acquire any other heavyweight lock while > > holding any of these locks. But, the exceptional case where we > > allowed to acquire a relation extension lock while holding any of > > these locks is a bit different. Because, if we are holding a relation > > extension lock then we allowed to acquire the relation extension lock > > on the same relation but it can not be any other relation otherwise it > > can create a cycle. But, the same is not true with the PageLock, > > i.e. while holding the PageLock you can acquire the relation extension > > lock on any relation and that will be safe because the relation > > extension lock guarantee that, it will never create the cycle. > > However, I agree that we don't have any such cases where we want to > > acquire a relation extension lock on the different relations while > > holding the PageLock. > > > > Right, today, we don't have such cases where after acquiring relation > extension or page lock for a particular relation, we need to acquire > any of those for other relation and I am not able to offhand think of > many cases where we might have such a need in the future. The one > theoretical possibility is to include fork_num in the lock tag while > acquiring extension lock for fsm/vm, but that will also have the same > relation. Similarly one might say it is valid to acquire extension > lock in share mode after we have acquired it exclusive mode. I am not > sure how much futuristic we want to make these Asserts. > > I feel we should cover the current possible cases (which I think will > make the asserts more strict then required) and if there is a need to > relax them in the future for any particular use case, then we will > consider those. In general, if we consider the way Mahendra has > written a patch which is to find the entry via the local hash table to > check for an Assert condition, then it will be a bit easier to extend > the checks if required in future as that way we have more information > about the particular lock. However, it will make the check more > expensive which might be okay considering that it is only for Assert > enabled builds. > > One minor comment: > /* > + * We should not acquire any other lock if we are already holding the > + * relation extension lock. Only exception is that if we are trying to > + * acquire the relation extension lock then we can hold the relation > + * extension on the same relation. > + */ > + Assert(!IsRelExtLockHeld() || > + ((locktag->locktag_type == LOCKTAG_RELATION_EXTEND) && found)); I have fixed this in the attached patch set. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Wed, Mar 11, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 10, 2020 at 6:39 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Fri, Mar 6, 2020 at 11:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > I think instead of the flag we need to keep the counter because we can > > > acquire the same relation extension lock multiple times. So > > > basically, every time we acquire the lock we can increment the counter > > > and while releasing we can decrement it. During an error path, I > > > think it is fine to set it to 0 in CommitTransaction/AbortTransaction. > > > But, I am not sure that we can set to 0 or decrement it in > > > AbortSubTransaction because we are not sure whether we have acquired > > > the lock under this subtransaction or not. > > > > I think that CommitTransaction, AbortTransaction, and friends have > > *zero* business touching this. I think the counter - or flag - should > > track whether we've got a PROCLOCK entry for a relation extension > > lock. We either do, or we do not, and that does not change because of > > anything have to do with the transaction state. It changes because > > somebody calls LockRelease() or LockReleaseAll(). > > > > Do we want to have a special check in the LockRelease() to identify > whether we are releasing relation extension lock? If not, then how we > will identify that relation extension is released and we can reset it > during subtransaction abort due to error? During success paths, we > know when we have released RelationExtension or Page Lock (via > UnlockRelationForExtension or UnlockPage). During the top-level > transaction end, we know when we have released all the locks, so that > will imply that RelationExtension and or Page locks must have been > released by that time. > > If we have no other choice, then I see a few downsides of adding a > special check in the LockRelease() call: > > 1. Instead of resetting/decrement the variable from specific APIs like > UnlockRelationForExtension or UnlockPage, we need to have it in > LockRelease. It will also look odd, if set variable in > LockRelationForExtension, but don't reset in the > UnlockRelationForExtension variant. Now, maybe we can allow to reset > it at both places if it is a flag, but not if it is a counter > variable. > > 2. One can argue that adding extra instructions in a generic path > (like LockRelease) is not a good idea, especially if those are for an > Assert. I understand this won't add anything which we can measure by > standard benchmarks. I have just written a WIP patch for relation extension lock where instead of incrementing and decrementing the counter in LockRelationForExtension and UnlockRelationForExtension respectively. We can just set and reset the flag in LockAcquireExtended and LockRelease. So this patch appears simple to me as we are not involving the transaction APIs to set and reset the flag. However, we need to add an extra check as you have already mentioned. I think we could measure the performance and see whether it has any impact or not? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have fixed this in the attached patch set. > I have modified your v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The modifications are (a) Change src/backend/storage/lmgr/README to reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which slightly simplifies the code, (c) moved the deadlock.c check a few lines up and (d) changed a few comments. It might be better if we can move the checks related to extension and page lock in a separate API or macro. What do you think? I have also used an extension to test this patch. This is the same extension that I have used to test the group locking patch. It will allow backends to form a group as we do for parallel workers. The extension is attached to this email. Test without patch: Session-1 Create table t1(c1 int, c2 char(500)); Select become_lock_group_leader(); Insert into t1 values(generate_series(1,100),'aaa'); -- stop this after acquiring relation extension lock via GDB. Session-2 Select become_lock_group_member(); Insert into t1 values(generate_series(101,200),'aaa'); - Debug LockAcquire and found that it doesn't generate conflict for Relation Extension lock. The above experiment has shown that without patch group members can acquire relation extension lock if the group leader has that lock. After patch the second session waits for the first session to release the relation extension lock. I know this is not a perfect way to test, but it is better than nothing. I think we need to do some more testing either using this extension or some other way for extension and page locks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Kuntal Ghosh
Date:
On Thu, Mar 12, 2020 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have fixed this in the attached patch set. > > > > I have modified your > v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The > modifications are (a) Change src/backend/storage/lmgr/README to > reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which > slightly simplifies the code, (c) moved the deadlock.c check a few > lines up and (d) changed a few comments. > > It might be better if we can move the checks related to extension and > page lock in a separate API or macro. What do you think? > I think moving them inside a macro is a good idea. Also, I think we should move all the Assert related code inside some debugging macro similar to this: #ifdef LOCK_DEBUG .... #endif + /* + * The relation extension or page lock can never participate in actual + * deadlock cycle. See Asserts in LockAcquireExtended. So, there is + * no advantage in checking wait edges from it. + */ + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) + return false; + Since this is true, we can also avoid these kind of locks in ExpandConstraints, right? It'll certainly reduce some complexity in topological sort. /* + * The relation extension or page lock conflict even between the group + * members. + */ + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) + { + PROCLOCK_PRINT("LockCheckConflicts: conflicting (group)", + proclock); + return true; + } This check includes the heavyweight locks that conflict even under same parallel group. It also has another property that they can never participate in deadlock cycles. And, the number of locks under this category is likely to increase in future with new parallel features. Hence, it could be used in multiple places. Should we move the condition inside a macro and just call it from here? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, Mar 12, 2020 at 7:50 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I have fixed this in the attached patch set. > > > > > > > I have modified your > > v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The > > modifications are (a) Change src/backend/storage/lmgr/README to > > reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which > > slightly simplifies the code, (c) moved the deadlock.c check a few > > lines up and (d) changed a few comments. > > > > It might be better if we can move the checks related to extension and > > page lock in a separate API or macro. What do you think? > > > I think moving them inside a macro is a good idea. Also, I think we > should move all the Assert related code inside some debugging macro > similar to this: > #ifdef LOCK_DEBUG > .... > #endif > If we move it under some macro, then those Asserts will be only enabled when that macro is defined. I think we want there Asserts to be enabled always in assert enabled build, these will be like any other Asserts in the code. What is the advantage of doing those under macro? > + /* > + * The relation extension or page lock can never participate in actual > + * deadlock cycle. See Asserts in LockAcquireExtended. So, there is > + * no advantage in checking wait edges from it. > + */ > + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || > + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) > + return false; > + > Since this is true, we can also avoid these kind of locks in > ExpandConstraints, right? > Yes, I had also thought about it but left it to avoid sprinkling such checks at more places than absolutely required. > It'll certainly reduce some complexity in > topological sort. > I think you mean to say TopoSort will have to look at fewer members in the wait queue, otherwise, there is nothing from the perspective of code which we can remove/change there. I think there will be hardly any chance that such locks will participate here because we take those for some work and release them (basically, they are unlike other heavyweight locks which can be released at the end). Having said that, I am not against putting those checks at the place you are suggesting, it is just that I thought that it won't be of much use. > /* > + * The relation extension or page lock conflict even between the group > + * members. > + */ > + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || > + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) > + { > + PROCLOCK_PRINT("LockCheckConflicts: conflicting (group)", > + proclock); > + return true; > + } > This check includes the heavyweight locks that conflict even under > same parallel group. It also has another property that they can never > participate in deadlock cycles. And, the number of locks under this > category is likely to increase in future with new parallel features. > Hence, it could be used in multiple places. Should we move the > condition inside a macro and just call it from here? > Right, this is what I have suggested upthread. Do you have any suggestions for naming such a macro or function? I could think of something like LocksConflictAmongGroupMembers or LocksNotParticipateInDeadlock. The first one suits more for its usage in LockCheckConflicts and the second in the deadlock.c code. So none of those sound perfect to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Thu, Mar 12, 2020 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have fixed this in the attached patch set. > > > > I have modified your > v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The > modifications are (a) Change src/backend/storage/lmgr/README to > reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which > slightly simplifies the code, (c) moved the deadlock.c check a few > lines up and (d) changed a few comments. Changes look fine to me. > It might be better if we can move the checks related to extension and > page lock in a separate API or macro. What do you think? I feel it looks cleaner this way as well. But, If we plan to move it to common function/macro then we should use some common name such that it can be used in FindLockCycleRecurseMember as well as in LockCheckConflicts. > I have also used an extension to test this patch. This is the same > extension that I have used to test the group locking patch. It will > allow backends to form a group as we do for parallel workers. The > extension is attached to this email. > > Test without patch: > Session-1 > Create table t1(c1 int, c2 char(500)); > Select become_lock_group_leader(); > > Insert into t1 values(generate_series(1,100),'aaa'); -- stop this > after acquiring relation extension lock via GDB. > > Session-2 > Select become_lock_group_member(); > Insert into t1 values(generate_series(101,200),'aaa'); > - Debug LockAcquire and found that it doesn't generate conflict for > Relation Extension lock. > > The above experiment has shown that without patch group members can > acquire relation extension lock if the group leader has that lock. > After patch the second session waits for the first session to release > the relation extension lock. I know this is not a perfect way to test, > but it is better than nothing. I think we need to do some more > testing either using this extension or some other way for extension > and page locks. I have also tested the same and verified it. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Fri, Mar 13, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 7:50 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > > > On Thu, Mar 12, 2020 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I have fixed this in the attached patch set. > > > > > > > > > > I have modified your > > > v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The > > > modifications are (a) Change src/backend/storage/lmgr/README to > > > reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which > > > slightly simplifies the code, (c) moved the deadlock.c check a few > > > lines up and (d) changed a few comments. > > > > > > It might be better if we can move the checks related to extension and > > > page lock in a separate API or macro. What do you think? > > > > > I think moving them inside a macro is a good idea. Also, I think we > > should move all the Assert related code inside some debugging macro > > similar to this: > > #ifdef LOCK_DEBUG > > .... > > #endif > > > > If we move it under some macro, then those Asserts will be only > enabled when that macro is defined. I think we want there Asserts to > be enabled always in assert enabled build, these will be like any > other Asserts in the code. What is the advantage of doing those under > macro? > > > + /* > > + * The relation extension or page lock can never participate in actual > > + * deadlock cycle. See Asserts in LockAcquireExtended. So, there is > > + * no advantage in checking wait edges from it. > > + */ > > + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || > > + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) > > + return false; > > + > > Since this is true, we can also avoid these kind of locks in > > ExpandConstraints, right? > > > > Yes, I had also thought about it but left it to avoid sprinkling such > checks at more places than absolutely required. > > > It'll certainly reduce some complexity in > > topological sort. > > > > I think you mean to say TopoSort will have to look at fewer members in > the wait queue, otherwise, there is nothing from the perspective of > code which we can remove/change there. I think there will be hardly > any chance that such locks will participate here because we take those > for some work and release them (basically, they are unlike other > heavyweight locks which can be released at the end). Having said > that, I am not against putting those checks at the place you are > suggesting, it is just that I thought that it won't be of much use. I am not sure I understand this part. Because topological sort will work on the soft edges we have created when we found the cycle, but for relation extension/page lock we are completely ignoring hard/soft edge then it will never participate in topo sort as well. Am I missing something? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Thu, Mar 12, 2020 at 3:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Mar 11, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > If we have no other choice, then I see a few downsides of adding a > > special check in the LockRelease() call: > > > > 1. Instead of resetting/decrement the variable from specific APIs like > > UnlockRelationForExtension or UnlockPage, we need to have it in > > LockRelease. It will also look odd, if set variable in > > LockRelationForExtension, but don't reset in the > > UnlockRelationForExtension variant. Now, maybe we can allow to reset > > it at both places if it is a flag, but not if it is a counter > > variable. > > > > 2. One can argue that adding extra instructions in a generic path > > (like LockRelease) is not a good idea, especially if those are for an > > Assert. I understand this won't add anything which we can measure by > > standard benchmarks. > > I have just written a WIP patch for relation extension lock where > instead of incrementing and decrementing the counter in > LockRelationForExtension and UnlockRelationForExtension respectively. > We can just set and reset the flag in LockAcquireExtended and > LockRelease. So this patch appears simple to me as we are not > involving the transaction APIs to set and reset the flag. However, we > need to add an extra check as you have already mentioned. I think we > could measure the performance and see whether it has any impact or > not? > LockAcquireExtended() { .. + if (locktag->locktag_type == LOCKTAG_RELATION_EXTEND) + IsRelationExtensionLockHeld = true; .. } Can we move this check inside a function (CheckAndSetLockHeld or something like that) as we need to add a similar thing for page lock? Also, how about moving the set and reset of these flags to GrantLockLocal and RemoveLocalLock as that will further reduce the number of places where we need to add such a check. Another thing is to see if it makes sense to have a macro like LOCALLOCK_LOCKMETHOD to get the lock tag. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Fri, Mar 13, 2020 at 11:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 3:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Mar 11, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > If we have no other choice, then I see a few downsides of adding a > > > special check in the LockRelease() call: > > > > > > 1. Instead of resetting/decrement the variable from specific APIs like > > > UnlockRelationForExtension or UnlockPage, we need to have it in > > > LockRelease. It will also look odd, if set variable in > > > LockRelationForExtension, but don't reset in the > > > UnlockRelationForExtension variant. Now, maybe we can allow to reset > > > it at both places if it is a flag, but not if it is a counter > > > variable. > > > > > > 2. One can argue that adding extra instructions in a generic path > > > (like LockRelease) is not a good idea, especially if those are for an > > > Assert. I understand this won't add anything which we can measure by > > > standard benchmarks. > > > > I have just written a WIP patch for relation extension lock where > > instead of incrementing and decrementing the counter in > > LockRelationForExtension and UnlockRelationForExtension respectively. > > We can just set and reset the flag in LockAcquireExtended and > > LockRelease. So this patch appears simple to me as we are not > > involving the transaction APIs to set and reset the flag. However, we > > need to add an extra check as you have already mentioned. I think we > > could measure the performance and see whether it has any impact or > > not? > > > > LockAcquireExtended() > { > .. > + if (locktag->locktag_type == LOCKTAG_RELATION_EXTEND) > + IsRelationExtensionLockHeld = true; > .. > } > > Can we move this check inside a function (CheckAndSetLockHeld or > something like that) as we need to add a similar thing for page lock? ok > Also, how about moving the set and reset of these flags to > GrantLockLocal and RemoveLocalLock as that will further reduce the > number of places where we need to add such a check. Make sense to me. Another thing is > to see if it makes sense to have a macro like LOCALLOCK_LOCKMETHOD to > get the lock tag. ok -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Kuntal Ghosh
Date:
On Fri, Mar 13, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 7:50 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > I think moving them inside a macro is a good idea. Also, I think we > > should move all the Assert related code inside some debugging macro > > similar to this: > > #ifdef LOCK_DEBUG > > .... > > #endif > > > If we move it under some macro, then those Asserts will be only > enabled when that macro is defined. I think we want there Asserts to > be enabled always in assert enabled build, these will be like any > other Asserts in the code. What is the advantage of doing those under > macro? > My concern is related to performance regression. We're using two static variables in hot-paths only for checking a few asserts. So, I'm not sure whether we should enable the same by default, specially when asserts are itself disabled. -ResetRelExtLockHeldCount() +ResetRelExtPageLockHeldCount() { RelationExtensionLockHeldCount = 0; + PageLockHeldCount = 0; +} Also, we're calling this method from frequently used functions like Commit/AbortTransaction. So, it's better these two static variables share the same cache line and reinitalize them with a single instruction. > > > /* > > + * The relation extension or page lock conflict even between the group > > + * members. > > + */ > > + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || > > + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) > > + { > > + PROCLOCK_PRINT("LockCheckConflicts: conflicting (group)", > > + proclock); > > + return true; > > + } > > This check includes the heavyweight locks that conflict even under > > same parallel group. It also has another property that they can never > > participate in deadlock cycles. And, the number of locks under this > > category is likely to increase in future with new parallel features. > > Hence, it could be used in multiple places. Should we move the > > condition inside a macro and just call it from here? > > > > Right, this is what I have suggested upthread. Do you have any > suggestions for naming such a macro or function? I could think of > something like LocksConflictAmongGroupMembers or > LocksNotParticipateInDeadlock. The first one suits more for its usage > in LockCheckConflicts and the second in the deadlock.c code. So none > of those sound perfect to me. > Actually, I'm not able to come up with a good suggestion. I'm trying to think of a generic name similar to strong or weak locks but with the following properties: a. Locks that don't participate in deadlock detection b. Locks that conflicts in the same parallel group -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Kuntal Ghosh
Date:
On Fri, Mar 13, 2020 at 8:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 7:50 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > > > > + /* > > > + * The relation extension or page lock can never participate in actual > > > + * deadlock cycle. See Asserts in LockAcquireExtended. So, there is > > > + * no advantage in checking wait edges from it. > > > + */ > > > + if ((LOCK_LOCKTAG(*lock) == LOCKTAG_RELATION_EXTEND) || > > > + (LOCK_LOCKTAG(*lock) == LOCKTAG_PAGE)) > > > + return false; > > > + > > > Since this is true, we can also avoid these kind of locks in > > > ExpandConstraints, right? > > I am not sure I understand this part. Because topological sort will > work on the soft edges we have created when we found the cycle, but > for relation extension/page lock we are completely ignoring hard/soft > edge then it will never participate in topo sort as well. Am I > missing something? > No, I think you're right. We only add constraints if we've detected a cycle in the graph. Hence, you don't need the check here. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, Mar 13, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I have fixed this in the attached patch set. > > > > > > > I have modified your > > v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The > > modifications are (a) Change src/backend/storage/lmgr/README to > > reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which > > slightly simplifies the code, (c) moved the deadlock.c check a few > > lines up and (d) changed a few comments. > > Changes look fine to me. > Today, while looking at this patch again, I realized that there is a where we sometimes allow group members to jump the wait queue. This is primarily to avoid creating deadlocks (see ProcSleep). Now, ideally, we don't need this for relation extension or page locks as those can never lead to deadlocks. However, the current code will give group members more priority to acquire relation extension or page locks if any one of the members has held those locks. Now, if we want we can prevent giving group members priority for these locks, but I am not sure how important is that case. So, I have left that as it is by adding a few comments. What do you think? Additionally, I have changed/added a few more sentences in README. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Fri, Mar 13, 2020 at 2:32 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > On Fri, Mar 13, 2020 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 12, 2020 at 7:50 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > > I think moving them inside a macro is a good idea. Also, I think we > > > should move all the Assert related code inside some debugging macro > > > similar to this: > > > #ifdef LOCK_DEBUG > > > .... > > > #endif > > > > > If we move it under some macro, then those Asserts will be only > > enabled when that macro is defined. I think we want there Asserts to > > be enabled always in assert enabled build, these will be like any > > other Asserts in the code. What is the advantage of doing those under > > macro? > > > My concern is related to performance regression. We're using two > static variables in hot-paths only for checking a few asserts. So, I'm > not sure whether we should enable the same by default, specially when > asserts are itself disabled. > -ResetRelExtLockHeldCount() > +ResetRelExtPageLockHeldCount() > { > RelationExtensionLockHeldCount = 0; > + PageLockHeldCount = 0; > +} > Also, we're calling this method from frequently used functions like > Commit/AbortTransaction. So, it's better these two static variables > share the same cache line and reinitalize them with a single > instruction. In the recent version of the patch, instead of a counter, we have done with a flag. So I think now we can just keep a single variable and we can just reset the bit in a single instruction. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Fri, Mar 13, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Mar 13, 2020 at 11:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 12, 2020 at 3:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Mar 11, 2020 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > If we have no other choice, then I see a few downsides of adding a > > > > special check in the LockRelease() call: > > > > > > > > 1. Instead of resetting/decrement the variable from specific APIs like > > > > UnlockRelationForExtension or UnlockPage, we need to have it in > > > > LockRelease. It will also look odd, if set variable in > > > > LockRelationForExtension, but don't reset in the > > > > UnlockRelationForExtension variant. Now, maybe we can allow to reset > > > > it at both places if it is a flag, but not if it is a counter > > > > variable. > > > > > > > > 2. One can argue that adding extra instructions in a generic path > > > > (like LockRelease) is not a good idea, especially if those are for an > > > > Assert. I understand this won't add anything which we can measure by > > > > standard benchmarks. > > > > > > I have just written a WIP patch for relation extension lock where > > > instead of incrementing and decrementing the counter in > > > LockRelationForExtension and UnlockRelationForExtension respectively. > > > We can just set and reset the flag in LockAcquireExtended and > > > LockRelease. So this patch appears simple to me as we are not > > > involving the transaction APIs to set and reset the flag. However, we > > > need to add an extra check as you have already mentioned. I think we > > > could measure the performance and see whether it has any impact or > > > not? > > > > > > > LockAcquireExtended() > > { > > .. > > + if (locktag->locktag_type == LOCKTAG_RELATION_EXTEND) > > + IsRelationExtensionLockHeld = true; > > .. > > } > > > > Can we move this check inside a function (CheckAndSetLockHeld or > > something like that) as we need to add a similar thing for page lock? > > ok Done > > > Also, how about moving the set and reset of these flags to > > GrantLockLocal and RemoveLocalLock as that will further reduce the > > number of places where we need to add such a check. > > Make sense to me. Done > > Another thing is > > to see if it makes sense to have a macro like LOCALLOCK_LOCKMETHOD to > > get the lock tag. > > ok Done Apart from that, I have also extended the solution for the page lock. And, I have also broken down the 3rd patch in two parts for relation extension and for the page lock. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Fri, Mar 13, 2020 at 3:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Mar 13, 2020 at 8:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Mar 12, 2020 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Mar 12, 2020 at 11:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I have fixed this in the attached patch set. > > > > > > > > > > I have modified your > > > v4-0003-Conflict-Extension-Page-lock-in-group-member patch. The > > > modifications are (a) Change src/backend/storage/lmgr/README to > > > reflect new behaviour, (b) Introduce a new macro LOCK_LOCKTAG which > > > slightly simplifies the code, (c) moved the deadlock.c check a few > > > lines up and (d) changed a few comments. > > > > Changes look fine to me. > > > > Today, while looking at this patch again, I realized that there is a > where we sometimes allow group members to jump the wait queue. This > is primarily to avoid creating deadlocks (see ProcSleep). Now, > ideally, we don't need this for relation extension or page locks as > those can never lead to deadlocks. However, the current code will > give group members more priority to acquire relation extension or page > locks if any one of the members has held those locks. Now, if we want > we can prevent giving group members priority for these locks, but I am > not sure how important is that case. So, I have left that as it is by > adding a few comments. What do you think? > > Additionally, I have changed/added a few more sentences in README. I have included all your changes in the latest patch set. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Apart from that, I have also extended the solution for the page lock. > And, I have also broken down the 3rd patch in two parts for relation > extension and for the page lock. > Thanks, I have made a number of cosmetic changes and written appropriate commit messages for all patches. See the attached patch series and let me know your opinion? BTW, did you get a chance to test page locks by using the extension which I have posted above or by some other way? I think it is important to test page-lock related patches now. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Apart from that, I have also extended the solution for the page lock. > > And, I have also broken down the 3rd patch in two parts for relation > > extension and for the page lock. > > > > Thanks, I have made a number of cosmetic changes and written > appropriate commit messages for all patches. See the attached patch > series and let me know your opinion? BTW, did you get a chance to test > page locks by using the extension which I have posted above or by some > other way? I think it is important to test page-lock related patches > now. I have reviewed the updated patches and looks fine to me. Apart from this I have done testing for the Page Lock using group locking extension. --Setup create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off); create index gin_test_idx on gin_test_tbl using gin (i); create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off); create index gin_test_idx1 on gin_test_tbl1 using gin (i); --session1: select become_lock_group_leader(); select gin_clean_pending_list('gin_test_idx'); --session2: select become_lock_group_member(session1_pid); select gin_clean_pending_list('gin_test_idx1'); --session3: select become_lock_group_leader(); select gin_clean_pending_list('gin_test_idx1'); --session4: select become_lock_group_member(session3_pid); select gin_clean_pending_list('gin_test_idx'); ERROR: deadlock detected DETAIL: Process 61953 waits for ExclusiveLock on page 0 of relation 16399 of database 13577; blocked by process 62197. Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of database 13577; blocked by process 61953. HINT: See server log for query details. Session1 and Session3 acquire the PageLock on two different index's meta-pages and blocked in gdb, meanwhile, their member tries to acquire the page lock as shown in the above example and it detects the deadlock which is solved after applying the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Sun, Mar 15, 2020 at 1:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > Apart from that, I have also extended the solution for the page lock. > > > And, I have also broken down the 3rd patch in two parts for relation > > > extension and for the page lock. > > > > > > > Thanks, I have made a number of cosmetic changes and written > > appropriate commit messages for all patches. See the attached patch > > series and let me know your opinion? BTW, did you get a chance to test > > page locks by using the extension which I have posted above or by some > > other way? I think it is important to test page-lock related patches > > now. > > I have reviewed the updated patches and looks fine to me. Apart from > this I have done testing for the Page Lock using group locking > extension. > > --Setup > create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off); > create index gin_test_idx on gin_test_tbl using gin (i); > create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off); > create index gin_test_idx1 on gin_test_tbl1 using gin (i); > > --session1: > select become_lock_group_leader(); > select gin_clean_pending_list('gin_test_idx'); > > --session2: > select become_lock_group_member(session1_pid); > select gin_clean_pending_list('gin_test_idx1'); > > --session3: > select become_lock_group_leader(); > select gin_clean_pending_list('gin_test_idx1'); > > --session4: > select become_lock_group_member(session3_pid); > select gin_clean_pending_list('gin_test_idx'); > > ERROR: deadlock detected > DETAIL: Process 61953 waits for ExclusiveLock on page 0 of relation > 16399 of database 13577; blocked by process 62197. > Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of > database 13577; blocked by process 61953. > HINT: See server log for query details. > > > Session1 and Session3 acquire the PageLock on two different index's > meta-pages and blocked in gdb, meanwhile, their member tries to > acquire the page lock as shown in the above example and it detects the > deadlock which is solved after applying the patch. I have modified 0001 and 0002 slightly, Basically, instead of two function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created a one function. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sun, Mar 15, 2020 at 1:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > Apart from that, I have also extended the solution for the page lock. > > > And, I have also broken down the 3rd patch in two parts for relation > > > extension and for the page lock. > > > > > > > Thanks, I have made a number of cosmetic changes and written > > appropriate commit messages for all patches. See the attached patch > > series and let me know your opinion? BTW, did you get a chance to test > > page locks by using the extension which I have posted above or by some > > other way? I think it is important to test page-lock related patches > > now. > > I have reviewed the updated patches and looks fine to me. Apart from > this I have done testing for the Page Lock using group locking > extension. > > --Setup > create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off); > create index gin_test_idx on gin_test_tbl using gin (i); > create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off); > create index gin_test_idx1 on gin_test_tbl1 using gin (i); > > --session1: > select become_lock_group_leader(); > select gin_clean_pending_list('gin_test_idx'); > > --session2: > select become_lock_group_member(session1_pid); > select gin_clean_pending_list('gin_test_idx1'); > > --session3: > select become_lock_group_leader(); > select gin_clean_pending_list('gin_test_idx1'); > > --session4: > select become_lock_group_member(session3_pid); > select gin_clean_pending_list('gin_test_idx'); > > ERROR: deadlock detected > DETAIL: Process 61953 waits for ExclusiveLock on page 0 of relation > 16399 of database 13577; blocked by process 62197. > Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of > database 13577; blocked by process 61953. > HINT: See server log for query details. > > > Session1 and Session3 acquire the PageLock on two different index's > meta-pages and blocked in gdb, meanwhile, their member tries to > acquire the page lock as shown in the above example and it detects the > deadlock which is solved after applying the patch. > So, in this test, you have first performed the actions from Session-1 and Session-3 (blocked them via GDB after acquiring page lock) and then performed the actions from Session-2 and Session-4, right? Though this is not a very realistic case, it proves the point that page locks don't participate in the deadlock cycle after the patch. I think we can do a few more tests that test other aspects of the patch. 1. Group members wait for page locks. If you test that the leader acquires the page lock and then member also tries to acquire the same lock on the same index, it wouldn't block before the patch, but after the patch, the member should wait for the leader to release the lock. 2. Try to hit Assert in LockAcquireExtended (a) by trying to re-acquire the page lock via the debugger, (b) try to acquire the relation extension lock after page lock and it should be allowed (after acquiring page lock, we take relation extension lock in following code path: ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have modified 0001 and 0002 slightly, Basically, instead of two > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created > a one function. > +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value) Can we rename the parameter as lock_held, acquired or something like that so that it indicates what it intends to do and probably add a comment for that variable atop of function? There is some work left related to testing some parts of the patch and I can do some more review, but it started to look good to me, so I am planning to push this in the coming week (say by Wednesday or so) unless there are some major comments. There are primarily two parts of the patch-series (a) Assert that we don't acquire a heavyweight lock on another object after relation extension lock. (b) Allow relation extension lock to conflict among the parallel group members. On similar lines there are two patches for page locks. I think we have discussed in detail about LWLock approach and it seems that it might be tricky than we initially thought especially with some of the latest findings where we have noticed that there are multiple cases where we can try to re-acquire the relation extension lock and other things which we have discussed. Also, all of us don't agree with that idea. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Sun, Mar 15, 2020 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Mar 15, 2020 at 1:15 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > Apart from that, I have also extended the solution for the page lock. > > > > And, I have also broken down the 3rd patch in two parts for relation > > > > extension and for the page lock. > > > > > > > > > > Thanks, I have made a number of cosmetic changes and written > > > appropriate commit messages for all patches. See the attached patch > > > series and let me know your opinion? BTW, did you get a chance to test > > > page locks by using the extension which I have posted above or by some > > > other way? I think it is important to test page-lock related patches > > > now. > > > > I have reviewed the updated patches and looks fine to me. Apart from > > this I have done testing for the Page Lock using group locking > > extension. > > > > --Setup > > create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off); > > create index gin_test_idx on gin_test_tbl using gin (i); > > create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off); > > create index gin_test_idx1 on gin_test_tbl1 using gin (i); > > > > --session1: > > select become_lock_group_leader(); > > select gin_clean_pending_list('gin_test_idx'); > > > > --session2: > > select become_lock_group_member(session1_pid); > > select gin_clean_pending_list('gin_test_idx1'); > > > > --session3: > > select become_lock_group_leader(); > > select gin_clean_pending_list('gin_test_idx1'); > > > > --session4: > > select become_lock_group_member(session3_pid); > > select gin_clean_pending_list('gin_test_idx'); > > > > ERROR: deadlock detected > > DETAIL: Process 61953 waits for ExclusiveLock on page 0 of relation > > 16399 of database 13577; blocked by process 62197. > > Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of > > database 13577; blocked by process 61953. > > HINT: See server log for query details. > > > > > > Session1 and Session3 acquire the PageLock on two different index's > > meta-pages and blocked in gdb, meanwhile, their member tries to > > acquire the page lock as shown in the above example and it detects the > > deadlock which is solved after applying the patch. > > > > So, in this test, you have first performed the actions from Session-1 > and Session-3 (blocked them via GDB after acquiring page lock) and > then performed the actions from Session-2 and Session-4, right? Yes > Though this is not a very realistic case, it proves the point that > page locks don't participate in the deadlock cycle after the patch. I > think we can do a few more tests that test other aspects of the patch. > > 1. Group members wait for page locks. If you test that the leader > acquires the page lock and then member also tries to acquire the same > lock on the same index, it wouldn't block before the patch, but after > the patch, the member should wait for the leader to release the lock. Okay, I will test this part. > 2. Try to hit Assert in LockAcquireExtended (a) by trying to > re-acquire the page lock via the debugger, I am not sure whether it is true or not, Because, if we are holding the page lock and we try the same page lock then the lock will be granted without reaching the code path. However, I agree that this is not intended instead this is a side effect of allowing relation extension lock while holding the same relation extension lock. So basically, now the situation is that if the lock is directly granted because we are holding the same lock then it will not go to the assert code. IMHO, we don't need to add extra code to make it behave differently. Please let me know what is your opinion on this. (b) try to acquire the > relation extension lock after page lock and it should be allowed > (after acquiring page lock, we take relation extension lock in > following code path: > ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer). ok -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Sun, Mar 15, 2020 at 6:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have modified 0001 and 0002 slightly, Basically, instead of two > > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created > > a one function. > > > > +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value) > > Can we rename the parameter as lock_held, acquired or something like > that so that it indicates what it intends to do and probably add a > comment for that variable atop of function? Done -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Sun, Mar 15, 2020 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sun, Mar 15, 2020 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 1. Group members wait for page locks. If you test that the leader > > acquires the page lock and then member also tries to acquire the same > > lock on the same index, it wouldn't block before the patch, but after > > the patch, the member should wait for the leader to release the lock. > > Okay, I will test this part. > > > 2. Try to hit Assert in LockAcquireExtended (a) by trying to > > re-acquire the page lock via the debugger, > > I am not sure whether it is true or not, Because, if we are holding > the page lock and we try the same page lock then the lock will be > granted without reaching the code path. However, I agree that this is > not intended instead this is a side effect of allowing relation > extension lock while holding the same relation extension lock. So > basically, now the situation is that if the lock is directly granted > because we are holding the same lock then it will not go to the assert > code. IMHO, we don't need to add extra code to make it behave > differently. Please let me know what is your opinion on this. > I also don't think there is any reason to add code to prevent that. Actually, what I wanted to test was to somehow hit the Assert for the cases where it will actually hit if someone tomorrow tries to acquire any other type of lock. Can we mimic such a situation by hacking code (say try to acquire some other type of heavyweight lock) or in some way to hit the newly added Assert? > (b) try to acquire the > > relation extension lock after page lock and it should be allowed > > (after acquiring page lock, we take relation extension lock in > > following code path: > > ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer). > > ok > Thanks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Masahiko Sawada
Date:
On Mon, 16 Mar 2020 at 00:54, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sun, Mar 15, 2020 at 6:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I have modified 0001 and 0002 slightly, Basically, instead of two > > > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created > > > a one function. > > > > > > > +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value) > > > > Can we rename the parameter as lock_held, acquired or something like > > that so that it indicates what it intends to do and probably add a > > comment for that variable atop of function? > > Done > I've looked at the patches and ISTM these work as expected. IsRelationExtensionLockHeld and IsPageLockHeld are used only when assertion is enabled. So how about making CheckAndSetLockHeld work only if USE_ASSERT_CHECKING to avoid overheads? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Mon, Mar 16, 2020 at 8:57 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 16 Mar 2020 at 00:54, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Sun, Mar 15, 2020 at 6:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I have modified 0001 and 0002 slightly, Basically, instead of two > > > > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created > > > > a one function. > > > > > > > > > > +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value) > > > > > > Can we rename the parameter as lock_held, acquired or something like > > > that so that it indicates what it intends to do and probably add a > > > comment for that variable atop of function? > > > > Done > > > > I've looked at the patches and ISTM these work as expected. Thanks for verifying. > IsRelationExtensionLockHeld and IsPageLockHeld are used only when > assertion is enabled. So how about making CheckAndSetLockHeld work > only if USE_ASSERT_CHECKING to avoid overheads? That makes sense to me so updated the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Mon, Mar 16, 2020 at 8:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Mar 15, 2020 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Sun, Mar 15, 2020 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > 1. Group members wait for page locks. If you test that the leader > > > acquires the page lock and then member also tries to acquire the same > > > lock on the same index, it wouldn't block before the patch, but after > > > the patch, the member should wait for the leader to release the lock. > > > > Okay, I will test this part. > > > > > 2. Try to hit Assert in LockAcquireExtended (a) by trying to > > > re-acquire the page lock via the debugger, > > > > I am not sure whether it is true or not, Because, if we are holding > > the page lock and we try the same page lock then the lock will be > > granted without reaching the code path. However, I agree that this is > > not intended instead this is a side effect of allowing relation > > extension lock while holding the same relation extension lock. So > > basically, now the situation is that if the lock is directly granted > > because we are holding the same lock then it will not go to the assert > > code. IMHO, we don't need to add extra code to make it behave > > differently. Please let me know what is your opinion on this. > > > > I also don't think there is any reason to add code to prevent that. > Actually, what I wanted to test was to somehow hit the Assert for the > cases where it will actually hit if someone tomorrow tries to acquire > any other type of lock. Can we mimic such a situation by hacking code > (say try to acquire some other type of heavyweight lock) or in some > way to hit the newly added Assert? I have hacked the code by calling another heavyweight lock and the assert is hit. > > > (b) try to acquire the > > > relation extension lock after page lock and it should be allowed > > > (after acquiring page lock, we take relation extension lock in > > > following code path: > > > ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer). I have tested this part and it works as expected i.e. assert is not hit. --test case create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off); create index gin_test_idx on gin_test_tbl using gin (i); insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 20000) g; select gin_clean_pending_list('gin_test_idx'); BTW, this test is already covered by the existing gin.sql file so we don't need to add any new test. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Kuntal Ghosh
Date:
On Mon, Mar 16, 2020 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Mon, Mar 16, 2020 at 8:57 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > IsRelationExtensionLockHeld and IsPageLockHeld are used only when > > assertion is enabled. So how about making CheckAndSetLockHeld work > > only if USE_ASSERT_CHECKING to avoid overheads? > > That makes sense to me so updated the patch. +1 In v10-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-.patch, + * Indicate that the lock is released for a particular type of locks. s/lock is/locks are + /* Indicate that the lock is acquired for a certain type of locks. */ s/lock is/locks are In v10-0002-*.patch, + * Flag to indicate if the page lock is held by this backend. We don't + * acquire any other heavyweight lock while holding the page lock except for + * relation extension. However, these locks are never taken in reverse order + * which implies that page locks will also never participate in the deadlock + * cycle. s/while holding the page lock except for relation extension/while holding the page lock except for relation extension and page lock + * We don't acquire any other heavyweight lock while holding the page lock + * except for relation extension lock. Same as above Other than that, the patches look good to me. I've also done some testing after applying the Test-group-deadlock patch provided by Amit earlier in the thread. It works as expected. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Mon, Mar 16, 2020 at 11:56 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > On Mon, Mar 16, 2020 at 9:43 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Mar 16, 2020 at 8:57 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > IsRelationExtensionLockHeld and IsPageLockHeld are used only when > > > assertion is enabled. So how about making CheckAndSetLockHeld work > > > only if USE_ASSERT_CHECKING to avoid overheads? > > > > That makes sense to me so updated the patch. > +1 > > In v10-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-.patch, > > + * Indicate that the lock is released for a particular type of locks. > s/lock is/locks are Done > + /* Indicate that the lock is acquired for a certain type of locks. */ > s/lock is/locks are Done > > In v10-0002-*.patch, > > + * Flag to indicate if the page lock is held by this backend. We don't > + * acquire any other heavyweight lock while holding the page lock except for > + * relation extension. However, these locks are never taken in reverse order > + * which implies that page locks will also never participate in the deadlock > + * cycle. > s/while holding the page lock except for relation extension/while > holding the page lock except for relation extension and page lock Done > + * We don't acquire any other heavyweight lock while holding the page lock > + * except for relation extension lock. > Same as above Done > > Other than that, the patches look good to me. I've also done some > testing after applying the Test-group-deadlock patch provided by Amit > earlier in the thread. It works as expected. Thanks for testing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Mon, Mar 16, 2020 at 3:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > + + /* + * Indicate that the lock is released for certain types of locks + */ +#ifdef USE_ASSERT_CHECKING + CheckAndSetLockHeld(locallock, false); +#endif } /* @@ -1618,6 +1666,11 @@ GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner) locallock->numLockOwners++; if (owner != NULL) ResourceOwnerRememberLock(owner, locallock); + + /* Indicate that the lock is acquired for certain types of locks. */ +#ifdef USE_ASSERT_CHECKING + CheckAndSetLockHeld(locallock, true); +#endif } There is no need to sprinkle USE_ASSERT_CHECKING at so many places, having inside the new function is sufficient. I have changed that, added few more comments and made minor changes. See, what you think about attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Dilip Kumar
Date:
On Tue, Mar 17, 2020 at 5:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 16, 2020 at 3:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > + > + /* > + * Indicate that the lock is released for certain types of locks > + */ > +#ifdef USE_ASSERT_CHECKING > + CheckAndSetLockHeld(locallock, false); > +#endif > } > > /* > @@ -1618,6 +1666,11 @@ GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner) > locallock->numLockOwners++; > if (owner != NULL) > ResourceOwnerRememberLock(owner, locallock); > + > + /* Indicate that the lock is acquired for certain types of locks. */ > +#ifdef USE_ASSERT_CHECKING > + CheckAndSetLockHeld(locallock, true); > +#endif > } > > There is no need to sprinkle USE_ASSERT_CHECKING at so many places, > having inside the new function is sufficient. I have changed that, > added few more comments and > made minor changes. See, what you think about attached? Your changes look fine to me. I have also verified all the test and everything works fine. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
From
Amit Kapila
Date:
On Tue, Mar 17, 2020 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Your changes look fine to me. I have also verified all the test and > everything works fine. > I have pushed the first patch. I will push the others in coming days. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com