Thread: -1/0 virtualtransaction
Hi -
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+----------
page | 16384 | 468238 | 19216 | | | | | | | -1/0 | | SIReadLock | t | f
Does anyone have any pointers on what a virtualtransaction of '-1/0' means?
I'm using SSI and an example is
----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+----------
page | 16384 | 468238 | 19216 | | | | | | | -1/0 | | SIReadLock | t | f
This is incredibly hard to search for.
I see these for page, tuple and (perhaps a clue?) one relation, and I'm seeing the page and tuple locks accumulate over time. Eventually this results in an "out of shared memory" error.
Any help is appreciated.
Thanks,
Mike
Mike Beachy <mbeachy@gmail.com> writes: > Does anyone have any pointers on what a virtualtransaction of '-1/0' means? > I'm using SSI and an example is > locktype | database | relation | page | tuple | virtualxid | > transactionid | classid | objid | objsubid | virtualtransaction | pid | > mode | granted | fastpath > ----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+---------- > page | 16384 | 468238 | 19216 | | | > | | | | -1/0 | | SIReadLock | > t | f Hmm, that's an invalid VXID, which would ordinarily mean that nothing is holding the lock. There is a passing mention in mvcc.sgml that SIRead locks sometimes need to be held past transaction end, so maybe what you're looking at is such a lock that is no longer associated with a specific transaction. I have to disclaim knowing much of anything about the SSI code, though. regards, tom lane
On Mon, Apr 26, 2021 at 6:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm, that's an invalid VXID, which would ordinarily mean that nothing
is holding the lock. There is a passing mention in mvcc.sgml that
SIRead locks sometimes need to be held past transaction end, so maybe
what you're looking at is such a lock that is no longer associated
with a specific transaction. I have to disclaim knowing much of
anything about the SSI code, though.
The "Serializable Snapshot Isolation in PostgreSQL" paper (https://drkp.net/papers/ssi-vldb12.pdf) makes me think this is a reasonable line of inquiry.
Thanks,
Mike
On Mon, 2021-04-26 at 17:45 -0400, Mike Beachy wrote: > Does anyone have any pointers on what a virtualtransaction of '-1/0' means? > > I'm using SSI and an example is > > locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction| pid | mode | granted | fastpath > ----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+---------- > page | 16384 | 468238 | 19216 | | | | | | | -1/0 | | SIReadLock | t | f > > This is incredibly hard to search for. > > I see these for page, tuple and (perhaps a clue?) one relation, and I'm seeing the page and tuple locks accumulate overtime. > Eventually this results in an "out of shared memory" error. Not sure, but do you see prepared transactions in "pg_prepared_xacts"? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi Laurenz -
On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Not sure, but do you see prepared transactions in "pg_prepared_xacts"?
No, the -1 in the virtualtransaction (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts was another clue I saw! But, it seems more or less a dead end as I have nothing in pg_prepared_xacts.
Thanks for the idea, though.
I still need to put more effort into Tom's idea about SIReadLock hanging out after the transaction, but some evidence pointing in this direction is that I've reduced the number of db connections and found that the '-1/0' locks will eventually go away! I interpret this as the db needing to find time when no overlapping read/write transactions are present. This doesn't seem completely correct, as I don't have any long lived transactions running while these locks are hanging out. Confusion still remains, for sure.
Mike
Hi,
I am just trying to jump in, but ignore if not relevant.
when you said Eventually this results in an "out of shared memory" error
Can you rule out the below two scenarios (wrt /dev/shm too low in docker or query requesting for too many locks either due to parallellism/partition involved)
There have been multiple cases of out of shared memory i have read earlier for due to above.
also, is this repeatable (given you mention it happens and eventually lead to "out of shared memory")
I may be missing something, but i do not see a PID even though it has a lock granted on a page, was the process terminated explicitly or implicitly. ( and an orphan lingering ? )
ps auwwxx | grep postgres
I took the below from "src/test/regress/sql/tidscan.sql" to simulate SIReadLock with an orphan process (by killing the process), but it gets reaped fine for me :(
postgres=# \d tidscan
Table "public.tidscan"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
postgres=# INSERT INTO tidscan VALUES (1), (2), (3);
postgres=# BEGIN ISOLATION LEVEL SERIALIZABLE;
BEGIN
postgres=*# SELECT * FROM tidscan WHERE ctid = '(0,1)';
id
----
1
(1 row)
postgres=*# -- locktype should be 'tuple'
SELECT locktype, mode FROM pg_locks WHERE pid = pg_backend_pid() AND mode = 'SIReadLock';
locktype | mode
----------+------------
tuple | SIReadLock
(1 row)
postgres=*# -- locktype should be 'tuple'
SELECT pid, locktype, mode FROM pg_locks WHERE mode = 'SIReadLock';
pid | locktype | mode
------+----------+------------
2831 | tuple | SIReadLock
(1 row)
i thought one could attach a gdb or strace to the pid to figure out what it did before crashing.
As always, I have little knowledge on postgresql, feel free to ignore if nothing relevant.
Thanks,
Vijay
On Tue, 27 Apr 2021 at 19:55, Mike Beachy <mbeachy@gmail.com> wrote:
Hi Laurenz -On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:Not sure, but do you see prepared transactions in "pg_prepared_xacts"?No, the -1 in the virtualtransaction (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts was another clue I saw! But, it seems more or less a dead end as I have nothing in pg_prepared_xacts.
Thanks for the idea, though.I still need to put more effort into Tom's idea about SIReadLock hanging out after the transaction, but some evidence pointing in this direction is that I've reduced the number of db connections and found that the '-1/0' locks will eventually go away! I interpret this as the db needing to find time when no overlapping read/write transactions are present. This doesn't seem completely correct, as I don't have any long lived transactions running while these locks are hanging out. Confusion still remains, for sure.
Mike
On Wed, Apr 28, 2021 at 2:25 AM Mike Beachy <mbeachy@gmail.com> wrote: > On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> >> Not sure, but do you see prepared transactions in "pg_prepared_xacts"? > > No, the -1 in the virtualtransaction (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts wasanother clue I saw! But, it seems more or less a dead end as I have nothing in pg_prepared_xacts. > > Thanks for the idea, though. There is another way to get a pg_lock that shows -1/0 there: if we run out of SERIALIZABLEXACT objects, we transfer the locks of the oldest committed transaction to a single SERIALIZABLEXACT object called OldCommittedSxact, and it has an invalid virtual xid. You can see this if you recompile with TEST_SUMMARIZE_SERIAL defined to force that path, and then run three overlapping transactions like this: tx1: BEGIN; tx1: SELECT * FROM t WHERE id = 42; tx2: BEGIN; tx2: SELECT * FROM t WHERE id = 999; tx1: COMMIT; Even though it comitted, at this point we still see tx1's locks, along with tx2's. tx3: BEGIN; tx3: SELECT * FROM t WHERE id = 1234; At this point we see tx1's locks still but their vxid has changed to -1/0. > I still need to put more effort into Tom's idea about SIReadLock hanging out after the transaction, but some evidence pointingin this direction is that I've reduced the number of db connections and found that the '-1/0' locks will eventuallygo away! I interpret this as the db needing to find time when no overlapping read/write transactions are present.This doesn't seem completely correct, as I don't have any long lived transactions running while these locks are hangingout. Confusion still remains, for sure. But do you have lots of short overlapping transactions so that there is never a moment where there are zero transactions running? As mentioned, locks (and transactions, and conflicts) hang around after you commit. That's because things that your finished transaction saw can cause transactions that are still running to become uncommittable, by forming part of a cycle. The rules for when the locks can be freed change once you reach the degraded "summarised" mode, though, due to lack of bookkeeping space. Not sure of the exact details without rereading the source code 42 times with much coffee, but it's something like: summarised locks can only be freed at times when there are zero active read/write serializable transactions, because that's when "CanPartialClearThrough" advances, while in the normal case they can be freed as soon as there are no SSI snapshots that were taken before it committed, because that's when "SxactGlobalXmin" advances. The number of SERIALIZABLEXACT objects is (max_connections + max_prepared_transactions) * 10. So, you could try increasing max_connections (without increasing the actual number of connections) to see if you can get to a point where you don't see these invalid virtual xids, and then maybe it'll be able to clean up locks more aggressively.
On Fri, Apr 30, 2021 at 7:12 AM Thomas Munro <thomas.munro@gmail.com> wrote: > But do you have lots of short overlapping transactions so that there > is never a moment where there are zero transactions running? Yeah, that almost certainly explains it. Thanks very much for the explanation about the summarized locks. > The number of SERIALIZABLEXACT objects is (max_connections + > max_prepared_transactions) * 10. So, you could try increasing > max_connections (without increasing the actual number of connections) > to see if you can get to a point where you don't see these invalid > virtual xids, and then maybe it'll be able to clean up locks more > aggressively. Aha! I hadn't considered that some parameter besides max_pred_locks_per_transaction would come into play. I'll give this a shot. Thanks, Mike
In case this helps anyone else, I found a simple way to get a rough idea of what's going on, which is to run: select (select count(distinct virtualtransaction) from pg_locks) as tx_with_locks, (select count(*) from pg_stat_activity where state = 'active') as active_tx, (select count(*) from pg_locks where virtualtransaction = '-1/0') as summarized_locks; I disabled the part of my application that seemed to be causing problems with too many writes (a background cleanup task) and then triggered it from a separate process. I can see the number of transactions with locks climbing when it hits a problematic item, while the number of active transactions (of course) stays low. Mike On Fri, Apr 30, 2021 at 4:53 PM Mike Beachy <mbeachy@gmail.com> wrote: > > On Fri, Apr 30, 2021 at 7:12 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > But do you have lots of short overlapping transactions so that there > > is never a moment where there are zero transactions running? > > Yeah, that almost certainly explains it. > > Thanks very much for the explanation about the summarized locks. > > > The number of SERIALIZABLEXACT objects is (max_connections + > > max_prepared_transactions) * 10. So, you could try increasing > > max_connections (without increasing the actual number of connections) > > to see if you can get to a point where you don't see these invalid > > virtual xids, and then maybe it'll be able to clean up locks more > > aggressively. > > Aha! I hadn't considered that some parameter besides > max_pred_locks_per_transaction would come into play. I'll give this a > shot. > > Thanks, > Mike