Thread: -1/0 virtualtransaction

-1/0 virtualtransaction

From
Mike Beachy
Date:
Hi -

Does anyone have any pointers on what a virtualtransaction of '-1/0' means?

I'm using SSI and an example is 

 locktype | database | relation |  page   | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid |    mode    | granted | fastpath
----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+----------
 page     |    16384 |   468238 |   19216 |       |            |               |         |       |          | -1/0               |     | SIReadLock | t       | f

 This is incredibly hard to search for. 

I see these for page, tuple and (perhaps a clue?) one relation, and I'm seeing the page and tuple locks accumulate over time. Eventually this results in an "out of shared memory" error.

Any help is appreciated.

Thanks,
Mike

Re: -1/0 virtualtransaction

From
Tom Lane
Date:
Mike Beachy <mbeachy@gmail.com> writes:
> Does anyone have any pointers on what a virtualtransaction of '-1/0' means?
> I'm using SSI and an example is

>  locktype | database | relation |  page   | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction | pid |
>  mode    | granted | fastpath
>
----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+----------
>  page     |    16384 |   468238 |   19216 |       |            |
>     |         |       |          | -1/0               |     | SIReadLock |
> t       | f

Hmm, that's an invalid VXID, which would ordinarily mean that nothing
is holding the lock.  There is a passing mention in mvcc.sgml that
SIRead locks sometimes need to be held past transaction end, so maybe
what you're looking at is such a lock that is no longer associated
with a specific transaction.  I have to disclaim knowing much of
anything about the SSI code, though.

            regards, tom lane



Re: -1/0 virtualtransaction

From
Mike Beachy
Date:
On Mon, Apr 26, 2021 at 6:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm, that's an invalid VXID, which would ordinarily mean that nothing
is holding the lock.  There is a passing mention in mvcc.sgml that
SIRead locks sometimes need to be held past transaction end, so maybe
what you're looking at is such a lock that is no longer associated
with a specific transaction.  I have to disclaim knowing much of
anything about the SSI code, though.

Hmm, yeah. I had seen discussion of this "held past transaction end" but didn't know what to make of it.

The "Serializable Snapshot Isolation in PostgreSQL" paper (https://drkp.net/papers/ssi-vldb12.pdf) makes me think this is a reasonable line of inquiry.

Thanks,
Mike

Re: -1/0 virtualtransaction

From
Laurenz Albe
Date:
On Mon, 2021-04-26 at 17:45 -0400, Mike Beachy wrote:
> Does anyone have any pointers on what a virtualtransaction of '-1/0' means?
> 
> I'm using SSI and an example is 
> 
>  locktype | database | relation |  page   | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction| pid |    mode    | granted | fastpath
 
>
----------+----------+----------+---------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------------+---------+----------
>  page     |    16384 |   468238 |   19216 |       |            |               |         |       |          | -1/0
          |     | SIReadLock | t       | f
 
> 
>  This is incredibly hard to search for. 
> 
> I see these for page, tuple and (perhaps a clue?) one relation, and I'm seeing the page and tuple locks accumulate
overtime.
 
>  Eventually this results in an "out of shared memory" error.

Not sure, but do you see prepared transactions in "pg_prepared_xacts"?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: -1/0 virtualtransaction

From
Mike Beachy
Date:
Hi Laurenz -

On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Not sure, but do you see prepared transactions in "pg_prepared_xacts"?

No, the -1 in the virtualtransaction (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts was another clue I saw! But, it seems more or less a dead end as I have nothing in pg_prepared_xacts.

Thanks for the idea, though.

I still need to put more effort into Tom's idea about SIReadLock hanging out after the transaction, but some evidence pointing in this direction is that I've reduced the number of db connections and found that the '-1/0' locks will eventually go away! I interpret this as the db needing to find time when no overlapping read/write transactions are present. This doesn't seem completely correct, as I don't have any long lived transactions running while these locks are hanging out. Confusion still remains, for sure.

Mike

Re: -1/0 virtualtransaction

From
Vijaykumar Jain
Date:
Hi,

I am just trying to jump in, but ignore if not relevant.

when you said    Eventually this results in an "out of shared memory" error  

Can you rule out the below two scenarios (wrt /dev/shm too low in docker or query requesting for too many locks either due to parallellism/partition involved) 
There have been multiple cases of out of shared memory i have read earlier for due to above.


also, is this repeatable (given you mention it happens and eventually lead to "out of shared memory")

I may be missing something, but i do not see a PID even though it has a lock granted on a page, was the process terminated explicitly or implicitly. ( and an orphan lingering ? )
ps auwwxx | grep postgres

I took the below from "src/test/regress/sql/tidscan.sql"  to simulate SIReadLock with an orphan process (by killing the process), but it gets reaped fine for me :(

postgres=# \d tidscan
              Table "public.tidscan"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |

postgres=# INSERT INTO tidscan VALUES (1), (2), (3);

postgres=# BEGIN ISOLATION LEVEL SERIALIZABLE;
BEGIN
postgres=*# SELECT * FROM tidscan WHERE ctid = '(0,1)';
 id
----
  1
(1 row)

postgres=*# -- locktype should be 'tuple'
SELECT locktype, mode FROM pg_locks WHERE pid = pg_backend_pid() AND mode = 'SIReadLock';
 locktype |    mode
----------+------------
 tuple    | SIReadLock
(1 row)

postgres=*# -- locktype should be 'tuple'
SELECT pid, locktype, mode FROM pg_locks WHERE mode = 'SIReadLock';
 pid  | locktype |    mode
------+----------+------------
 2831 | tuple    | SIReadLock
(1 row)

i thought one could attach a gdb or strace to the pid to figure out what it did before crashing.

As always, I have little knowledge on postgresql, feel free to ignore if nothing relevant.

Thanks,
Vijay 



On Tue, 27 Apr 2021 at 19:55, Mike Beachy <mbeachy@gmail.com> wrote:
Hi Laurenz -

On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Not sure, but do you see prepared transactions in "pg_prepared_xacts"?

No, the -1 in the virtualtransaction (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts was another clue I saw! But, it seems more or less a dead end as I have nothing in pg_prepared_xacts.

Thanks for the idea, though.

I still need to put more effort into Tom's idea about SIReadLock hanging out after the transaction, but some evidence pointing in this direction is that I've reduced the number of db connections and found that the '-1/0' locks will eventually go away! I interpret this as the db needing to find time when no overlapping read/write transactions are present. This doesn't seem completely correct, as I don't have any long lived transactions running while these locks are hanging out. Confusion still remains, for sure.

Mike 

Re: -1/0 virtualtransaction

From
Thomas Munro
Date:
On Wed, Apr 28, 2021 at 2:25 AM Mike Beachy <mbeachy@gmail.com> wrote:
> On Tue, Apr 27, 2021 at 2:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>>
>> Not sure, but do you see prepared transactions in "pg_prepared_xacts"?
>
> No, the -1 in the virtualtransaction (https://www.postgresql.org/docs/11/view-pg-locks.html) for pg_prepared_xacts
wasanother clue I saw! But, it seems more or less a dead end as I have nothing in pg_prepared_xacts. 
>
> Thanks for the idea, though.

There is another way to get a pg_lock that shows -1/0 there: if we run
out of SERIALIZABLEXACT objects, we transfer the locks of the oldest
committed transaction to a single SERIALIZABLEXACT object called
OldCommittedSxact, and it has an invalid virtual xid.  You can see
this if you recompile with TEST_SUMMARIZE_SERIAL defined to force that
path, and then run three overlapping transactions like this:

tx1: BEGIN;
tx1: SELECT * FROM t WHERE id = 42;
tx2:   BEGIN;
tx2:   SELECT * FROM t WHERE id = 999;
tx1: COMMIT;

Even though it comitted, at this point we still see tx1's locks, along
with tx2's.

tx3:     BEGIN;
tx3:     SELECT * FROM t WHERE id = 1234;

At this point we see tx1's locks still but their vxid has changed to -1/0.

> I still need to put more effort into Tom's idea about SIReadLock hanging out after the transaction, but some evidence
pointingin this direction is that I've reduced the number of db connections and found that the '-1/0' locks will
eventuallygo away! I interpret this as the db needing to find time when no overlapping read/write transactions are
present.This doesn't seem completely correct, as I don't have any long lived transactions running while these locks are
hangingout. Confusion still remains, for sure. 

But do you have lots of short overlapping transactions so that there
is never a moment where there are zero transactions running?

As mentioned, locks (and transactions, and conflicts) hang around
after you commit.  That's because things that your finished
transaction saw can cause transactions that are still running to
become uncommittable, by forming part of a cycle.   The rules for when
the locks can be freed change once you reach the degraded "summarised"
mode, though, due to lack of bookkeeping space.  Not sure of the exact
details without rereading the source code 42 times with much coffee,
but it's something like: summarised locks can only be freed at times
when there are zero active read/write serializable transactions,
because that's when "CanPartialClearThrough" advances, while in the
normal case they can be freed as soon as there are no SSI snapshots
that were taken before it committed, because that's when
"SxactGlobalXmin" advances.

The number of SERIALIZABLEXACT objects is (max_connections +
max_prepared_transactions) * 10.  So, you could try increasing
max_connections (without increasing the actual number of connections)
to see if you can get to a point where you don't see these invalid
virtual xids, and then maybe it'll be able to clean up locks more
aggressively.



Re: -1/0 virtualtransaction

From
Mike Beachy
Date:
On Fri, Apr 30, 2021 at 7:12 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> But do you have lots of short overlapping transactions so that there
> is never a moment where there are zero transactions running?

Yeah, that almost certainly explains it.

Thanks very much for the explanation about the summarized locks.

> The number of SERIALIZABLEXACT objects is (max_connections +
> max_prepared_transactions) * 10.  So, you could try increasing
> max_connections (without increasing the actual number of connections)
> to see if you can get to a point where you don't see these invalid
> virtual xids, and then maybe it'll be able to clean up locks more
> aggressively.

Aha! I hadn't considered that some parameter besides
max_pred_locks_per_transaction would come into play.  I'll give this a
shot.

Thanks,
Mike



Re: -1/0 virtualtransaction

From
Mike Beachy
Date:
In case this helps anyone else, I found a simple way to get a rough
idea of what's going on, which is to run:

select (select count(distinct virtualtransaction) from pg_locks) as
tx_with_locks, (select count(*) from pg_stat_activity where state =
'active') as active_tx, (select count(*) from pg_locks where
virtualtransaction = '-1/0') as summarized_locks;

I disabled the part of my application that seemed to be causing
problems with too many writes (a background cleanup task) and then
triggered it from a separate process. I can see the number of
transactions with locks climbing when it hits a problematic item,
while the number of active transactions (of course) stays low.

Mike

On Fri, Apr 30, 2021 at 4:53 PM Mike Beachy <mbeachy@gmail.com> wrote:
>
> On Fri, Apr 30, 2021 at 7:12 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > But do you have lots of short overlapping transactions so that there
> > is never a moment where there are zero transactions running?
>
> Yeah, that almost certainly explains it.
>
> Thanks very much for the explanation about the summarized locks.
>
> > The number of SERIALIZABLEXACT objects is (max_connections +
> > max_prepared_transactions) * 10.  So, you could try increasing
> > max_connections (without increasing the actual number of connections)
> > to see if you can get to a point where you don't see these invalid
> > virtual xids, and then maybe it'll be able to clean up locks more
> > aggressively.
>
> Aha! I hadn't considered that some parameter besides
> max_pred_locks_per_transaction would come into play.  I'll give this a
> shot.
>
> Thanks,
> Mike