Home > mailing lists

Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From	Mihail Nikalayeu
Subject	Re: Adding REPACK [concurrently]
Date	January 25 19:31:00
Msg-id	CADzfLwUEH5+LjCN+6kRfSsXwuou8rKXyVV42Wi-O_TG0360Kug@mail.gmail.com Whole thread Raw
In response to	Re: Adding REPACK [concurrently] (Mihail Nikalayeu <mihailnikalayeu@gmail.com>)
Responses	Re: Adding REPACK [concurrently]
List	pgsql-hackers

Tree view

Hello, Antonin!

PART 1:

I started rebasing the MVCC-safe version on top of the multi-snapshot version and realized it becomes complex.

But, what's really bad about MVCC-unsafety is the ability to access *incorrect* data and break some logic (or even constraints).

If we may *prevent* such data access with some kind of error (which is going to be very infrequent) - I don't see any sense to achieve true MVCC-safety.

I remembered a way it works with indcheckxmin for indexes. And made something similar for pg_class: it records the rewriting transaction XID and causes the executor to raise an error if a transaction with an older snapshot attempts to access the rewritten relation.

For the normal case - check is never executed, no performance regression here. Also, the flag is automatically cleared by VACUUM once the transaction ID is frozen.

It also "fixes" ALTER TABLE, not only REPACK concurrently.

Attached patch contains more details (some in the commit message).

PART 2:

I have continued working with stress tests. This time I added your WIP patch to fix the LR\CLOG race.

I made the following configs:

1) just REPACK CONCURRENTLY - ok

2) + relcheckxmin (see PART1) - ok
3) + worker - ok

4) + multiple snapshots - broken in multiple ways.

You may see example of run here - https://cirrus-ci.com/build/6359048020295680

Some examples:

1) 'pgbench: error: client 11 script 0 aborted in command 20 query 0: ERROR: could not read blocks 0..0 in file "base/5/16414": read only 0 of 8192 bytes

2) at /home/postgres/postgres/contrib/amcheck/t/008_repack_concurrently.pl line 51.
[15:36:37.204] # 'pgbench: error: client 5 script 0 aborted in command 28 query 0: ERROR: division by zero

3) 'pgbench: error: client 12 script 0 aborted in command 6 query 0: ERROR: cache lookup failed for relation 17400

Attachment

pgsql-hackers by date:

From: "Jelte Fennema-Nio"
Date: 25 January, 17:35:42
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information

From: Tom Lane
Date: 25 January, 19:44:26
Subject: Re: [PATCH] Replace COUNT(NULL) with '0'::bigint

Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

Attachment

Previous

Next