Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Hello, many thanks for the new version. Here's a very quick proposal
> for a new top-of-file comment on cluster.c,
The comment matches 0005, but I had to adjust it for 0004 (no background
worker there). Also, the worker writes the changes to a file rather than
tuplestore (storage/sharedfileset.h seems to me an easier way to pass the data
from one process to another) Besides that I made the following changes:
"bloat is greatly reduced" -> "bloat is eliminated"
and
"table, and to cope with" -> "table. To cope with"
> I haven't read build_relation_finish_concurrent() yet to understand how
> exactly do we do the lock upgrade, which I think is an important point
> we should address in this comment. Also not addressed is how exactly we
> handle indexes. Feel free to correct this, reword it or include any
> additional details that you think are important.
ok, I'll get back to the earlier parts of the set, including this, in the
beginning of January. Regarding indexes, one thing I've noticed recently that
they get locked in build_new_indexes(), but maybe it should happen earlier.
> (At this point we could just as well rename the file to repack.c, since
> very little of the original remains. But let's discuss that later.)
ok. Do you mean only the file or the functions as well? (I'm not going to do
that now, w/o that discussion.)
Attached here is a new version of the patch set. Its rebased and extended one
more time: 0006 is a PoC of the "snapshot resetting" technique, as discussed
elsewhere with Mihail Nikalayeu and Matthias van de Meent. The way snapshot
are generated here is different though: we need the snapshots from logical
replication's snapbuild.c, not those from procarray.c. More information is in
the commit message.
I do not insist that this should go to PG 19, just needed some confidence that
it's doable, as well as some feedback. There are no tests for this yet, but
I've played with it for a while and checked the behavior using debugger. I'm
curious to hear if the design is sound.
While working on that, I fixed some problems in 0004 and 0005 too. It
shouldn't be difficult to identify them using git, if needed.
Even if 0005 and 0006 won't land in PG19, these parts show that some
refactoring may be needed regarding the AM callback
table_relation_copy_for_cluster(). The parts 0004, 0005 and 0006 each change
the argument list. It wouldn't be perfect if both PG 19 and 20 changed the
API. I think we should reconsider which arguments are generic and which are
rather AM-specific. Maybe we should then add an opaque pointer (void *) for
the AM-specific information. REPACK could then use it to pass the
CONCURRENTLY-specific information.
I'm now going to prioritize the parts <= 0004.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com