Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements - Mailing list pgsql-hackers

From Michail Nikolaev
Subject Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date
Msg-id CANtu0og-4pvn4+TCWH6U9ghyd7x7NBAZSgi4ZWyBZdBWH6OpWA@mail.gmail.com
Whole thread Raw
In response to Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hello, Michael!

Thank you for your comments and feedback!

Yes, this patch set contains a significant amount of code, which makes it challenging to review. Some details are explained in the commit messages, but I’m doing my best to structure the patch set in a way that is as committable as possible. Once all the parts are ready, I plan to write a detailed letter explaining everything, including benchmark results and other relevant information.

Meanwhile, here’s a quick overview of the patch structure. If you have suggestions for an alternative decomposition approach, I’d be happy to hear.

The primary goals of the patch set are to:
    * Enable the xmin horizon to propagate freely during concurrent index builds
    * Build concurrent indexes with a single heap scan

The patch set is split into the following parts. Technically, each part could be committed separately, but all of them are required to achieve the goals.

Part 1: Stress tests
- 0001: Yes, this patch is from another thread and not directly required, it’s included here as a single commit because it’s necessary for stress testing this patch set. Without it, issues with concurrent reindexing and upserts cause failures.
- 0002: Yes, I agree these tests need to be refactored or moved into a separate task. I’ll address this later.

Part 2: During the first phase of concurrently building a  index, reset the snapshot used for heap scans between pages, allowing xmin to go forward.
- 0003: Implement such snapshot resetting for non-parallel and non-unique cases
- 0004: Extends snapshot resetting to parallel builds
- 0005: Extends snapshot resetting to unique indexes

Part 3: Build concurrent indexes in a single heap scan
- 0006: Introduces the STIR (Short-Term Index Replacement) access method, a specialized method for auxiliary indexes during concurrent builds
- 0007: Implements the auxiliary index approach, enabling concurrent index builds to use a single heap scan.
            In a few words, it works like this: create an empty auxiliary STIR index to track new tuples, scan heap and build new index, merge STIR tuples into new index, drop auxiliary index.
- 0008: Enhances the auxiliary index approach by resetting snapshots during the merge phase, allowing xmin to propagate

Part 4: This part depends on all three previous parts being committed to make sense (other parts are possible to apply separately).
- 0009:  Remove PROC_IN_SAFE_IC logic, as it is no more required

I have a plan to add a few additional small things (optimizations) and then do some scaled stress-testing and benchmarking. I think that without it, no one is going to spend his time for such an amount of code :) 

Merry Christmas,
Mikhail.

pgsql-hackers by date:

Previous
From: Alena Rybakina
Date:
Subject: Re: Exists pull-up application with JoinExpr
Next
From: Vladlen Popolitov
Date:
Subject: Re: Windows UTF8 system locale