Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

From shveta malik
Subject Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date
Msg-id CAJpy0uC3Ke0cxkwhqfW3tYkaOU=qr2p_80XTW7jweiYMrHuwKw@mail.gmail.com
Whole thread Raw
In response to Re: [WIP]Vertical Clustered Index (columnar store extension) - take2  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Mon, Jul 14, 2025 at 2:08 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Shveta,
>
> Thanks for your README questions.
>
> On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Thank You for working on this. I started going through the README and
> > tried running simple tests, have few concerns:
> >
> > 1)
> > I am not able to understand section 4.2 'WOS-to-ROS conversion'. When
> > whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID?
>
> Whiteout WOS remembers the tuple that needs to be deleted on the next
> WOS-to-ROS transfer. There is a TID/CRID mapping, so the intended
> meaning of “delete 4” in this diagram was  “delete the ROS data which
> has CRID 4”.
>
> > And when does delete-vector X represents?
>
> The delete vector is a bitset for knowing which records of ROS are
> marked for deletion. IIUC, the bits of the “delete vector” are what
> were previously in the “Whiteout ROS” -- i.e. the bits were set during
> the previous WOS-to-ROS transfers.
>
> Updating the delete vector bits is cheap, but it is more expensive to
> reconcile with ROS to delete the ROS data, so that happens only
> periodically when some threshold is exceeded. See README 2.5.3
> “Garbage Collection”. But, the diagram is showing the result of
> garbage collection at the same time as the WOS-to-ROS transfer.
>
> The “X” in the diagram was supposed to represent that the bit is set
> to mark the CRID 2 columns for deletion. I’ve changed this now to be
> 0’s and 1’s, which makes it consistent with the other description
> about the delete vector in “2.3.3”. e.g. 1 means "marked for delete".
>
> > I did not  get why ColA-2,
> > ColA-4 and ColB-2, ColB-4 were removed in resultant data?
>
> The record (of CRID 4) was in the “Whiteout WOS”, so during
> WOS-to-ROS transfer, the “delete vector” bit 4 would become set to
> mark CRID 4 for deletion in the ROS.
>
> The “garbage collection” (aka “deleted-rows-collection”) happens
> according to some threshold; however, this diagram shows what happens
> when the threshold is reached at the same time as the WOS-to-ROS
> transfer.
>
> e.g. So AFTER case shows result of the garbage collection as well:
>
> The “ColA-2 │ ColB-2” was removed because the delete vector bit 2 was
> already set
> The “ColA-4 │ ColB-4” was removed because the delete vector bit 4
> would become set (from having previously been in the Whiteout WOS)
>
> ~
>
> Notice the CRIDs in the before/after are different because they were
> renumbered after garbage collection.
>
> > Is the diagram complete?
>
> Yes, the diagram was complete, but hopefully it is easier to
> understand now that I have made a few minor changes to it. FYI, see
> also the PGConf.dev 2025 presentation slides [1] – it might help
> understanding to see similar information presented slightly
> differently.

Thanks for improving the diagram. Now the flow is clear.

> > 2)
> > We can make the definition consistent at both places as the first one
> > gives a feeling that rows are marked for deletion in WOS while the
> > second one says ROS.
> >
> > Whiteout WOS = Record of WOS rows marked for deletion
> > Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS
> >
>
> Fixed. Both are now using 2nd wording.
>
> > 3)
> > It is not part of README. But please help me understand the meaning
> > and usage of this GUC in VCI context:
> > vci.max_devices: Sets the maximum device number which can be attached.
> >
>
> The WOS–to–ROS transfer can be done by background workers. IIUC, the
> patch 0002 currently includes code for inspecting Linux devices
> associated with tablespaces where any VCI indexes reside. The purpose
> of this inspection is to discover the IO load so that VCI can
> determine the best time to launch the background worker – e.g.
> launching may be delayed longer if the system is deemed too busy. The
> “vci.max_devices” puts a limit on the number of devices that can be
> handled. All this logic is inherited from the product where this VCI
> patch originated; I feel some of this may be overly complex for the
> OSS patch's first version. We may be able to simplify/remove parts of
> this logic – maybe even this GUC.

Thanks for the explanation. The concept is quite tricky though. I will
also look at the code to understand it better.

> ...
> > -------------
> >
> > Few typos in README:
> > a) Each VCI indexed column is stored as an internal relations.
> > --relations --> relation
>
> Fixed.
>
> >
> > b) Records are addresses by CRID (Columnar Record ID) instead of by TID.
> > --addresses->addressed
>
> Fixed.
>
> >
> > c) Extents can be found by ID using offsets in a column "meta-data"
> > internal relation.
> > -- by ID using offsets? Do you mean 'by using offsets' alone?
>
> I was trying to say the appropriate offset can be found using the
> extent ID as the offset-array index. I’ve reworded this in the README
> to be clearer.
>
> >
> > d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE
>
> Fixed.
>
> ~~~
>
> Please see the updated README.
>
> ======
> [1] slides -
https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdf
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia



pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Logical replication prefetch
Next
From: Ajin Cherian
Date:
Subject: Re: 024_add_drop_pub.pl might fail due to deadlock