Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date | |
Msg-id | CAHut+PvYQZAHcD-tK5XaobUpWoTf0Gkjx7nAA9eJq_HbPCSxCQ@mail.gmail.com Whole thread Raw |
In response to | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
|
List | pgsql-hackers |
Hi Shveta, Thanks for your README questions. On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote: > > Thank You for working on this. I started going through the README and > tried running simple tests, have few concerns: > > 1) > I am not able to understand section 4.2 'WOS-to-ROS conversion'. When > whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID? Whiteout WOS remembers the tuple that needs to be deleted on the next WOS-to-ROS transfer. There is a TID/CRID mapping, so the intended meaning of “delete 4” in this diagram was “delete the ROS data which has CRID 4”. > And when does delete-vector X represents? The delete vector is a bitset for knowing which records of ROS are marked for deletion. IIUC, the bits of the “delete vector” are what were previously in the “Whiteout ROS” -- i.e. the bits were set during the previous WOS-to-ROS transfers. Updating the delete vector bits is cheap, but it is more expensive to reconcile with ROS to delete the ROS data, so that happens only periodically when some threshold is exceeded. See README 2.5.3 “Garbage Collection”. But, the diagram is showing the result of garbage collection at the same time as the WOS-to-ROS transfer. The “X” in the diagram was supposed to represent that the bit is set to mark the CRID 2 columns for deletion. I’ve changed this now to be 0’s and 1’s, which makes it consistent with the other description about the delete vector in “2.3.3”. e.g. 1 means "marked for delete". > I did not get why ColA-2, > ColA-4 and ColB-2, ColB-4 were removed in resultant data? The record (of CRID 4) was in the “Whiteout WOS”, so during WOS-to-ROS transfer, the “delete vector” bit 4 would become set to mark CRID 4 for deletion in the ROS. The “garbage collection” (aka “deleted-rows-collection”) happens according to some threshold; however, this diagram shows what happens when the threshold is reached at the same time as the WOS-to-ROS transfer. e.g. So AFTER case shows result of the garbage collection as well: The “ColA-2 │ ColB-2” was removed because the delete vector bit 2 was already set The “ColA-4 │ ColB-4” was removed because the delete vector bit 4 would become set (from having previously been in the Whiteout WOS) ~ Notice the CRIDs in the before/after are different because they were renumbered after garbage collection. > Is the diagram complete? Yes, the diagram was complete, but hopefully it is easier to understand now that I have made a few minor changes to it. FYI, see also the PGConf.dev 2025 presentation slides [1] – it might help understanding to see similar information presented slightly differently. > 2) > We can make the definition consistent at both places as the first one > gives a feeling that rows are marked for deletion in WOS while the > second one says ROS. > > Whiteout WOS = Record of WOS rows marked for deletion > Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS > Fixed. Both are now using 2nd wording. > 3) > It is not part of README. But please help me understand the meaning > and usage of this GUC in VCI context: > vci.max_devices: Sets the maximum device number which can be attached. > The WOS–to–ROS transfer can be done by background workers. IIUC, the patch 0002 currently includes code for inspecting Linux devices associated with tablespaces where any VCI indexes reside. The purpose of this inspection is to discover the IO load so that VCI can determine the best time to launch the background worker – e.g. launching may be delayed longer if the system is deemed too busy. The “vci.max_devices” puts a limit on the number of devices that can be handled. All this logic is inherited from the product where this VCI patch originated; I feel some of this may be overly complex for the OSS patch's first version. We may be able to simplify/remove parts of this logic – maybe even this GUC. ... > ------------- > > Few typos in README: > a) Each VCI indexed column is stored as an internal relations. > --relations --> relation Fixed. > > b) Records are addresses by CRID (Columnar Record ID) instead of by TID. > --addresses->addressed Fixed. > > c) Extents can be found by ID using offsets in a column "meta-data" > internal relation. > -- by ID using offsets? Do you mean 'by using offsets' alone? I was trying to say the appropriate offset can be found using the extent ID as the offset-array index. I’ve reworded this in the README to be clearer. > > d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE Fixed. ~~~ Please see the updated README. ====== [1] slides - https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdf Kind Regards, Peter Smith. Fujitsu Australia
Attachment
pgsql-hackers by date: