Home > mailing lists

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

From	Peter Smith
Subject	Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date	July 14 11:37:41
Msg-id	CAHut+PvYQZAHcD-tK5XaobUpWoTf0Gkjx7nAA9eJq_HbPCSxCQ@mail.gmail.com Whole thread Raw
In response to	Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (shveta malik <shveta.malik@gmail.com>)
Responses	Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
List	pgsql-hackers

Tree view

Hi Shveta,

Thanks for your README questions.

On Fri, Jul 11, 2025 at 1:46 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thank You for working on this. I started going through the README and
> tried running simple tests, have few concerns:
>
> 1)
> I am not able to understand section 4.2 'WOS-to-ROS conversion'. When
> whiteout-WOS says 'delete 4', what does that mean? 4 is CRID, TXID?

Whiteout WOS remembers the tuple that needs to be deleted on the next
WOS-to-ROS transfer. There is a TID/CRID mapping, so the intended
meaning of “delete 4” in this diagram was  “delete the ROS data which
has CRID 4”.

> And when does delete-vector X represents?

The delete vector is a bitset for knowing which records of ROS are
marked for deletion. IIUC, the bits of the “delete vector” are what
were previously in the “Whiteout ROS” -- i.e. the bits were set during
the previous WOS-to-ROS transfers.

Updating the delete vector bits is cheap, but it is more expensive to
reconcile with ROS to delete the ROS data, so that happens only
periodically when some threshold is exceeded. See README 2.5.3
“Garbage Collection”. But, the diagram is showing the result of
garbage collection at the same time as the WOS-to-ROS transfer.

The “X” in the diagram was supposed to represent that the bit is set
to mark the CRID 2 columns for deletion. I’ve changed this now to be
0’s and 1’s, which makes it consistent with the other description
about the delete vector in “2.3.3”. e.g. 1 means "marked for delete".

> I did not  get why ColA-2,
> ColA-4 and ColB-2, ColB-4 were removed in resultant data?

The record (of CRID 4) was in the “Whiteout WOS”, so during
WOS-to-ROS transfer, the “delete vector” bit 4 would become set to
mark CRID 4 for deletion in the ROS.

The “garbage collection” (aka “deleted-rows-collection”) happens
according to some threshold; however, this diagram shows what happens
when the threshold is reached at the same time as the WOS-to-ROS
transfer.

e.g. So AFTER case shows result of the garbage collection as well:

The “ColA-2 │ ColB-2” was removed because the delete vector bit 2 was
already set
The “ColA-4 │ ColB-4” was removed because the delete vector bit 4
would become set (from having previously been in the Whiteout WOS)

~

Notice the CRIDs in the before/after are different because they were
renumbered after garbage collection.

> Is the diagram complete?

Yes, the diagram was complete, but hopefully it is easier to
understand now that I have made a few minor changes to it. FYI, see
also the PGConf.dev 2025 presentation slides [1] – it might help
understanding to see similar information presented slightly
differently.

> 2)
> We can make the definition consistent at both places as the first one
> gives a feeling that rows are marked for deletion in WOS while the
> second one says ROS.
>
> Whiteout WOS = Record of WOS rows marked for deletion
> Whiteout WOS -- TID records of WOS rows that are marked for deletion on ROS
>

Fixed. Both are now using 2nd wording.

> 3)
> It is not part of README. But please help me understand the meaning
> and usage of this GUC in VCI context:
> vci.max_devices: Sets the maximum device number which can be attached.
>

The WOS–to–ROS transfer can be done by background workers. IIUC, the
patch 0002 currently includes code for inspecting Linux devices
associated with tablespaces where any VCI indexes reside. The purpose
of this inspection is to discover the IO load so that VCI can
determine the best time to launch the background worker – e.g.
launching may be delayed longer if the system is deemed too busy. The
“vci.max_devices” puts a limit on the number of devices that can be
handled. All this logic is inherited from the product where this VCI
patch originated; I feel some of this may be overly complex for the
OSS patch's first version. We may be able to simplify/remove parts of
this logic – maybe even this GUC.

...
> -------------
>
> Few typos in README:
> a) Each VCI indexed column is stored as an internal relations.
> --relations --> relation

Fixed.

>
> b) Records are addresses by CRID (Columnar Record ID) instead of by TID.
> --addresses->addressed

Fixed.

>
> c) Extents can be found by ID using offsets in a column "meta-data"
> internal relation.
> -- by ID using offsets? Do you mean 'by using offsets' alone?

I was trying to say the appropriate offset can be found using the
extent ID as the offset-array index. I’ve reworded this in the README
to be clearer.

>
> d) EXPLAIN ANALYSE -->EXPLAIN ANALYZE

Fixed.

~~~

Please see the updated README.

======
[1] slides -
https://www.pgevents.ca/events/pgconfdev2025/sessions/session/292/slides/98/A%20journey%20toward%20the%20columnar%20data%20store%20.pdf

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

README

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 14 July, 11:25:39
Subject: Re: Changing shared_buffers without restart

From: Sutou Kouhei
Date: 14 July, 11:38:03
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

Attachment

Previous

Next