Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

From Peter Smith
Subject Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date
Msg-id CAHut+PtF0Mu=QPhCyTuUJg0RuGSC7Vjr5f6rsasmr+SeMk7L2g@mail.gmail.com
Whole thread Raw
In response to Re: [WIP]Vertical Clustered Index (columnar store extension) - take2  (Japin Li <japinli@hotmail.com>)
List pgsql-hackers
Hi Japin,

Thanks for your README questions.

On Fri, Jul 11, 2025 at 7:18 PM Japin Li <japinli@hotmail.com> wrote:
...
>
> 3.
> In the README, 'TID' seems to have conflicting definitions:
> Transaction ID (2.1) vs. tuple physical identifier (2.3.1).
>
> Could you confirm the intended meaning? Suggest using 'XID' for Transaction ID
> if my understanding is correct.
>

Yes, TID was meant only for the Tuple identifier. Some terms became
muddled. Hopefully, those are fixed now.

> 4.
> -1:  TID relation (maps CRID to original TID)
> -5:  TID-CRID mapping table
>
> I'm trying to understand the distinctions here. Based on the definition in
> vci_tidcrid.h, it seems plausible to use just one relation for the mapping,
> suggesting a potential redundancy.
>
> /*
>  * TID-CRID pair used for TIDCRID update list
>  */
> typedef struct vcis_tidcrid_pair_item
> {
>     ItemPointerData page_item_id;   /* TID on the original relation */
>     vcis_Crid   crid;           /* CRID */
> } vcis_tidcrid_pair_item_t;
>
> How they are different? I see the code in vci_tidcrid.c
>

AFAIK, the distinction is described by the code comments in vci_columns.h:

+/** Column ID of special column */
+#define VCI_COLUMN_ID_TID          (-1)
+#define VCI_COLUMN_ID_NULL       (-2)
+#define VCI_COLUMN_ID_DELETE  (-3)

So those are all special columns in the ROS data part. In other words,
these internal relations all have data that is indexed by the CRID –
e.g “Delete vector” (2.3.3)  and “Null information” (2.3.4). So here,
the TID relation is the mapping from the CRID back to the original
TID.

 On the other hand, the other relations...

+/**  The data below are not column-stored data.
+ * We prepare them for convenience.
+ */
+#define VCI_COLUMN_ID_TID_CRID                  (-5)
+#define VCI_COLUMN_ID_TID_CRID_UPDATE  (-6)
+#define VCI_COLUMN_ID_TID_CRID_WRITE     (-7)
+#define VCI_COLUMN_ID_TID_CRID_CDR         (-8)
+#define VCI_COLUMN_ID_DATA_WOS                (-9)
+#define VCI_COLUMN_ID_WHITEOUT_WOS     (-10)

… are not “column-stored” – In other words, these ones, including the
"TID-CRID mapping table” (-5), are *not* indexed by CRID.

You may be right about a potential redundancy. But right now we're
focused on making these patches ready for open source - removing dead
code to shrink the size, improving the PostgreSQL core interface, and
fixing bugs. Rewriting or optimising the logic will have to wait.


> 5.
> Typo in README.
> - Each extent can have its own independent compression dictionary or all
>   extents can share a comon dictionary
> --> s/comon/common/g
>

Fixed.

~~~

Please see the updated README that I attached in the previous post.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Dmitry Koval
Date:
Subject: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Next
From: Dmitry Dolgov
Date:
Subject: Re: Changing shared_buffers without restart