Re: Refactor code around GUC default_toast_compression - Mailing list pgsql-hackers

From Chao Li
Subject Re: Refactor code around GUC default_toast_compression
Date
Msg-id 7A329D19-A0F2-4558-A391-6E82699B9BC0@gmail.com
Whole thread
In response to Re: Refactor code around GUC default_toast_compression  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

> On May 2, 2026, at 06:43, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, May 01, 2026 at 02:44:07PM +0530, Ayush Tiwari wrote:
>> 1. The patch includes an unrelated hunk in
>>   doc/src/sgml/ref/alter_index.sgml, adding text about `ALTER INDEX ...
>>   ATTACH PARTITION`.  That looks like an accidental carry-over from another
>>   patch and shouldn't be there ig.
>
> Sorry about that.  That feels like a rebase fart.
>
>> 2. The comment in src/include/access/toast_compression.h describing
>>   default_toast_compression looks stale after this change?  It still says
>>   that the GUC value is one of the char values stored in
>>   pg_attribute.attcompression, but the patch changes it to use the new
>>   ToastCompressionGucValue enum values instead.(Maybe I'm
>>   missing something)
>
> Nope, you are missing nothing.  I was re-reading the patch and I think
> that we could just remove the whole paragraph.  Even by doing so we
> lose no information.
>
>> 3. One minor point: CompressionIdToMethod() seems to be added as a public
>>   helper, but I could not find any callers in this patch.
>
> Oops, removed.  I may have used it at some point.
>
>>   Also,
>>   pg_column_compression() still keeps its own cmid-to-name switch.  If the
>>   intent is to centralize these mappings in the registry, perhaps that code
>>   could use the new helper path as well, otherwise the unused helper may
>>   not be necessary yet (though it might be in future).
>
> Yes, this is part of the extra tweaks that would be needed when added
> a new compression method.  This part looks at a varlena pointer,
> retrieves the on-disk ID.  So this is left as-is on purpose, like the
> direct TOAST decompress business based on varlena pointers.
>
> Attached is a v2, to keep the CI happy for as long as we can use it.
> --
> Michael
> <v2-0001-Refactor-some-code-logic-around-GUC-default_toast.patch>

Overall looks good. A few small comments:

1
```
 /*
  * GUC support.
- *
- * default_toast_compression is an integer for purposes of the GUC machinery,
- * but the value is one of the char values defined below, as they appear in
- * pg_attribute.attcompression, e.g. TOAST_PGLZ_COMPRESSION.
  */
 extern PGDLLIMPORT int default_toast_compression;
```

Before this patch, default_toast_compression stores 'p'/'l'. With this patch, it is changed to store 0/1. Would it be
betterto rename this variable? 

Otherwise, a third-party extension that relies on this variable could silently misbehave. I understand that a major
releaseis allowed to change API/ABI contracts, but a build failure would be better than silent misbehavior. Or at least
weshould document this change somewhere. 

2
```
 #ifdef USE_LZ4
-#define DEFAULT_TOAST_COMPRESSION    TOAST_LZ4_COMPRESSION
+#define DEFAULT_TOAST_COMPRESSION    TOAST_LZ4_COMPRESSION_GUC
 #else
-#define DEFAULT_TOAST_COMPRESSION    TOAST_PGLZ_COMPRESSION
+#define DEFAULT_TOAST_COMPRESSION    TOAST_PGLZ_COMPRESSION_GUC
 #endif
```

Would it better to also rename DEFAULT_TOAST_COMPRESSION to DEFAULT_TOAST_COMPRESSION_GUC.

3
```
+#define TOAST_COMPRESS_PGLZ        0
+#define TOAST_COMPRESS_LZ4        1
+#define TOAST_COMPRESS_INVALID    2
```

Now TOAST_COMPRESS_PGLZ is 0, and TOAST_PGLZ_COMPRESSION is ‘p’. When they appear together in the code, it’s hard to
guesswhich is 0 and which is ‘p’. So, would it better to rename TOAST_COMPRESS_PGLZ to TOAST_PGLZ_COMPRESS_ID, and
renameTOAST_PGLZ_COMPRESSION to TOAST_PGLZ_COMPRESS_METHOD? 

4
```
     /*
-     * Call appropriate compression routine for the compression method.
+     * Translate the compression method char to the on-disk compression ID
+     * via the Method Registry, then dispatch to the appropriate compression
+     * routine.
      */
+    cmid = MethodToCompressionId(cmethod);
     switch (cmethod)
     {
         case TOAST_PGLZ_COMPRESSION:
             tmp = pglz_compress_datum((const varlena *) DatumGetPointer(value));
-            cmid = TOAST_PGLZ_COMPRESSION_ID;
             break;
         case TOAST_LZ4_COMPRESSION:
             tmp = lz4_compress_datum((const varlena *) DatumGetPointer(value));
-            cmid = TOAST_LZ4_COMPRESSION_ID;
             break;
         default:
             elog(ERROR, "invalid compression method %c", cmethod);
```

As the switch/default explicitly rejects invalid cmethod, I feel slightly better for readability to place "cmid =
MethodToCompressionId(cmethod);"after the switch clause. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Refactor code around GUC default_toast_compression
Next
From: Zhongpu Chen
Date:
Subject: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8