Hi all,
While hacking on the TOAST code, I have been annoyed more than once
with the following piece in toast_compression.h:
/*
* Built-in compression method ID. The toast compression header will store
* this in the first 2 bits of the raw length. These built-in compression
* method IDs are directly mapped to the built-in compression methods.
*
* Don't use these values for anything other than understanding the meaning
* of the raw bits from a varlena; in particular, if the goal is to identify
* a compression method, use the constants TOAST_PGLZ_COMPRESSION, etc.
* below. We might someday support more than 4 compression methods, but
* we can never have more than 4 values in this enum, because there are
* only 2 bits available in the places where this is stored.
*/
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_INVALID_COMPRESSION_ID = 2,
} ToastCompressionId;
This is due the fact that we have only two bits that can be used in
va_tcinfo or va_extinfo. While looking at the addition of a new
compression method, this was causing a mess, so I have hacked the
attached patch, that makes the addition of more compression methods
easier. The idea is centralized in toast_compression.c, with the
addition of a registry that knows about all the TOAST compression
methods and its meta-data:
- name
- GUC enum values.
- attcompression char value.
- varatt on-disk value.
This is coupled with a set of translation routines, used in other code
paths. This has also the merit to remove TOAST_INVALID_COMPRESSION_ID
from the list of GUC values, which did not really make sense to begin
with. I don't deny that the addition of a new compression method
would require more tweaks, particularly for the decompression part,
but I think that this is a nice cleanup anyway. This is added to the
next commit fest, to be considered for v20.
Thanks,
--
Michael