cmax docs seem misleading - Mailing list pgsql-docs

From Paul A Jungwirth
Subject cmax docs seem misleading
Date
Msg-id CA+renyWVogpNSTug5e+PTMWmTOcj8UXsAhHuHiavsVU0rzNpUQ@mail.gmail.com
Whole thread Raw
List pgsql-docs
The docs for cmax say:[0]

> The command identifier within the deleting transaction, or zero.

This was true once upon a time, I think. But nowadays cmax and cmin
are the same physical field, and the user-facing system columns don't
seem to be trying to interpret it. For example:

[v19devel:5432][334102] regression=# create table pj (a int);
CREATE TABLE
[v19devel:5432][334102] regression=# begin; insert into pj values (1);
insert into pj values (2); commit;
BEGIN
INSERT 0 1
INSERT 0 1
COMMIT
[v19devel:5432][334102] regression=# select ctid, xmin, xmax, cmin,
cmax, * from pj;
 ctid  | xmin  | xmax | cmin | cmax | a
-------+-------+------+------+------+---
 (0,1) | 22424 |    0 |    0 |    0 | 1
 (0,2) | 22424 |    0 |    1 |    1 | 2

So here you have a non-zero cmax for a not-deleted row.

The converse isn't true either. "Or zero" hints that deleted rows
might always have a non-zero value, but 0 is also just the first
command in the transaction. (Null would be a meaningful signal, but I
assume we don't want to do that.)

As far as I can tell, it is impossible to observe cmin <> cmax. From
heap_getsysattr (access/common/heaptuple.c):

        case MinCommandIdAttributeNumber:
        case MaxCommandIdAttributeNumber:

            /*
             * cmin and cmax are now both aliases for the same field, which
             * can in fact also be a combo command id.  XXX perhaps we should
             * return the "real" cmin or cmax if possible, that is if we are
             * inside the originating transaction?
             */
            result =
CommandIdGetDatum(HeapTupleHeaderGetRawCommandId(tup->t_data));
            break;

So it looks like these system columns also don't look up combocids.

I'm not interested in changing any of this, but I think we could clean
up the docs a little. The description for cmin is questionable too:

> The command identifier (starting at zero) within the inserting transaction.

That's true if the row hasn't been deleted yet, but then we overwrite the field.

Here is a patch to make both of these fields a little clearer, I
think. It could be improved further by some glossary entries
explaining what a command id is (and a combocid). Or maybe that's too
much information? And maybe we should be more drastic: combine cmin &
cmax into one entry, and explain that they are two names for the same
value, which might signify the insert cid, the delete cid, or a
combocid.

[0] https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CMAX

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Attachment

pgsql-docs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Fix improper xreflabels created for v18 release notes
Next
From: Alexey Shishkin
Date:
Subject: Re: clarification for pg_basebackup and major versions