On Wed, Apr 29, 2026 at 2:10 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2026-04-29 10:07:04 +0200, Hannu Krosing wrote:
> > On Wed, Apr 29, 2026 at 5:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > I am trying to understand your idea. If we are trying to deparse from
> > > an actual system table using a snapshot, why don't we just use the
> > > WAL? I mean, the WAL should contain the actual catalog modifications
> > > it has made.
> >
> > We have the full data in the catalog and we would likely need catalog
> > queries for any change, even when de-parsing the tree.
> >
> > And we should not add the extra load on the original DDL side, just as
> > we don't for DML.
>
> That can't be a relevant cost compared to everything else.
Probably not. But unless we somehow encode "everything" at that point
we will make building different DDL decoders harder down the line.
So why not just save the normally serialised parse tree at this point
and let the decoders decide to do whatever they need.
> > At most we could just serialize the statement tree into the WAL,
> > though even that may be an overkill if we can get the change from
> > existing records.
> >
> > - insert new row in pg_class --> extract the CREATE TABLE (or INDEX, or ...)
> > - update row in pg_class or insert, update or delete a row in
> > pg_attribute --> extract ALTER TABLE
> > - except when it just updates relfilenod --> extract TRUNCATE
> > - delete row in pg_class --> DROP TABLE
> > - dml on pg_constraint --> ALTER TABLE
> >
> > ... etc
>
> That doesn't work in the general case, think of
> ALTER TABLE ... ALTER COLUMN ... TYPE foo USING (...)
>
> There's a big difference between USING(foo::int8) and USING (pg_size_bytes(foo))
> but it's nowhere visible in the WAL.
It can't be a big difference if it is not visible in the WAL.
Currently, we do treat DML exactly this way (or arguably worse).
In the following all the updates are decoded exactly the same
CREATE TABLE t(id int primary key, data text);
INSERT INTO t VALUES(1, 'one');
UPDATE t SET data='one' where id=1;
UPDATE t SET id=id;
UPDATE t SET id=10-9;
UPDATE t SET data='one';
ALL of the above get decoded as "UPDATE t SET data='one' where id=1;"
That is, we do not care how the values got there, as long as the end
result is the same.
And we do not track which fields were actually changed
The only reasons I see why we could not do the same for DDL are
1. it would be significantly more expensive to do so
or
2. we plan to fix some of that for DML as well and to start tracking
more of the intent in DML by extracting that from the statement trees.
--
Hannu