Hey hackers,
I’ve encountered a small issue in pg_dump.
It currently emits OVERRIDING SYSTEM VALUE in INSERTs for
a table that doesn't have an identity column if it used to have
a GENERATED ALWAYS AS IDENTITY column that was later dropped.
Simple repro:
CREATE TABLE demo (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
fk_a BIGINT NOT NULL,
fk_b BIGINT NOT NULL
);
INSERT INTO demo (fk_a, fk_b)
OVERRIDING SYSTEM VALUE VALUES (1, 2);
ALTER TABLE demo DROP COLUMN id;
ALTER TABLE demo ADD PRIMARY KEY (fk_a, fk_b);
pg_dump --data-only --inserts --table=demo mydb
Expected:
INSERT INTO public.demo VALUES (1, 2);
Actual:
INSERT INTO public.demo OVERRIDING SYSTEM VALUE VALUES (1, 2);
The clause is harmless, but it's misleading and causes noisy diffs.
In CI setups that rely on deterministic dumps, this can add significant friction
to reviews. We currently work around it by first clearing the stale markers:
UPDATE pg_catalog.pg_attribute a
SET attidentity = ''
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n
ON n.oid = c.relnamespace
WHERE a.attrelid = c.oid
AND a.attisdropped
AND a.attidentity <> ''
AND n.nspname = 'public'
AND c.relkind IN ('r', 'p');
Background:
After DROP COLUMN, the pg_attribute row is marked attisdropped = true,
but attidentity still has its old value ('a').
getTableAttrs() loops over all attributes (attnum > 0), including dropped
attributes, and does:
tbinfo->needs_override = tbinfo->needs_override ||
(tbinfo->attidentity[j] == ATTRIBUTE_IDENTITY_ALWAYS);
So a dropped column with attidentity = 'a' still flips needs_override to true,
and dumpTableData_insert() then always emits OVERRIDING SYSTEM VALUE
for the table.
The attached patch fixes this by ignoring dropped columns when setting
needs_override (i.e., checking !attisdropped).
I also considered clearing attidentity in DROP COLUMN, but that wouldn't
address the problem for already-stale catalog entries.
This seems to have been around since identity columns were added.
Patch attached.
Thoughts?