Re: Validating check constraints without a table scan? - Mailing list pgsql-general
From | Torsten Förtsch |
---|---|
Subject | Re: Validating check constraints without a table scan? |
Date | |
Msg-id | CAKkG4_=pL3s0cpr4d4hWcobRcdBF0=73RMcasiSz8vM9dcubxw@mail.gmail.com Whole thread Raw |
In response to | Validating check constraints without a table scan? (Philip Couling <couling@gmail.com>) |
Responses |
Re: Validating check constraints without a table scan?
Re: Validating check constraints without a table scan? |
List | pgsql-general |
On Fri, Nov 15, 2024 at 9:38 AM Philip Couling <couling@gmail.com> wrote:
Is there a solid reason why adding a check constraint does not use existing indexes for validation.We are currently looking at partitioning a multi TB table leaving all existing data in place and simply attaching it as a partition to a new table. To prevent locking, we are trying to add an INVALID check constraint first and then validate it.I can trivially prove the invalid constraint is valid with a simple SELECT which will use an existing index and return instantaneously. But AFAIK Theres no way to mark a constraint as valid without scanning all the rows.
Most likely your query is not exactly the same as the check constraint. Think about NULL and similar.
This operation is really problematic on a production database with heavy IO load.Is there a solid ready why validating check constraints cannot use existing indexes? If I can prove the constraint is valid so trivially with a SELECT, then why can Postgres not do the same (or similar)?
Here is what has worked for me many times:
1. create the check constraint as NOT VALID. From now on no new or updated row is allowed to violate it.
2. check if the constraint holds with a query on a binary replica. Make sure the query starts only when the constraint is visible on the replica.
3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE conname='your_constraint_name' AND conrelid='schema.table'::regclass
How you perform step 2 is irrelevant. Checking it on a replica would simply avoid the load on the master. You just need to make sure there is no conflicting data in the table.
WARNING, you need to be damn sure of your data if you do that. But if you are, it works.
Here is the procedure how I solved the same problem for some of our multi-TB tables (PG14):
The table has a column called transaction_time. We wanted to partition by that column. For some historical reason the column did not have a NOT NULL constraint. However, there was no way our processes could insert NULL in that column and there was no row with NULL in that field. So, first was to add the NOT NULL constraint:
BEGIN;
ALTER TABLE my.table
ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT NULL) NOT VALID;
UPDATE pg_constraint
SET convalidated=true
WHERE conname = 'transaction_time_not_null'
AND conrelid = 'my.table'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;
COMMIT;
ALTER TABLE my.table
ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT NULL) NOT VALID;
UPDATE pg_constraint
SET convalidated=true
WHERE conname = 'transaction_time_not_null'
AND conrelid = 'my.table'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;
COMMIT;
Now for cosmetic purposes we first turn the check constraint above into a normal NOT NULL constraint:
BEGIN;
SET LOCAL client_min_messages = 'debug4';
-- expecting this message
-- DEBUG: existing constraints on column "table.transaction_time" are sufficient to prove that it does not contain nulls
ALTER TABLE my.table
ALTER COLUMN transaction_time SET NOT NULL;
RESET client_min_messages;
ALTER TABLE my.table
DROP CONSTRAINT transaction_time_not_null;
SET LOCAL client_min_messages = 'debug4';
-- expecting this message
-- DEBUG: existing constraints on column "table.transaction_time" are sufficient to prove that it does not contain nulls
ALTER TABLE my.table
ALTER COLUMN transaction_time SET NOT NULL;
RESET client_min_messages;
ALTER TABLE my.table
DROP CONSTRAINT transaction_time_not_null;
COMMIT;
If you set client_min_messages to something like debug4, then the database tells you if it wants to scan the table or if existing constraints are sufficient to prove the condition.
transaction_time in our case is never in the future. Also database transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust the effect of any action performed more than 30 seconds ago in the database is visible.
SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS switch_time\gset
Next comes the actual change:
BEGIN;
-- rename the existing table
ALTER TABLE my.table RENAME TO table_old;
-- drop triggers. We will recreate them later.
DROP TRIGGER ... ON my.table_old;
DROP TRIGGER ...;
DROP TRIGGER ...;
-- create partitioned table
CREATE TABLE my.table (
LIKE my.table_old
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS
)
PARTITION BY RANGE (transaction_time);
LIKE my.table_old
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS
)
PARTITION BY RANGE (transaction_time);
-- recreate triggers
CREATE TRIGGER ...
BEFORE DELETE ON my.table
FOR EACH ROW EXECUTE FUNCTION my,trigger_fun();
BEFORE DELETE ON my.table
FOR EACH ROW EXECUTE FUNCTION my,trigger_fun();
CREATE TRIGGER ...;
-- create the partition for future rows
CREATE TABLE my.table_current PARTITION OF my.table
FOR VALUES FROM (:'switch_time') TO ('infinity');
FOR VALUES FROM (:'switch_time') TO ('infinity');
-- and some indexes and FK
ALTER TABLE my.table_current ADD PRIMARY KEY (id);
CREATE INDEX ON my.table_current (...);
CREATE INDEX ON my.table_current (...);
ALTER TABLE my.table_current
ADD CONSTRAINT fk_name
FOREIGN KEY (...)
REFERENCES ...(...) ON UPDATE ... ON DELETE ...;
ADD CONSTRAINT fk_name
FOREIGN KEY (...)
REFERENCES ...(...) ON UPDATE ... ON DELETE ...;
-- now attach the old table as partition
ALTER TABLE my.table_old
ADD CONSTRAINT partition_boundaries
CHECK('-infinity'<=transaction_time AND transaction_time<:'switch_time')
NOT VALID;
-- for procedural reasons we know the constraint is valid. Let's make PG believe it too.
ADD CONSTRAINT partition_boundaries
CHECK('-infinity'<=transaction_time AND transaction_time<:'switch_time')
NOT VALID;
-- for procedural reasons we know the constraint is valid. Let's make PG believe it too.
UPDATE pg_constraint
SET convalidated=true
WHERE conname='partition_boundaries'
AND conrelid='my.table_old'::REGCLASS
SET convalidated=true
WHERE conname='partition_boundaries'
AND conrelid='my.table_old'::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;
-- now attach it. We use again debug4 to check if the table scan is avoided.
SET LOCAL client_min_messages = 'debug4';
-- expecting
-- DEBUG: partition constraint for table "table_old" is implied by existing constraints
ALTER TABLE my.table ATTACH PARTITION my.table_old
FOR VALUES FROM ('-infinity') TO (:'switch_time');
RESET client_min_messages;
-- drop the now unnecessary constraint
ALTER TABLE my.table_old
DROP CONSTRAINT partition_boundaries;
-- now attach it. We use again debug4 to check if the table scan is avoided.
SET LOCAL client_min_messages = 'debug4';
-- expecting
-- DEBUG: partition constraint for table "table_old" is implied by existing constraints
ALTER TABLE my.table ATTACH PARTITION my.table_old
FOR VALUES FROM ('-infinity') TO (:'switch_time');
RESET client_min_messages;
-- drop the now unnecessary constraint
ALTER TABLE my.table_old
DROP CONSTRAINT partition_boundaries;
COMMIT;
Once the new partition gets too full, we will use a similar procedure to adjust the partition boundary of the new partition and then create the next partition.
pgsql-general by date: