Thread: Save failed records into auxiliary table
Hello Everyone,
I am writing to ask help about a use case I have to set up: if anyone can provide me with any suggestions, I would be really grateful.
This is my use case:
I have a table, with a primary key composed of two columns, each one, of course, with a not null constraint.
Each night a bunch of data will be loaded from an external ETL, which is out of my control.
Since I know there might be data quality issues, such as duplicated records or invalid records with null values on not-null columns, I would like to save such records, and only such records, in an auxiliary table.
Currently, I tried two solutions:
1) Trigger on insert: but I could not have the insert into the auxiliary table working, because the constraints violation triggers a rollback
2) Rule on insert, with a "DO INSTEAD" condition, which works, but, in case of invalid records, copies all the row of the transaction in the auxiliary table and not only the invalid ones.
This is what I did:
Let's assume that my table is actually name "mytable", with pk composed of (col1,col2).
The auxiliary table is mytable_failures, which has the same columns of mytable, no constraints and an additional column "fail_reason".
These are the rules:
CREATE or REPLACE RULE insert_mytable_nulls AS
ON INSERT TO mytable
where (length(trim(NEW.col1))=0 OR
length(trim(NEW.col2))=0 OR
NEW.col1 IS NULL OR
NEW.col2 IS NULL)
DO INSTEAD
insert into mytable_failures values(NEW.*,'col1 and col2 cannot be null');
CREATE or REPLACE RULE insert_mytable_pkey AS
ON INSERT TO mytable
where (select true from mytable where col1=NEW.col1 and col2=NEW.col2)
DO INSTEAD
insert into mytable_failures values(NEW.*,'Primary Key violation');
CREATE or REPLACE RULE insert_mytable_nulls AS
ON INSERT TO mytable
where (length(trim(NEW.col1))=0 OR
length(trim(NEW.col2))=0 OR
NEW.col1 IS NULL OR
NEW.col2 IS NULL)
DO INSTEAD
insert into mytable_failures values(NEW.*,'col1 and col2 cannot be null');
CREATE or REPLACE RULE insert_mytable_pkey AS
ON INSERT TO mytable
where (select true from mytable where col1=NEW.col1 and col2=NEW.col2)
DO INSTEAD
insert into mytable_failures values(NEW.*,'Primary Key violation');
If I execute the following transaction batch:
INSERT INTO mytable(
col1, col2, col3, col3, last_update)
VALUES ('UK', 'FB00004', 'en', now());
INSERT INTO mytable(
col1, col2, col3, col3, last_update)
VALUES ('UK', 'FB00005', 'en', now());
INSERT INTO mytable(
col1, col2, col3, col3, last_update)
VALUES ('UK', 'FB00004', 'en', now());
col1, col2, col3, col3, last_update)
VALUES ('UK', 'FB00004', 'en', now());
INSERT INTO mytable(
col1, col2, col3, col3, last_update)
VALUES ('UK', 'FB00005', 'en', now());
INSERT INTO mytable(
col1, col2, col3, col3, last_update)
VALUES ('UK', 'FB00004', 'en', now());
1) the first two rows are correctly inserted in mytable and the third discarded
2) incorrectly, all the three rows in the mytable_failures.
Thank you for all the help you can provide
Mimo
Am 12.11.20 um 10:00 schrieb Il Mimo di Creta: > Hello Everyone, > > I am writing to ask help about a use case I have to set up: if anyone > can provide me with any suggestions, I would be really grateful. > > This is my use case: > I have a table, with a primary key composed of two columns, each one, > of course, with a not null constraint. > Each night a bunch of data will be loaded from an external ETL, which > is out of my control. > Since I know there might be data quality issues, such as duplicated > records or invalid records with null values on not-null columns, I > would like to save such records, and only such records, in an > auxiliary table. > > Currently, I tried two solutions: > 1) Trigger on insert: but I could not have the insert into the > auxiliary table working, because the constraints violation triggers a > rollback > 2) Rule on insert, with a "DO INSTEAD" condition, which works, but, in > case of invalid records, copies all the row of the transaction in the > auxiliary table and not only the invalid ones. > > This is what I did: > > Let's assume that my table is actually name "mytable", with pk > composed of (col1,col2). > > The auxiliary table is mytable_failures, which has the same columns of > mytable, no constraints and an additional column "fail_reason". > > These are the rules: > CREATE or REPLACE RULE insert_mytable_nulls AS > ON INSERT TO mytable > where (length(trim(NEW.col1))=0 OR > length(trim(NEW.col2))=0 OR > NEW.col1 IS NULL OR > NEW.col2 IS NULL) > DO INSTEAD > insert into mytable_failures values(NEW.*,'col1 and col2 cannot be null'); > > > CREATE or REPLACE RULE insert_mytable_pkey AS > ON INSERT TO mytable > where (select true from mytable where col1=NEW.col1 and col2=NEW.col2) > DO INSTEAD > insert into mytable_failures values(NEW.*,'Primary Key violation'); > > If I execute the following transaction batch: > > INSERT INTO mytable( > col1, col2, col3, col3, last_update) > VALUES ('UK', 'FB00004', 'en', now()); > INSERT INTO mytable( > col1, col2, col3, col3, last_update) > VALUES ('UK', 'FB00005', 'en', now()); > INSERT INTO mytable( > col1, col2, col3, col3, last_update) > VALUES ('UK', 'FB00004', 'en', now()); > > 1) the first two rows are correctly inserted in mytable and the third > discarded > 2) incorrectly, all the three rows in the mytable_failures. > > Thank you for all the help you can provide > > Mimo I think this might help: https://www.postgresqltutorial.com/postgresql-upsert/ -- Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Attachment
On Thu, Nov 12, 2020 at 2:01 AM Il Mimo di Creta <mimo.creta@gmail.com> wrote:
I would like to save such records, and only such records, in an auxiliary table.
You will probably need to resort to loading the incoming data into an unlogged (and consider temporary) staging table then executing one query to load the known good data into the main table and a second query to load the known bad data into the auxiliary table. For me this is the most direct solution generally anyway and should be used absent a performance problem necessitating optimization work.
Upsert isn't going to help you load incoming data into an auxiliary table, it can only adjust the data to avoid the conflict relative to the referenced table.
David J.
Thank you David,
I tried other solutions, but at this point I agree this looks like the most viable way to solve the problem.
Thank you for your help
On Thu, Nov 12, 2020 at 4:26 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Nov 12, 2020 at 2:01 AM Il Mimo di Creta <mimo.creta@gmail.com> wrote:I would like to save such records, and only such records, in an auxiliary table.You will probably need to resort to loading the incoming data into an unlogged (and consider temporary) staging table then executing one query to load the known good data into the main table and a second query to load the known bad data into the auxiliary table. For me this is the most direct solution generally anyway and should be used absent a performance problem necessitating optimization work.Upsert isn't going to help you load incoming data into an auxiliary table, it can only adjust the data to avoid the conflict relative to the referenced table.David J.