Thread: Separating data sets in a table
Hello, I want to solve the following problem: CREATE TABLE Ref ( Id int ) ; CREATE TABLE Import ( Id int, Other varchar(42), Flag int, Ts timestamp ) ; CREATE TABLE Data ( Id int, Other varchar(42) ) ; The table Import will be filled by a COPY FROM statement and contains no checks for referential integrity. The columns Id and Other have to be moved to the table Data if the table Ref contains the Id. If not Flag should get a certain value that something went wrong. Moreover Import should only contain one representation of a dataset with equal Id and Other column and I would like to store the newest one (this is the reason for the timestamp). I tried to do the following approach: CREATE LOCAL TEMPORARY TABLE ImportOK () INHERITS (Import) ; INSERT INTO ImportOK SELECT * FROM Import i INNER JOIN Ref r ON i.Id = r.Id; DELETE FROM Import WHERE Id IN (SELECT Id FROM ImportOK) ; The idea was that the latest statement should get rid of all valid data sets from Import. The valid datasets now could be moved to Data and I could afterwards check Import for duplicated data sets. Unfortunately the latest statement is so terribly slow that I can't imagine that there is a better way to do this. It seems like a very beginner question but I have no real clue how to do this right. Probably the solution has to be done completely different. Thanks for your patience Andreas.
On Thu, 22 Aug 2002, Andreas Tille wrote: > I want to solve the following problem: > > CREATE TABLE Ref ( Id int ) ; > CREATE TABLE Import ( Id int, > Other varchar(42), > Flag int, > Ts timestamp ) ; > CREATE TABLE Data ( Id int, > Other varchar(42) ) ; > > The table Import will be filled by a COPY FROM statement and contains > no checks for referential integrity. The columns Id and Other have to > be moved to the table Data if the table Ref contains the Id. If not > Flag should get a certain value that something went wrong. Moreover > Import should only contain one representation of a dataset with equal > Id and Other column and I would like to store the newest one (this > is the reason for the timestamp). > ... While I fiddled around with my previous solution I found out that perhaps an updatable view could solve my problem. Thus I tried to implement: CREATE TABLE ViewImport ( Id int, Other varchar(42), Flag int, Ts timestamp ) ; CREATE RULE "_RETviewimport" AS ON SELECT TO ViewImport DO INSTEAD SELECT i.* FROM Import i INNER JOIN Ref r ON i.Id = r.Id; Now I wonder if it is possible to find a clever rule for update which contains the JOIN I placed into this working example to set the Flag just right now for the appropriate data sets. Any help for this approach or any other solution for this problem? Kind regards Andreas.
On Thu, 22 Aug 2002, Andreas Tille wrote: > Hello, > > I want to solve the following problem: > > CREATE TABLE Ref ( Id int ) ; > CREATE TABLE Import ( Id int, > Other varchar(42), > Flag int, > Ts timestamp ) ; > CREATE TABLE Data ( Id int, > Other varchar(42) ) ; Hello Andreas, I it might be easier to help you with your problem if you take a step back from it. What you are describing here is an at attempt at solving your particular problem, but you don't clearly define exactly what the larger problem. I get the sense that you have data you importing on a regular basis from outside Postgres, and you want to check it before it get moves into production, but I'm not exactly sure what's happening. Could you clarify what you are trying to do, not in the SQL, but just the english version of what's happening? I think it might be easier to help you solve your problem with that knowledge. Thanks, -mark http://mark.stosberg.com/
On Sat, 24 Aug 2002, Mark Stosberg wrote: > On Thu, 22 Aug 2002, Andreas Tille wrote: > > Hello, > > > > I want to solve the following problem: > > > > CREATE TABLE Ref ( Id int ) ; > > CREATE TABLE Import ( Id int, > > Other varchar(42), > > Flag int, > > Ts timestamp ) ; > > CREATE TABLE Data ( Id int, > > Other varchar(42) ) ; > larger problem. I get the sense that you have data you importing on a > regular basis from outside Postgres, and you want to check it before > it get moves into production, but I'm not exactly sure what's happening. You are completely right. I just do an import from an external database. The person I obtain the data from does an output of the table in a form to do a "COPY FROM". The problem is that it might happen that there are some data rows which infringe referential integrity and I have to ask back the data provider for additional data which describe additional data which are referenced by the Id mentioned above. So I have to sort out those data sets who have no known Id in my production data. Kind regards Andreas.
On Sun, 25 Aug 2002, Andreas Tille wrote: > On Sat, 24 Aug 2002, Mark Stosberg wrote: > > > On Thu, 22 Aug 2002, Andreas Tille wrote: > > > Hello, > > > > > > I want to solve the following problem: > > > > > > CREATE TABLE Ref ( Id int ) ; > > > CREATE TABLE Import ( Id int, > > > Other varchar(42), > > > Flag int, > > > Ts timestamp ) ; > > > CREATE TABLE Data ( Id int, > > > Other varchar(42) ) ; > > larger problem. I get the sense that you have data you importing on a > > regular basis from outside Postgres, and you want to check it before > > it get moves into production, but I'm not exactly sure what's happening. > > You are completely right. I just do an import from an external database. > The person I obtain the data from does an output of the table in a form > to do a "COPY FROM". The problem is that it might happen that there are > some data rows which infringe referential integrity and I have to ask > back the data provider for additional data which describe additional data > which are referenced by the Id mentioned above. So I have to sort out those > data sets who have no known Id in my production data. Andreas, Thanks for the clarification. Here's an idea about how to solve your problem. As you are importing your data, instead of doing it all at once, try import it a row at a time into a table that has the RI turned on. Check each insert to see if it's successful. It if it's not successful, then insert that row into a table that /doesn't/ have RI (maybe "import_failures"), perhaps also including the error that Postgres returned. (This may be stored in $DBH::errstr). Then when you are done, you can look in the import_failures for a report of which rows need some assistance. If you need every row to succeed that's imported into the production table, you can do all this inside of a transaction, and roll it back if any of the inserts fail. [ thinks for a moment. ] Of course, that would normally rollback your inserts into import_failures too, so perhaps you can use a second database connection to make sure those always happen. I hope that helps. Perhaps thinking in terms of "row-at-a-time processing" will help you solve your problem. -mark http://mark.stosberg.com/
On Sun, 25 Aug 2002, Mark Stosberg wrote: > Thanks for the clarification. Here's an idea about how to solve your > problem. As you are importing your data, instead of doing it all at > once, try import it a row at a time into a table that has the RI turned > on. Check each insert to see if it's successful. It if it's not > successful, then insert that row into a table that /doesn't/ have RI > (maybe "import_failures"), > perhaps also including the error that Postgres returned. (This may be > stored in $DBH::errstr). Then when you are done, you can look in the > import_failures for a report of which rows need some assistance. If you > need every row to succeed that's imported into the production table, you > can do all this inside of a transaction, and roll it back if any of the > inserts fail. [ thinks for a moment. ] Of course, that would normally > rollback your inserts into import_failures too, so perhaps you can use a > second database connection to make sure those always happen. > > I hope that helps. Perhaps thinking in terms of "row-at-a-time > processing" will help you solve your problem. Well for sure this might be an option but as I said I receive the data in the dump format apropriate to use "COPY <tablemane> FROM <file>". Would you really like to suggest me to split those data sets into single lines? Moreover I'm not sure about how to catch the error messages of failed COPY statements. I've thought that including all data and handling them afterwards would be agood idea and it is just my lack of SQL knowledge which prevents me from finding a clever solution to sort the stuff out. Kind regards Andreas.
On Mon, 2002-08-26 at 07:26, Andreas Tille wrote: > Well for sure this might be an option but as I said I receive the data > in the dump format apropriate to use "COPY <tablemane> FROM <file>". Would > you really like to suggest me to split those data sets into single lines? > Moreover I'm not sure about how to catch the error messages of failed > COPY statements. How about this approach: Create a temporary table (no constraints) CREATE TEMP TABLE temptable AS (SELECT* FROM tablename LIMIT 1); DELETE FROM temptable; Copy all data into the temporary table COPYtemptable FROM 'filepath'; Select from the temporary table all items that satisfy the constraints, insert theminto the real table and delete them from the temporary table: BEGIN; INSERT INTO tablename (SELECT* FROM temptable WHERE ...); DELETE FROM temptable WHERE ...; COMMIT; All good data should now bein place. The temporary table should now contain only those items that do not satisfy the constraints for the realtable. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "Seeing then that all these things shall be dissolved, what manner of personsought ye to be? You ought to live holy and godly lives as you look forward to the day of God and speed itscoming." II Peter 3:11,12
On 26 Aug 2002, Oliver Elphick wrote: > Create a temporary table (no constraints) > > CREATE TEMP TABLE temptable AS > (SELECT * FROM tablename LIMIT 1); > DELETE FROM temptable; > > Copy all data into the temporary table > > COPY temptable FROM 'filepath'; Up to this point I have no problems. The difference is that I have to mark the invalid data sets by a flag which represents a "reason" why the data were invalid. That's why I use an additional flag in the table and I do not use a temporary table because I have to store the "histrory" of invalid data (to bother the provider of the data to fix it). > Select from the temporary table all items that satisfy the > constraints, insert them into the real table and delete them from > the temporary table: > > BEGIN; > INSERT INTO tablename (SELECT * FROM temptable WHERE ...); > DELETE FROM temptable WHERE ...; > COMMIT; > > All good data should now be in place. The temporary table should > now contain only those items that do not satisfy the constraints for > the real table. This was in my first atempt here. The problem I have is that I need a JOIN to a further table and I've got errors from the parser which let me guess that joins are not allowed in INSERT statements ... at least I do not know how to do it right if it should be possible. That's why I had the idea just to set a certain flag and then do the insert of all data sets where flag = OK. Kind regards Andreas.
On Mon, 2002-08-26 at 10:46, Andreas Tille wrote: > On 26 Aug 2002, Oliver Elphick wrote: > > > Select from the temporary table all items that satisfy the > > constraints, insert them into the real table and delete them from > > the temporary table: > > > > BEGIN; > > INSERT INTO tablename (SELECT * FROM temptable WHERE ...); > > DELETE FROM temptable WHERE ...; > > COMMIT; > > > > All good data should now be in place. The temporary table should > > now contain only those items that do not satisfy the constraints for > > the real table. > This was in my first atempt here. > The problem I have is that I need a JOIN to a further table and > I've got errors from the parser which let me guess that joins are not > allowed in INSERT statements ... at least I do not know how to do it > right if it should be possible. A natural join seems to work fine: bray=# insert into junk (select b.* from batch as b, product as p where b.product= p.id and p.eancode is not null); INSERT 0 7552 I don't have any earlier messages from this thread; what was the syntax you were trying to use? -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "Seeing then that all these things shall be dissolved, what manner of personsought ye to be? You ought to live holy and godly lives as you look forward to the day of God and speed itscoming." II Peter 3:11,12
"Andreas Tille" <tillea@rki.de> a �crit dans le message de news: Pine.LNX.4.44.0208221126490.1478-100000@wr-linux02.rki.ivbb.bund.de... ... > I tried to do the following approach: > > CREATE LOCAL TEMPORARY TABLE ImportOK () INHERITS (Import) ; > > INSERT INTO ImportOK SELECT * FROM Import i > INNER JOIN Ref r ON i.Id = r.Id; > > DELETE FROM Import WHERE Id IN (SELECT Id FROM ImportOK) ; ... > Unfortunately the latest statement is so terribly slow that I can't > imagine that there is a better way to do this. > You must use EXISTS if you work with big tables. EXISTS use indexes, and IN use temporary tables. -- this change nothing for IN : CREATE INDEX import_id_index ON import(id); CREATE INDEX import_ok_id_index ON import_ok(id); -- slow : -- DELETE FROM import WHERE id IN (SELECT id FROM import_ok) ; DELETE FROM import WHERE EXISTS ( SELECT id FROM import_ok AS ok WHERE ok.id = import.id );