Thread: Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.
Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.
From
Daniel Shane
Date:
Hi all! I have an interesting problem here that I think could be of interest to everyone. I in the process of writing test casesfor our applications and there is one problem I am facing. To be able to test correctly, I need to create a small database(a sample if you want) from a very large one so that I can run some tests on a subset of the data. Sometimes you are asked to do this but know nothing about the database in advance (ugh!). I could create several queries and build it myself by trial and error, but I was wondering if a more general approach couldbe elaborated. For my case, the testing does not write in the database and queries are simple in nature (they do not use count() or anythingthat needs any whole table to work). Here are some solutions that I have come up with : a) Run the main program A on the large database, however, I will restrict its operation to only a subset of the data. If I create a MockConnection, I could save all the queries as text strings and serialize the result set that postgres returnedand use this logging to re-run the program without connection to the real database. I would simply return the serializedresult set if I find a match for the query. b) Take the source code of Postgres and add some tweaks in there so that it logs every table/row that was needed in the outputresultSet and build a seperate minimal DB from that. With this options, database B (the minimalist one) would not have any constraints. On the other hand, it may be very difficultfor me to add this logging in the source code. Of course, option B seems much more interesting, since a small optimization in the programs queries would probably stillwork while in option a) it would fail immediately. I was wondering if maybe there was something else I could do to solve this problem in a general way? Daniel Shane
Re: Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.
From
Scott Bailey
Date:
Daniel Shane wrote: > Hi all! > > I have an interesting problem here that I think could be of interest to everyone. I in the process of writing test casesfor our applications and there is one problem I am facing. To be able to test correctly, I need to create a small database(a sample if you want) from a very large one so that I can run some tests on a subset of the data. > > Sometimes you are asked to do this but know nothing about the database in advance (ugh!). > > I could create several queries and build it myself by trial and error, but I was wondering if a more general approach couldbe elaborated. > > For my case, the testing does not write in the database and queries are simple in nature (they do not use count() or anythingthat needs any whole table to work). > > Here are some solutions that I have come up with : > > a) Run the main program A on the large database, however, I will restrict its operation to only a subset of the data. > > If I create a MockConnection, I could save all the queries as text strings and serialize the result set that postgres returnedand use this logging to re-run the program without connection to the real database. I would simply return the serializedresult set if I find a match for the query. > > b) Take the source code of Postgres and add some tweaks in there so that it logs every table/row that was needed in theoutput resultSet and build a seperate minimal DB from that. > > With this options, database B (the minimalist one) would not have any constraints. On the other hand, it may be very difficultfor me to add this logging in the source code. > > > Of course, option B seems much more interesting, since a small optimization in the programs queries would probably stillwork while in option a) it would fail immediately. > > I was wondering if maybe there was something else I could do to solve this problem in a general way? > > Daniel Shane I'd create a test schema, set the search path on your test user to just that schema. And you could create the tables something like so: CREATE TABLE test.foo AS SELECT * FROM public.foo LIMIT 1000; Scott
Re: Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.
From
Daniel Shane
Date:
Hi Scott! The problem is that my test database has several tables with many links between them, so I have no idea which 1000 rows toget from which table. The only thing I can do is run the program that connects to that database and tell it to run on asample of the database. I can get a log of all the queries that are executed, but I was wondering if there was a more general solution where I coulduse a "modified/hacked" postgres driver and catch all the rows of all the tables that were accessed during those queries. I could then simply insert them into the test database and in theory my program should run the same if I used it insteadof the real one (assuming its configure to run on the same sample). Daniel Shane >>>QUOTE I'd create a test schema, set the search path on your test user to just that schema. And you could create the tables something like so: CREATE TABLE test.foo AS SELECT * FROM public.foo LIMIT 1000; Scott <<<QUOTE Daniel Shane wrote: > Hi all! > > I have an interesting problem here that I think could be of interest to everyone. I in the process of writing test casesfor our applications and there is one problem I am facing. To be able to test correctly, I need to create a small database(a sample if you want) from a very large one so that I can run some tests on a subset of the data. > > Sometimes you are asked to do this but know nothing about the database in advance (ugh!). > > I could create several queries and build it myself by trial and error, but I was wondering if a more general approach couldbe elaborated. > > ...
Re: Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.
From
Scott Bailey
Date:
Daniel Shane wrote: > Hi Scott! > > The problem is that my test database has several tables with many links between them, so I have no idea which 1000 rowsto get from which table. The only thing I can do is run the program that connects to that database and tell it to runon a sample of the database. > > I can get a log of all the queries that are executed, but I was wondering if there was a more general solution where Icould use a "modified/hacked" postgres driver and catch all the rows of all the tables that were accessed during those queries. > > I could then simply insert them into the test database and in theory my program should run the same if I used it insteadof the real one (assuming its configure to run on the same sample). > > Daniel Shane Yeah, still a much easier way. If there are many links between the tables, then related records would typically be inserted either in the same transaction or soon there after. You can use the xmin columns to copy related rows from all tables. Scott