Hi all!
I have an interesting problem here that I think could be of interest to everyone. I in the process of writing test
casesfor our applications and there is one problem I am facing. To be able to test correctly, I need to create a small
database(a sample if you want) from a very large one so that I can run some tests on a subset of the data.
Sometimes you are asked to do this but know nothing about the database in advance (ugh!).
I could create several queries and build it myself by trial and error, but I was wondering if a more general approach
couldbe elaborated.
For my case, the testing does not write in the database and queries are simple in nature (they do not use count() or
anythingthat needs any whole table to work).
Here are some solutions that I have come up with :
a) Run the main program A on the large database, however, I will restrict its operation to only a subset of the data.
If I create a MockConnection, I could save all the queries as text strings and serialize the result set that postgres
returnedand use this logging to re-run the program without connection to the real database. I would simply return the
serializedresult set if I find a match for the query.
b) Take the source code of Postgres and add some tweaks in there so that it logs every table/row that was needed in the
outputresultSet and build a seperate minimal DB from that.
With this options, database B (the minimalist one) would not have any constraints. On the other hand, it may be very
difficultfor me to add this logging in the source code.
Of course, option B seems much more interesting, since a small optimization in the programs queries would probably
stillwork while in option a) it would fail immediately.
I was wondering if maybe there was something else I could do to solve this problem in a general way?
Daniel Shane