Thread: Creating files with testdata
I am developing a complex multi-tenant application in postgresql 16 in Linux. During the development I would like to be ableto enter test data into various related tables at any given time for testing purposes. While this can certainly be accomplished by predefined CTE INSERT statements in an SQL file, I would prefer something moreflexible. I am thinking of using a markdown file as a source for these CTE INSERT statements. Specifically, having acertain structure in the markdown file where a given markdown heading level, bullet level etc. would correspond to specifictables and columns. After entering my test data into the markdown file for the given test scenario, I would then run an awk script or similarto create a SQL file with the various CTE INSERT statements. Howevever, it gets complex since I need to handle 1:Nrelationships between tables in the markdown file... I hope the above outline is understandable and am interested in comments and thoughts on my above approach. Thanks.
On 3/10/25 11:09 AM, H wrote: > I am developing a complex multi-tenant application in postgresql 16 in Linux. During the development I would like to beable to enter test data into various related tables at any given time for testing purposes. > > While this can certainly be accomplished by predefined CTE INSERT statements in an SQL file, I would prefer something moreflexible. I am thinking of using a markdown file as a source for these CTE INSERT statements. Specifically, having acertain structure in the markdown file where a given markdown heading level, bullet level etc. would correspond to specifictables and columns. Why CTE INSERTs? > > After entering my test data into the markdown file for the given test scenario, I would then run an awk script or similarto create a SQL file with the various CTE INSERT statements. Howevever, it gets complex since I need to handle 1:Nrelationships between tables in the markdown file... > > I hope the above outline is understandable and am interested in comments and thoughts on my above approach. > > Thanks. > > -- Adrian Klaver adrian.klaver@aklaver.com
On March 10, 2025 2:26:48 PM GMT-04:00, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > >On 3/10/25 11:09 AM, H wrote: >> I am developing a complex multi-tenant application in postgresql 16 >in Linux. During the development I would like to be able to enter test >data into various related tables at any given time for testing >purposes. >> >> While this can certainly be accomplished by predefined CTE INSERT >statements in an SQL file, I would prefer something more flexible. I am >thinking of using a markdown file as a source for these CTE INSERT >statements. Specifically, having a certain structure in the markdown >file where a given markdown heading level, bullet level etc. would >correspond to specific tables and columns. > >Why CTE INSERTs? > >> >> After entering my test data into the markdown file for the given test >scenario, I would then run an awk script or similar to create a SQL >file with the various CTE INSERT statements. Howevever, it gets complex >since I need to handle 1:N relationships between tables in the markdown >file... >> >> I hope the above outline is understandable and am interested in >comments and thoughts on my above approach. >> >> Thanks. >> >> There are tables referencing each other using randomly generated IDs, ie. those IDs are not known until after the parenttable row is inserted.
On Mon, Mar 10, 2025 at 12:17 PM H <agents@meddatainc.com> wrote:
There are tables referencing each other using randomly generated IDs, ie. those IDs are not known until after the parent table row is inserted.
I just reserve half of the number space of bigint, the negatives, for test data and assign known IDs as part of the test data specification.
David J.
On Mon, 10 Mar 2025 at 19:17, H <agents@meddatainc.com> wrote: ... > After entering my test data into the markdown file for the given test scenario, I would then run an awk script or similarto create a SQL file with the various CTE INSERT statements. Howevever, it gets complex since I need to handle 1:Nrelationships between tables in the markdown file... > I hope the above outline is understandable and am interested in comments and thoughts on my above approach. I do not think MD would be a good source format. Being a developer I would recommend taking your favorite scripting language ( perl / python / lua / js , whatever ), build a big object ( which can be easily helped by some mini-functions to build a little DSL ) and then spit sql from it ( for developemewnt it is normally better then using whatever db connection your language has, as it leaves an understandable sql script ). I have done this with perl for some projects, built a driver which defined several helper functions, then dofile("xx.dat") which returned a big hash and then a series of loops on the result to write the SQL in whatever order was neccessary. Francisco Olarte.
On Mon, 10 Mar 2025 at 20:18, H <agents@meddatainc.com> wrote: > There are tables referencing each other using randomly generated IDs, ie. those IDs are not known until after the parenttable row is inserted. Random? Then they CAN collide. ;-> Do the IDs need to be random or are they just the default value? I've encountered a similar problem with sequences and have bypassed it by using explicit values into the test data, i.e.m, inserting an invoice with id 1000 followed by its lines with IDs 100010, 100011, .... and then setting start_values for the sequences at 99999999. This helped me a lot in a debugging. You could do the same depending on what domain your random ids are on. Francisco Olarte.
On March 10, 2025 3:27:02 PM GMT-04:00, Francisco Olarte <folarte@peoplecall.com> wrote: >On Mon, 10 Mar 2025 at 20:18, H <agents@meddatainc.com> wrote: >> There are tables referencing each other using randomly generated IDs, >ie. those IDs are not known until after the parent table row is >inserted. > >Random? Then they CAN collide. ;-> > >Do the IDs need to be random or are they just the default value? > >I've encountered a similar problem with sequences and have bypassed it >by using explicit values into the test data, i.e.m, inserting an >invoice with id 1000 followed by its lines with IDs 100010, 100011, >.... and then setting start_values for the sequences at 99999999. This >helped me a lot in a debugging. You could do the same depending on >what domain your random ids are on. > >Francisco Olarte. K am using UUIDs which the application will use as well.
On March 10, 2025 3:22:41 PM GMT-04:00, Francisco Olarte <folarte@peoplecall.com> wrote: >On Mon, 10 Mar 2025 at 19:17, H <agents@meddatainc.com> wrote: >... >> After entering my test data into the markdown file for the given test >scenario, I would then run an awk script or similar to create a SQL >file with the various CTE INSERT statements. Howevever, it gets complex >since I need to handle 1:N relationships between tables in the markdown >file... >> I hope the above outline is understandable and am interested in >comments and thoughts on my above approach. > >I do not think MD would be a good source format. Being a developer I >would recommend taking your favorite scripting language ( perl / >python / lua / js , whatever ), build a big object ( which can be >easily helped by some mini-functions to build a little DSL ) and then >spit sql from it ( for developemewnt it is normally better then using >whatever db connection your language has, as it leaves an >understandable sql script ). > >I have done this with perl for some projects, built a driver which >defined several helper functions, then dofile("xx.dat") which returned >a big hash and then a series of loops on the result to write the SQL >in whatever order was neccessary. > >Francisco Olarte. The plan is to use the same format for later adding information into the finished application. By the way, all informationis text, either paragraphs, sentences or single words.
On March 10, 2025 3:21:57 PM GMT-04:00, "David G. Johnston" <david.g.johnston@gmail.com> wrote: >On Mon, Mar 10, 2025 at 12:17 PM H <agents@meddatainc.com> wrote: > >> There are tables referencing each other using randomly generated IDs, >ie. >> those IDs are not known until after the parent table row is inserted. >> > >I just reserve half of the number space of bigint, the negatives, for >test >data and assign known IDs as part of the test data specification. > >David J. The finished application will also use UUIDs which is why am using it now.
On March 10, 2025 5:09:46 PM GMT-04:00, H <agents@meddatainc.com> wrote: >On March 10, 2025 3:22:41 PM GMT-04:00, Francisco Olarte ><folarte@peoplecall.com> wrote: >>On Mon, 10 Mar 2025 at 19:17, H <agents@meddatainc.com> wrote: >>... >>> After entering my test data into the markdown file for the given >test >>scenario, I would then run an awk script or similar to create a SQL >>file with the various CTE INSERT statements. Howevever, it gets >complex >>since I need to handle 1:N relationships between tables in the >markdown >>file... >>> I hope the above outline is understandable and am interested in >>comments and thoughts on my above approach. >> >>I do not think MD would be a good source format. Being a developer I >>would recommend taking your favorite scripting language ( perl / >>python / lua / js , whatever ), build a big object ( which can be >>easily helped by some mini-functions to build a little DSL ) and then >>spit sql from it ( for developemewnt it is normally better then using >>whatever db connection your language has, as it leaves an >>understandable sql script ). >> >>I have done this with perl for some projects, built a driver which >>defined several helper functions, then dofile("xx.dat") which returned >>a big hash and then a series of loops on the result to write the SQL >>in whatever order was neccessary. >> >>Francisco Olarte. > >The plan is to use the same format for later adding information into >the finished application. By the way, all information is text, either >paragraphs, sentences or single words. I am also contemplating using Pandoc to process the markdown file. Pandoc builds an AST which can then be processed as desired.Awk processes a file line-by-line which is not ideal.
On March 10, 2025 3:15:10 PM EDT, H <agents@meddatainc.com> wrote: >On March 10, 2025 2:26:48 PM GMT-04:00, Adrian Klaver ><adrian.klaver@aklaver.com> wrote: >> >> >>On 3/10/25 11:09 AM, H wrote: >>> I am developing a complex multi-tenant application in postgresql 16 >>in Linux. During the development I would like to be able to enter test >>data into various related tables at any given time for testing >>purposes. >>> >>> While this can certainly be accomplished by predefined CTE INSERT >>statements in an SQL file, I would prefer something more flexible. I >am >>thinking of using a markdown file as a source for these CTE INSERT >>statements. Specifically, having a certain structure in the markdown >>file where a given markdown heading level, bullet level etc. would >>correspond to specific tables and columns. >> >>Why CTE INSERTs? >> >>> >>> After entering my test data into the markdown file for the given >test >>scenario, I would then run an awk script or similar to create a SQL >>file with the various CTE INSERT statements. Howevever, it gets >complex >>since I need to handle 1:N relationships between tables in the >markdown >>file... >>> >>> I hope the above outline is understandable and am interested in >>comments and thoughts on my above approach. >>> >>> Thanks. >>> >>> > >There are tables referencing each other using randomly generated IDs, >ie. those IDs are not known until after the parent table row is >inserted. I thought I should give an example of why I need this. This is /not/ the application but might be a useful example: Imagine a very simple Wikipedia-clone. You have a couple of DB tables for article subjects in a couple of levels, a tablefor body text and a couple of different tables for references and for images related to the article in question. Referencesand images are not referenced inline in the body text but at the end. Everything is generated manually, ie the body text is written, various references added, and a suitable subject area in differentlevels chosen, all by an author. I want to be able to write this "article", with the appropriate references, images etc to be done outside this database application.My vision is it's done in a markdown document where a strict use of heading levels, bullet lists etc would correlatewith subject areas, body text, references, images and what-nots. A script of some kind that can parse markdown then generates the necessary SQL statements to insert all of this in the appropriatetables, all according to how the information was laid out in the markdown document. My initial thought was to use awk but it's (awk)ward since it is strictly line/paragraph oriented. Pandoc parses markdownvery well, generates an AST internally and then the output filter should be able to generate complex SQL statementssuch as CTEs. Hope that might shed some more light on my thought process.
DbVisualizer https://www.dbvis.com/docs/ug/exporting-a-grid/?_gl=1*1aoko6w*_up*MQ..*_ga*MTA0NjM4MTcxNi4xNzQxNjc2MDc0*_ga_9P463C7BC6*MTc0MTY3NjA3My4xLjEuMTc0MTY3NjEyNC4wLjAuODQ3OTY1NDMx*_ga_PTZV59MQGC*MTc0MTY3NjA3My4xLjEuMTc0MTY3NjEyNC4wLjAuMA..#generating-test-data Or DBeaver https://dbeaver.com/docs/dbeaver/Mock-Data-Generation-in-DBeaver/ might be of use
On March 11, 2025 2:59:35 AM EDT, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: >DbVisualizer >https://www.dbvis.com/docs/ug/exporting-a-grid/?_gl=1*1aoko6w*_up*MQ..*_ga*MTA0NjM4MTcxNi4xNzQxNjc2MDc0*_ga_9P463C7BC6*MTc0MTY3NjA3My4xLjEuMTc0MTY3NjEyNC4wLjAuODQ3OTY1NDMx*_ga_PTZV59MQGC*MTc0MTY3NjA3My4xLjEuMTc0MTY3NjEyNC4wLjAuMA..#generating-test-data > >Or > >DBeaver >https://dbeaver.com/docs/dbeaver/Mock-Data-Generation-in-DBeaver/ > >might be of use I need to generate real test data, not mock data.