Thread: [GENERAL] Generating sample data
My previous databases used real client (or my own) data; now I want to generate sample data for the tables in the two applications I'm developing. My web search finds a bunch of pricey (IMO) commercial products. Are there any open source data generators that can provide sample data based on each table's schema? TIA, Rich
In the Ruby land there's a gem called faker that allows you to generate fake data. However, I'm not sure it can generate data based on a schema so a little bit of scripting my be necessary. Would this approach work for you?
Yours
Greg
You could start here:
I have rolled my own on occasion by just pulling some public lists of most common given names and family names and toing a full-join. Same for city, streets, etc.
-Steve
On Tue, Dec 27, 2016 at 11:23 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.
Are there any open source data generators that can provide sample data
based on each table's schema?
TIA,
Rich
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, 27 Dec 2016, Greg Navis wrote: > In the Ruby land there's a gem called faker > <https://github.com/stympy/faker> that allows you to generate fake data. > However, I'm not sure it can generate data based on a schema so a little > bit of scripting my be necessary. Would this approach work for you? Greg, I work in Python, not Ruby, so this might be too big of a hurdle. Thanks, Rich
On Tue, Dec 27, 2016 at 12:01 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
You could start here:I have rolled my own on occasion by just pulling some public lists of most common given names and family names and toing a full-join. Same for city, streets, etc.-SteveOn Tue, Dec 27, 2016 at 11:23 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.
Are there any open source data generators that can provide sample data
based on each table's schema?
TIA,
Rich
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Sorry, "doing" a full-join. Which also leads to lots of fun cross-cultural names like "Muhammad Wang" and "Santiago O'Leary".
Cheers,
Steve
On Tue, 27 Dec 2016, Steve Crawford wrote: > You could start here: > http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/ > I have rolled my own on occasion by just pulling some public lists of most > common given names and family names and toing a full-join. Same for city, > streets, etc. Steve, Thanks very much for the URL. One application is small (7 tables), the other is three times that size (23 tables). If I need to find public domain data on the Web, I'll do that. Much appreciated, Rich
On 12/27/2016 12:03 PM, Rich Shepard wrote: > On Tue, 27 Dec 2016, Greg Navis wrote: > >> In the Ruby land there's a gem called faker >> <https://github.com/stympy/faker> that allows you to generate fake data. >> However, I'm not sure it can generate data based on a schema so a little >> bit of scripting my be necessary. Would this approach work for you? > > Greg, > > I work in Python, not Ruby, so this might be too big of a hurdle. As it happens there is a Python version of the a fore mentioned faker: https://pypi.python.org/pypi/Faker/0.7.7 It was I use to generate fake/sample data. > > Thanks, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/27/2016 02:23 PM, Adrian Klaver wrote: > On 12/27/2016 12:03 PM, Rich Shepard wrote: >> On Tue, 27 Dec 2016, Greg Navis wrote: >> >>> In the Ruby land there's a gem called faker >>> <https://github.com/stympy/faker> that allows you to generate fake data. >>> However, I'm not sure it can generate data based on a schema so a little >>> bit of scripting my be necessary. Would this approach work for you? >> >> Greg, >> >> I work in Python, not Ruby, so this might be too big of a hurdle. > > As it happens there is a Python version of the a fore mentioned faker: > > https://pypi.python.org/pypi/Faker/0.7.7 > > It was I use to generate fake/sample data. Ugh. It is what I use to generate fake/sample data. Memo to self: Do one thing at a time! > >> >> Thanks, >> >> Rich >> >> > > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/27/2016 12:06 PM, Rich Shepard wrote: > On Tue, 27 Dec 2016, Steve Crawford wrote: > >> You could start here: >> http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/ >> > >> I have rolled my own on occasion by just pulling some public lists of >> most >> common given names and family names and toing a full-join. Same for city, >> streets, etc. > > Steve, > > Thanks very much for the URL. One application is small (7 tables), the > other is three times that size (23 tables). If I need to find public domain > data on the Web, I'll do that. What sort of data do you want to create? If it is the standard contact information then the previously mentioned tools are sufficient. If it is data specific to a field of study then things might get trickier. > > Much appreciated, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 27 Dec 2016, Adrian Klaver wrote: > What sort of data do you want to create? Adrian, Various text, date, and numeric values. > If it is data specific to a field of study then things might get trickier. It's not a common database. I'll probably need to cobble together generic data of the appropriate types myself. Thanks, Rich
On Tue, 27 Dec 2016, Adrian Klaver wrote: > As it happens there is a Python version of the a fore mentioned faker: > https://pypi.python.org/pypi/Faker/0.7.7 > It was I use to generate fake/sample data. Adrian, Aha! That's a great start for me. Many thanks, Rich
On Tue, 27 Dec 2016, Adrian Klaver wrote: > As it happens there is a Python version of the a fore mentioned faker: > https://pypi.python.org/pypi/Faker/0.7.7 Adrian, Impressive and complete. It will generate all the data I need. Many thanks, Rich
----- Original Message ----- > From: "Rich Shepard" <rshepard@appl-ecosys.com> > To: pgsql-general@postgresql.org > Sent: Tuesday, December 27, 2016 7:23:46 PM > Subject: Re: [GENERAL] Generating sample data > > On Tue, 27 Dec 2016, Adrian Klaver wrote: > > > As it happens there is a Python version of the a fore mentioned faker: > > https://pypi.python.org/pypi/Faker/0.7.7 > > Adrian, > > Impressive and complete. It will generate all the data I need. > This is kind of fun: https://github.com/bmtober/groan I had to hunt down the original author from the 1990's, which was when I originally downloaded from his personal web siteat http://raingod.com/raingod/resources/Programming/Perl/Software/Groan/ The initial commit on that github page is the original source as provided by Mr. McIntyre. In a subsequent commit, I removed some of the original code that formatted for HTML output, leaving just plain text, andalso posted an example grammar for generating fake names and strings that look like social security numbers (i.e., a U.S.taxpayer identification). The script will generate duplicates, but you can do something like for n in {1..20} do groan.pl ssn.gn done | sort -u to get unique source data. By defining other custom grammars, you could potentially generate all kinds of data. -- B
Hi, Not open source, but also not pricey (IMO): Advanced Data Generator. http://www.upscene.com/advanced_data_generator/ Generates e-mail addresses, street names, first & last names, company names, complex relationships etc. And yes, this is our product. ;) With regards, Martijn Tonies Upscene Productions http://www.upscene.com My previous databases used real client (or my own) data; now I want to generate sample data for the tables in the two applications I'm developing. My web search finds a bunch of pricey (IMO) commercial products. Are there any open source data generators that can provide sample data based on each table's schema? TIA, Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Wed, 28 Dec 2016, Martijn Tonies (Upscene Productions) wrote: > Not open source, but also not pricey (IMO): Advanced Data Generator. > http://www.upscene.com/advanced_data_generator/ > > Generates e-mail addresses, street names, first & last names, company names, > complex relationships etc. > > And yes, this is our product. ;) Martijn, Thank you for making me aware of your company and product. However, after 20 years of using only F/OSS to run my business (and personal computing) needs and contributing to several open source projects along the way my preference is to use such tools. When I get the large database application up and running I'll post it on github and turn it loose into the F/OSS world under the GPL. Regards, Rich