Thread: [GENERAL] Generating sample data

[GENERAL] Generating sample data

From
Rich Shepard
Date:
   My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

   Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich




Re: [GENERAL] Generating sample data

From
Greg Navis
Date:
In the Ruby land there's a gem called faker that allows you to generate fake data. However, I'm not sure it can generate data based on a schema so a little bit of scripting my be necessary. Would this approach work for you?

Yours
Greg

Re: [GENERAL] Generating sample data

From
Steve Crawford
Date:
You could start here:

I have rolled my own on occasion by just pulling some public lists of most common given names and family names and toing a full-join. Same for city, streets, etc.

-Steve

On Tue, Dec 27, 2016 at 11:23 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
  My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

  Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Generating sample data

From
Rich Shepard
Date:
On Tue, 27 Dec 2016, Greg Navis wrote:

> In the Ruby land there's a gem called faker
> <https://github.com/stympy/faker> that allows you to generate fake data.
> However, I'm not sure it can generate data based on a schema so a little
> bit of scripting my be necessary. Would this approach work for you?

Greg,

   I work in Python, not Ruby, so this might be too big of a hurdle.

Thanks,

Rich


Re: [GENERAL] Generating sample data

From
Steve Crawford
Date:

On Tue, Dec 27, 2016 at 12:01 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
You could start here:

I have rolled my own on occasion by just pulling some public lists of most common given names and family names and toing a full-join. Same for city, streets, etc.

-Steve

On Tue, Dec 27, 2016 at 11:23 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
  My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

  Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Sorry, "doing" a full-join. Which also leads to lots of fun cross-cultural names like "Muhammad Wang" and "Santiago O'Leary".

Cheers,
Steve

Re: [GENERAL] Generating sample data

From
Rich Shepard
Date:
On Tue, 27 Dec 2016, Steve Crawford wrote:

> You could start here:
> http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/

> I have rolled my own on occasion by just pulling some public lists of most
> common given names and family names and toing a full-join. Same for city,
> streets, etc.

Steve,

   Thanks very much for the URL. One application is small (7 tables), the
other is three times that size (23 tables). If I need to find public domain
data on the Web, I'll do that.

Much appreciated,

Rich


Re: [GENERAL] Generating sample data

From
Adrian Klaver
Date:
On 12/27/2016 12:03 PM, Rich Shepard wrote:
> On Tue, 27 Dec 2016, Greg Navis wrote:
>
>> In the Ruby land there's a gem called faker
>> <https://github.com/stympy/faker> that allows you to generate fake data.
>> However, I'm not sure it can generate data based on a schema so a little
>> bit of scripting my be necessary. Would this approach work for you?
>
> Greg,
>
>   I work in Python, not Ruby, so this might be too big of a hurdle.

As it happens there is a Python version of the a fore mentioned faker:

https://pypi.python.org/pypi/Faker/0.7.7

It was I use to generate fake/sample data.

>
> Thanks,
>
> Rich
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Generating sample data

From
Adrian Klaver
Date:
On 12/27/2016 02:23 PM, Adrian Klaver wrote:
> On 12/27/2016 12:03 PM, Rich Shepard wrote:
>> On Tue, 27 Dec 2016, Greg Navis wrote:
>>
>>> In the Ruby land there's a gem called faker
>>> <https://github.com/stympy/faker> that allows you to generate fake data.
>>> However, I'm not sure it can generate data based on a schema so a little
>>> bit of scripting my be necessary. Would this approach work for you?
>>
>> Greg,
>>
>>   I work in Python, not Ruby, so this might be too big of a hurdle.
>
> As it happens there is a Python version of the a fore mentioned faker:
>
> https://pypi.python.org/pypi/Faker/0.7.7
>
> It was I use to generate fake/sample data.
Ugh.

It is what I use to generate fake/sample data.

Memo to self: Do one thing at a time!

>
>>
>> Thanks,
>>
>> Rich
>>
>>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Generating sample data

From
Adrian Klaver
Date:
On 12/27/2016 12:06 PM, Rich Shepard wrote:
> On Tue, 27 Dec 2016, Steve Crawford wrote:
>
>> You could start here:
>> http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/
>>
>
>> I have rolled my own on occasion by just pulling some public lists of
>> most
>> common given names and family names and toing a full-join. Same for city,
>> streets, etc.
>
> Steve,
>
>   Thanks very much for the URL. One application is small (7 tables), the
> other is three times that size (23 tables). If I need to find public domain
> data on the Web, I'll do that.

What sort of data do you want to create?

If it is the standard contact information then the previously mentioned
tools are sufficient.

If it is data specific to a field of study then things might get trickier.

>
> Much appreciated,
>
> Rich
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Generating sample data

From
Rich Shepard
Date:
On Tue, 27 Dec 2016, Adrian Klaver wrote:

> What sort of data do you want to create?

Adrian,

   Various text, date, and numeric values.

> If it is data specific to a field of study then things might get trickier.

   It's not a common database. I'll probably need to cobble together generic
data of the appropriate types myself.

Thanks,

Rich


Re: [GENERAL] Generating sample data

From
Rich Shepard
Date:
On Tue, 27 Dec 2016, Adrian Klaver wrote:

> As it happens there is a Python version of the a fore mentioned faker:
> https://pypi.python.org/pypi/Faker/0.7.7
> It was I use to generate fake/sample data.

Adrian,

   Aha! That's a great start for me.

Many thanks,

Rich


Re: [GENERAL] Generating sample data

From
Rich Shepard
Date:
On Tue, 27 Dec 2016, Adrian Klaver wrote:

> As it happens there is a Python version of the a fore mentioned faker:
> https://pypi.python.org/pypi/Faker/0.7.7

Adrian,

   Impressive and complete. It will generate all the data I need.

Many thanks,

Rich


Re: [GENERAL] Generating sample data

From
"btober@computer.org"
Date:
----- Original Message -----
> From: "Rich Shepard" <rshepard@appl-ecosys.com>
> To: pgsql-general@postgresql.org
> Sent: Tuesday, December 27, 2016 7:23:46 PM
> Subject: Re: [GENERAL] Generating sample data
>
> On Tue, 27 Dec 2016, Adrian Klaver wrote:
>
> > As it happens there is a Python version of the a fore mentioned faker:
> > https://pypi.python.org/pypi/Faker/0.7.7
>
> Adrian,
>
>    Impressive and complete. It will generate all the data I need.
>


This is kind of fun:


https://github.com/bmtober/groan


I had to hunt down the original author from the 1990's, which was when I originally downloaded from his personal web
siteat 


http://raingod.com/raingod/resources/Programming/Perl/Software/Groan/


The initial commit on that github page is the original source as provided by Mr. McIntyre.

In a subsequent commit, I removed some of the original code that formatted for HTML output, leaving just plain text,
andalso posted an example grammar for generating fake names and strings that look like social security numbers (i.e., a
U.S.taxpayer identification).  

The script will generate duplicates, but you can do something like

for n in {1..20}
do
   groan.pl ssn.gn
done | sort -u

to get unique source data.

By defining other custom grammars, you could potentially generate all kinds of data.

-- B



Re: [GENERAL] Generating sample data

From
"Martijn Tonies \(Upscene Productions\)"
Date:
Hi,

Not open source, but also not pricey (IMO): Advanced Data Generator.
http://www.upscene.com/advanced_data_generator/

Generates e-mail addresses, street names, first & last names, company names,
complex relationships etc.

And yes, this is our product. ;)

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com



   My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

   Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Generating sample data

From
Rich Shepard
Date:
On Wed, 28 Dec 2016, Martijn Tonies (Upscene Productions) wrote:

> Not open source, but also not pricey (IMO): Advanced Data Generator.
> http://www.upscene.com/advanced_data_generator/
>
> Generates e-mail addresses, street names, first & last names, company names,
> complex relationships etc.
>
> And yes, this is our product. ;)

Martijn,

   Thank you for making me aware of your company and product. However, after
20 years of using only F/OSS to run my business (and personal computing)
needs and contributing to several open source projects along the way my
preference is to use such tools. When I get the large database application
up and running I'll post it on github and turn it loose into the F/OSS world
under the GPL.

Regards,

Rich