Home > mailing lists

Re: Sample data generator for performance testing - Mailing list pgsql-general

From	Adrian Klaver
Subject	Re: Sample data generator for performance testing
Date	January 3, 2024 18:31:20
Msg-id	49380122-9643-43f0-bbef-bd34ab6d434d@aklaver.com Whole thread
In response to	Re: Sample data generator for performance testing (arun chirappurath <arunsnmimt@gmail.com>)
List	pgsql-general

Tree view

On 1/3/24 9:50 AM, arun chirappurath wrote:

On Wed, 3 Jan, 2024, 23:03 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/3/24 09:24, arun chirappurath wrote:
> Hi Adrian,
>
> Thanks for your mail.
>
> Is this for all tables in the database or a subset? Yes

Yes all tables or yes just some tables?
All tables.except some which has user details.

>
> Does it need to deal with foreign key relationships? No
>
> What are the sizes of the existing data and what size sample data do you
> want to produce?1Gb and 1Gb test data.

If the source data is 1GB and the test data is 1GB then there is no
sampling, you are using the data population in its entirety.

Yes.would like to double the load and test.

Does that mean you want to take the 1GB of your existing data and double it to 2GB while maintaining

the data distribution from the original data?

Also do we have any standard methods for sampling and generating test data

Something like?:

https://www.postgresql.org/docs/current/sql-select.html

"TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ]

A TABLESAMPLE clause after a table_name indicates that the specified sampling_method should be used to retrieve a subset of the rows in that table. This sampling precedes the application of any other filters such as WHERE clauses. The standard PostgreSQL distribution includes two sampling methods, BERNOULLI and SYSTEM, and other sampling methods can be installed in the database via extensions
...: "

Read the rest of the documentation for TABLESAMPLE to get the details.

>
> On Wed, 3 Jan, 2024, 22:40 Adrian Klaver, <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
> On 1/2/24 23:23, arun chirappurath wrote:
> > Hi All,
> >
> > Do we have any open source tools which can be used to create
> sample data
> > at scale from our postgres databases?
> > Which considers data distribution and randomness
>
>
>
> >
> > Regards,
> > Arun
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: arun chirappurath
Date: 03 January 2024, 18:02:03
Subject: Re: Sample data generator for performance testing

From: "PGUser2020"
Date: 04 January 2024, 08:42:20
Subject: PostgreSQL 11 packages gone from reporpms?

Re: Sample data generator for performance testing - Mailing list pgsql-general

Previous

Next