Thread: Simulating Clog Contention

Simulating Clog Contention

From
Simon Riggs
Date:
In order to simulate real-world clog contention, we need to use
benchmarks that deal with real world situations.

Currently, pgbench pre-loads data using COPY and executes a VACUUM so
that all hint bits are set on every row of every page of every table.
Thus, as pgbench runs it sees zero clog accesses from historical data.
As a result, clog access is minimised and the effects of clog
contention in the real world go unnoticed.

The following patch adds a pgbench option -I to load data using
INSERTs, so that we can begin benchmark testing with rows that have
large numbers of distinct un-hinted transaction ids. With a database
pre-created using this we will be better able to simulate and thus
more easily measure clog contention. Note that current clog has space
for 1 million xids, so a scale factor of greater than 10 is required
to really stress the clog.

The patch uses multiple connections to load data using a predefined
script similar to the -N or -S logic.

$ pgbench --help
pgbench is a benchmarking tool for PostgreSQL.

Usage:
  pgbench [OPTIONS]... [DBNAME]

Initialization options:
  -i           invokes initialization mode using COPY
  -I           invokes initialization mode using INSERTs
...


$ pgbench -I -c 4 -t 10000
creating tables...
filling accounts table with 100000 rows using inserts
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_branches_pkey" for table "pgbench_branches"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_accounts_pkey" for table "pgbench_accounts"
done.
transactions option ignored
transaction type: Load pgbench_accounts using INSERTs
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
number of transactions per client: 25000
number of transactions actually processed: 100000/100000
tps = 828.194854 (including connections establishing)
tps = 828.440330 (excluding connections establishing)

Yes, my laptop really is that slow. Contributions to improve that
situation gratefully received.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Simulating Clog Contention

From
Cédric Villemain
Date:
> $ pgbench --help
> pgbench is a benchmarking tool for PostgreSQL.
>
> Usage:
>  pgbench [OPTIONS]... [DBNAME]
>
> Initialization options:
>  -i           invokes initialization mode using COPY
>  -I           invokes initialization mode using INSERTs

sounds usefull.

what about a long extra option: --inserts like pg_dump ?
pgbench -i --inserts ...



--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


Re: Simulating Clog Contention

From
Heikki Linnakangas
Date:
On 12.01.2012 14:31, Simon Riggs wrote:
> In order to simulate real-world clog contention, we need to use
> benchmarks that deal with real world situations.
>
> Currently, pgbench pre-loads data using COPY and executes a VACUUM so
> that all hint bits are set on every row of every page of every table.
> Thus, as pgbench runs it sees zero clog accesses from historical data.
> As a result, clog access is minimised and the effects of clog
> contention in the real world go unnoticed.
>
> The following patch adds a pgbench option -I to load data using
> INSERTs, so that we can begin benchmark testing with rows that have
> large numbers of distinct un-hinted transaction ids. With a database
> pre-created using this we will be better able to simulate and thus
> more easily measure clog contention. Note that current clog has space
> for 1 million xids, so a scale factor of greater than 10 is required
> to really stress the clog.

No doubt this is handy for testing this particular area, but overall I 
feel this is too much of a one-trick pony to include in pgbench.

Alternatively, you could do something like this:

do $$
declare  i int4;  naccounts int4;
begin  select count(*) into naccounts from pgbench_accounts;  for i in 1..naccounts loop    -- use a begin-exception
blockto create a new subtransaction    begin      update pgbench_accounts set abalance = abalance where aid = i;
exception     when division_by_zero then raise 'unexpected division by zero 
 
error';    end;  end loop;
end;
$$;

after initializing the pgbench database, to assign distinct xmins to all 
rows. Another idea would be to run pg_dump in --inserts mode, edit the 
dump to remove BEGIN/COMMIT from it, and restore it back.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Simulating Clog Contention

From
Peter Geoghegan
Date:
On 19 January 2012 14:36, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> No doubt this is handy for testing this particular area, but overall I feel
> this is too much of a one-trick pony to include in pgbench.

I don't think that being conservative in accepting pgbench options is
the right way to go. It's already so easy for a non-expert to shoot
themselves in the foot that we don't do ourselves any favours by
carefully weighing the merits of an expert-orientated feature.

Have you ever read the man page for rsync? It's massive, with a huge
number of options, and rsync is supposed to be a tool that's widely
used by sysadmins, not a specialist database benchmarking tool.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: Simulating Clog Contention

From
Simon Riggs
Date:
On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 12.01.2012 14:31, Simon Riggs wrote:
>>
>> In order to simulate real-world clog contention, we need to use
>> benchmarks that deal with real world situations.
>>
>> Currently, pgbench pre-loads data using COPY and executes a VACUUM so
>> that all hint bits are set on every row of every page of every table.
>> Thus, as pgbench runs it sees zero clog accesses from historical data.
>> As a result, clog access is minimised and the effects of clog
>> contention in the real world go unnoticed.
>>
>> The following patch adds a pgbench option -I to load data using
>> INSERTs, so that we can begin benchmark testing with rows that have
>> large numbers of distinct un-hinted transaction ids. With a database
>> pre-created using this we will be better able to simulate and thus
>> more easily measure clog contention. Note that current clog has space
>> for 1 million xids, so a scale factor of greater than 10 is required
>> to really stress the clog.
>
>
> No doubt this is handy for testing this particular area, but overall I feel
> this is too much of a one-trick pony to include in pgbench.
>
> Alternatively, you could do something like this:

I think the one-trick pony is pgbench. It has exactly one starting
condition for its tests and that isn't even a real world condition.

The main point of including the option into pgbench is to have a
utility that produces as initial test condition that works the same
for everyone, so we can accept each others benchmark results. We both
know that if someone posts that they have done $RANDOMSQL on a table
before running a test, it will just be ignored and people will say
user error. Some people will get it wrong when reproducing things and
we'll have chaos.

The patch exists as a way of testing the clog contention improvement
patches and provides a route to long term regression testing that the
solution(s) still work.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Thu, Jan 19, 2012 at 10:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> On 12.01.2012 14:31, Simon Riggs wrote:
>>>
>>> In order to simulate real-world clog contention, we need to use
>>> benchmarks that deal with real world situations.
>>>
>>> Currently, pgbench pre-loads data using COPY and executes a VACUUM so
>>> that all hint bits are set on every row of every page of every table.
>>> Thus, as pgbench runs it sees zero clog accesses from historical data.
>>> As a result, clog access is minimised and the effects of clog
>>> contention in the real world go unnoticed.
>>>
>>> The following patch adds a pgbench option -I to load data using
>>> INSERTs, so that we can begin benchmark testing with rows that have
>>> large numbers of distinct un-hinted transaction ids. With a database
>>> pre-created using this we will be better able to simulate and thus
>>> more easily measure clog contention. Note that current clog has space
>>> for 1 million xids, so a scale factor of greater than 10 is required
>>> to really stress the clog.
>>
>>
>> No doubt this is handy for testing this particular area, but overall I feel
>> this is too much of a one-trick pony to include in pgbench.
>>
>> Alternatively, you could do something like this:
>
> I think the one-trick pony is pgbench. It has exactly one starting
> condition for its tests and that isn't even a real world condition.
>
> The main point of including the option into pgbench is to have a
> utility that produces as initial test condition that works the same
> for everyone, so we can accept each others benchmark results. We both
> know that if someone posts that they have done $RANDOMSQL on a table
> before running a test, it will just be ignored and people will say
> user error. Some people will get it wrong when reproducing things and
> we'll have chaos.
>
> The patch exists as a way of testing the clog contention improvement
> patches and provides a route to long term regression testing that the
> solution(s) still work.

I agree: I think this is useful.

However, I think we should follow the precedent of some of the other
somewhat-obscure options we've added recently and have only a long
form option for this: --inserts.

Also, I don't think the behavior described here should be joined at
the hip to --inserts:

+     * We do this after a load by COPY, but before a load via INSERT
+     *
+     * This is done deliberately to ensure that no heap or index hints are
+     * set before we start running the benchmark. This emulates the case
+     * where data has arrived row at a time by INSERT, rather than being
+     * bulkloaded prior to update.

I think that's also a useful behavior, but if we're going to have it,
we should have a separate option for it, like --create-indexes-early.
Otherwise, someone who wants to (for example) test only the impact of
doing inserts vs. COPY will get misleading answers.

Finally, it's occurred to me that it would be useful to make pgbench
respect -n even in initialization mode, and the SGML doc changes imply
that this patch does that somewhere or other, but maybe only when
you're doing INSERTS and not when you're doing copy, which would be
odd; and there's no sgml doc update for -n, and no command-line help
change either.

In short, I think the reason this patch seems like it's implementing
something fairly arbitrary it's really three pretty good ideas that
are somewhat jumbled together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Simon Riggs
Date:
On Thu, Jan 19, 2012 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I agree: I think this is useful.
>
> However, I think we should follow the precedent of some of the other
> somewhat-obscure options we've added recently and have only a long
> form option for this: --inserts.

Yep, no problem.

> Also, I don't think the behavior described here should be joined at
> the hip to --inserts:
>
> +        * We do this after a load by COPY, but before a load via INSERT
> +        *
> +        * This is done deliberately to ensure that no heap or index hints are
> +        * set before we start running the benchmark. This emulates the case
> +        * where data has arrived row at a time by INSERT, rather than being
> +        * bulkloaded prior to update.
>
> I think that's also a useful behavior, but if we're going to have it,
> we should have a separate option for it, like --create-indexes-early.
> Otherwise, someone who wants to (for example) test only the impact of
> doing inserts vs. COPY will get misleading answers.

Creating indexes later would invalidate the test conditions I was
trying to create, so that doesn't add a useful new initialisation
mode. (Creating the indexes causes all of the hint bits to be set).

So that's just adding unrelated requirements for additional tests.
Yes, there are lots of additional tests we could get this code to
perform but we don't need to burden this patch with responsibility for
adding them, especially not when the tests mentioned don't refer to
any related patches in this commit fest and could be done at any time.
Any such change is clearly very low priority at this time.

> Finally, it's occurred to me that it would be useful to make pgbench
> respect -n even in initialization mode, and the SGML doc changes imply
> that this patch does that somewhere or other, but maybe only when
> you're doing INSERTS and not when you're doing copy, which would be
> odd; and there's no sgml doc update for -n, and no command-line help
> change either.

I'll fix the docs.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Also, I don't think the behavior described here should be joined at
>> the hip to --inserts:
>>
>> +        * We do this after a load by COPY, but before a load via INSERT
>> +        *
>> +        * This is done deliberately to ensure that no heap or index hints are
>> +        * set before we start running the benchmark. This emulates the case
>> +        * where data has arrived row at a time by INSERT, rather than being
>> +        * bulkloaded prior to update.
>>
>> I think that's also a useful behavior, but if we're going to have it,
>> we should have a separate option for it, like --create-indexes-early.
>> Otherwise, someone who wants to (for example) test only the impact of
>> doing inserts vs. COPY will get misleading answers.
>
> Creating indexes later would invalidate the test conditions I was
> trying to create, so that doesn't add a useful new initialisation
> mode. (Creating the indexes causes all of the hint bits to be set).

Right, but the point is that to address Heikki's objection that this
is a special-purpose hack, we should try to make it general, so that
it can be used by other people for other things.  For example, if the
options are separated, you can use this to measure how much slower
--inserts vs. the regular way.  But if that also changes the way
indexes are created, then you can't.  Moreover, since the
documentation mentioned only one of those two changes and not the
other, you might reasonably think that you've conducted a valid test.
We could document that --inserts changes the behavior in multiple
ways, but then the switch will end up being a bit of a misnomer, so I
think it's better to have a separate switch for each behavior someone
might want.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Simon Riggs
Date:
On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Also, I don't think the behavior described here should be joined at
>>> the hip to --inserts:
>>>
>>> +        * We do this after a load by COPY, but before a load via INSERT
>>> +        *
>>> +        * This is done deliberately to ensure that no heap or index hints are
>>> +        * set before we start running the benchmark. This emulates the case
>>> +        * where data has arrived row at a time by INSERT, rather than being
>>> +        * bulkloaded prior to update.
>>>
>>> I think that's also a useful behavior, but if we're going to have it,
>>> we should have a separate option for it, like --create-indexes-early.
>>> Otherwise, someone who wants to (for example) test only the impact of
>>> doing inserts vs. COPY will get misleading answers.
>>
>> Creating indexes later would invalidate the test conditions I was
>> trying to create, so that doesn't add a useful new initialisation
>> mode. (Creating the indexes causes all of the hint bits to be set).
>
> Right, but the point is that to address Heikki's objection that this
> is a special-purpose hack, we should try to make it general, so that
> it can be used by other people for other things.

This supports running hundreds of different tests because it creates a
useful general starting condition. It's not a special purpose hack
because its not a hack, nor is it special purpose.

Yes, we could have an option to run with no indexes. Or we could have
an option to run with 2 indexes as well. We could do all sorts of
things. None of that is important, because there aren't any patches in
the queue that need those tests and its too late to do it in this
release. And if it really is important you can do it in the next
release.

If we have time to spend we should be spending it on running the patch
to test the effectiveness of other patches in the queue, not on
inventing new tests that have no relevance.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Simulating Clog Contention

From
Marti Raudsepp
Date:
On Thu, Jan 19, 2012 at 18:12, Robert Haas <robertmhaas@gmail.com> wrote:
> Right, but the point is that to address Heikki's objection that this
> is a special-purpose hack, we should try to make it general, so that
> it can be used by other people for other things.

Personally I would like to see support for more flexibility in pgbench
scripts. It would be useful to allow scripts to contain custom
initialization sections -- for scripts that want a custom schema, as
well as different ways to populate the standard schema.

Regards,
Marti


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Thu, Jan 19, 2012 at 11:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Also, I don't think the behavior described here should be joined at
>>>> the hip to --inserts:
>>>>
>>>> +        * We do this after a load by COPY, but before a load via INSERT
>>>> +        *
>>>> +        * This is done deliberately to ensure that no heap or index hints are
>>>> +        * set before we start running the benchmark. This emulates the case
>>>> +        * where data has arrived row at a time by INSERT, rather than being
>>>> +        * bulkloaded prior to update.
>>>>
>>>> I think that's also a useful behavior, but if we're going to have it,
>>>> we should have a separate option for it, like --create-indexes-early.
>>>> Otherwise, someone who wants to (for example) test only the impact of
>>>> doing inserts vs. COPY will get misleading answers.
>>>
>>> Creating indexes later would invalidate the test conditions I was
>>> trying to create, so that doesn't add a useful new initialisation
>>> mode. (Creating the indexes causes all of the hint bits to be set).
>>
>> Right, but the point is that to address Heikki's objection that this
>> is a special-purpose hack, we should try to make it general, so that
>> it can be used by other people for other things.
>
> This supports running hundreds of different tests because it creates a
> useful general starting condition. It's not a special purpose hack
> because its not a hack, nor is it special purpose.
>
> Yes, we could have an option to run with no indexes. Or we could have
> an option to run with 2 indexes as well. We could do all sorts of
> things. None of that is important, because there aren't any patches in
> the queue that need those tests and its too late to do it in this
> release. And if it really is important you can do it in the next
> release.
>
> If we have time to spend we should be spending it on running the patch
> to test the effectiveness of other patches in the queue, not on
> inventing new tests that have no relevance.

I feel I've adequate explained why it makes sense to me to separate
those options.  If you want, I'll do the work myself; it will take
less time than arguing about it.

On the other hand, if you wish to insist that we should only commit
this patch if --inserts makes multiple, unrelated, undocumented
changes to the initial test configurations, then I'll join Heikki in
voting against this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Simon Riggs
Date:
On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> I feel I've adequate explained why it makes sense to me to separate
> those options.  If you want, I'll do the work myself; it will take
> less time than arguing about it.

If you have time to contribute, please use the patch as stands to test
the other patches in the CF queue.

It's more important that we measure and fix clog contention than have
a new pgbench feature with no immediate value to the next release of
Postgres.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Thu, Jan 19, 2012 at 1:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel I've adequate explained why it makes sense to me to separate
>> those options.  If you want, I'll do the work myself; it will take
>> less time than arguing about it.
>
> If you have time to contribute, please use the patch as stands to test
> the other patches in the CF queue.

Those things aren't mutually exclusive; whether or not I spend an hour
whacking this patch around isn't going to have any impact on how much
benchmarking I get done.   Benchmarking is mostly waiting, and I can
do other things while the tests are going.

Just to reiterate a point I've made previously, Nate Boley's test
machine was running another big job for several weeks straight, and I
haven't been able to use the machine for anything.  It seems to be
unloaded at the moment so I'll try to squeeze in some tests, but I
don't know how long it will stay that way.  It's been great to have
nearly unimpeded access to this for most of the cycle, but all good
things must (and do) come to an end.  In any event, none of this has
much impact on the offer above, which is a small amount of code that I
will be happy to attend to if you do not wish to do so.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Abhijit Menon-Sen
Date:
At 2012-01-12 12:31:20 +0000, simon@2ndQuadrant.com wrote:
>
> The following patch adds a pgbench option -I to load data using
> INSERTs

This is just to confirm that the patch applies and builds and works
fine (though of course it does take a long time… pity there doesn't
seem to be any easy way to add progress indication like -i has).

I'm aware of the subsequent discussion about using only a long option,
documenting -n, and adding a knob to control index creation timing. I
don't have a useful opinion on any of those things. It's just that the
patch was marked "Needs review" and it was only while waiting for 100k
inserts to run that I thought of checking if there was any discussion
about it. Oops.

> Yes, my laptop really is that slow.

It appears to be eight times as fast as mine.

-- ams


Re: Simulating Clog Contention

From
Merlin Moncure
Date:
On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
> This is just to confirm that the patch applies and builds and works
> fine (though of course it does take a long time… pity there doesn't
> seem to be any easy way to add progress indication like -i has).

On any non server grade hardware you'd probably want to disable
synchronous_commit while loading.  FWIW, this is a great addition to
pgbench.

merlin


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Thu, Jan 26, 2012 at 11:41 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
>> This is just to confirm that the patch applies and builds and works
>> fine (though of course it does take a long time… pity there doesn't
>> seem to be any easy way to add progress indication like -i has).
>
> On any non server grade hardware you'd probably want to disable
> synchronous_commit while loading.  FWIW, this is a great addition to
> pgbench.

Do you object to separating out the three different things the patch
does and adding separate options for each one?  If so, why?  I find
them independently useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Merlin Moncure
Date:
On Thu, Jan 26, 2012 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 26, 2012 at 11:41 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
>>> This is just to confirm that the patch applies and builds and works
>>> fine (though of course it does take a long time… pity there doesn't
>>> seem to be any easy way to add progress indication like -i has).
>>
>> On any non server grade hardware you'd probably want to disable
>> synchronous_commit while loading.  FWIW, this is a great addition to
>> pgbench.
>
> Do you object to separating out the three different things the patch
> does and adding separate options for each one?  If so, why?  I find
> them independently useful.

I didn't take a position on that -- although superficially it seems
like more granular control is good (and you can always group options
together with a 'super option' like as in cp -a) -- just making a
general comment on the usefulness of testing against records that
don't have the same xid.

merlin


Re: Simulating Clog Contention

From
Jeff Janes
Date:
On Thu, Jan 12, 2012 at 4:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

> The following patch adds a pgbench option -I to load data using
> INSERTs, so that we can begin benchmark testing with rows that have
> large numbers of distinct un-hinted transaction ids. With a database
> pre-created using this we will be better able to simulate and thus
> more easily measure clog contention. Note that current clog has space
> for 1 million xids, so a scale factor of greater than 10 is required
> to really stress the clog.

Running with this patch with a non-default scale factor generates the
spurious notice:

"Scale option ignored, using pgbench_branches table count = 10"

In fact the scale option is not being ignored, because it was used to
initialize the pgbench_branches table count earlier in this same
invocation.

I think that even in normal (non-initialization) usage, this message
should be suppressed when the provided scale factor
is equal to the pgbench_branches table count.

Cheers,

Jeff


Re: Simulating Clog Contention

From
Jeff Janes
Date:
On Fri, Jan 27, 2012 at 1:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Jan 12, 2012 at 4:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> The following patch adds a pgbench option -I to load data using
>> INSERTs, so that we can begin benchmark testing with rows that have
>> large numbers of distinct un-hinted transaction ids. With a database
>> pre-created using this we will be better able to simulate and thus
>> more easily measure clog contention. Note that current clog has space
>> for 1 million xids, so a scale factor of greater than 10 is required
>> to really stress the clog.
>
> Running with this patch with a non-default scale factor generates the
> spurious notice:
>
> "Scale option ignored, using pgbench_branches table count = 10"
>
> In fact the scale option is not being ignored, because it was used to
> initialize the pgbench_branches table count earlier in this same
> invocation.
>
> I think that even in normal (non-initialization) usage, this message
> should be suppressed when the provided scale factor
> is equal to the pgbench_branches table count.

The attached patch does just that.  There is probably no reason to
warn people that we are doing what they told us to, but not for the
reason they think.

I think this change makes sense regardless of the disposition of the
thread topic.

Cheers,

Jeff

Attachment

Re: Simulating Clog Contention

From
Robert Haas
Date:
On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I think that even in normal (non-initialization) usage, this message
>> should be suppressed when the provided scale factor
>> is equal to the pgbench_branches table count.
>
> The attached patch does just that.  There is probably no reason to
> warn people that we are doing what they told us to, but not for the
> reason they think.

In my opinion, a more sensible approach than anything we're doing
right now would be to outright *reject* options that will only be
ignored.  If -s isn't supported except with -i, then trying to specify
-s without -i should just error out at the options-parsing stage,
before we even try to connect to the database.  It's not very helpful
to accept options and then ignore them, and we have many instances of
that right now: initialization-only switches are accepted and ignored
when not initializing, and run-only switches are accepted and ignored
with initializing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Jeff Janes
Date:
On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> I think that even in normal (non-initialization) usage, this message
>>> should be suppressed when the provided scale factor
>>> is equal to the pgbench_branches table count.
>>
>> The attached patch does just that.  There is probably no reason to
>> warn people that we are doing what they told us to, but not for the
>> reason they think.
>
> In my opinion, a more sensible approach than anything we're doing
> right now would be to outright *reject* options that will only be
> ignored.  If -s isn't supported except with -i, then trying to specify
> -s without -i should just error out at the options-parsing stage,
> before we even try to connect to the database.  It's not very helpful
> to accept options and then ignore them, and we have many instances of
> that right now: initialization-only switches are accepted and ignored
> when not initializing, and run-only switches are accepted and ignored
> with initializing.

I like the ability to say, effectively, "I think I had previously did
the initialization with -s 40, if I actually didn't then scream at me,
and if I did then go ahead and do the pgbench I just asked for".
But, since it does unconditionally report the scale actually used and
I just have to have the discipline to go look at the result, I can see
where this is perhaps overkill.   In my own (non-PG-related) code,
when I have tasks that have to be run in multiple phases that can get
out of sync if I am not careful, I like to be able to specify the
flags even in the "unused" invocation, so that the code can verify I
am being consistent.  Code is better at that than I am.


I'm not sure I know what all would be incompatible with what.  I could
start drawing that matrix up once the API stabilizes, but I think you
are still planning on whacking this -I option around a bit.

Cheers,

Jeff


Re: Simulating Clog Contention

From
Jeff Janes
Date:
On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
> At 2012-01-12 12:31:20 +0000, simon@2ndQuadrant.com wrote:
>>
>> The following patch adds a pgbench option -I to load data using
>> INSERTs
>
> This is just to confirm that the patch applies and builds and works
> fine (though of course it does take a long time… pity there doesn't
> seem to be any easy way to add progress indication like -i has).

I was thinking the opposite.  That -i should only print progress
indication when -d is given.  Or at least knock an order of magnitude
or two off of how often it does so.

Cheers,

Jeff


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Mon, Jan 30, 2012 at 10:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
>> At 2012-01-12 12:31:20 +0000, simon@2ndQuadrant.com wrote:
>>>
>>> The following patch adds a pgbench option -I to load data using
>>> INSERTs
>>
>> This is just to confirm that the patch applies and builds and works
>> fine (though of course it does take a long time… pity there doesn't
>> seem to be any easy way to add progress indication like -i has).
>
> I was thinking the opposite.  That -i should only print progress
> indication when -d is given.  Or at least knock an order of magnitude
> or two off of how often it does so.

I'd be in all in favor of having -i emit progress reports 10x less
often; even on a laptop, the current reports are very chatty.  But I
think 100x less often might be taking it too far.

Either way, if we're going to have an option for inserts, they should
produce the same progress reports that COPY does - though possibly
more often, since I'm guessing it's likely to be way slower.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Mon, Jan 30, 2012 at 10:48 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>> I think that even in normal (non-initialization) usage, this message
>>>> should be suppressed when the provided scale factor
>>>> is equal to the pgbench_branches table count.
>>>
>>> The attached patch does just that.  There is probably no reason to
>>> warn people that we are doing what they told us to, but not for the
>>> reason they think.
>>
>> In my opinion, a more sensible approach than anything we're doing
>> right now would be to outright *reject* options that will only be
>> ignored.  If -s isn't supported except with -i, then trying to specify
>> -s without -i should just error out at the options-parsing stage,
>> before we even try to connect to the database.  It's not very helpful
>> to accept options and then ignore them, and we have many instances of
>> that right now: initialization-only switches are accepted and ignored
>> when not initializing, and run-only switches are accepted and ignored
>> with initializing.
>
> I like the ability to say, effectively, "I think I had previously did
> the initialization with -s 40, if I actually didn't then scream at me,
> and if I did then go ahead and do the pgbench I just asked for".
> But, since it does unconditionally report the scale actually used and
> I just have to have the discipline to go look at the result, I can see
> where this is perhaps overkill.   In my own (non-PG-related) code,
> when I have tasks that have to be run in multiple phases that can get
> out of sync if I am not careful, I like to be able to specify the
> flags even in the "unused" invocation, so that the code can verify I
> am being consistent.  Code is better at that than I am.

I guess my point is - if we're gonna have it do that, then shouldn't
we actually *error out* if the scales don't match, not just print a
warning and then charge ahead?

> I'm not sure I know what all would be incompatible with what.  I could
> start drawing that matrix up once the API stabilizes, but I think you
> are still planning on whacking this -I option around a bit.

It's mostly that everything in the "initialization options" section of
pgbench --help is incompatible with everything in the "benchmarking
options" section, and the other way around.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Simulating Clog Contention

From
Robert Haas
Date:
On Mon, Jan 30, 2012 at 12:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I was thinking the opposite.  That -i should only print progress
>> indication when -d is given.  Or at least knock an order of magnitude
>> or two off of how often it does so.
>
> I'd be in all in favor of having -i emit progress reports 10x less
> often; even on a laptop, the current reports are very chatty.  But I
> think 100x less often might be taking it too far.

Trivial patch for that attached.

> Either way, if we're going to have an option for inserts, they should
> produce the same progress reports that COPY does - though possibly
> more often, since I'm guessing it's likely to be way slower.

I looked at this a little more and I'm coming around to the view that
Heikki expressed originally: I think this too much of a one-tricky
pony to justify including it in pgbench.  It's an interesting hack for
testing, but the thing is that, to really make it do anything
interesting, you've got to not only use INSERT instead of COPY and
create the indexes before loading the table, BUT ALSO frob the
existing code to prevent the WAL bypass from doing its thing.  I
suppose we could have a separate option for all THREE of those
behaviors, rather than just the two I mentioned in my previous email,
but that seems over the top.  So if we're going to have this at all,
we might as well just call it --artificially-inflate-clog-pain and be
done with it.

But I think that's kind of a narrow special case that isn't really
worth catering for.  Our CLOG contention right now is not so bad that
we need to focus a major development effort on making it less, and
even if we do want to do that that there's no real evidence that a
half-hour pgbench run isn't sufficient to demonstrate the problem
perfectly adequately with the code we have right now.  After a few
minutes the frequency of hitting previously-updated rows is high
enough to measure the problem anyway.  In the process of developing
the various performance improvements we've committed for 9.2, I and
others have developed various test cases - Heikki has one he uses for
his xlog scaling patch, for example.  We can't commit all of those as
pgbench options, or we'll go nuts.  Ideally it would be nice if
pgbench was flexible enough to handle these kinds of uses cases via
configuration rather than by hard-coding them.

Given all the above, I'm inclined to conclude that this is just
another special-purpose test case which we should use for testing and
that's it, as Simon already proposed upthread.  However, I do think
that we should go ahead and make -n work in initialization mode,
because I've wanted that a few times.  So, patch for that attached,
too.

(Note that no actual benchmarking is happening right now with regard
to the CLOG history patch because, as previously noted, the most
recent version does not compile.  This probably doesn't matter for
that patch hugely anyway, since that mechanism as currently designed
does not kick in until a million transactions have been processed, and
by that time you'll have quite a spread of XIDs in pgbench_acounts
anyway.  I don't believe there are any other remaining patches in this
CommitFest to which the test case would be applicable; please let me
know if I am wrong.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment