Thread: Simulating Clog Contention
In order to simulate real-world clog contention, we need to use benchmarks that deal with real world situations. Currently, pgbench pre-loads data using COPY and executes a VACUUM so that all hint bits are set on every row of every page of every table. Thus, as pgbench runs it sees zero clog accesses from historical data. As a result, clog access is minimised and the effects of clog contention in the real world go unnoticed. The following patch adds a pgbench option -I to load data using INSERTs, so that we can begin benchmark testing with rows that have large numbers of distinct un-hinted transaction ids. With a database pre-created using this we will be better able to simulate and thus more easily measure clog contention. Note that current clog has space for 1 million xids, so a scale factor of greater than 10 is required to really stress the clog. The patch uses multiple connections to load data using a predefined script similar to the -N or -S logic. $ pgbench --help pgbench is a benchmarking tool for PostgreSQL. Usage: pgbench [OPTIONS]... [DBNAME] Initialization options: -i invokes initialization mode using COPY -I invokes initialization mode using INSERTs ... $ pgbench -I -c 4 -t 10000 creating tables... filling accounts table with 100000 rows using inserts set primary key... NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_branches_pkey" for table "pgbench_branches" NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_tellers_pkey" for table "pgbench_tellers" NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_accounts_pkey" for table "pgbench_accounts" done. transactions option ignored transaction type: Load pgbench_accounts using INSERTs scaling factor: 1 query mode: simple number of clients: 4 number of threads: 1 number of transactions per client: 25000 number of transactions actually processed: 100000/100000 tps = 828.194854 (including connections establishing) tps = 828.440330 (excluding connections establishing) Yes, my laptop really is that slow. Contributions to improve that situation gratefully received. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
> $ pgbench --help > pgbench is a benchmarking tool for PostgreSQL. > > Usage: > pgbench [OPTIONS]... [DBNAME] > > Initialization options: > -i invokes initialization mode using COPY > -I invokes initialization mode using INSERTs sounds usefull. what about a long extra option: --inserts like pg_dump ? pgbench -i --inserts ... -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On 12.01.2012 14:31, Simon Riggs wrote: > In order to simulate real-world clog contention, we need to use > benchmarks that deal with real world situations. > > Currently, pgbench pre-loads data using COPY and executes a VACUUM so > that all hint bits are set on every row of every page of every table. > Thus, as pgbench runs it sees zero clog accesses from historical data. > As a result, clog access is minimised and the effects of clog > contention in the real world go unnoticed. > > The following patch adds a pgbench option -I to load data using > INSERTs, so that we can begin benchmark testing with rows that have > large numbers of distinct un-hinted transaction ids. With a database > pre-created using this we will be better able to simulate and thus > more easily measure clog contention. Note that current clog has space > for 1 million xids, so a scale factor of greater than 10 is required > to really stress the clog. No doubt this is handy for testing this particular area, but overall I feel this is too much of a one-trick pony to include in pgbench. Alternatively, you could do something like this: do $$ declare i int4; naccounts int4; begin select count(*) into naccounts from pgbench_accounts; for i in 1..naccounts loop -- use a begin-exception blockto create a new subtransaction begin update pgbench_accounts set abalance = abalance where aid = i; exception when division_by_zero then raise 'unexpected division by zero error'; end; end loop; end; $$; after initializing the pgbench database, to assign distinct xmins to all rows. Another idea would be to run pg_dump in --inserts mode, edit the dump to remove BEGIN/COMMIT from it, and restore it back. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 19 January 2012 14:36, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > No doubt this is handy for testing this particular area, but overall I feel > this is too much of a one-trick pony to include in pgbench. I don't think that being conservative in accepting pgbench options is the right way to go. It's already so easy for a non-expert to shoot themselves in the foot that we don't do ourselves any favours by carefully weighing the merits of an expert-orientated feature. Have you ever read the man page for rsync? It's massive, with a huge number of options, and rsync is supposed to be a tool that's widely used by sysadmins, not a specialist database benchmarking tool. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 12.01.2012 14:31, Simon Riggs wrote: >> >> In order to simulate real-world clog contention, we need to use >> benchmarks that deal with real world situations. >> >> Currently, pgbench pre-loads data using COPY and executes a VACUUM so >> that all hint bits are set on every row of every page of every table. >> Thus, as pgbench runs it sees zero clog accesses from historical data. >> As a result, clog access is minimised and the effects of clog >> contention in the real world go unnoticed. >> >> The following patch adds a pgbench option -I to load data using >> INSERTs, so that we can begin benchmark testing with rows that have >> large numbers of distinct un-hinted transaction ids. With a database >> pre-created using this we will be better able to simulate and thus >> more easily measure clog contention. Note that current clog has space >> for 1 million xids, so a scale factor of greater than 10 is required >> to really stress the clog. > > > No doubt this is handy for testing this particular area, but overall I feel > this is too much of a one-trick pony to include in pgbench. > > Alternatively, you could do something like this: I think the one-trick pony is pgbench. It has exactly one starting condition for its tests and that isn't even a real world condition. The main point of including the option into pgbench is to have a utility that produces as initial test condition that works the same for everyone, so we can accept each others benchmark results. We both know that if someone posts that they have done $RANDOMSQL on a table before running a test, it will just be ignored and people will say user error. Some people will get it wrong when reproducing things and we'll have chaos. The patch exists as a way of testing the clog contention improvement patches and provides a route to long term regression testing that the solution(s) still work. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 19, 2012 at 10:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 12.01.2012 14:31, Simon Riggs wrote: >>> >>> In order to simulate real-world clog contention, we need to use >>> benchmarks that deal with real world situations. >>> >>> Currently, pgbench pre-loads data using COPY and executes a VACUUM so >>> that all hint bits are set on every row of every page of every table. >>> Thus, as pgbench runs it sees zero clog accesses from historical data. >>> As a result, clog access is minimised and the effects of clog >>> contention in the real world go unnoticed. >>> >>> The following patch adds a pgbench option -I to load data using >>> INSERTs, so that we can begin benchmark testing with rows that have >>> large numbers of distinct un-hinted transaction ids. With a database >>> pre-created using this we will be better able to simulate and thus >>> more easily measure clog contention. Note that current clog has space >>> for 1 million xids, so a scale factor of greater than 10 is required >>> to really stress the clog. >> >> >> No doubt this is handy for testing this particular area, but overall I feel >> this is too much of a one-trick pony to include in pgbench. >> >> Alternatively, you could do something like this: > > I think the one-trick pony is pgbench. It has exactly one starting > condition for its tests and that isn't even a real world condition. > > The main point of including the option into pgbench is to have a > utility that produces as initial test condition that works the same > for everyone, so we can accept each others benchmark results. We both > know that if someone posts that they have done $RANDOMSQL on a table > before running a test, it will just be ignored and people will say > user error. Some people will get it wrong when reproducing things and > we'll have chaos. > > The patch exists as a way of testing the clog contention improvement > patches and provides a route to long term regression testing that the > solution(s) still work. I agree: I think this is useful. However, I think we should follow the precedent of some of the other somewhat-obscure options we've added recently and have only a long form option for this: --inserts. Also, I don't think the behavior described here should be joined at the hip to --inserts: + * We do this after a load by COPY, but before a load via INSERT + * + * This is done deliberately to ensure that no heap or index hints are + * set before we start running the benchmark. This emulates the case + * where data has arrived row at a time by INSERT, rather than being + * bulkloaded prior to update. I think that's also a useful behavior, but if we're going to have it, we should have a separate option for it, like --create-indexes-early. Otherwise, someone who wants to (for example) test only the impact of doing inserts vs. COPY will get misleading answers. Finally, it's occurred to me that it would be useful to make pgbench respect -n even in initialization mode, and the SGML doc changes imply that this patch does that somewhere or other, but maybe only when you're doing INSERTS and not when you're doing copy, which would be odd; and there's no sgml doc update for -n, and no command-line help change either. In short, I think the reason this patch seems like it's implementing something fairly arbitrary it's really three pretty good ideas that are somewhat jumbled together. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 19, 2012 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I agree: I think this is useful. > > However, I think we should follow the precedent of some of the other > somewhat-obscure options we've added recently and have only a long > form option for this: --inserts. Yep, no problem. > Also, I don't think the behavior described here should be joined at > the hip to --inserts: > > + * We do this after a load by COPY, but before a load via INSERT > + * > + * This is done deliberately to ensure that no heap or index hints are > + * set before we start running the benchmark. This emulates the case > + * where data has arrived row at a time by INSERT, rather than being > + * bulkloaded prior to update. > > I think that's also a useful behavior, but if we're going to have it, > we should have a separate option for it, like --create-indexes-early. > Otherwise, someone who wants to (for example) test only the impact of > doing inserts vs. COPY will get misleading answers. Creating indexes later would invalidate the test conditions I was trying to create, so that doesn't add a useful new initialisation mode. (Creating the indexes causes all of the hint bits to be set). So that's just adding unrelated requirements for additional tests. Yes, there are lots of additional tests we could get this code to perform but we don't need to burden this patch with responsibility for adding them, especially not when the tests mentioned don't refer to any related patches in this commit fest and could be done at any time. Any such change is clearly very low priority at this time. > Finally, it's occurred to me that it would be useful to make pgbench > respect -n even in initialization mode, and the SGML doc changes imply > that this patch does that somewhere or other, but maybe only when > you're doing INSERTS and not when you're doing copy, which would be > odd; and there's no sgml doc update for -n, and no command-line help > change either. I'll fix the docs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Also, I don't think the behavior described here should be joined at >> the hip to --inserts: >> >> + * We do this after a load by COPY, but before a load via INSERT >> + * >> + * This is done deliberately to ensure that no heap or index hints are >> + * set before we start running the benchmark. This emulates the case >> + * where data has arrived row at a time by INSERT, rather than being >> + * bulkloaded prior to update. >> >> I think that's also a useful behavior, but if we're going to have it, >> we should have a separate option for it, like --create-indexes-early. >> Otherwise, someone who wants to (for example) test only the impact of >> doing inserts vs. COPY will get misleading answers. > > Creating indexes later would invalidate the test conditions I was > trying to create, so that doesn't add a useful new initialisation > mode. (Creating the indexes causes all of the hint bits to be set). Right, but the point is that to address Heikki's objection that this is a special-purpose hack, we should try to make it general, so that it can be used by other people for other things. For example, if the options are separated, you can use this to measure how much slower --inserts vs. the regular way. But if that also changes the way indexes are created, then you can't. Moreover, since the documentation mentioned only one of those two changes and not the other, you might reasonably think that you've conducted a valid test. We could document that --inserts changes the behavior in multiple ways, but then the switch will end up being a bit of a misnomer, so I think it's better to have a separate switch for each behavior someone might want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Also, I don't think the behavior described here should be joined at >>> the hip to --inserts: >>> >>> + * We do this after a load by COPY, but before a load via INSERT >>> + * >>> + * This is done deliberately to ensure that no heap or index hints are >>> + * set before we start running the benchmark. This emulates the case >>> + * where data has arrived row at a time by INSERT, rather than being >>> + * bulkloaded prior to update. >>> >>> I think that's also a useful behavior, but if we're going to have it, >>> we should have a separate option for it, like --create-indexes-early. >>> Otherwise, someone who wants to (for example) test only the impact of >>> doing inserts vs. COPY will get misleading answers. >> >> Creating indexes later would invalidate the test conditions I was >> trying to create, so that doesn't add a useful new initialisation >> mode. (Creating the indexes causes all of the hint bits to be set). > > Right, but the point is that to address Heikki's objection that this > is a special-purpose hack, we should try to make it general, so that > it can be used by other people for other things. This supports running hundreds of different tests because it creates a useful general starting condition. It's not a special purpose hack because its not a hack, nor is it special purpose. Yes, we could have an option to run with no indexes. Or we could have an option to run with 2 indexes as well. We could do all sorts of things. None of that is important, because there aren't any patches in the queue that need those tests and its too late to do it in this release. And if it really is important you can do it in the next release. If we have time to spend we should be spending it on running the patch to test the effectiveness of other patches in the queue, not on inventing new tests that have no relevance. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 19, 2012 at 18:12, Robert Haas <robertmhaas@gmail.com> wrote: > Right, but the point is that to address Heikki's objection that this > is a special-purpose hack, we should try to make it general, so that > it can be used by other people for other things. Personally I would like to see support for more flexibility in pgbench scripts. It would be useful to allow scripts to contain custom initialization sections -- for scripts that want a custom schema, as well as different ways to populate the standard schema. Regards, Marti
On Thu, Jan 19, 2012 at 11:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> Also, I don't think the behavior described here should be joined at >>>> the hip to --inserts: >>>> >>>> + * We do this after a load by COPY, but before a load via INSERT >>>> + * >>>> + * This is done deliberately to ensure that no heap or index hints are >>>> + * set before we start running the benchmark. This emulates the case >>>> + * where data has arrived row at a time by INSERT, rather than being >>>> + * bulkloaded prior to update. >>>> >>>> I think that's also a useful behavior, but if we're going to have it, >>>> we should have a separate option for it, like --create-indexes-early. >>>> Otherwise, someone who wants to (for example) test only the impact of >>>> doing inserts vs. COPY will get misleading answers. >>> >>> Creating indexes later would invalidate the test conditions I was >>> trying to create, so that doesn't add a useful new initialisation >>> mode. (Creating the indexes causes all of the hint bits to be set). >> >> Right, but the point is that to address Heikki's objection that this >> is a special-purpose hack, we should try to make it general, so that >> it can be used by other people for other things. > > This supports running hundreds of different tests because it creates a > useful general starting condition. It's not a special purpose hack > because its not a hack, nor is it special purpose. > > Yes, we could have an option to run with no indexes. Or we could have > an option to run with 2 indexes as well. We could do all sorts of > things. None of that is important, because there aren't any patches in > the queue that need those tests and its too late to do it in this > release. And if it really is important you can do it in the next > release. > > If we have time to spend we should be spending it on running the patch > to test the effectiveness of other patches in the queue, not on > inventing new tests that have no relevance. I feel I've adequate explained why it makes sense to me to separate those options. If you want, I'll do the work myself; it will take less time than arguing about it. On the other hand, if you wish to insist that we should only commit this patch if --inserts makes multiple, unrelated, undocumented changes to the initial test configurations, then I'll join Heikki in voting against this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I feel I've adequate explained why it makes sense to me to separate > those options. If you want, I'll do the work myself; it will take > less time than arguing about it. If you have time to contribute, please use the patch as stands to test the other patches in the CF queue. It's more important that we measure and fix clog contention than have a new pgbench feature with no immediate value to the next release of Postgres. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 19, 2012 at 1:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I feel I've adequate explained why it makes sense to me to separate >> those options. If you want, I'll do the work myself; it will take >> less time than arguing about it. > > If you have time to contribute, please use the patch as stands to test > the other patches in the CF queue. Those things aren't mutually exclusive; whether or not I spend an hour whacking this patch around isn't going to have any impact on how much benchmarking I get done. Benchmarking is mostly waiting, and I can do other things while the tests are going. Just to reiterate a point I've made previously, Nate Boley's test machine was running another big job for several weeks straight, and I haven't been able to use the machine for anything. It seems to be unloaded at the moment so I'll try to squeeze in some tests, but I don't know how long it will stay that way. It's been great to have nearly unimpeded access to this for most of the cycle, but all good things must (and do) come to an end. In any event, none of this has much impact on the offer above, which is a small amount of code that I will be happy to attend to if you do not wish to do so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At 2012-01-12 12:31:20 +0000, simon@2ndQuadrant.com wrote: > > The following patch adds a pgbench option -I to load data using > INSERTs This is just to confirm that the patch applies and builds and works fine (though of course it does take a long time… pity there doesn't seem to be any easy way to add progress indication like -i has). I'm aware of the subsequent discussion about using only a long option, documenting -n, and adding a knob to control index creation timing. I don't have a useful opinion on any of those things. It's just that the patch was marked "Needs review" and it was only while waiting for 100k inserts to run that I thought of checking if there was any discussion about it. Oops. > Yes, my laptop really is that slow. It appears to be eight times as fast as mine. -- ams
On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote: > This is just to confirm that the patch applies and builds and works > fine (though of course it does take a long time… pity there doesn't > seem to be any easy way to add progress indication like -i has). On any non server grade hardware you'd probably want to disable synchronous_commit while loading. FWIW, this is a great addition to pgbench. merlin
On Thu, Jan 26, 2012 at 11:41 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote: >> This is just to confirm that the patch applies and builds and works >> fine (though of course it does take a long time… pity there doesn't >> seem to be any easy way to add progress indication like -i has). > > On any non server grade hardware you'd probably want to disable > synchronous_commit while loading. FWIW, this is a great addition to > pgbench. Do you object to separating out the three different things the patch does and adding separate options for each one? If so, why? I find them independently useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 26, 2012 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 26, 2012 at 11:41 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote: >>> This is just to confirm that the patch applies and builds and works >>> fine (though of course it does take a long time… pity there doesn't >>> seem to be any easy way to add progress indication like -i has). >> >> On any non server grade hardware you'd probably want to disable >> synchronous_commit while loading. FWIW, this is a great addition to >> pgbench. > > Do you object to separating out the three different things the patch > does and adding separate options for each one? If so, why? I find > them independently useful. I didn't take a position on that -- although superficially it seems like more granular control is good (and you can always group options together with a 'super option' like as in cp -a) -- just making a general comment on the usefulness of testing against records that don't have the same xid. merlin
On Thu, Jan 12, 2012 at 4:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The following patch adds a pgbench option -I to load data using > INSERTs, so that we can begin benchmark testing with rows that have > large numbers of distinct un-hinted transaction ids. With a database > pre-created using this we will be better able to simulate and thus > more easily measure clog contention. Note that current clog has space > for 1 million xids, so a scale factor of greater than 10 is required > to really stress the clog. Running with this patch with a non-default scale factor generates the spurious notice: "Scale option ignored, using pgbench_branches table count = 10" In fact the scale option is not being ignored, because it was used to initialize the pgbench_branches table count earlier in this same invocation. I think that even in normal (non-initialization) usage, this message should be suppressed when the provided scale factor is equal to the pgbench_branches table count. Cheers, Jeff
On Fri, Jan 27, 2012 at 1:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jan 12, 2012 at 4:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> The following patch adds a pgbench option -I to load data using >> INSERTs, so that we can begin benchmark testing with rows that have >> large numbers of distinct un-hinted transaction ids. With a database >> pre-created using this we will be better able to simulate and thus >> more easily measure clog contention. Note that current clog has space >> for 1 million xids, so a scale factor of greater than 10 is required >> to really stress the clog. > > Running with this patch with a non-default scale factor generates the > spurious notice: > > "Scale option ignored, using pgbench_branches table count = 10" > > In fact the scale option is not being ignored, because it was used to > initialize the pgbench_branches table count earlier in this same > invocation. > > I think that even in normal (non-initialization) usage, this message > should be suppressed when the provided scale factor > is equal to the pgbench_branches table count. The attached patch does just that. There is probably no reason to warn people that we are doing what they told us to, but not for the reason they think. I think this change makes sense regardless of the disposition of the thread topic. Cheers, Jeff
Attachment
On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> I think that even in normal (non-initialization) usage, this message >> should be suppressed when the provided scale factor >> is equal to the pgbench_branches table count. > > The attached patch does just that. There is probably no reason to > warn people that we are doing what they told us to, but not for the > reason they think. In my opinion, a more sensible approach than anything we're doing right now would be to outright *reject* options that will only be ignored. If -s isn't supported except with -i, then trying to specify -s without -i should just error out at the options-parsing stage, before we even try to connect to the database. It's not very helpful to accept options and then ignore them, and we have many instances of that right now: initialization-only switches are accepted and ignored when not initializing, and run-only switches are accepted and ignored with initializing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I think that even in normal (non-initialization) usage, this message >>> should be suppressed when the provided scale factor >>> is equal to the pgbench_branches table count. >> >> The attached patch does just that. There is probably no reason to >> warn people that we are doing what they told us to, but not for the >> reason they think. > > In my opinion, a more sensible approach than anything we're doing > right now would be to outright *reject* options that will only be > ignored. If -s isn't supported except with -i, then trying to specify > -s without -i should just error out at the options-parsing stage, > before we even try to connect to the database. It's not very helpful > to accept options and then ignore them, and we have many instances of > that right now: initialization-only switches are accepted and ignored > when not initializing, and run-only switches are accepted and ignored > with initializing. I like the ability to say, effectively, "I think I had previously did the initialization with -s 40, if I actually didn't then scream at me, and if I did then go ahead and do the pgbench I just asked for". But, since it does unconditionally report the scale actually used and I just have to have the discipline to go look at the result, I can see where this is perhaps overkill. In my own (non-PG-related) code, when I have tasks that have to be run in multiple phases that can get out of sync if I am not careful, I like to be able to specify the flags even in the "unused" invocation, so that the code can verify I am being consistent. Code is better at that than I am. I'm not sure I know what all would be incompatible with what. I could start drawing that matrix up once the API stabilizes, but I think you are still planning on whacking this -I option around a bit. Cheers, Jeff
On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote: > At 2012-01-12 12:31:20 +0000, simon@2ndQuadrant.com wrote: >> >> The following patch adds a pgbench option -I to load data using >> INSERTs > > This is just to confirm that the patch applies and builds and works > fine (though of course it does take a long time… pity there doesn't > seem to be any easy way to add progress indication like -i has). I was thinking the opposite. That -i should only print progress indication when -d is given. Or at least knock an order of magnitude or two off of how often it does so. Cheers, Jeff
On Mon, Jan 30, 2012 at 10:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen <ams@toroid.org> wrote: >> At 2012-01-12 12:31:20 +0000, simon@2ndQuadrant.com wrote: >>> >>> The following patch adds a pgbench option -I to load data using >>> INSERTs >> >> This is just to confirm that the patch applies and builds and works >> fine (though of course it does take a long time… pity there doesn't >> seem to be any easy way to add progress indication like -i has). > > I was thinking the opposite. That -i should only print progress > indication when -d is given. Or at least knock an order of magnitude > or two off of how often it does so. I'd be in all in favor of having -i emit progress reports 10x less often; even on a laptop, the current reports are very chatty. But I think 100x less often might be taking it too far. Either way, if we're going to have an option for inserts, they should produce the same progress reports that COPY does - though possibly more often, since I'm guessing it's likely to be way slower. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 30, 2012 at 10:48 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>>> I think that even in normal (non-initialization) usage, this message >>>> should be suppressed when the provided scale factor >>>> is equal to the pgbench_branches table count. >>> >>> The attached patch does just that. There is probably no reason to >>> warn people that we are doing what they told us to, but not for the >>> reason they think. >> >> In my opinion, a more sensible approach than anything we're doing >> right now would be to outright *reject* options that will only be >> ignored. If -s isn't supported except with -i, then trying to specify >> -s without -i should just error out at the options-parsing stage, >> before we even try to connect to the database. It's not very helpful >> to accept options and then ignore them, and we have many instances of >> that right now: initialization-only switches are accepted and ignored >> when not initializing, and run-only switches are accepted and ignored >> with initializing. > > I like the ability to say, effectively, "I think I had previously did > the initialization with -s 40, if I actually didn't then scream at me, > and if I did then go ahead and do the pgbench I just asked for". > But, since it does unconditionally report the scale actually used and > I just have to have the discipline to go look at the result, I can see > where this is perhaps overkill. In my own (non-PG-related) code, > when I have tasks that have to be run in multiple phases that can get > out of sync if I am not careful, I like to be able to specify the > flags even in the "unused" invocation, so that the code can verify I > am being consistent. Code is better at that than I am. I guess my point is - if we're gonna have it do that, then shouldn't we actually *error out* if the scales don't match, not just print a warning and then charge ahead? > I'm not sure I know what all would be incompatible with what. I could > start drawing that matrix up once the API stabilizes, but I think you > are still planning on whacking this -I option around a bit. It's mostly that everything in the "initialization options" section of pgbench --help is incompatible with everything in the "benchmarking options" section, and the other way around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 30, 2012 at 12:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I was thinking the opposite. That -i should only print progress >> indication when -d is given. Or at least knock an order of magnitude >> or two off of how often it does so. > > I'd be in all in favor of having -i emit progress reports 10x less > often; even on a laptop, the current reports are very chatty. But I > think 100x less often might be taking it too far. Trivial patch for that attached. > Either way, if we're going to have an option for inserts, they should > produce the same progress reports that COPY does - though possibly > more often, since I'm guessing it's likely to be way slower. I looked at this a little more and I'm coming around to the view that Heikki expressed originally: I think this too much of a one-tricky pony to justify including it in pgbench. It's an interesting hack for testing, but the thing is that, to really make it do anything interesting, you've got to not only use INSERT instead of COPY and create the indexes before loading the table, BUT ALSO frob the existing code to prevent the WAL bypass from doing its thing. I suppose we could have a separate option for all THREE of those behaviors, rather than just the two I mentioned in my previous email, but that seems over the top. So if we're going to have this at all, we might as well just call it --artificially-inflate-clog-pain and be done with it. But I think that's kind of a narrow special case that isn't really worth catering for. Our CLOG contention right now is not so bad that we need to focus a major development effort on making it less, and even if we do want to do that that there's no real evidence that a half-hour pgbench run isn't sufficient to demonstrate the problem perfectly adequately with the code we have right now. After a few minutes the frequency of hitting previously-updated rows is high enough to measure the problem anyway. In the process of developing the various performance improvements we've committed for 9.2, I and others have developed various test cases - Heikki has one he uses for his xlog scaling patch, for example. We can't commit all of those as pgbench options, or we'll go nuts. Ideally it would be nice if pgbench was flexible enough to handle these kinds of uses cases via configuration rather than by hard-coding them. Given all the above, I'm inclined to conclude that this is just another special-purpose test case which we should use for testing and that's it, as Simon already proposed upthread. However, I do think that we should go ahead and make -n work in initialization mode, because I've wanted that a few times. So, patch for that attached, too. (Note that no actual benchmarking is happening right now with regard to the CLOG history patch because, as previously noted, the most recent version does not compile. This probably doesn't matter for that patch hugely anyway, since that mechanism as currently designed does not kick in until a million transactions have been processed, and by that time you'll have quite a spread of XIDs in pgbench_acounts anyway. I don't believe there are any other remaining patches in this CommitFest to which the test case would be applicable; please let me know if I am wrong.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company