Thread: pgbench-ycsb
Hello, hackers. It might be a good idea to give users an opportunity to test their applications with pgbench under different real-life-like load. So that they will be able to see what's going to happen on production. YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests were originally designed to facilitate performance comparisons of different cloud data serving systems and it takes into account different application workloads like: workload A - assumes that application do a lot of reads(50%) and updates(50%). workload B - case when application do 95% of cases reads and 5% updates workload C - models behavior of read-only application. workload E - the workload of the applications which in 95% of cases requests for several neighboring tuples and in 5% of cases - does updates. In the patch those workloads were implemented to be executed by pgbench: pgbench -b ycsb-A -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hello Anthony, > applications with pgbench under different real-life-like load. So that > they will be able to see what's going to happen on production. > > YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests > were originally designed to facilitate performance comparisons of > different cloud data serving systems and it takes into account different > application workloads like: > workload A - assumes that application do a lot of reads(50%) and > updates(50%). > workload B - case when application do 95% of cases reads > and 5% updates > workload C - models behavior of read-only application. > workload E - the workload of the applications which in 95% of cases > requests for several neighboring tuples and in 5% of cases - does > updates. > > In the patch those workloads were implemented to be executed by pgbench: > pgbench -b ycsb-A Could you provide a link to the specification? I cannot find something simple, and I was kind of hoping to avoid diving into the source code of the java tool on github:-) In particular, I'm looking for a description of the expected underlying schema and its size (scale) parameters. Patch does not include any documentation, nor help, nor tests. It should. + "\\set write_weight 0\n" + "\\set operation random(1,:total_weight)\n" + "\\if (:operation < :write_weight)\n" This is dead code:-( A lot of copy-paste between the cases, that should be avoided if possible. Note that pgbench already has a builtin weight management. I'd suggest that the implementation could reuse it instead of reimplementing them within these duplicated scripts. Maybe add simple builtins (eg ycsb-read/write/...) for individual transactions and a new --load=ycsb-A which would set the various transactions with their expected weights. A, B, C, E... What is missing to get the D bench as well? -- Fabien.
> On Thu, 19 Jul 2018 at 15:36, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Anthony, > > > applications with pgbench under different real-life-like load. So that > > they will be able to see what's going to happen on production. > > > > YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests > > were originally designed to facilitate performance comparisons of > > different cloud data serving systems and it takes into account different > > application workloads like: > > workload A - assumes that application do a lot of reads(50%) and > > updates(50%). > > workload B - case when application do 95% of cases reads > > and 5% updates > > workload C - models behavior of read-only application. > > workload E - the workload of the applications which in 95% of cases > > requests for several neighboring tuples and in 5% of cases - does > > updates. > > > > In the patch those workloads were implemented to be executed by pgbench: > > pgbench -b ycsb-A > > Could you provide a link to the specification? > > I cannot find something simple, and I was kind of hoping to avoid diving > into the source code of the java tool on github:-) In particular, I'm > looking for a description of the expected underlying schema and its size > (scale) parameters. There are the description files for different workloads, like [1], (with the custom amount of records, of course) and the schema [2]. Would this information be enough? [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql
On 2018-07-19 16:50, Dmitry Dolgov wrote: >> On Thu, 19 Jul 2018 at 15:36, Fabien COELHO <coelho@cri.ensmp.fr> >> wrote: >> >> >> Hello Anthony, >> >> > applications with pgbench under different real-life-like load. So that >> > they will be able to see what's going to happen on production. >> > >> > YCSB (Yahoo! Cloud Serving Benchmark) was taken as a concept. YCSB tests >> > were originally designed to facilitate performance comparisons of >> > different cloud data serving systems and it takes into account different >> > application workloads like: >> > workload A - assumes that application do a lot of reads(50%) and >> > updates(50%). >> > workload B - case when application do 95% of cases reads >> > and 5% updates >> > workload C - models behavior of read-only application. >> > workload E - the workload of the applications which in 95% of cases >> > requests for several neighboring tuples and in 5% of cases - does >> > updates. >> > >> > In the patch those workloads were implemented to be executed by pgbench: >> > pgbench -b ycsb-A >> >> Could you provide a link to the specification? >> >> I cannot find something simple, and I was kind of hoping to avoid >> diving >> into the source code of the java tool on github:-) In particular, I'm >> looking for a description of the expected underlying schema and its >> size >> (scale) parameters. > > There are the description files for different workloads, like [1], > (with the > custom amount of records, of course) and the schema [2]. Would this > information be enough? > > [1]: > https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada > [2]: > https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql Hi. Thanks for your feedback, I'll fix it soon. Actually I used the article "Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears. Benchmarking Cloud Serving Systems with YCSB. ACM Symposium on Cloud Computing (SoCC), Indianapolis, IN, USA, 2010" It is available here: https://github.com/brianfrankcooper/YCSB/wiki/Papers-and-Presentations But maybe an article is more complicated then your example. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
>> Could you provide a link to the specification? >> >> I cannot find something simple, and I was kind of hoping to avoid diving >> into the source code of the java tool on github:-) In particular, I'm >> looking for a description of the expected underlying schema and its size >> (scale) parameters. > > There are the description files for different workloads, like [1], (with the > custom amount of records, of course) and the schema [2]. Would this > information be enough? > > [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada > [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql The second link is a start. I notice that the submitted patch transactions do not apply to this schema, which is significantly different from the pgbench TPC-B (like) benchmark. The YCSB schema is key -> fields[0-9], all of them TEXT, somehow expected to be 100 bytes each, and update is expected to update one of these fields. This suggest that maybe a -i extension would be in order. Possibly pgbench -i -s 1 --layout={tpcb,ycsb} (or schema ?) where "tpcb" would be the default? I'm sceptical about using a textual primary key as it corresponds more to NoSQL limitations than to an actual design choice. I'd be okay with INT8 as a pkey. I find the YSCB tablename "usertable" especially unhelpful. Maybe "pgbench_ycsb"? -- Fabien.
> On Sat, 21 Jul 2018 at 22:41, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > >> Could you provide a link to the specification? > >> > >> I cannot find something simple, and I was kind of hoping to avoid diving > >> into the source code of the java tool on github:-) In particular, I'm > >> looking for a description of the expected underlying schema and its size > >> (scale) parameters. > > > > There are the description files for different workloads, like [1], (with the > > custom amount of records, of course) and the schema [2]. Would this > > information be enough? > > > > [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada > > [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql > > The second link is a start. > > I notice that the submitted patch transactions do not apply to this > schema, which is significantly different from the pgbench TPC-B (like) > benchmark. > > The YCSB schema is key -> fields[0-9], all of them TEXT, somehow expected > to be 100 bytes each, and update is expected to update one of these > fields. > > This suggest that maybe a -i extension would be in order. Possibly > > pgbench -i -s 1 --layout={tpcb,ycsb} (or schema ?) > > where "tpcb" would be the default? > > I'm sceptical about using a textual primary key as it corresponds more to > NoSQL limitations than to an actual design choice. I'd be okay with INT8 > as a pkey. > > I find the YSCB tablename "usertable" especially unhelpful. Maybe > "pgbench_ycsb"? Just to clarify - if I understand Anthony correctly, this proposal is not about implementing exactly YCSB as it is, but more about using zipfian distribution for an id in the regular pgbench table structure in conjunction with read/write balance to simulate something similar to it. And probably instead of implementing the exact YCSB workload inside pgbench, it makes more sense to add PostgreSQL Jsonb as one of the options into the framework itself (I was in the middle of it few years ago, but then was distracted by some interesting benchmarking results).
> Just to clarify - if I understand Anthony correctly, this proposal is not about > implementing exactly YCSB as it is, but more about using zipfian distribution > for an id in the regular pgbench table structure in conjunction with read/write > balance to simulate something similar to it. Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the point is not to implement YCSB, then do not call it YCSB:-) Maybe there could be other simpler builtins to use non uniform distributions: {zipf,exp,...}-{simple,select} and default values (exp_param, zipf_param?) for the random distribution parameters. \set id random_zipfian(1, 100000*:scale, :zipf_param) \set val random(-5000, 5000) UPDATE pgbench_whatever ...; Then pgbench -b zipf-se@1 -b zipf-si@1 [ -D zipf_param=1.1 ... ] -T 10000 ... > And probably instead of implementing the exact YCSB workload inside pgbench, it > makes more sense to add PostgreSQL Jsonb as one of the options into the > framework itself (I was in the middle of it few years ago, but then was > distracted by some interesting benchmarking results). Sure. -- Fabien.
On 2018-07-22 16:56, Fabien COELHO wrote: >> Just to clarify - if I understand Anthony correctly, this proposal is >> not about >> implementing exactly YCSB as it is, but more about using zipfian >> distribution >> for an id in the regular pgbench table structure in conjunction with >> read/write >> balance to simulate something similar to it. > > Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the > point is not to implement YCSB, then do not call it YCSB:-) > > Maybe there could be other simpler builtins to use non uniform > distributions: {zipf,exp,...}-{simple,select} and default values > (exp_param, zipf_param?) for the random distribution parameters. > > \set id random_zipfian(1, 100000*:scale, :zipf_param) > \set val random(-5000, 5000) > UPDATE pgbench_whatever ...; > > Then > > pgbench -b zipf-se@1 -b zipf-si@1 [ -D zipf_param=1.1 ... ] -T 10000 > ... > >> And probably instead of implementing the exact YCSB workload inside >> pgbench, it >> makes more sense to add PostgreSQL Jsonb as one of the options into >> the >> framework itself (I was in the middle of it few years ago, but then >> was >> distracted by some interesting benchmarking results). > > Sure. Hello, thank you for your interest. I'm still improving this idea, the patch and I'm very happy about the discussion we have. It really helps. The idea was to implement the workloads as close to YCSB as possible using pgbench. So, the schema it should be applied to - is default schema generated by pgbnench -i (pgbench_accounts). -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
>>> Just to clarify - if I understand Anthony correctly, this proposal is >>> not about implementing exactly YCSB as it is, but more about using >>> zipfian distribution for an id in the regular pgbench table structure >>> in conjunction with read/write balance to simulate something similar >>> to it. >> >> Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the >> point is not to implement YCSB, then do not call it YCSB:-) >> >> Maybe there could be other simpler builtins to use non uniform >> distributions: {zipf,exp,...}-{simple,select} and default values >> (exp_param, zipf_param?) for the random distribution parameters. >> >> \set id random_zipfian(1, 100000*:scale, :zipf_param) >> \set val random(-5000, 5000) >> UPDATE pgbench_whatever ...; >> >> Then >> >> pgbench -b zipf-se@1 -b zipf-si@1 [ -D zipf_param=1.1 ... ] -T 10000 ... >> >>> And probably instead of implementing the exact YCSB workload inside >>> pgbench, it makes more sense to add PostgreSQL Jsonb as one of the >>> options into the framework itself (I was in the middle of it few years >>> ago, but then was distracted by some interesting benchmarking >>> results). >> >> Sure. > > Hello, > thank you for your interest. I'm still improving this idea, the patch > and I'm very happy about the discussion we have. It really helps. > > The idea was to implement the workloads as close to YCSB as possible > using pgbench. Basically I'm against having something called YCSB if it is not YCSB;-) > So, the schema it should be applied to - is default schema generated by > pgbnench -i (pgbench_accounts). This is a contradiction, because pgbench_accounts table is in no way, even remotely, conformant to the YCSB benchmark test table. So for me there are three possibilities: (1) do nothing, always an option as committers may be against extending pgbench in this direction anyway. Personally I'm fine with having it. (2) implement YCSB cleanly, i.e. both initialization and operations, at least if this is "reasonable" (i.e. it does not result in 2000 lines of new code). ISTM that it can be done, given that the YCSB schema is very simple, hence I suggested "pgbench -i --schema yscb" to trigger a non default initialization. (3) if you are interested in demonstrating non uniform distribution on pgbench_accounts, I'm also fine with it, just do so, but do *NOT* call it YCSB. Also it seems that the YCSB bench uses some hashing to mix keys and avoid having 1 as the most frequent, 2 as the second, and so on. There is a hash function in pgbench which can be used (although the solution is not perfect, some values cannot be reached), but it is used by YCSB. Otherwise I'm planning to submit a pseudo-random permutation function to ease this some day, provided that the size of the table stays constant. -- Fabien.
On Sun, Jul 22, 2018 at 4:42 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Basically I'm against having something called YCSB if it is not YCSB;-) Yep, that seems pretty clear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company