Re: pgbench-ycsb - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: pgbench-ycsb
Date
Msg-id CA+q6zcXRqgTD-uG1snC4LiNAsBoujFHbnyZ3OL8q+1QPJso_mw@mail.gmail.com
Whole thread Raw
In response to Re: pgbench-ycsb  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench-ycsb  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
> On Sat, 21 Jul 2018 at 22:41, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> >> Could you provide a link to the specification?
> >>
> >> I cannot find something simple, and I was kind of hoping to avoid diving
> >> into the source code of the java tool on github:-) In particular, I'm
> >> looking for a description of the expected underlying schema and its size
> >> (scale) parameters.
> >
> > There are the description files for different workloads, like [1], (with the
> > custom amount of records, of course) and the schema [2]. Would this
> > information be enough?
> >
> > [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
> > [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql
>
> The second link is a start.
>
> I notice that the submitted patch transactions do not apply to this
> schema, which is significantly different from the pgbench TPC-B (like)
> benchmark.
>
> The YCSB schema is key -> fields[0-9], all of them TEXT, somehow expected
> to be 100 bytes each, and update is expected to update one of these
> fields.
>
> This suggest that maybe a -i extension would be in order. Possibly
>
>     pgbench -i -s 1 --layout={tpcb,ycsb} (or schema ?)
>
> where "tpcb" would be the default?
>
> I'm sceptical about using a textual primary key as it corresponds more to
> NoSQL limitations than to an actual design choice. I'd be okay with INT8
> as a pkey.
>
> I find the YSCB tablename "usertable" especially unhelpful. Maybe
> "pgbench_ycsb"?

Just to clarify - if I understand Anthony correctly, this proposal is not about
implementing exactly YCSB as it is, but more about using zipfian distribution
for an id in the regular pgbench table structure in conjunction with read/write
balance to simulate something similar to it.

And probably instead of implementing the exact YCSB workload inside pgbench, it
makes more sense to add PostgreSQL Jsonb as one of the options into the
framework itself (I was in the middle of it few years ago, but then was
distracted by some interesting benchmarking results).


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Non-portable shell code in pg_upgrade tap tests
Next
From: Andrei Korigodski
Date:
Subject: Re: pgbench: improve --help and --version parsing