Thread: DBT-5 Stored Procedure Development (2022)
Dear all,
Please review the attached for my jerry-rigged project proposal. I am seeking to continually refactor the proposal as I can!
Thanks,
Mahesh
Attachment
On Tue, Apr 19, 2022 at 11:02 AM Mahesh Gouru <mahesh.gouru@gmail.com> wrote: > Please review the attached for my jerry-rigged project proposal. I am seeking to continually refactor the proposal as Ican! I for one see a lot of value in this proposal. I think it would be great to revive DBT-5, since TPC-E has a number of interesting bottlenecks that we'd likely learn something from. It's particularly good at stressing concurrency control, which TPC-C really doesn't do. It's also a lot easier to run smaller benchmarks that don't require lots of storage space, but are nevertheless correct according to the spec. -- Peter Geoghegan
Hi Mahesh, On Tue, Apr 19, 2022 at 02:01:54PM -0400, Mahesh Gouru wrote: > Dear all, > > Please review the attached for my jerry-rigged project proposal. I am > seeking to continually refactor the proposal as I can! My comments might briefer that they should be, but I need to write this quickly. :) * The 4 steps in the description aren't needed, they already exist. * May 20: I think this should be more about reviewing the TPC-E specification rather than industry research, as we want to try to follow specification guidelines. * June 20: Random data generation and scaling are provided by and already defined by the spec * Aug 01: A report generator already exists, but I think time could be allocated to redoing the raw HTML generation with something like reStructuredText, something that is easier to generate with scripts and convertible into other formats with other tools As some of tasks proposed are actually in place, one other task could be updating egen (the TPC supplied code.) The kit was last developed again 1.12 and 1.14 is current as this email. Regards, Mark
On Tue, Apr 19, 2022 at 11:31 AM Mark Wong <markwkm@gmail.com> wrote: > As some of tasks proposed are actually in place, one other task could be > updating egen (the TPC supplied code.) The kit was last developed again > 1.12 and 1.14 is current as this email. As you know, I have had some false starts with using DBT5 on a modern Linux distribution. Perhaps I gave up too easily at the time, but I'm definitely still interested. Has there been work on that since? Thanks -- Peter Geoghegan
On Tue, Apr 19, 2022 at 05:20:50PM -0700, Peter Geoghegan wrote: > On Tue, Apr 19, 2022 at 11:31 AM Mark Wong <markwkm@gmail.com> wrote: > > As some of tasks proposed are actually in place, one other task could be > > updating egen (the TPC supplied code.) The kit was last developed again > > 1.12 and 1.14 is current as this email. > > As you know, I have had some false starts with using DBT5 on a modern > Linux distribution. Perhaps I gave up too easily at the time, but I'm > definitely still interested. Has there been work on that since? I'm afraid not. I'm guessing that pulling in egen 1.14 would address that. Maybe it would make sense to put that on the top of todo list if this project is accepted... Regards, Mark
On Tue, Apr 26, 2022 at 10:36 AM Mark Wong <markwkm@gmail.com> wrote: > I'm afraid not. I'm guessing that pulling in egen 1.14 would address > that. Maybe it would make sense to put that on the top of todo list if > this project is accepted... Wouldn't it be a prerequisite here? I don't actually have any reason to prefer the old function-based code to the new stored procedure based code. Really, all I'm looking for is a credible implementation of TPC-E that I can use to model some aspects of OLTP performance for my own purposes. TPC-C (which I have plenty of experience with) has only two secondary indexes (in typical configurations), and doesn't really stress concurrency control at all. Plus there are no low cardinality indexes in TPC-C, while TPC-E has quite a few. Chances are high that I'd learn something from TPC-E, which has all of these things -- I'm really looking for bottlenecks, where Postgres does entirely the wrong thing. It's especially interesting to me as somebody that focuses on B-Tree indexing. -- Peter Geoghegan
On Mon, May 02, 2022 at 07:14:28AM -0700, Mark Wong wrote: > On Tue, Apr 26, 2022, 10:45 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > On Tue, Apr 26, 2022 at 10:36 AM Mark Wong <markwkm@gmail.com> wrote: > > > I'm afraid not. I'm guessing that pulling in egen 1.14 would address > > > that. Maybe it would make sense to put that on the top of todo list if > > > this project is accepted... > > > > Wouldn't it be a prerequisite here? I don't actually have any reason > > to prefer the old function-based code to the new stored procedure > > based code. Really, all I'm looking for is a credible implementation > > of TPC-E that I can use to model some aspects of OLTP performance for > > my own purposes. > > > > TPC-C (which I have plenty of experience with) has only two secondary > > indexes (in typical configurations), and doesn't really stress > > concurrency control at all. Plus there are no low cardinality indexes > > in TPC-C, while TPC-E has quite a few. Chances are high that I'd learn > > something from TPC-E, which has all of these things -- I'm really > > looking for bottlenecks, where Postgres does entirely the wrong thing. > > It's especially interesting to me as somebody that focuses on B-Tree > > indexing. I think it could be done in either order. While it's not ideal that the kit seems to work most reliably as-is on RHEL/Centos/etc. 6, I think that could provide some confidence in getting familiar with something on a working platform. The updates to the stored functions/procedures would be the same regardless of egen version. If we get the project slot, we can talk further about what to actually tackle first. Regards, Mark