Hi,
I'm planning to run TPC-DS benchmarks on PostgreSQL and wanted to ask the community about the current recommended approach.
Some background: I've been running TPC-DS on Greenplum-based databases for a long time using adapted tooling (modified queries, load scripts, etc. for the MPP environment).
Now I'd like to benchmark on PostgreSQL as well, and I'm wondering whether the community has converged on a standard tool or workflow — or if everyone is still downloading the official TPC-DS kit and doing their own PostgreSQL adaptations.
A few specific questions:
1. Is there a commonly-used, well-maintained TPC-DS toolset for PostgreSQL? I know some tpcds tools for postgres like gregrahn/tpcds-kit(its tpcds version is 2.x, but the newest tpcds version is 4.x) on GitHub, but I'm curious if there's anything more "official" or widely adopted in the community — something that handles data generation, PostgreSQL-compatible DDL, query adaptation, and result collection out of the box.
2. TPC-DS is now at version 4.0. Which version of the specification are people currently using for PostgreSQL benchmarking? Is there a practical reason to prefer an older version (e.g., v2.x or v3.x) over the latest?
Any pointers to repos, scripts, wiki pages, or past threads would be greatly appreciated.
--
Zhang Mingli
HashData