We propose a patch that improves hybrid hash join’s performance for large multi-batch joins where the probe relation has skew.
Project name: Histojoin
Patch file: histojoin_v1.patch
This patch implements the Histojoin join algorithm as an optional feature added to the standard Hybrid Hash Join (HHJ). A flag is used to enable or disable the Histojoin features. When Histojoin is disabled, HHJ acts as normal. The Histojoin features allow HHJ to use PostgreSQL’s statistics to do skew aware partitioning. The basic idea is to keep build relation tuples in a small in-memory hash table that have join values that are frequently occurring in the probe relation. This improves performance of HHJ when multiple batches are used by 10% to 50% for skewed data sets. The performance improvements of this patch can be seen in the paper (pages 25-30) at:
http://people.ok.ubc.ca/rlawrenc/histojoin2.pdf
All generators and materials needed to verify these results can be provided.
This is a patch against the HEAD of the repository.
This patch does not contain platform specific code. It compiles and has been tested on our machines in both Windows (MSVC++) and Linux (GCC).
Currently the Histojoin feature is enabled by default and is used whenever HHJ is used and there are Most Common Value (MCV) statistics available on the probe side base relation of the join. To disable this feature simply set the enable_hashjoin_usestatmcvs flag to off in the database configuration file or at run time with the 'set' command.
One potential improvement not included in the patch is that Most Common Value (MCV) statistics are only determined when the probe relation is produced by a scan operator. There is a benefit to using MCVs even when the probe relation is not a base scan, but we were unable to determine how to find statistics from a base relation after other operators are performed.
This patch was created by Bryce Cutt as part of his work on his M.Sc. thesis.
--
Dr. Ramon Lawrence
Assistant Professor, Department of Computer Science, University of British Columbia Okanagan
E-mail: ramon.lawrence@ubc.ca