Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets - Mailing list pgsql-hackers

From Lawrence, Ramon
Subject Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date
Msg-id 6EEA43D22289484890D119821101B1DF2C180E@exchange20.mercury.ad.ubc.ca
Whole thread Raw
In response to Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Lawrence, Ramon" <ramon.lawrence@ubc.ca>)
List pgsql-hackers
Robert,

You do not need to use qgen.exe to generate queries as you are not
running the TPC-H benchmark test.  Attached is an example of the 22
sample TPC-H queries according to the benchmark.

We have not tested using the TPC-H queries for this particular patch and
only use the TPC-H database as a large, skewed data set.  The simpler
queries we test involve joins of Part-Lineitem or Supplier-Lineitem such
as:

Select * from part, lineitem where p_partkey = l_partkey

OR

Select count(*) from part, lineitem where p_partkey = l_partkey

The count(*) version is usually more useful for comparisons as the
generation of output tuples on the client side (say with pgadmin)
dominates the actual time to complete the query.

To isolate query costs, we also test using a simple server-side
function.  The setup description I have also attached.

I would be happy to help in any way I can.

Bryce is currently working on an updated patch according to your
suggestions.

--
Dr. Ramon Lawrence
Assistant Professor, Department of Computer Science, University of
British Columbia Okanagan
E-mail: ramon.lawrence@ubc.ca


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Robert Haas
> Sent: December 17, 2008 7:54 PM
> To: Lawrence, Ramon
> Cc: Tom Lane; pgsql-hackers@postgresql.org; Bryce Cutt
> Subject: Re: [HACKERS] Proposed Patch to Improve Performance of Multi-
> Batch Hash Join for Skewed Data Sets
>
> Dr. Lawrence:
>
> I'm still working on reviewing this patch.  I've managed to load the
> sample TPCH data from tpch1g1z.zip after changing the line endings to
> UNIX-style and chopping off the trailing vertical bars.  (If anyone is
> interested, I have the results of pg_dump | bzip2 -9 on the resulting
> database, which I would be happy to upload if someone has server
> space.  It is about 250MB.)
>
> But, I'm not sure quite what to do in terms of generating queries.
> TPCHSkew contains QGEN.EXE, but that seems to require that you provide
> template queries as input, and I'm not sure where to get the
> templates.
>
> Any suggestions?
>
> Thanks,
>
> ...Robert
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Partitioning wiki page
Next
From: Heikki Linnakangas
Date:
Subject: Re: Preventing index scans for non-recoverable index AMs