Problems implementing TPC-D queries - Mailing list pgsql-sql

From Paul Parker
Subject Problems implementing TPC-D queries
Date
Msg-id 199806040337.WAA09687@noel.cs.rice.edu
Whole thread Raw
List pgsql-sql

Hi,

I've been working for a while on a research project that attempts to
"parallelize" Postgres via a software DSM (more work than expected and not
actually useful for any end-user).  We want to use TPC-D to model commercial
workloads to analyze the speedup and memory behavior.  I've gotten most of the
queries to work (the inclusion of subselects in 6.3 was a big help and is
earnestly appreciated), but I'm still having problems with a couple.

Query 12 analyzes the performance of a couple of shipping methods, finding
out how many late high-priority and low-priority shipments were made using
each method.  TPC's definitional query uses CASE in an aggregate, which
isn't implemented.  One variant uses aggregates (COUNT) inside views, which
of course doesn't work.  A second variant uses temporary tables instead of
views, which I was more hopeful about.  Unfortunately, I have been unable
to use INSERT INTO to actually insert the tuples from the SELECT.  That is:


tpc_test=>     SELECT
tpc_test->         SHIPMODE, COUNT(*)
tpc_test->     FROM LATEORDRS0
tpc_test->     WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH')
tpc_test->     AND shipmode = 'RAIL'
tpc_test->     GROUP BY SHIPMODE;
shipmode  |count
----------+-----
RAIL      |   44
(1 row)

as compared to:

tpc_test=> CREATE TABLE QUICK0 (SHIPMODE CHAR(10), HIGH_LINE_COUNT INTEGER);
CREATE
tpc_test=>
tpc_test=> INSERT INTO QUICK0
tpc_test->     SELECT
tpc_test->         SHIPMODE, COUNT(*)
tpc_test->     FROM LATEORDRS0
tpc_test->     WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH')
tpc_test->     AND shipmode = 'RAIL'
tpc_test->     GROUP BY SHIPMODE;
INSERT 105206 1
tpc_test=> SELECT * FROM QUICK0;
shipmode|high_line_count
--------+---------------
        |              0


Note the types are correct (assuming count returns an integer, which is
validated below):

CREATE VIEW LATEORDRS0 AS
    SELECT
        L_SHIPMODE AS SHIPMODE, O_ORDRPRIORITY AS ORDRPRIORITY
    FROM ORDR, LINEITEM
    WHERE O_ORDRKEY = L_ORDRKEY
        AND L_SHIPMODE IN ('TRUCK','RAIL')
        AND L_COMMITDATE < L_RECEIPTDATE
        AND L_SHIPDATE < L_COMMITDATE
        AND L_RECEIPTDATE >= '1993-01-01'::DATETIME
        AND L_RECEIPTDATE < ('1993-01-01'::DATETIME + '1 YEAR'::TIMESPAN);

tpc_test-> \d lineitem

Table    = lineitem
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
...
| l_shipmode                       | char()                           |    10 |


In fact, this query works (assuming quick0 has at least one row with a
non-NULL shipmode):

insert into quick0
  select shipmode, count(*)
  from quick0
  group by shipmode ;


But this query only adds a blank and zero tuple:

insert into quick0
  select shipmode, count(*)
  from lateordrs0
  group by shipmode ;
select * from quick0;


Quite likely this is actually a bug, but I don't know SQL well enough to
pronounce it that for sure.


The other query's problem is COUNT DISTINCT (unimplemented, I believe).
I've seen this mentioned in the archives, but I haven't seen any solutions
that would work.  They mentioned you could produce a DISTINCT view, but
COUNT(*) failed on it (related to the view-aggregate problem??).  I suspect
INSERT INTO SELECT DISTINCT... might work, however even if it did, I would
need to do it for each of some arbitrary number of cases, as per the query,
which counts the number of suppliers for each part meeting some
specifications:


SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS SUPPLIER_CNT
FROM PARTSUPP, PART
WHERE
    P_PARTKEY = PS_PARTKEY AND
    P_BRAND <> 'Brand#14' AND
    P_TYPE NOT LIKE 'LARGE PLATED%' AND
    P_SIZE IN (45, 22, 2, 48, 4, 49, 26, 10) AND
    PS_SUPPKEY NOT IN
        (SELECT S_SUPPKEY FROM SUPPLIER WHERE S_COMMENT LIKE
        '%Better Business Bureau%Complaints%')
GROUP BY P_BRAND, P_TYPE, P_SIZE
ORDER BY SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE;


Also, for no reason readily evident to myself, this last query takes 41 cpu
hours on a P6-150 w/ 64 M RAM running FreeBSD 2.2 over a 20 M database,
after having done vacuum analyze (which improved several queries one or two
orders of magnitude), whereas the other queries take a few cpu seconds, a
difference of about 1000x.  True, the VM size does swell to 57 M, but the
working set seems content at <30M, which the otherwise empty machine easily
gives it.

CREATE TABLE ALL_NATIONS0
    (YEAR_CREATE INTEGER,
     VOLUME FLOAT8,
     NATION CHAR(25));

INSERT INTO ALL_NATIONS0
    SELECT EXTRACT (YEAR FROM O_ORDRDATE)::INTEGER AS YEAR_SEL, L_EXTENDEDPRICE*(1-L_DISCOUNT) AS VOLUME,
        N2.N_NAME AS NATION
    FROM PART, SUPPLIER, LINEITEM, ORDR, CUSTOMER,
        NATION N1, NATION N2, REGION
    WHERE P_PARTKEY = L_PARTKEY
        AND S_SUPPKEY = L_SUPPKEY
        AND L_ORDRKEY = O_ORDRKEY
        AND O_CUSTKEY = C_CUSTKEY
        AND C_NATIONKEY = N1.N_NATIONKEY
        AND N1.N_REGIONKEY = R_REGIONKEY
        AND R_NAME = 'ASIA'
        AND S_NATIONKEY = N2.N_NATIONKEY
        AND O_ORDRDATE > ('1995-01-01'::DATETIME)
        AND O_ORDRDATE < ('1996-12-31'::DATETIME)
        AND P_TYPE = 'ECONOMY POLISHED COPPER';


Any help anyone can give is appreciated,

Paul

-----------
Paul Parker\___________________________________
pparker@cs.rice.edu   a.k.a.  "The other Paul" \_____________
Second-year grad-student seeking a decent signature quotation

pgsql-sql by date:

Previous
From: Stoeppel@oberland.net (Stoeppel)
Date:
Subject: problems with CAST :-{
Next
From: "Konstantin S. Kuznetsov"
Date:
Subject: Need ODBC driver