Re: Progress Report on Materialized Views - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Progress Report on Materialized Views
Date
Msg-id 002301c3fa6b$99d825b0$ec9387d9@LaptopDellXP
Whole thread Raw
In response to Re: Progress Report on Materialized Views  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Progress Report on Materialized Views  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>Tom Lane
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Could I suggest that your next step is to sync up with the work
being
> > done on tuning the DBT-3 query workload? As I'm sure you're aware,
that
> > is very similar to TPC-H workload, where most of the commercial
RDBMS
> > vendors utilise Materialized Views to enhance certain queries.
> 
> Oh?  As far as I can tell, TPC-H forbids use of materialized views.
> See sections 1.5.6 and 1.5.7 of the spec.  The effect of the fine
print
> seems to be that the only way you are allowed to store extra copies of
> data is as indexes over columns that are primary keys, foreign keys,
> or date columns.

Sorry, I wasn't very clear there. I'm very happy that you're looking at
this area and are able to slap my imprecision into shape so succinctly.
You are 100% right: MVs are unambiguously not allowed as part of the
TPC-H spec - what 1.5.7 shows is how useful MVs are: they've had to ban
them! My take on the reason we now have MVs in all of the major
commercial DBMS is because they weren't directly banned in the original
TPC-D spec.
[ http://www.tpc.org/tpch/spec/tpch2.1.0.pdf ]

For me, there are two issues:
i) passing the TPC-H test
ii) dealing with a real-world workload

IMHO, TPC-H is very much a real-world workload, so the difference is:
i) coping with a TPC-H style workload when you have no a priori
knowledge of what might be performed, and when: that is the heart of the
TPC-H test. I think it is important that we strive to succeed on the
pure test since there always will be many queries you can't predict.
Removing MVs from that was an important statement about the requirement
to cope with ad-hoc queries.

ii) coping with the same workload when you have learnt something about
it - i.e. you are able to tune the system over time. That's where MVs
come in. Of course, you can go too far here, so there needs to be some
judgement about what constitutes a real-world MV scenario.

For me, the issue isn't passing the test, but exceeding it.

If Jonathan's MV work can make a noticeable improvement on the DBT-3
workload, then it will be worthwhile; this is likely to be intertwined
with optimizer improvements required in other areas.

However, overall it was very cheeky of me to presume to bring you or
anybody else together on these things, so I'll butt out until I have
time to contribute personally,

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: aliases, &c in HAVING clause?
Next
From: Tom Lane
Date:
Subject: Re: Too-many-files errors on OS X