Re: Materialized views in Oracle - Mailing list pgsql-general

From Mike Christensen
Subject Re: Materialized views in Oracle
Date
Msg-id CABs1bs2cnyT-K0siX5rdZOvtWgzgr+CAxZY3CPEhPE_Q7p9O7g@mail.gmail.com
Whole thread Raw
In response to Re: Materialized views in Oracle  (Craig Ringer <ringerc@ringerc.id.au>)
Responses Re: Materialized views in Oracle
Re: Materialized views in Oracle
Re: Materialized views in Oracle
Re: Materialized views in Oracle
List pgsql-general
>> Hmm I think a materialized view you have to update yourself is called a
>> "table"..  but after dealing with the nightmare that is Oracle 11g, I
>> think it'd be much more fun going that route with triggers and
>> everything.  Yes this thread is a complete vent, and also a plea to the
>> pg guys to do materialized views right!
>
> In terms of defining "right," it'd be rather handy to know what's wrong with
> the implementation you're currently struggling with. The Oracle guys won't
> have set out to do it wrong; whatever's wrong with it won't be obvious until
> you try to use it a different way to how they envisioned it or will be an
> issue that only comes up in real-world use.
>
> It's entirely possible any future Pg support could fall into the same issue
> ... unless opportunities like this are used to learn from the experience of
> those who went before.
>
> Exactly what do you find frustrating with the mat views you're using?
>
> What's wrong with them? What's right with them? How are you trying to use
> them? What problem are you attempting to solve with them? How do you *want*
> and *expect* them to work?

Oh you've asked for it..

Lemme give you a bit of a brain dump about my day.  I have to first be
up front, I'm not a database expert by any means and I'm nowhere close
to an Oracle expert, or even an Oracle novice.  If Oracle was a
swimming pool, I would have those little floaty duck things on my
arms.  I took a contract gig for "a major cell phone company" that
happens to use Oracle, so I've been trying to figure out stuff as I
go.  I'm about 3 weeks in, so that's exactly how much Oracle
experience I have now.

One of the major components I'm working on is this reporting engine
that runs these major huge expensive queries (seriously, some of them
take 2-3 minutes to run) to generate data that gets dumped to an Excel
file for VPs to read (or make pretty charts out of and look smart in
board rooms).  I've been trying to speed some of these queries up, but
unfortunately the SQL code that generates these reports is about 4,000
lines long and is wrapped in a massive Oracle "package".  The SQL code
is not indented nor does it have any whitespace or anything, you'd
have more fun trying to debug into minimized Javascript files on
Facebook.  I've tried to run some SQL formatters on the code, and so
far any formatter will just crash.  Today, I decided to re-write one
of these reports from scratch since they wanted a bunch of changes
anyway.  The data, as you've probably imagined, is highly
un-normalized and the queries are pretty crazy, so I got the idea to
make a few strategic DB changes as well as stick this report in a
materialized view.  That way, we could just query from that, throw in
whatever filters we need and poof, instant report in like half a
second.  If done right.

Problem 1:

I spent quite a long time writing the query that I was planning on
turning into a view.  This involved doing things like a sub-query to
get the minimum date from another set of rows, and building a comma
delimited list out another nested select with the wm_concat function.
After finally getting this query to work, turning it into a MV ended
up being impossible.

Oracle has a few really cool features around materialized views.
There's a REFRESH ON COMMIT option that automatically updates the
materialized view any time the underlying data is changed (otherwise
you have to manually update it, or setup an update schedule for it).
There's also a FAST REFRESH option that will only update the data
that's changed, rather than run the entire query and re-build every
row (Doing a full refresh on every commit would be crazy expensive, so
no)..  In my opinion, doing a FAST REFRESH ON COMMIT seems to be the
only interesting feature - otherwise, there's not really a point -
just make a normal table and re-build it once a day.  Plus, they
demand these reports to be real-time data anyway.  However, if you
want both of these options (fast refresh and update on commit), Oracle
turns into a complete nightmare.

First, apparently there can be no JOINS in your query.  If you create
a view with:

SELECT A.Foo, B.Foo FROM A INNER JOIN B ON A.ID=B.ID

You'll get the cryptic error:

ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view

Now, you can do the SAME exact query without the ANSI joins:

SELECT A.Foo, B.Foo FROM A, B WHERE A.ID=B.ID

Now all of a sudden it works.  Yes, that makes a lot of sense, and
thanks for the helpful error message too.

Second: You have to create these materialized view logs for all the
underlying tables if you want FAST REFRESH.  This allows Oracle to log
the exact data that changed so it can figure out which parts of your
view table to update.  Why can't Oracle just log its own data when
there's fast refreshing views dependent on it?  You not only have to
create these logs, but you have to create them with special ROWID
parameters otherwise you get more cryptic unhelpful Oracle errors like
the one above.

Next: If you have a SUBSELECT in your query, you're screwed:

SELECT
  A,
  B,
  (SELECT MIN(DATE) FROM MyDates WHERE ID=TABLE1.ID) as EarliestDate
FROM TABLE1

If you do this, you'll get:

ORA-22818: subquery expressions not allowed here

You have to re-write this query in a weird LEFT JOIN instead.  Once
you do that, you'll also break FAST REFRESH with the cryptic error
message from above.

Basically, FAST REFRESH and ON COMMIT are extremely limited in what
you can do with them; any query more complicated than a basic SELECT *
FROM FOO seems to be asking for trouble.  If there are reasons for
these things, both the error message displayed and the Oracle
documentation make trouble shooting next to impossible.  I guess this
is why Oracle consultants make so much money.

So, let's review the actual steps required to make a FAST REFRESH ON
COMMIT view with joins:

First, take the tables:

CREATE TABLE foo(foo NUMBER, CONSTRAINT foo_pk PRIMARY KEY(foo));
CREATE TABLE bar(foo NUMBER, bar NUMBER, CONSTRAINT bar_pk PRIMARY
KEY(foo, bar));

We first have to create materialized view logs for both these guys:

CREATE MATERIALIZED VIEW LOG ON foo WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON bar WITH ROWID;

If we do not do this OR we do not use the ROWID parameter, we get
"Cannot set the ON COMMIT refresh attribute".  We now have to create
the view like this:

CREATE MATERIALIZED VIEW foo_bar
  NOLOGGING
  CACHE
  BUILD IMMEDIATE
  REFRESH FAST ON COMMIT  AS SELECT foo.foo,
                                    bar.bar,
                                    foo.ROWID AS foo_rowid,
                                    bar.ROWID AS bar_rowid
                               FROM foo, bar
                              WHERE foo.foo = bar.foo;

Notice a lot of things: No ANSI JOINs (if you use them, cryptic
error), we have to explicitly say what columns we want, we have to
return the ROWID columns of every unique table.  If we don't do any of
those things, we get "Cannot set the ON COMMIT refresh attribute."

An Oracle expert can probably get into details about why all of these
things are absolutely vital to the database engine, but discovering
these things took over 4 hours of my time today.  I'll also admit
there is this SQL function called DBMS_MVIEW.EXPLAIN_MVIEW that
apparently helps people figure out why their materialized views don't
support certain options, but I couldn't get this function to work, nor
have I gotten any of those DMBS functions to work.  I think they might
be part of some package I either don't have access to or are not
installed on my server; again, the error messages are too cryptic to
tell why.

Problem 2:

Ok so now I have a materialized view.  Since I was never able to do
the nested select with the earliest date (nor LEFT JOIN it in) no
matter how hard I tried, I decided I should stick this value on the
PROJECT table and setup a trigger to update it every time one of the
dates changed.  This should be super easy, right?  Even if you think
it's a bad approach (it might be) or you hate triggers, it should
STILL be easy..  I spent about 4 hours today trying to get the most
basic trigger working at all.

Here was my first attempt:

CREATE TRIGGER Trigger_UpdateTrainingDelivery
    AFTER DELETE OR INSERT OR UPDATE OF STARTDATE
    ON TPM_TRAININGPLAN
    FOR EACH ROW
    BEGIN
       UPDATE TPM_PROJECT SET TRAININGDELIVERYSTART = (SELECT
MIN(TP.STARTDATE) FROM TPM_TRAININGPLAN TP WHERE TP.PROJECTID =
:new.PROJECTID)
       WHERE PROJECTID = :new.PROJECTID;
    END;

So, every time a TRAININGPLAN row changes its STARTDATE field, I want
to recalc TRAININGDELIVERYSTART to be the earliest date in that set.
Should be simple.  However, I tried messing with this query syntax for
hours and got no where.  Each time I tried to create the trigger, I
got the message:

Warnings: --->
   W (1): Warning: execution completed with warning
          <---

That's right - It told me there was a warning, but wasn't actually
nice enough to tell me what the warning is.  There is a "SHOW ERRORS"
command, but it apparently requires SQLPlus and don't get me started
on that thing.  Instead, you have to query a special error table.
When I did this, there was random errors in no particular order, some
didn't even appear to be related to any of this.  There were no
timestamps on the error either, and I found the data totally useless.
However, it still created the trigger.  So then I tried to update one
of the TRAININGPLAN rows.  When I do this, I get:

>[Error] Script lines: 20-22 ------------------------
 ORA-04098: trigger 'TPMDBO.TRIGGER_UPDATETRAININGDELIVERY' is invalid
and failed re-validation
 Script line 20, statement line 1, column 7

So, basically Oracle was fine creating my trigger even though it knew
it was invalid.  Now my database is broken, and it won't even tell me
what's wrong with the trigger.

Well, I never actually did figure out how to fix my TRIGGER but I did
learn that even if I do figure out the magical trigger syntax, I'll
get nothing but "the table is mutating" errors (How X-Men, will my
table have special mutant powers too?).  Since TRAININGPLAN is in a
state of flux until the transaction is committed, it's apparently
impossible to select those rows and get the minimum date.  I don't
know if Postgres has this problem, but it seems to me the engine
should be able to tell me what the minimum date will be since the
trigger is running AFTER the rows are updated, and then commit
everything at once.  I don't see a reason for this limitation, nor do
I see a reason why Oracle can't spit out any meaningful error message
about my invalid trigger either when I create the trigger or the
trigger runs.

It seems that I'm going to have to change my middle-ware to update
this column, which will be a mess because my middle-ware is a giant
nightmare of .NET Entity classes and bad code which I did not write.
If I wrote it, it would be good code.  .NET is my specialty.  One
could argue this is the proper place to do this anyway.

In conclusion:

Overall, I've found Oracle near impossible to work with relative to
Postgres.  The documentation is hard to read, there's very few good
examples, and the tools are expensive, buggy, and crash a lot.
There's no good error messages, there's obscure limitations, and
everything is designed way more complicated than it needs to be.

I'd rather work on a PG database any time of the night or day, no
questions about it.  Maybe this message will show just how important
good error messages and solid documentation are, especially for
beginners.  Anyway, sorry for the massive email but hopefully this'll
be of use to someone, or at least make people chuckle as they
reminisce about their early days as a DBO.  Thanks for listening!

Mike

pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Materialized views in Oracle
Next
From: Craig Ringer
Date:
Subject: Re: Materialized views in Oracle