Re: Materialized views in Oracle - Mailing list pgsql-general
From | Mike Christensen |
---|---|
Subject | Re: Materialized views in Oracle |
Date | |
Msg-id | CABs1bs2cnyT-K0siX5rdZOvtWgzgr+CAxZY3CPEhPE_Q7p9O7g@mail.gmail.com Whole thread Raw |
In response to | Re: Materialized views in Oracle (Craig Ringer <ringerc@ringerc.id.au>) |
Responses |
Re: Materialized views in Oracle
Re: Materialized views in Oracle Re: Materialized views in Oracle Re: Materialized views in Oracle |
List | pgsql-general |
>> Hmm I think a materialized view you have to update yourself is called a >> "table".. but after dealing with the nightmare that is Oracle 11g, I >> think it'd be much more fun going that route with triggers and >> everything. Yes this thread is a complete vent, and also a plea to the >> pg guys to do materialized views right! > > In terms of defining "right," it'd be rather handy to know what's wrong with > the implementation you're currently struggling with. The Oracle guys won't > have set out to do it wrong; whatever's wrong with it won't be obvious until > you try to use it a different way to how they envisioned it or will be an > issue that only comes up in real-world use. > > It's entirely possible any future Pg support could fall into the same issue > ... unless opportunities like this are used to learn from the experience of > those who went before. > > Exactly what do you find frustrating with the mat views you're using? > > What's wrong with them? What's right with them? How are you trying to use > them? What problem are you attempting to solve with them? How do you *want* > and *expect* them to work? Oh you've asked for it.. Lemme give you a bit of a brain dump about my day. I have to first be up front, I'm not a database expert by any means and I'm nowhere close to an Oracle expert, or even an Oracle novice. If Oracle was a swimming pool, I would have those little floaty duck things on my arms. I took a contract gig for "a major cell phone company" that happens to use Oracle, so I've been trying to figure out stuff as I go. I'm about 3 weeks in, so that's exactly how much Oracle experience I have now. One of the major components I'm working on is this reporting engine that runs these major huge expensive queries (seriously, some of them take 2-3 minutes to run) to generate data that gets dumped to an Excel file for VPs to read (or make pretty charts out of and look smart in board rooms). I've been trying to speed some of these queries up, but unfortunately the SQL code that generates these reports is about 4,000 lines long and is wrapped in a massive Oracle "package". The SQL code is not indented nor does it have any whitespace or anything, you'd have more fun trying to debug into minimized Javascript files on Facebook. I've tried to run some SQL formatters on the code, and so far any formatter will just crash. Today, I decided to re-write one of these reports from scratch since they wanted a bunch of changes anyway. The data, as you've probably imagined, is highly un-normalized and the queries are pretty crazy, so I got the idea to make a few strategic DB changes as well as stick this report in a materialized view. That way, we could just query from that, throw in whatever filters we need and poof, instant report in like half a second. If done right. Problem 1: I spent quite a long time writing the query that I was planning on turning into a view. This involved doing things like a sub-query to get the minimum date from another set of rows, and building a comma delimited list out another nested select with the wm_concat function. After finally getting this query to work, turning it into a MV ended up being impossible. Oracle has a few really cool features around materialized views. There's a REFRESH ON COMMIT option that automatically updates the materialized view any time the underlying data is changed (otherwise you have to manually update it, or setup an update schedule for it). There's also a FAST REFRESH option that will only update the data that's changed, rather than run the entire query and re-build every row (Doing a full refresh on every commit would be crazy expensive, so no).. In my opinion, doing a FAST REFRESH ON COMMIT seems to be the only interesting feature - otherwise, there's not really a point - just make a normal table and re-build it once a day. Plus, they demand these reports to be real-time data anyway. However, if you want both of these options (fast refresh and update on commit), Oracle turns into a complete nightmare. First, apparently there can be no JOINS in your query. If you create a view with: SELECT A.Foo, B.Foo FROM A INNER JOIN B ON A.ID=B.ID You'll get the cryptic error: ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view Now, you can do the SAME exact query without the ANSI joins: SELECT A.Foo, B.Foo FROM A, B WHERE A.ID=B.ID Now all of a sudden it works. Yes, that makes a lot of sense, and thanks for the helpful error message too. Second: You have to create these materialized view logs for all the underlying tables if you want FAST REFRESH. This allows Oracle to log the exact data that changed so it can figure out which parts of your view table to update. Why can't Oracle just log its own data when there's fast refreshing views dependent on it? You not only have to create these logs, but you have to create them with special ROWID parameters otherwise you get more cryptic unhelpful Oracle errors like the one above. Next: If you have a SUBSELECT in your query, you're screwed: SELECT A, B, (SELECT MIN(DATE) FROM MyDates WHERE ID=TABLE1.ID) as EarliestDate FROM TABLE1 If you do this, you'll get: ORA-22818: subquery expressions not allowed here You have to re-write this query in a weird LEFT JOIN instead. Once you do that, you'll also break FAST REFRESH with the cryptic error message from above. Basically, FAST REFRESH and ON COMMIT are extremely limited in what you can do with them; any query more complicated than a basic SELECT * FROM FOO seems to be asking for trouble. If there are reasons for these things, both the error message displayed and the Oracle documentation make trouble shooting next to impossible. I guess this is why Oracle consultants make so much money. So, let's review the actual steps required to make a FAST REFRESH ON COMMIT view with joins: First, take the tables: CREATE TABLE foo(foo NUMBER, CONSTRAINT foo_pk PRIMARY KEY(foo)); CREATE TABLE bar(foo NUMBER, bar NUMBER, CONSTRAINT bar_pk PRIMARY KEY(foo, bar)); We first have to create materialized view logs for both these guys: CREATE MATERIALIZED VIEW LOG ON foo WITH ROWID; CREATE MATERIALIZED VIEW LOG ON bar WITH ROWID; If we do not do this OR we do not use the ROWID parameter, we get "Cannot set the ON COMMIT refresh attribute". We now have to create the view like this: CREATE MATERIALIZED VIEW foo_bar NOLOGGING CACHE BUILD IMMEDIATE REFRESH FAST ON COMMIT AS SELECT foo.foo, bar.bar, foo.ROWID AS foo_rowid, bar.ROWID AS bar_rowid FROM foo, bar WHERE foo.foo = bar.foo; Notice a lot of things: No ANSI JOINs (if you use them, cryptic error), we have to explicitly say what columns we want, we have to return the ROWID columns of every unique table. If we don't do any of those things, we get "Cannot set the ON COMMIT refresh attribute." An Oracle expert can probably get into details about why all of these things are absolutely vital to the database engine, but discovering these things took over 4 hours of my time today. I'll also admit there is this SQL function called DBMS_MVIEW.EXPLAIN_MVIEW that apparently helps people figure out why their materialized views don't support certain options, but I couldn't get this function to work, nor have I gotten any of those DMBS functions to work. I think they might be part of some package I either don't have access to or are not installed on my server; again, the error messages are too cryptic to tell why. Problem 2: Ok so now I have a materialized view. Since I was never able to do the nested select with the earliest date (nor LEFT JOIN it in) no matter how hard I tried, I decided I should stick this value on the PROJECT table and setup a trigger to update it every time one of the dates changed. This should be super easy, right? Even if you think it's a bad approach (it might be) or you hate triggers, it should STILL be easy.. I spent about 4 hours today trying to get the most basic trigger working at all. Here was my first attempt: CREATE TRIGGER Trigger_UpdateTrainingDelivery AFTER DELETE OR INSERT OR UPDATE OF STARTDATE ON TPM_TRAININGPLAN FOR EACH ROW BEGIN UPDATE TPM_PROJECT SET TRAININGDELIVERYSTART = (SELECT MIN(TP.STARTDATE) FROM TPM_TRAININGPLAN TP WHERE TP.PROJECTID = :new.PROJECTID) WHERE PROJECTID = :new.PROJECTID; END; So, every time a TRAININGPLAN row changes its STARTDATE field, I want to recalc TRAININGDELIVERYSTART to be the earliest date in that set. Should be simple. However, I tried messing with this query syntax for hours and got no where. Each time I tried to create the trigger, I got the message: Warnings: ---> W (1): Warning: execution completed with warning <--- That's right - It told me there was a warning, but wasn't actually nice enough to tell me what the warning is. There is a "SHOW ERRORS" command, but it apparently requires SQLPlus and don't get me started on that thing. Instead, you have to query a special error table. When I did this, there was random errors in no particular order, some didn't even appear to be related to any of this. There were no timestamps on the error either, and I found the data totally useless. However, it still created the trigger. So then I tried to update one of the TRAININGPLAN rows. When I do this, I get: >[Error] Script lines: 20-22 ------------------------ ORA-04098: trigger 'TPMDBO.TRIGGER_UPDATETRAININGDELIVERY' is invalid and failed re-validation Script line 20, statement line 1, column 7 So, basically Oracle was fine creating my trigger even though it knew it was invalid. Now my database is broken, and it won't even tell me what's wrong with the trigger. Well, I never actually did figure out how to fix my TRIGGER but I did learn that even if I do figure out the magical trigger syntax, I'll get nothing but "the table is mutating" errors (How X-Men, will my table have special mutant powers too?). Since TRAININGPLAN is in a state of flux until the transaction is committed, it's apparently impossible to select those rows and get the minimum date. I don't know if Postgres has this problem, but it seems to me the engine should be able to tell me what the minimum date will be since the trigger is running AFTER the rows are updated, and then commit everything at once. I don't see a reason for this limitation, nor do I see a reason why Oracle can't spit out any meaningful error message about my invalid trigger either when I create the trigger or the trigger runs. It seems that I'm going to have to change my middle-ware to update this column, which will be a mess because my middle-ware is a giant nightmare of .NET Entity classes and bad code which I did not write. If I wrote it, it would be good code. .NET is my specialty. One could argue this is the proper place to do this anyway. In conclusion: Overall, I've found Oracle near impossible to work with relative to Postgres. The documentation is hard to read, there's very few good examples, and the tools are expensive, buggy, and crash a lot. There's no good error messages, there's obscure limitations, and everything is designed way more complicated than it needs to be. I'd rather work on a PG database any time of the night or day, no questions about it. Maybe this message will show just how important good error messages and solid documentation are, especially for beginners. Anyway, sorry for the massive email but hopefully this'll be of use to someone, or at least make people chuckle as they reminisce about their early days as a DBO. Thanks for listening! Mike
pgsql-general by date: