Re: Design ? table vs. view? - Mailing list pgsql-general

From Rémi Cura
Subject Re: Design ? table vs. view?
Date
Msg-id CAJvUf_s=u+oF9sT+XRFiOQq4L4mYXf=KVMPi-B=Rt7NztExHfQ@mail.gmail.com
Whole thread Raw
In response to Re: Design ? table vs. view?  (John McKown <john.archie.mckown@gmail.com>)
List pgsql-general
Hey,
I guess you know all about PL/R,
the R language extension for postgres .
It is very convenient, though be carefull as sometime it crashed my server.

Cheers,
Rémi-C


2014-07-16 3:42 GMT+02:00 John McKown <john.archie.mckown@gmail.com>:
On Tue, Jul 15, 2014 at 8:46 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
John McKown wrote
> I have a table which has some "raw" data in it. By "raw", I mean it is
> minimally processed from a log file. Every week, I update this table by
> processing the weekly log using awk to create a "psql script" file which
> looks similar to:
>
<snip>
> So the overhead may be quite high, because to SELECT from RUNINFO,
> PostgreSQL must realize all four views.
>
> I appreciate your thoughts on if this is OK, given that performance is
> currently acceptable. Mainly because this work is basically only done one
> a
> week, on Sundays. And I don't do it myself, it is done via a scheduler
> (not
> cron, but similar) which runs some scripts.

I would likely make "jobrun.runinfo" into a table while leaving
"jobrun.rawdata" as-is.  I would have a function that populates "runinfo"
from "rawdata" that I would call after performing the copy to "rawdata".
There would be no views - unless you desire a view interface over "runinfo"
for API or permission reasons.

In 9.4 you can (probably) make "runinfo" an explicit MATERIALIZED VIEW and
perform REFRESH command to accomplish the same thing - though I am not
particularly familiar with the mechanics of that feature.

David J.


Being the indecisive nut that I am, I am going to do both <grin/>. I will keep the current view. But when I update the rawdata, what I will then do is:

drop table runinfo_table;
create table runinfo_table as select distinct * from runinfo;

I am fairly confident that there cannot be any duplicates in runinfo. But, being paranoid as well, I will do the DISTINCT just to be sure. I may change the VIEW to do that in the future, and remove it from the preceeding. Since the process which updates the rawdata table is automated and runs on a Sunday, the time needed to recreate runinfo_table is not relevant to me. So I get what I want, unless I update rawdata off schedule. I cannot imagine why I would do that since the logs from which I create it are generally only available after 17:00 local time on Sunday. Getting the iogs-to-date information for the time since the last dump is basically a PITA and my current use is not critical. Actually, it is more a "skunkworks" project of my own to produce a set of nice graphs, using R, which _might_ turn out to be interesting to management, but the production of which _will_ help me learn PostgreSQL and R better (hopefully).

Many thanks.

--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

pgsql-general by date:

Previous
From: Sergey Konoplev
Date:
Subject: statement_timeout doesn't work
Next
From: basti
Date:
Subject: Re: php password authentication failed for user ...