Thread: query rewrite using materialized views

query rewrite using materialized views

From
Yann Michel
Date:
Hi,

are there any plans for rewriting queries to preexisting materialized
views?  I mean, rewrite a query (within the optimizer) to use a
materialized view instead of the originating table?

Regards,
Yann

Re: query rewrite using materialized views

From
Josh Berkus
Date:
Yann,

> are there any plans for rewriting queries to preexisting materialized
> views?  I mean, rewrite a query (within the optimizer) to use a
> materialized view instead of the originating table?

Automatically, and by default, no.   Using the RULES system?  Yes, you can
already do this and the folks on the MattView project on pgFoundry are
working to make it easier.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: query rewrite using materialized views

From
"Wager, Ryan D [NTK]"
Date:
All,
   I am currently working on a project for my company that entails
Databasing upwards of 300 million specific parameters.  In the current
DB Design, these parameters are mapped against two lookup tables (2
million, and 1.5 million respectively) and I am having extreme issues
getting PG to scale to a working level.  Here are my issues:
  1)the 250 million records are currently whipped and reinserted as a
"daily snapshot" and the fastest way I have found "COPY" to do this from
a file is no where near fast enough to do this.  SQL*Loader from Oracle
does some things that I need, ie Direct Path to the db files access
(skipping the RDBMS), inherently ignoring indexing rules and saving a
ton of time (Dropping the index, COPY'ing 250 million records, then
Recreating the index just takes way too long).
  2)Finding a way to keep this many records in a fashion that can be
easily queried.  I even tried breaking it up into almost 2800 separate
tables, basically views of the data pre-broken down, if this is a
working method it can be done this way, but when I tried it, VACUUM, and
the COPY's all seemed to slow down extremely.
   If there is anyone that can give me some tweak parameters or design
help on this, it would be ridiculously appreciated.  I have already
created this in Oracle and it works, but we don't want to have to pay
the monster if something as wonderful as Postgres can handle it.


Ryan Wager

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Tuesday, January 04, 2005 12:06 PM
To: pgsql-performance@postgresql.org
Cc: Yann Michel
Subject: Re: [PERFORM] query rewrite using materialized views

Yann,

> are there any plans for rewriting queries to preexisting materialized
> views?  I mean, rewrite a query (within the optimizer) to use a
> materialized view instead of the originating table?

Automatically, and by default, no.   Using the RULES system?  Yes, you
can
already do this and the folks on the MattView project on pgFoundry are
working to make it easier.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: query rewrite using materialized views

From
Rod Taylor
Date:
>   1)the 250 million records are currently whipped and reinserted as a
> "daily snapshot" and the fastest way I have found "COPY" to do this from
> a file is no where near fast enough to do this.  SQL*Loader from Oracle
> does some things that I need, ie Direct Path to the db files access
> (skipping the RDBMS), inherently ignoring indexing rules and saving a
> ton of time (Dropping the index, COPY'ing 250 million records, then
> Recreating the index just takes way too long).

If you have the hardware for it, instead of doing 1 copy, do 1 copy
command per CPU (until your IO is maxed out anyway) and divide the work
amongst them. I can push through 100MB/sec using methods like this --
which makes loading 100GB of data much faster.

Ditto for indexes. Don't create a single index on one CPU and wait --
send off one index creation command per CPU.

>   2)Finding a way to keep this many records in a fashion that can be
> easily queried.  I even tried breaking it up into almost 2800 separate
> tables, basically views of the data pre-broken down, if this is a
> working method it can be done this way, but when I tried it, VACUUM, and
> the COPY's all seemed to slow down extremely.

Can you send us EXPLAIN ANALYSE output for the slow selects and a little
insight into what your doing? A basic table structure, and indexes
involved would be handy. You may change column and table names if you
like.

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
> Sent: Tuesday, January 04, 2005 12:06 PM
> To: pgsql-performance@postgresql.org
> Cc: Yann Michel
> Subject: Re: [PERFORM] query rewrite using materialized views
>
> Yann,
>
> > are there any plans for rewriting queries to preexisting materialized
> > views?  I mean, rewrite a query (within the optimizer) to use a
> > materialized view instead of the originating table?
>
> Automatically, and by default, no.   Using the RULES system?  Yes, you
> can
> already do this and the folks on the MattView project on pgFoundry are
> working to make it easier.
>
--


Re: query rewrite using materialized views

From
Josh Berkus
Date:
Wagner,

>    If there is anyone that can give me some tweak parameters or design
> help on this, it would be ridiculously appreciated.  I have already
> created this in Oracle and it works, but we don't want to have to pay
> the monster if something as wonderful as Postgres can handle it.

In addition to Rod's advice, please increase your checkpoint_segments and
checkpoint_timeout parameters and make sure that the pg_xlog is on a seperate
disk resource from the database.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: query rewrite using materialized views

From
"Wager, Ryan D [NTK]"
Date:
Rod,
  I do this, PG gets forked many times, it is tough to find the max
number of times I can do this, but I have a Proc::Queue Manager Perl
driver that handles all of the copy calls.  I have a quad CPU machine.
Each COPY only hits ones CPU for like 2.1% but anything over about 5
kicks the load avg up.

  Ill get some explain analysis and table structures out there pronto.

-----Original Message-----
From: Rod Taylor [mailto:pg@rbt.ca]
Sent: Tuesday, January 04, 2005 1:02 PM
To: Wager, Ryan D [NTK]
Cc: Postgresql Performance
Subject: Re: [PERFORM] query rewrite using materialized views

>   1)the 250 million records are currently whipped and reinserted as a
> "daily snapshot" and the fastest way I have found "COPY" to do this
from
> a file is no where near fast enough to do this.  SQL*Loader from
Oracle
> does some things that I need, ie Direct Path to the db files access
> (skipping the RDBMS), inherently ignoring indexing rules and saving a
> ton of time (Dropping the index, COPY'ing 250 million records, then
> Recreating the index just takes way too long).

If you have the hardware for it, instead of doing 1 copy, do 1 copy
command per CPU (until your IO is maxed out anyway) and divide the work
amongst them. I can push through 100MB/sec using methods like this --
which makes loading 100GB of data much faster.

Ditto for indexes. Don't create a single index on one CPU and wait --
send off one index creation command per CPU.

>   2)Finding a way to keep this many records in a fashion that can be
> easily queried.  I even tried breaking it up into almost 2800 separate
> tables, basically views of the data pre-broken down, if this is a
> working method it can be done this way, but when I tried it, VACUUM,
and
> the COPY's all seemed to slow down extremely.

Can you send us EXPLAIN ANALYSE output for the slow selects and a little
insight into what your doing? A basic table structure, and indexes
involved would be handy. You may change column and table names if you
like.

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh
Berkus
> Sent: Tuesday, January 04, 2005 12:06 PM
> To: pgsql-performance@postgresql.org
> Cc: Yann Michel
> Subject: Re: [PERFORM] query rewrite using materialized views
>
> Yann,
>
> > are there any plans for rewriting queries to preexisting
materialized
> > views?  I mean, rewrite a query (within the optimizer) to use a
> > materialized view instead of the originating table?
>
> Automatically, and by default, no.   Using the RULES system?  Yes, you
> can
> already do this and the folks on the MattView project on pgFoundry are

> working to make it easier.
>
--


Re: query rewrite using materialized views

From
Rod Taylor
Date:
On Tue, 2005-01-04 at 13:26 -0600, Wager, Ryan D [NTK] wrote:
> Rod,
>   I do this, PG gets forked many times, it is tough to find the max
> number of times I can do this, but I have a Proc::Queue Manager Perl
> driver that handles all of the copy calls.  I have a quad CPU machine.
> Each COPY only hits ones CPU for like 2.1% but anything over about 5
> kicks the load avg up.

Sounds like disk IO is slowing down the copy then.

>   Ill get some explain analysis and table structures out there pronto.
>
> -----Original Message-----
> From: Rod Taylor [mailto:pg@rbt.ca]
> Sent: Tuesday, January 04, 2005 1:02 PM
> To: Wager, Ryan D [NTK]
> Cc: Postgresql Performance
> Subject: Re: [PERFORM] query rewrite using materialized views
>
> >   1)the 250 million records are currently whipped and reinserted as a
> > "daily snapshot" and the fastest way I have found "COPY" to do this
> from
> > a file is no where near fast enough to do this.  SQL*Loader from
> Oracle
> > does some things that I need, ie Direct Path to the db files access
> > (skipping the RDBMS), inherently ignoring indexing rules and saving a
> > ton of time (Dropping the index, COPY'ing 250 million records, then
> > Recreating the index just takes way too long).
>
> If you have the hardware for it, instead of doing 1 copy, do 1 copy
> command per CPU (until your IO is maxed out anyway) and divide the work
> amongst them. I can push through 100MB/sec using methods like this --
> which makes loading 100GB of data much faster.
>
> Ditto for indexes. Don't create a single index on one CPU and wait --
> send off one index creation command per CPU.
>
> >   2)Finding a way to keep this many records in a fashion that can be
> > easily queried.  I even tried breaking it up into almost 2800 separate
> > tables, basically views of the data pre-broken down, if this is a
> > working method it can be done this way, but when I tried it, VACUUM,
> and
> > the COPY's all seemed to slow down extremely.
>
> Can you send us EXPLAIN ANALYSE output for the slow selects and a little
> insight into what your doing? A basic table structure, and indexes
> involved would be handy. You may change column and table names if you
> like.
>
> > -----Original Message-----
> > From: pgsql-performance-owner@postgresql.org
> > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh
> Berkus
> > Sent: Tuesday, January 04, 2005 12:06 PM
> > To: pgsql-performance@postgresql.org
> > Cc: Yann Michel
> > Subject: Re: [PERFORM] query rewrite using materialized views
> >
> > Yann,
> >
> > > are there any plans for rewriting queries to preexisting
> materialized
> > > views?  I mean, rewrite a query (within the optimizer) to use a
> > > materialized view instead of the originating table?
> >
> > Automatically, and by default, no.   Using the RULES system?  Yes, you
> > can
> > already do this and the folks on the MattView project on pgFoundry are
>
> > working to make it easier.
> >
--


Re: query rewrite using materialized views

From
Josh Berkus
Date:
Ryan,

> >   I do this, PG gets forked many times, it is tough to find the max
> > number of times I can do this, but I have a Proc::Queue Manager Perl
> > driver that handles all of the copy calls.  I have a quad CPU machine.
> > Each COPY only hits ones CPU for like 2.1% but anything over about 5
> > kicks the load avg up.

That's consistent with Xeon problems we've seen elsewhere.   Keep the # of
processes at or below the # of processors.

Moving pg_xlog is accomplished through:
1) in 8.0, changes to postgresql.conf
    (in 8.0 you'd also want to explore using multiple arrays with tablespaces to
make things even faster)
2) in other versions:
    a) mount a seperate disk on PGDATA/pg_xlog, or
    b) symlink PGDATA/pg_xlog to another location

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: query rewrite using materialized views

From
Simon Riggs
Date:
On Tue, 2005-01-04 at 14:02 -0500, Rod Taylor wrote:
> >   1)the 250 million records are currently whipped and reinserted as a
> > "daily snapshot" and the fastest way I have found "COPY" to do this from
> > a file is no where near fast enough to do this.  SQL*Loader from Oracle
> > does some things that I need, ie Direct Path to the db files access
> > (skipping the RDBMS), inherently ignoring indexing rules and saving a
> > ton of time (Dropping the index, COPY'ing 250 million records, then
> > Recreating the index just takes way too long).
>
> If you have the hardware for it, instead of doing 1 copy, do 1 copy
> command per CPU (until your IO is maxed out anyway) and divide the work
> amongst them. I can push through 100MB/sec using methods like this --
> which makes loading 100GB of data much faster.
>
> Ditto for indexes. Don't create a single index on one CPU and wait --
> send off one index creation command per CPU.

Not sure what you mean by "whipped". If you mean select and re-insert
then perhaps using a pipe would produce better performance, since no
disk access for the data file would be involved.

In 8.0 COPY and CREATE INDEX is optimised to not use WAL at all if
archive_command is not set. 8 is great...

> >   2)Finding a way to keep this many records in a fashion that can be
> > easily queried.  I even tried breaking it up into almost 2800 separate
> > tables, basically views of the data pre-broken down, if this is a
> > working method it can be done this way, but when I tried it, VACUUM, and
> > the COPY's all seemed to slow down extremely.
>
> Can you send us EXPLAIN ANALYSE output for the slow selects and a little
> insight into what your doing? A basic table structure, and indexes
> involved would be handy. You may change column and table names if you
> like.

There's a known issue using UNION ALL views in 8.0 that makes them
slightly more inefficient than using a single table. Perhaps that would
explain your results.

There shouldn't be any need to do the 2800 table approach in this
instance.

--
Best Regards, Simon Riggs


Re: query rewrite using materialized views

From
Yann Michel
Date:
Hi,

On Tue, Jan 04, 2005 at 10:06:18AM -0800, Josh Berkus wrote:
> > are there any plans for rewriting queries to preexisting materialized
> > views?  I mean, rewrite a query (within the optimizer) to use a
> > materialized view instead of the originating table?
>
> Automatically, and by default, no.   Using the RULES system?  Yes, you can
> already do this and the folks on the MattView project on pgFoundry are
> working to make it easier.

I was just wondering if this might be on schedule for 8.x due to I read
the thread about materialized views some days ago. If materialized views
are someday implemented one should kepp this requested feature in mind
due to I know from Oracle to let it improve query execution plans...

Regards,
Yann