Thread: Perfomance of views

Perfomance of views

From

Svenne Krap

Date:

26 October 2005, 12:25:03

Hi there.

I am currently building a system, where it would be nice to use multiple
levels of views upon each other (it is a staticstics system, where
traceability is important).

Is there any significant performance reduction in say 10 levels of views
instead of one giant, nested sql-statement ? I especially think exection
planner-wise.

The data mainly comes from one small to medium sized tabel (< 5 million
rows) and a handfull small (< 5000 rows) support tables.
The hardware will be okay for the job, but nothing really fancy (specs
are Xeon, 2G of memory, 6 SCSI-disks in a RAID1+0) . The base will be
version 8.1 provided that it gets out of beta around end-of-year.

Svenne

Re: Perfomance of views

From

Richard Huxton

Date:

27 October 2005, 06:36:47

Svenne Krap wrote:
> Hi there.
>
> I am currently building a system, where it would be nice to use multiple
> levels of views upon each other (it is a staticstics system, where
> traceability is important).
>
> Is there any significant performance reduction in say 10 levels of views
> instead of one giant, nested sql-statement ? I especially think exection
> planner-wise.

The planner tries to push conditions "inside" views where it can. It's
not perfect though, and if you're writing a big query by hand you might
be able to do better than it.

In short, I'd test if you can.
--
   Richard Huxton
   Archonet Ltd

Re: Perfomance of views

From

Svenne Krap

Date:

27 October 2005, 08:02:18

What do you mean exactly but "pushing conditions inside" ?

I don't think I will have the option of testing on the full queries, as
these take many days to write (the current ones, they are replacing on a
mssql takes up more that 5kb of query). The current ones are nightmares
from a maintaince standpoint.

Basicly what the application is doing is selecting some base data from
the "large" table for a point in time (usually a quarter) and selects
all matching auxilliare data from the other tables. They are made in a
time-travel like manner with a first and last useable date.

The ways I have considered was :
1) write a big query in hand (not preferred as it gets hard to manage)
2) write layers of views (still not prefered as I still have to remember
to put on the right conditions everywhere)
3) write layers of sql-functions (returning the right sets of rows from
the underlying tables) - which I prefer from a development angel .. it
gets very clean and I cant forget a parameter anywhere.

But I seem to remember (and I have used PGSQL in production since 7.0)
that the planner has some problems with solution 3 (i.e. estimating the
cost and rearranging the query), but frankly that would be the way I
would like to go.

Based on the current (non-optimal) design and hardware constraints, I
still have to make sure, the query runs fairly optimal - that means the
planner must use indexes intelligently and other stuff as if it was
(well-)written using solution 1.

What do you think of the three solutions ? And is there some ressource
about the planners capabilites for someone like me (that is very used to
write reasonably fast and complex sql, can read c-code, but does not
really want to dig into the source code)

Regards

Svenne

Richard Huxton wrote:

> Svenne Krap wrote:
>
>> Hi there.
>>
>> I am currently building a system, where it would be nice to use
>> multiple levels of views upon each other (it is a staticstics system,
>> where traceability is important).
>>
>> Is there any significant performance reduction in say 10 levels of
>> views instead of one giant, nested sql-statement ? I especially think
>> exection planner-wise.
>
>
> The planner tries to push conditions "inside" views where it can. It's
> not perfect though, and if you're writing a big query by hand you
> might be able to do better than it.
>
> In short, I'd test if you can.

Re: Perfomance of views

From

Richard Huxton

Date:

27 October 2005, 08:30:33

Don't forget to CC the list

Svenne Krap wrote:
> What do you mean exactly but "pushing conditions inside" ?

If I have something like "SELECT * FROM complicated_view WHERE foo = 7"
then the planner can look "inside" complicated_view and see where it can
attach the condition "foo=7", rather than running the query and applying
the condition at the end.

There are cases where it is safe for the planner to do this, but it
isn't smart enough to do so.

> I don't think I will have the option of testing on the full queries, as
> these take many days to write (the current ones, they are replacing on a
> mssql takes up more that 5kb of query). The current ones are nightmares
> from a maintaince standpoint.

Hmm - it sounds like they would be.

> Basicly what the application is doing is selecting some base data from
> the "large" table for a point in time (usually a quarter) and selects
> all matching auxilliare data from the other tables. They are made in a
> time-travel like manner with a first and last useable date.
>
> The ways I have considered was :
> 1) write a big query in hand (not preferred as it gets hard to manage)

Agreed.

> 2) write layers of views (still not prefered as I still have to remember
> to put on the right conditions everywhere)

This is what I'd probably do, but of course I don't have full
information about your situation.

> 3) write layers of sql-functions (returning the right sets of rows from
> the underlying tables) - which I prefer from a development angel .. it
> gets very clean and I cant forget a parameter anywhere.
>
> But I seem to remember (and I have used PGSQL in production since 7.0)
> that the planner has some problems with solution 3 (i.e. estimating the
> cost and rearranging the query), but frankly that would be the way I
> would like to go.

Well, 8.x can "inline" a simple sql function into a larger query, but it
doesn't sound like that will be enough in your case. Once a function
becomes a "black box" then there's not much the planner can do to figure
out what to do.

> Based on the current (non-optimal) design and hardware constraints, I
> still have to make sure, the query runs fairly optimal - that means the
> planner must use indexes intelligently and other stuff as if it was
> (well-)written using solution 1.

Well, #1,#2 are likely to be the most efficient, but you won't know for
sure about #2 until you test it.

There are a couple of other options though:

#4 - Write a set-returning function that breaks the query into steps and
executes each in turn. So - fetch IDs from the main table in step 1 and
store them in a temporary table, join other tables in later steps.

#5 - Write a function that writes your big query for you and either
returns the SQL to your application, or runs it and returns the results.

> What do you think of the three solutions ? And is there some ressource
> about the planners capabilites for someone like me (that is very used to
> write reasonably fast and complex sql, can read c-code, but does not
> really want to dig into the source code)

There is some stuff in the "Internals" section of the manuals and it
might be worth rummaging around on http://techdocs.postgresql.org

--
   Richard Huxton
   Archonet Ltd

Re: Perfomance of views

From

Jan Wieck

Date:

27 October 2005, 10:41:59

On 10/27/2005 7:29 AM, Richard Huxton wrote:

> Don't forget to CC the list
>
>
> Svenne Krap wrote:
>> What do you mean exactly but "pushing conditions inside" ?
>
> If I have something like "SELECT * FROM complicated_view WHERE foo = 7"
> then the planner can look "inside" complicated_view and see where it can
> attach the condition "foo=7", rather than running the query and applying
> the condition at the end.

Sorry, but the planner doesn't attach the condition anywhere. It is the
rewriter that takes the actual query, replaces the views rangetable and
expression entries with the actual underlying objects and adds the views
condition with an AND to the queries condition. Simply example:

Given a view

     create view v1 as select a1, b1, c2 from t1, t2 where a1 = a2;

The statement

     select * from v1 where b1 = 'foo';

will result in a parsetree equivalent to what you would get if the
original query was

     select a1, b1, c2 from t1, t2 where (b1 = 'foo') and (a1 = a2);

It is the planners and optimizers job to recognize where in the
execution plan it can push qualifications down into filters or even
scankeys. The planner should be able to realize that

     select * from v1 where a1 = 42;

is in fact equivalent to

     select a1, b1, c2 from t1, t2 where a1 = 42 and a1 = a2;

as well as

     select a1, b1, c2 from t1, t2 where a1 = 42 and a1 = a2 and a2 = 42;

This very last addition of "a2 = 42" because of "a2 = a1 = 42" allows it
to put a constant scankey onto the scan of t2. The 8.0 planner does
that, so the resulting query plan for the last three selects above is
absolutely identical.

>
> There are cases where it is safe for the planner to do this, but it
> isn't smart enough to do so.

Example?


Jan

>
>> I don't think I will have the option of testing on the full queries, as
>> these take many days to write (the current ones, they are replacing on a
>> mssql takes up more that 5kb of query). The current ones are nightmares
>> from a maintaince standpoint.
>
> Hmm - it sounds like they would be.
>
>> Basicly what the application is doing is selecting some base data from
>> the "large" table for a point in time (usually a quarter) and selects
>> all matching auxilliare data from the other tables. They are made in a
>> time-travel like manner with a first and last useable date.
>>
>> The ways I have considered was :
>> 1) write a big query in hand (not preferred as it gets hard to manage)
>
> Agreed.
>
>> 2) write layers of views (still not prefered as I still have to remember
>> to put on the right conditions everywhere)
>
> This is what I'd probably do, but of course I don't have full
> information about your situation.
>
>> 3) write layers of sql-functions (returning the right sets of rows from
>> the underlying tables) - which I prefer from a development angel .. it
>> gets very clean and I cant forget a parameter anywhere.
>>
>> But I seem to remember (and I have used PGSQL in production since 7.0)
>> that the planner has some problems with solution 3 (i.e. estimating the
>> cost and rearranging the query), but frankly that would be the way I
>> would like to go.
>
> Well, 8.x can "inline" a simple sql function into a larger query, but it
> doesn't sound like that will be enough in your case. Once a function
> becomes a "black box" then there's not much the planner can do to figure
> out what to do.
>
>> Based on the current (non-optimal) design and hardware constraints, I
>> still have to make sure, the query runs fairly optimal - that means the
>> planner must use indexes intelligently and other stuff as if it was
>> (well-)written using solution 1.
>
> Well, #1,#2 are likely to be the most efficient, but you won't know for
> sure about #2 until you test it.
>
> There are a couple of other options though:
>
> #4 - Write a set-returning function that breaks the query into steps and
> executes each in turn. So - fetch IDs from the main table in step 1 and
> store them in a temporary table, join other tables in later steps.
>
> #5 - Write a function that writes your big query for you and either
> returns the SQL to your application, or runs it and returns the results.
>
>> What do you think of the three solutions ? And is there some ressource
>> about the planners capabilites for someone like me (that is very used to
>> write reasonably fast and complex sql, can read c-code, but does not
>> really want to dig into the source code)
>
> There is some stuff in the "Internals" section of the manuals and it
> might be worth rummaging around on http://techdocs.postgresql.org
>
> --
>    Richard Huxton
>    Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Perfomance of views

From

Tom Lane

Date:

27 October 2005, 11:38:08

Svenne Krap <svenne@krap.dk> writes:
> The ways I have considered was :
> 1) write a big query in hand (not preferred as it gets hard to manage)
> 2) write layers of views (still not prefered as I still have to remember
> to put on the right conditions everywhere)
> 3) write layers of sql-functions (returning the right sets of rows from
> the underlying tables) - which I prefer from a development angel .. it
> gets very clean and I cant forget a parameter anywhere.

#1 and #2 should behave pretty similarly, assuming that the "one big
query" would have been structured the same way as the nest of views is.
#3 unfortunately will pretty much suck, because there's no chance for
cross-level optimization.

There's been some discussion of inline-expanding SQL functions that
return sets when they are called in FROM, which would make a SQL
function that contains just a SELECT effectively equivalent to a view
as far as the planner's powers of optimization go.  No one's tried to
make it happen yet though.

            regards, tom lane

Re: Perfomance of views

From

Svenne Krap

Date:

27 October 2005, 12:09:45

Tom Lane wrote:

>There's been some discussion of inline-expanding SQL functions that
>return sets when they are called in FROM, which would make a SQL
>function that contains just a SELECT effectively equivalent to a view
>as far as the planner's powers of optimization go.  No one's tried to
>make it happen yet though.
>
>

This is exactly what would be brilliant in my case. Use the functions as
a kind of strict, parameterized views, that in the planner (or wherever)
gets replaced down to a simple (?!?!) sql-statement.
This would imho be highly valuable for almost any kind of complex
time-travel application (and surely dozens of other applications).

And before anyone suggests it, I don't code C well enough (*cough*
rusty) to try to do it myself. I would apriciate if it went on the todo
for 8.2 though. (I might even be willing to sponsor some money (a single
or perhpas two thousands of US dollars) for getting it done and release
it immediately under postgresql standard license (BSD)).

I by the way also support the idea of a way to force a table into a
PgSQL managed cache like discussed a while ago. Sometimes overall speed
for the system is less important than speed of a single query.

I must also say, that I am very impressed with the performance
enhancements of 8.1 beta, the bitmap index scans are amazing ! Good job,
guys - PgSQL has come a far way from 7.0 (where I started) and the
future looks bright ;)

Svenne

Re: Perfomance of views

From

Richard Huxton

Date:

27 October 2005, 12:45:47

Jan Wieck wrote:
> On 10/27/2005 7:29 AM, Richard Huxton wrote:
>> Svenne Krap wrote:
>>
>>> What do you mean exactly but "pushing conditions inside" ?
>>
>> If I have something like "SELECT * FROM complicated_view WHERE foo =
>> 7" then the planner can look "inside" complicated_view and see where
>> it can attach the condition "foo=7", rather than running the query and
>> applying the condition at the end.
>
> Sorry, but the planner doesn't attach the condition anywhere. It is the
> rewriter that takes the actual query, replaces the views rangetable and
> expression entries with the actual underlying objects and adds the views
> condition with an AND to the queries condition. Simply example:

Thanks for the correction Jan.

--
   Richard Huxton
   Archonet Ltd