Thread: Postgresql 8.1.4 - performance issues for select on view using max

From:
Ioana Danes
Date:

Hi everyone,

I am doing a test for a scenario where I have 2
schemas  one (public) for the operational data and
another one (archive) for old, archived data. So
basically I want to split the data from some huge
tables in two. All data before 2006 in archive and all
data after and including  2006 in public.

Let's say I have a table named public.AllTransactions
with data before and including 2006.
I want to move all the data < 2006 into a new table
named archive.transaction (in archive schema)
I also want to move all data >= 2006 into a new table
named public.transaction (in public schema).

In order to make this transparent for the developers I
want to drop the original table public.AllTransactions
 and to create a view with the same name that is a
union between the two new tables:

create view public.AllTransactions as
select * from public.transaction
union all
select * from archive.transaction

On this view I will create rules for insert, update,
delete...

Testing some selects I know we have in the application
I got into a scenario where my plan does not work
without doing code change. This scenario is:

select max(transid) from alltransaction;

because the planner does not use the existent indexes
on the 2 new tables: public.transaction and
archive.transaction

Here are the results of the explain analyze:

1. Select only from one table is OK:
-------------------------------------

# explain select max(transid) from public.transaction;

                                              QUERY
PLAN

--------------------------------------------------------------------------------

----------------------
 Result  (cost=0.04..0.05 rows=1 width=0)
   InitPlan
     ->  Limit  (cost=0.00..0.04 rows=1 width=8)
           ->  Index Scan Backward using
pk_transaction on transaction (cost=0.00..357870.46

rows=9698002 width=8)
                 Filter: (transid IS NOT NULL)
(5 rows)


2. Select from the view is doing a sequential scan:
---------------------------------------------------
# explain analyze select max(transid) from
alltransaction;

QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------
          -----------------
 Aggregate  (cost=200579993.70..200579993.71 rows=1
width=8) (actual time=115778.101..115778.103 rows=1
loops=1)
   ->  Append  (cost=100000000.00..200447315.74
rows=10614237 width=143) (actual time=0.082..95146.144
rows=10622206 loops=           1)
         ->  Seq Scan transaction
(cost=100000000.00..100312397.02 rows=9698002
width=143) (actual time=0.078..56002.778 rows=
  9706475 loops=1)
         ->  Seq Scan on transaction
(cost=100000000.00..100028776.35 rows=916235
width=143) (actual time=8.822..2799.496 rows=
 915731 loops=1)
 Total runtime: 115778.200 ms
(5 rows)

Is this a bug or this is how the planner is suppose to
work?

The same problem I have on the following select:
select transid from alltransaction order by transid
desc limit 1;

Thank you for your time,
Ioana


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

From:
Dimitri Fontaine
Date:

Hi,

Le mercredi 18 octobre 2006 21:51, Ioana Danes a écrit :
> I am doing a test for a scenario where I have 2
> schemas  one (public) for the operational data and
> another one (archive) for old, archived data. So
> basically I want to split the data from some huge
> tables in two. All data before 2006 in archive and all
> data after and including  2006 in public.
[...]
> I got into a scenario where my plan does not work
> without doing code change.

This sounds a lot as a ddl partitionning, you may want to add some checks to
your schema and set constraint_exclusion to on in your postgresql.conf.

Please read following documentation material :
  http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Regards,
--
Dimitri Fontaine
http://www.dalibo.com/

From:
"Joshua Marsh"
Date:



On 10/18/06, Ioana Danes <> wrote:

# explain select max(transid) from public.transaction;

                                              QUERY
PLAN

--------------------------------------------------------------------------------

----------------------
Result  (cost=0.04..0.05 rows=1 width=0)
   InitPlan
     ->  Limit  (cost=0.00..0.04 rows=1 width=8)
           ->  Index Scan Backward using
pk_transaction on transaction (cost= 0.00..357870.46

rows=9698002 width=8)
                 Filter: (transid IS NOT NULL)
(5 rows)

This works fine because i recognizes the index for that table and can simply use it to find the max. 
 

2. Select from the view is doing a sequential scan:
---------------------------------------------------
# explain analyze select max(transid) from
alltransaction;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
          -----------------
Aggregate  (cost=200579993.70..200579993.71 rows=1
width=8) (actual time=115778.101..115778.103 rows=1
loops=1)
   ->  Append  (cost=100000000.00..200447315.74
rows=10614237 width=143) (actual time=0.082..95146.144
rows=10622206 loops=           1)
         ->  Seq Scan transaction
(cost=100000000.00..100312397.02 rows=9698002
width=143) (actual time=0.078..56002.778 rows=
  9706475 loops=1)
         ->  Seq Scan on transaction
(cost=100000000.00..100028776.35 rows=916235
width=143) (actual time=8.822..2799.496 rows=
915731 loops=1)
Total runtime: 115778.200 ms
(5 rows)


Because this is a view, it cannot use the indexes from the other tables.  Everytime you run a query against a view, it recreates itself based on the underlying data.  From there it must sort the table based on the i and then return your max.

It's probably not a great idea to make a view this way if you are planning on using queries like this regularly because you can't create an index for a view.  You could try a query that pulls the max from each table and then grabs the max of these:

select max (foo.transid) from (select max(transid) as id from public.transaction union select max(transid) from archive.transaction) as foo;

--
This E-mail is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521 and is legally privileged.

This information is confidential information and is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.
From:
Ioana Danes
Date:

Thanks a lot I will give it a try.

--- Dimitri Fontaine <> wrote:

> Hi,
>
> Le mercredi 18 octobre 2006 21:51, Ioana Danes a
> écrit :
> > I am doing a test for a scenario where I have 2
> > schemas  one (public) for the operational data and
> > another one (archive) for old, archived data. So
> > basically I want to split the data from some huge
> > tables in two. All data before 2006 in archive and
> all
> > data after and including  2006 in public.
> [...]
> > I got into a scenario where my plan does not work
> > without doing code change.
>
> This sounds a lot as a ddl partitionning, you may
> want to add some checks to
> your schema and set constraint_exclusion to on in
> your postgresql.conf.
>
> Please read following documentation material :
>
>
http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
>
> Regards,
> --
> Dimitri Fontaine
> http://www.dalibo.com/
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

From:
Ioana Danes
Date:

Hello,

I tried the partitioning scenario but I've got into
the same problem. The max function is not using the
indexes on the two partitioned tables...

Any other thoughts?

--- Ioana Danes <> wrote:

> Thanks a lot I will give it a try.
>
> --- Dimitri Fontaine <> wrote:
>
> > Hi,
> >
> > Le mercredi 18 octobre 2006 21:51, Ioana Danes a
> > écrit :
> > > I am doing a test for a scenario where I have 2
> > > schemas  one (public) for the operational data
> and
> > > another one (archive) for old, archived data. So
> > > basically I want to split the data from some
> huge
> > > tables in two. All data before 2006 in archive
> and
> > all
> > > data after and including  2006 in public.
> > [...]
> > > I got into a scenario where my plan does not
> work
> > > without doing code change.
> >
> > This sounds a lot as a ddl partitionning, you may
> > want to add some checks to
> > your schema and set constraint_exclusion to on in
> > your postgresql.conf.
> >
> > Please read following documentation material :
> >
> >
>
http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
> >
> > Regards,
> > --
> > Dimitri Fontaine
> > http://www.dalibo.com/
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around
> http://mail.yahoo.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

From:
Jeff Davis
Date:

On Wed, 2006-10-18 at 15:51 -0400, Ioana Danes wrote:
> Hi everyone,
> Testing some selects I know we have in the application
> I got into a scenario where my plan does not work
> without doing code change. This scenario is:
>
> select max(transid) from alltransaction;
>
> because the planner does not use the existent indexes
> on the 2 new tables: public.transaction and
> archive.transaction
>

First, the query is expanded into something like (I'm being inexact
here):

SELECT max(transid) FROM (SELECT * FROM public.transaction UNION SELECT
* FROM archive.transaction);

PostgreSQL added a hack to the max() aggregate so that, in the simple
case, it can recognize that what it really wants to do is use the index.
Using the index for an aggregate only works in special cases, like min()
and max(). What PostgreSQL actually does is to transform a query from:

SELECT max(value) FROM sometable;

Into:

SELECT value FROM sometable ORDER BY value DESC LIMIT 1;

In your case, it would need to transform the query into something more
like:

SELECT max(transid) FROM (
  SELECT transid FROM (
    SELECT transid FROM public.transaction ORDER BY transid DESC
      LIMIT 1
  ) t1
  UNION
  SELECT transid FROM (
    SELECT transid FROM archive.transaction ORDER BY transid DESC
      LIMIT 1
  ) t2
) t;

The reason for that is because PostgreSQL (apparently) isn't smart
enough to do a mergesort on the two indexes to sort the result of the
UNION. At least, I can't get PostgreSQL to sort over two UNIONed tables
using an index; perhaps I'm missing it.

Regards,
    Jeff Davis



From:
Dimitri Fontaine
Date:

Le mercredi 18 octobre 2006 23:02, Ioana Danes a écrit :
> I tried the partitioning scenario but I've got into
> the same problem. The max function is not using the
> indexes on the two partitioned tables...
>
> Any other thoughts?

Did you make sure your test included table inheritance?
I'm not sure the planner benefits from constraint_exclusion without selecting
the empty parent table (instead of your own union based view).

--
Dimitri Fontaine
http://www.dalibo.com/

From:
Jeff Davis
Date:

On Wed, 2006-10-18 at 23:19 +0200, Dimitri Fontaine wrote:
> Le mercredi 18 octobre 2006 23:02, Ioana Danes a écrit :
> > I tried the partitioning scenario but I've got into
> > the same problem. The max function is not using the
> > indexes on the two partitioned tables...
> >
> > Any other thoughts?
>
> Did you make sure your test included table inheritance?
> I'm not sure the planner benefits from constraint_exclusion without selecting
> the empty parent table (instead of your own union based view).
>

constraint exclusion and inheritance won't help him.

The problem is that he has two indexes, and he needs to find the max
between both of them. PostgreSQL isn't smart enough to recognize that it
can use two indexes, find the max in each one, and find the max of those
two values.

Regards,
    Jeff Davis


From:
"Jim C. Nasby"
Date:

On Wed, Oct 18, 2006 at 02:33:49PM -0700, Jeff Davis wrote:
> On Wed, 2006-10-18 at 23:19 +0200, Dimitri Fontaine wrote:
> > Le mercredi 18 octobre 2006 23:02, Ioana Danes a ??crit :
> > > I tried the partitioning scenario but I've got into
> > > the same problem. The max function is not using the
> > > indexes on the two partitioned tables...
> > >
> > > Any other thoughts?
> >
> > Did you make sure your test included table inheritance?
> > I'm not sure the planner benefits from constraint_exclusion without selecting
> > the empty parent table (instead of your own union based view).
> >
>
> constraint exclusion and inheritance won't help him.
>
> The problem is that he has two indexes, and he needs to find the max
> between both of them. PostgreSQL isn't smart enough to recognize that it
> can use two indexes, find the max in each one, and find the max of those
> two values.

Sorry, don't have the earlier part of this thread, but what about...

SELECT greatest(max(a), max(b)) ...

?
--
Jim Nasby                                            
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

From:
Jeff Davis
Date:

On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote:
> Sorry, don't have the earlier part of this thread, but what about...
>
> SELECT greatest(max(a), max(b)) ...
>
> ?

To fill you in, we're trying to get the max of a union (a view across
two physical tables).

It can be done if you're creative with the query; I suggested a query
that selected the max of the max()es of the individual tables. Your
query could work too. However, the trick would be getting postgresql to
recognize that it can transform "SELECT max(x) FROM foo" into that,
where foo is a view of a union.

If PostgreSQL could sort the result of a union by merging the results of
two index scans, I think the problem would be solved. Is there something
preventing this, or is it just something that needs to be added to the
planner?

Regards,
    Jeff Davis


From:
"Jim C. Nasby"
Date:

On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote:
> On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote:
> > Sorry, don't have the earlier part of this thread, but what about...
> >
> > SELECT greatest(max(a), max(b)) ...
> >
> > ?
>
> To fill you in, we're trying to get the max of a union (a view across
> two physical tables).

UNION or UNION ALL? You definitely don't want to do a plain UNION if you
can possibly avoid it.

> It can be done if you're creative with the query; I suggested a query
> that selected the max of the max()es of the individual tables. Your
> query could work too. However, the trick would be getting postgresql to
> recognize that it can transform "SELECT max(x) FROM foo" into that,
> where foo is a view of a union.
>
> If PostgreSQL could sort the result of a union by merging the results of
> two index scans, I think the problem would be solved. Is there something
> preventing this, or is it just something that needs to be added to the
> planner?

Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to
see if that worked in this case, but I don't have much hope for that.
--
Jim Nasby                                            
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

From:
Tom Lane
Date:

Jeff Davis <> writes:
> If PostgreSQL could sort the result of a union by merging the results of
> two index scans, I think the problem would be solved. Is there something
> preventing this, or is it just something that needs to be added to the
> planner?

It's something on the wish-list.  Personally I'd be inclined to try to
rewrite the query as a plain MAX() across rewritten per-table indexed
queries, rather than worry about mergesort or anything like that.
There wasn't any very good way to incorporate that idea when planagg.c
was first written, but now that the planner has an explicit notion of
"append relations" it might be relatively straightforward.

            regards, tom lane

From:
Jeff Davis
Date:

On Wed, 2006-10-18 at 17:35 -0500, Jim C. Nasby wrote:
> On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote:
> > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote:
> > > Sorry, don't have the earlier part of this thread, but what about...
> > >
> > > SELECT greatest(max(a), max(b)) ...
> > >
> > > ?
> >
> > To fill you in, we're trying to get the max of a union (a view across
> > two physical tables).
>
> UNION or UNION ALL? You definitely don't want to do a plain UNION if you
> can possibly avoid it.

Oops, of course he must be doing UNION ALL, but for some reason I ran my
test queries with plain UNION (thanks for reminding me). However, it
didn't make a difference, see below.

> > It can be done if you're creative with the query; I suggested a query
> > that selected the max of the max()es of the individual tables. Your
> > query could work too. However, the trick would be getting postgresql to
> > recognize that it can transform "SELECT max(x) FROM foo" into that,
> > where foo is a view of a union.
> >
> > If PostgreSQL could sort the result of a union by merging the results of
> > two index scans, I think the problem would be solved. Is there something
> > preventing this, or is it just something that needs to be added to the
> > planner?
>
> Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to
> see if that worked in this case, but I don't have much hope for that.

Yeah, that's the solution. Here's the problem:

=> set enable_seqscan = false;
SET
=> EXPLAIN SELECT i FROM (SELECT i FROM t10 UNION ALL SELECT i FROM t11)
t ORDER BY i DESC;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Sort  (cost=200026772.96..200027272.96 rows=200000 width=4)
   Sort Key: t.i
   ->  Append  (cost=100000000.00..200004882.00 rows=200000 width=4)
         ->  Seq Scan on t10  (cost=100000000.00..100001441.00
rows=100000 width=4)
         ->  Seq Scan on t11  (cost=100000000.00..100001441.00
rows=100000 width=4)
(5 rows)

=> EXPLAIN SELECT i FROM (SELECT i FROM t10) t ORDER BY i DESC;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Index Scan Backward using t10_idx on t10  (cost=0.00..1762.00
rows=100000 width=4)
(1 row)

=> EXPLAIN SELECT i FROM (SELECT i FROM t11) t ORDER BY i DESC;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Index Scan Backward using t11_idx on t11  (cost=0.00..1762.00
rows=100000 width=4)
(1 row)

=>

But if PostgreSQL could just merge the index scan results, it could
"ORDER BY i" the result of a UNION ALL without a problem. But it can't
do that, so the syntactical trick introduced for min/max won't work in
his case :(

He'll probably have to change his application to make that query perform
decently if the tables are split.

Ideas?

Regards,
    Jeff Davis


From:
Ioana Danes
Date:

Hello,

Actually what I expected from the planner for this
query (select max(transid) from view) was something
like this :

select max(transid) from (select max(transid) from
archive.transaction union all select max(transid) from
public.transaction)

and to apply the max function to each query of the
union. This is what is happening when you use a where
condition, it is using the indexes on each subquery of
the view...
ex: select transid from view where transid = 12;

This way it would be fast enough.

Also for order by and limit I was expecting the same
thing.


Thank you for your time,
Ioana Danes

> constraint exclusion and inheritance won't help him.
>
> The problem is that he has two indexes, and he needs
> to find the max
> between both of them. PostgreSQL isn't smart enough
> to recognize that it
> can use two indexes, find the max in each one, and
> find the max of those
> two values.
>
> Regards,
>     Jeff Davis
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

From:
Ioana Danes
Date:

Hi,
I tried and this does does not work either.

Thank you,
Ioana
--- "Jim C. Nasby" <> wrote:

> On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis
> wrote:
> > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby
> wrote:
> > > Sorry, don't have the earlier part of this
> thread, but what about...
> > >
> > > SELECT greatest(max(a), max(b)) ...
> > >
> > > ?
> >
> > To fill you in, we're trying to get the max of a
> union (a view across
> > two physical tables).
>
> UNION or UNION ALL? You definitely don't want to do
> a plain UNION if you
> can possibly avoid it.
>
> > It can be done if you're creative with the query;
> I suggested a query
> > that selected the max of the max()es of the
> individual tables. Your
> > query could work too. However, the trick would be
> getting postgresql to
> > recognize that it can transform "SELECT max(x)
> FROM foo" into that,
> > where foo is a view of a union.
> >
> > If PostgreSQL could sort the result of a union by
> merging the results of
> > two index scans, I think the problem would be
> solved. Is there something
> > preventing this, or is it just something that
> needs to be added to the
> > planner?
>
> Hrm... it'd be worth trying the old ORDER BY ...
> LIMIT 1 trick just to
> see if that worked in this case, but I don't have
> much hope for that.
> --
> Jim Nasby
> 
> EnterpriseDB      http://enterprisedb.com
> 512.569.9461 (cell)
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

From:
Ioana Danes
Date:

Hello,

It looks like some of you missed my first email but my
problem is not to find a replacement for this select:
select max(transid) from someunionview
Thare are plenty of solutions for doing this...

My point is to split a tale in two and to make this
transparent for the developers as a first step. On the
second step they will improve some of the queries but
that is another story.

So I would like to know if there is any plan to
improve this type of query for views in the near
future, or maybe it is alredy improved in 8.2 version?
I have the same problem and question for:
select transid from someunionview order by transid
desc limit 1;

Thank you for your time,
Ioana Danes

--- Tom Lane <> wrote:

> Jeff Davis <> writes:
> > If PostgreSQL could sort the result of a union by
> merging the results of
> > two index scans, I think the problem would be
> solved. Is there something
> > preventing this, or is it just something that
> needs to be added to the
> > planner?
>
> It's something on the wish-list.  Personally I'd be
> inclined to try to
> rewrite the query as a plain MAX() across rewritten
> per-table indexed
> queries, rather than worry about mergesort or
> anything like that.
> There wasn't any very good way to incorporate that
> idea when planagg.c
> was first written, but now that the planner has an
> explicit notion of
> "append relations" it might be relatively
> straightforward.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com