Thread: Postgresql 8.1.4 - performance issues for select on view using max
Hi everyone, I am doing a test for a scenario where I have 2 schemas one (public) for the operational data and another one (archive) for old, archived data. So basically I want to split the data from some huge tables in two. All data before 2006 in archive and all data after and including 2006 in public. Let's say I have a table named public.AllTransactions with data before and including 2006. I want to move all the data < 2006 into a new table named archive.transaction (in archive schema) I also want to move all data >= 2006 into a new table named public.transaction (in public schema). In order to make this transparent for the developers I want to drop the original table public.AllTransactions and to create a view with the same name that is a union between the two new tables: create view public.AllTransactions as select * from public.transaction union all select * from archive.transaction On this view I will create rules for insert, update, delete... Testing some selects I know we have in the application I got into a scenario where my plan does not work without doing code change. This scenario is: select max(transid) from alltransaction; because the planner does not use the existent indexes on the 2 new tables: public.transaction and archive.transaction Here are the results of the explain analyze: 1. Select only from one table is OK: ------------------------------------- # explain select max(transid) from public.transaction; QUERY PLAN -------------------------------------------------------------------------------- ---------------------- Result (cost=0.04..0.05 rows=1 width=0) InitPlan -> Limit (cost=0.00..0.04 rows=1 width=8) -> Index Scan Backward using pk_transaction on transaction (cost=0.00..357870.46 rows=9698002 width=8) Filter: (transid IS NOT NULL) (5 rows) 2. Select from the view is doing a sequential scan: --------------------------------------------------- # explain analyze select max(transid) from alltransaction; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- ----------------- Aggregate (cost=200579993.70..200579993.71 rows=1 width=8) (actual time=115778.101..115778.103 rows=1 loops=1) -> Append (cost=100000000.00..200447315.74 rows=10614237 width=143) (actual time=0.082..95146.144 rows=10622206 loops= 1) -> Seq Scan transaction (cost=100000000.00..100312397.02 rows=9698002 width=143) (actual time=0.078..56002.778 rows= 9706475 loops=1) -> Seq Scan on transaction (cost=100000000.00..100028776.35 rows=916235 width=143) (actual time=8.822..2799.496 rows= 915731 loops=1) Total runtime: 115778.200 ms (5 rows) Is this a bug or this is how the planner is suppose to work? The same problem I have on the following select: select transid from alltransaction order by transid desc limit 1; Thank you for your time, Ioana __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hi, Le mercredi 18 octobre 2006 21:51, Ioana Danes a écrit : > I am doing a test for a scenario where I have 2 > schemas one (public) for the operational data and > another one (archive) for old, archived data. So > basically I want to split the data from some huge > tables in two. All data before 2006 in archive and all > data after and including 2006 in public. [...] > I got into a scenario where my plan does not work > without doing code change. This sounds a lot as a ddl partitionning, you may want to add some checks to your schema and set constraint_exclusion to on in your postgresql.conf. Please read following documentation material : http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html Regards, -- Dimitri Fontaine http://www.dalibo.com/
Attachment
On 10/18/06, Ioana Danes <ioanasoftware@yahoo.ca> wrote:
This works fine because i recognizes the index for that table and can simply use it to find the max.
# explain select max(transid) from public.transaction;
QUERY
PLAN
--------------------------------------------------------------------------------
----------------------
Result (cost=0.04..0.05 rows=1 width=0)
InitPlan
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using
pk_transaction on transaction (cost= 0.00..357870.46
rows=9698002 width=8)
Filter: (transid IS NOT NULL)
(5 rows)
This works fine because i recognizes the index for that table and can simply use it to find the max.
2. Select from the view is doing a sequential scan:
---------------------------------------------------
# explain analyze select max(transid) from
alltransaction;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
-----------------
Aggregate (cost=200579993.70..200579993.71 rows=1
width=8) (actual time=115778.101..115778.103 rows=1
loops=1)
-> Append (cost=100000000.00..200447315.74
rows=10614237 width=143) (actual time=0.082..95146.144
rows=10622206 loops= 1)
-> Seq Scan transaction
(cost=100000000.00..100312397.02 rows=9698002
width=143) (actual time=0.078..56002.778 rows=
9706475 loops=1)
-> Seq Scan on transaction
(cost=100000000.00..100028776.35 rows=916235
width=143) (actual time=8.822..2799.496 rows=
915731 loops=1)
Total runtime: 115778.200 ms
(5 rows)
Because this is a view, it cannot use the indexes from the other tables. Everytime you run a query against a view, it recreates itself based on the underlying data. From there it must sort the table based on the i and then return your max.
It's probably not a great idea to make a view this way if you are planning on using queries like this regularly because you can't create an index for a view. You could try a query that pulls the max from each table and then grabs the max of these:
select max (foo.transid) from (select max(transid) as id from public.transaction union select max(transid) from archive.transaction) as foo;
--
This E-mail is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521 and is legally privileged.
This information is confidential information and is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.
Thanks a lot I will give it a try. --- Dimitri Fontaine <dim@dalibo.com> wrote: > Hi, > > Le mercredi 18 octobre 2006 21:51, Ioana Danes a > écrit : > > I am doing a test for a scenario where I have 2 > > schemas one (public) for the operational data and > > another one (archive) for old, archived data. So > > basically I want to split the data from some huge > > tables in two. All data before 2006 in archive and > all > > data after and including 2006 in public. > [...] > > I got into a scenario where my plan does not work > > without doing code change. > > This sounds a lot as a ddl partitionning, you may > want to add some checks to > your schema and set constraint_exclusion to on in > your postgresql.conf. > > Please read following documentation material : > > http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html > > Regards, > -- > Dimitri Fontaine > http://www.dalibo.com/ > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hello, I tried the partitioning scenario but I've got into the same problem. The max function is not using the indexes on the two partitioned tables... Any other thoughts? --- Ioana Danes <ioanasoftware@yahoo.ca> wrote: > Thanks a lot I will give it a try. > > --- Dimitri Fontaine <dim@dalibo.com> wrote: > > > Hi, > > > > Le mercredi 18 octobre 2006 21:51, Ioana Danes a > > écrit : > > > I am doing a test for a scenario where I have 2 > > > schemas one (public) for the operational data > and > > > another one (archive) for old, archived data. So > > > basically I want to split the data from some > huge > > > tables in two. All data before 2006 in archive > and > > all > > > data after and including 2006 in public. > > [...] > > > I got into a scenario where my plan does not > work > > > without doing code change. > > > > This sounds a lot as a ddl partitionning, you may > > want to add some checks to > > your schema and set constraint_exclusion to on in > > your postgresql.conf. > > > > Please read following documentation material : > > > > > http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html > > > > Regards, > > -- > > Dimitri Fontaine > > http://www.dalibo.com/ > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Wed, 2006-10-18 at 15:51 -0400, Ioana Danes wrote: > Hi everyone, > Testing some selects I know we have in the application > I got into a scenario where my plan does not work > without doing code change. This scenario is: > > select max(transid) from alltransaction; > > because the planner does not use the existent indexes > on the 2 new tables: public.transaction and > archive.transaction > First, the query is expanded into something like (I'm being inexact here): SELECT max(transid) FROM (SELECT * FROM public.transaction UNION SELECT * FROM archive.transaction); PostgreSQL added a hack to the max() aggregate so that, in the simple case, it can recognize that what it really wants to do is use the index. Using the index for an aggregate only works in special cases, like min() and max(). What PostgreSQL actually does is to transform a query from: SELECT max(value) FROM sometable; Into: SELECT value FROM sometable ORDER BY value DESC LIMIT 1; In your case, it would need to transform the query into something more like: SELECT max(transid) FROM ( SELECT transid FROM ( SELECT transid FROM public.transaction ORDER BY transid DESC LIMIT 1 ) t1 UNION SELECT transid FROM ( SELECT transid FROM archive.transaction ORDER BY transid DESC LIMIT 1 ) t2 ) t; The reason for that is because PostgreSQL (apparently) isn't smart enough to do a mergesort on the two indexes to sort the result of the UNION. At least, I can't get PostgreSQL to sort over two UNIONed tables using an index; perhaps I'm missing it. Regards, Jeff Davis
Le mercredi 18 octobre 2006 23:02, Ioana Danes a écrit : > I tried the partitioning scenario but I've got into > the same problem. The max function is not using the > indexes on the two partitioned tables... > > Any other thoughts? Did you make sure your test included table inheritance? I'm not sure the planner benefits from constraint_exclusion without selecting the empty parent table (instead of your own union based view). -- Dimitri Fontaine http://www.dalibo.com/
Attachment
On Wed, 2006-10-18 at 23:19 +0200, Dimitri Fontaine wrote: > Le mercredi 18 octobre 2006 23:02, Ioana Danes a écrit : > > I tried the partitioning scenario but I've got into > > the same problem. The max function is not using the > > indexes on the two partitioned tables... > > > > Any other thoughts? > > Did you make sure your test included table inheritance? > I'm not sure the planner benefits from constraint_exclusion without selecting > the empty parent table (instead of your own union based view). > constraint exclusion and inheritance won't help him. The problem is that he has two indexes, and he needs to find the max between both of them. PostgreSQL isn't smart enough to recognize that it can use two indexes, find the max in each one, and find the max of those two values. Regards, Jeff Davis
On Wed, Oct 18, 2006 at 02:33:49PM -0700, Jeff Davis wrote: > On Wed, 2006-10-18 at 23:19 +0200, Dimitri Fontaine wrote: > > Le mercredi 18 octobre 2006 23:02, Ioana Danes a ??crit : > > > I tried the partitioning scenario but I've got into > > > the same problem. The max function is not using the > > > indexes on the two partitioned tables... > > > > > > Any other thoughts? > > > > Did you make sure your test included table inheritance? > > I'm not sure the planner benefits from constraint_exclusion without selecting > > the empty parent table (instead of your own union based view). > > > > constraint exclusion and inheritance won't help him. > > The problem is that he has two indexes, and he needs to find the max > between both of them. PostgreSQL isn't smart enough to recognize that it > can use two indexes, find the max in each one, and find the max of those > two values. Sorry, don't have the earlier part of this thread, but what about... SELECT greatest(max(a), max(b)) ... ? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote: > Sorry, don't have the earlier part of this thread, but what about... > > SELECT greatest(max(a), max(b)) ... > > ? To fill you in, we're trying to get the max of a union (a view across two physical tables). It can be done if you're creative with the query; I suggested a query that selected the max of the max()es of the individual tables. Your query could work too. However, the trick would be getting postgresql to recognize that it can transform "SELECT max(x) FROM foo" into that, where foo is a view of a union. If PostgreSQL could sort the result of a union by merging the results of two index scans, I think the problem would be solved. Is there something preventing this, or is it just something that needs to be added to the planner? Regards, Jeff Davis
On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote: > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote: > > Sorry, don't have the earlier part of this thread, but what about... > > > > SELECT greatest(max(a), max(b)) ... > > > > ? > > To fill you in, we're trying to get the max of a union (a view across > two physical tables). UNION or UNION ALL? You definitely don't want to do a plain UNION if you can possibly avoid it. > It can be done if you're creative with the query; I suggested a query > that selected the max of the max()es of the individual tables. Your > query could work too. However, the trick would be getting postgresql to > recognize that it can transform "SELECT max(x) FROM foo" into that, > where foo is a view of a union. > > If PostgreSQL could sort the result of a union by merging the results of > two index scans, I think the problem would be solved. Is there something > preventing this, or is it just something that needs to be added to the > planner? Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to see if that worked in this case, but I don't have much hope for that. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jeff Davis <pgsql@j-davis.com> writes: > If PostgreSQL could sort the result of a union by merging the results of > two index scans, I think the problem would be solved. Is there something > preventing this, or is it just something that needs to be added to the > planner? It's something on the wish-list. Personally I'd be inclined to try to rewrite the query as a plain MAX() across rewritten per-table indexed queries, rather than worry about mergesort or anything like that. There wasn't any very good way to incorporate that idea when planagg.c was first written, but now that the planner has an explicit notion of "append relations" it might be relatively straightforward. regards, tom lane
On Wed, 2006-10-18 at 17:35 -0500, Jim C. Nasby wrote: > On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis wrote: > > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby wrote: > > > Sorry, don't have the earlier part of this thread, but what about... > > > > > > SELECT greatest(max(a), max(b)) ... > > > > > > ? > > > > To fill you in, we're trying to get the max of a union (a view across > > two physical tables). > > UNION or UNION ALL? You definitely don't want to do a plain UNION if you > can possibly avoid it. Oops, of course he must be doing UNION ALL, but for some reason I ran my test queries with plain UNION (thanks for reminding me). However, it didn't make a difference, see below. > > It can be done if you're creative with the query; I suggested a query > > that selected the max of the max()es of the individual tables. Your > > query could work too. However, the trick would be getting postgresql to > > recognize that it can transform "SELECT max(x) FROM foo" into that, > > where foo is a view of a union. > > > > If PostgreSQL could sort the result of a union by merging the results of > > two index scans, I think the problem would be solved. Is there something > > preventing this, or is it just something that needs to be added to the > > planner? > > Hrm... it'd be worth trying the old ORDER BY ... LIMIT 1 trick just to > see if that worked in this case, but I don't have much hope for that. Yeah, that's the solution. Here's the problem: => set enable_seqscan = false; SET => EXPLAIN SELECT i FROM (SELECT i FROM t10 UNION ALL SELECT i FROM t11) t ORDER BY i DESC; QUERY PLAN ------------------------------------------------------------------------------------ Sort (cost=200026772.96..200027272.96 rows=200000 width=4) Sort Key: t.i -> Append (cost=100000000.00..200004882.00 rows=200000 width=4) -> Seq Scan on t10 (cost=100000000.00..100001441.00 rows=100000 width=4) -> Seq Scan on t11 (cost=100000000.00..100001441.00 rows=100000 width=4) (5 rows) => EXPLAIN SELECT i FROM (SELECT i FROM t10) t ORDER BY i DESC; QUERY PLAN ------------------------------------------------------------------------------------ Index Scan Backward using t10_idx on t10 (cost=0.00..1762.00 rows=100000 width=4) (1 row) => EXPLAIN SELECT i FROM (SELECT i FROM t11) t ORDER BY i DESC; QUERY PLAN ------------------------------------------------------------------------------------ Index Scan Backward using t11_idx on t11 (cost=0.00..1762.00 rows=100000 width=4) (1 row) => But if PostgreSQL could just merge the index scan results, it could "ORDER BY i" the result of a UNION ALL without a problem. But it can't do that, so the syntactical trick introduced for min/max won't work in his case :( He'll probably have to change his application to make that query perform decently if the tables are split. Ideas? Regards, Jeff Davis
Hello, Actually what I expected from the planner for this query (select max(transid) from view) was something like this : select max(transid) from (select max(transid) from archive.transaction union all select max(transid) from public.transaction) and to apply the max function to each query of the union. This is what is happening when you use a where condition, it is using the indexes on each subquery of the view... ex: select transid from view where transid = 12; This way it would be fast enough. Also for order by and limit I was expecting the same thing. Thank you for your time, Ioana Danes > constraint exclusion and inheritance won't help him. > > The problem is that he has two indexes, and he needs > to find the max > between both of them. PostgreSQL isn't smart enough > to recognize that it > can use two indexes, find the max in each one, and > find the max of those > two values. > > Regards, > Jeff Davis > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hi, I tried and this does does not work either. Thank you, Ioana --- "Jim C. Nasby" <jim@nasby.net> wrote: > On Wed, Oct 18, 2006 at 03:32:15PM -0700, Jeff Davis > wrote: > > On Wed, 2006-10-18 at 17:10 -0500, Jim C. Nasby > wrote: > > > Sorry, don't have the earlier part of this > thread, but what about... > > > > > > SELECT greatest(max(a), max(b)) ... > > > > > > ? > > > > To fill you in, we're trying to get the max of a > union (a view across > > two physical tables). > > UNION or UNION ALL? You definitely don't want to do > a plain UNION if you > can possibly avoid it. > > > It can be done if you're creative with the query; > I suggested a query > > that selected the max of the max()es of the > individual tables. Your > > query could work too. However, the trick would be > getting postgresql to > > recognize that it can transform "SELECT max(x) > FROM foo" into that, > > where foo is a view of a union. > > > > If PostgreSQL could sort the result of a union by > merging the results of > > two index scans, I think the problem would be > solved. Is there something > > preventing this, or is it just something that > needs to be added to the > > planner? > > Hrm... it'd be worth trying the old ORDER BY ... > LIMIT 1 trick just to > see if that worked in this case, but I don't have > much hope for that. > -- > Jim Nasby > jim@nasby.net > EnterpriseDB http://enterprisedb.com > 512.569.9461 (cell) > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hello, It looks like some of you missed my first email but my problem is not to find a replacement for this select: select max(transid) from someunionview Thare are plenty of solutions for doing this... My point is to split a tale in two and to make this transparent for the developers as a first step. On the second step they will improve some of the queries but that is another story. So I would like to know if there is any plan to improve this type of query for views in the near future, or maybe it is alredy improved in 8.2 version? I have the same problem and question for: select transid from someunionview order by transid desc limit 1; Thank you for your time, Ioana Danes --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > If PostgreSQL could sort the result of a union by > merging the results of > > two index scans, I think the problem would be > solved. Is there something > > preventing this, or is it just something that > needs to be added to the > > planner? > > It's something on the wish-list. Personally I'd be > inclined to try to > rewrite the query as a plain MAX() across rewritten > per-table indexed > queries, rather than worry about mergesort or > anything like that. > There wasn't any very good way to incorporate that > idea when planagg.c > was first written, but now that the planner has an > explicit notion of > "append relations" it might be relatively > straightforward. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com