Thread: Sequence vs. Index Scan
I have two schemas, both with the exact same DDL. One performs great and the other has problems with a specific query:<br/><br />SELECT *<br />FROM<br /> branch_active_vw<br />WHERE branch_id = get_branch_for_zip ( '22151' ) <br />ORDERBY branch_name<br />;<br /><br />I am not defining the view here because the explain plans show the real problem. I can post the view as well if it will help.<br /><br />The explain plans are as follows:<br /><br />Fast Schema:<br />1. Sort (cost=17.50..17.51 rows=1 width=680) (actual time=2838.583..2838.586 rows=1 loops=1)<br />2. SortKey: branch.branch_name<br />3. -> Nested Loop Left Join (cost=0.00..17.49 rows=1 width=680) (actual time= 2838.060..2838.093rows=1 loops=1)<br />4. Join Filter: ("inner".branch_group_id = "outer".branch_group_id)<br />5. -> Nested Loop (cost=0.00..11.45 rows=1 width=647) (actual time=2837.776..2837.804 rows=1 loops=1)<br />6. -> Nested Loop (cost=0.00..7.88 rows=1 width=618) (actual time=2837.697..2837.716 rows=1 loops=1)<br/>7. Join Filter: ("inner".locale_id = "outer".locale_id) <br />8. -> Nested Loop (cost=0.00..6.86 rows=1 width=598) (actual time=2837.666..2837.676 rows=1 loops=1)<br />9. Join Filter: ("inner".corp_id = "outer".corp_id) <br />10. -> IndexScan using branch_pkey on branch (cost=0.00..5.84 rows=1 width=560) (actual time=2837.621..2837.624 rows=1 loops=1)<br/>11. Index Cond: (branch_id = get_branch_for_zip('22151'::character varying))<br />12. Filter: ((start_day <= now()) AND ((end_day IS NULL) OR (end_day >=now())))<br />13. -> Seq Scan on corp (cost=0.00..1.01 rows=1 width=46) (actual time=0.015..0.017 rows=1 loops=1)<br />14. -> Seq Scan on locale (cost=0.00..1.01 rows=1 width=28)(actual time=0.014..0.016 rows=1 loops=1)<br />15. -> Index Scan using zip_cd_pkey on zip_cd branch_address_zip_cd (cost= 0.00..3.55 rows=1 width=37) (actual time=0.066..0.069 rows=1 loops=1)<br />16. Index Cond: (branch_address_zip_cd.zip_cd_id = "outer".branch_address_zip_id)<br />17. -> SeqScan on branch_group (cost= 0.00..1.07 rows=7 width=41) (actual time=0.013..0.029 rows=7 loops=1)<br />18. SubPlan<br/>19. -> Seq Scan on branch_area (cost=0.00..4.89 rows=1 width=6) (actual time=0.132..0.137 rows=2loops=1)<br />20. Filter: (branch_id = $0) <br />21. Total runtime: 2839.044 ms<br clear="all" /><br/>Slow Schema:<br />Sort (cost=12.77..12.78 rows=1 width=1380) (actual time=157492.513..157492.515 rows=1 loops=1)<br/>1. Sort Key: branch.branch_name<br />2. -> Nested Loop Left Join (cost= 0.00..12.76 rows=1 width=1380)(actual time=130130.384..157492.484 rows=1 loops=1)<br />3. Join Filter: ("inner".branch_group_id = "outer".branch_group_id)<br/>4. -> Nested Loop (cost= 0.00..10.34 rows=1 width=1360) (actual time=130130.157..157492.253rows=1 loops=1)<br />5. Join Filter: ("inner".locale_id = "outer".locale_id)<br/>6. -> Nested Loop (cost= 0.00..9.31 rows=1 width=1340) (actual time=130130.127..157492.213rows=1 loops=1)<br />7. -> Nested Loop (cost=0.00..5.75 rows=1 width=1311)(actual time=130130.042..157492.119 rows=1 loops=1)<br />8. Join Filter: ("inner".corp_id= "outer".corp_id) <br />9. -> Seq Scan on branch (cost=0.00..4.72 rows=1width=1281) (actual time=130129.988..157492.057 rows=1 loops=1)<br />10. Filter: ((start_day<= now()) AND ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = get_branch_for_zip('22151'::charactervarying))) <br />11. -> Seq Scan on corp (cost=0.00..1.01rows=1 width=38) (actual time=0.022..0.024 rows=1 loops=1)<br />12. -> Index Scanusing zip_cd_pkey on zip_cd branch_address_zip_cd (cost= 0.00..3.55 rows=1 width=37) (actual time=0.070..0.073 rows=1loops=1)<br />13. Index Cond: (branch_address_zip_cd.zip_cd_id = "outer".branch_address_zip_id)<br/>14. -> Seq Scan on locale (cost= 0.00..1.01 rows=1 width=28) (actualtime=0.013..0.015 rows=1 loops=1)<br />15. -> Seq Scan on branch_group (cost=0.00..1.01 rows=1 width=28)(actual time=0.013..0.015 rows=1 loops=1)<br />16. SubPlan<br />17. -> Seq Scan on branch_area (cost= 0.00..1.40 rows=1 width=5) (actual time=0.077..0.084 rows=1 loops=1)<br />18. Filter: (branch_id= $0)<br />19. Total runtime: 157492.890 ms<br /><br />The problem is that lines 10-12 on the fast schema showan index scan while lines 9-10 of the slow schema show a sequence scan. The sequence scan of the branch_id, combinedwith the rest of the filter takes forever. I have checked and there IS an index, specifically a primary key index,on the branch_id in both schemas so I cannot figure out why the optimizer is looking at these differently. In fact,the table the branch_id comes from has the exact same indices and foreign keys on both schemas. <br /><br />Any directionwould be deeply appreciated.<br /><br />Thanks!<br />Aaron<br /><br />-- <br />==================================================================<br/> Aaron Bono<br /> Aranya Software Technologies,Inc.<br /> <a href="http://www.aranya.com"> http://www.aranya.com</a><br /> <a href="http://codeelixir.com">http://codeelixir.com</a><br />==================================================================
On Sat, May 05, 2007 at 04:05:52PM -0500, Aaron Bono wrote: > I have two schemas, both with the exact same DDL. One performs great and Do they have the same data? > I have > checked and there IS an index, specifically a primary key index, on the > branch_id in both schemas so I cannot figure out why the optimizer is > looking at these differently. In fact, the table the branch_id comes from > has the exact same indices and foreign keys on both schemas. Different data? Different statistics? Different numbers of (possibly dead) rows? A -- Andrew Sullivan | ajs@crankycanuck.ca Everything that happens in the world happens at some place. --Jane Jacobs
On 5/5/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
They have different data. The fast one has about 150 rows and the slow one has about 40 rows. The field in question here, the branch_id, is a BIGSERIAL in both.
We don't allow deletes and updates are fairly infrequent. I also did a vacuum analyze to no effect.
What can I do to get more statistics that would help?
Thanks for the help,
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On Sat, May 05, 2007 at 04:05:52PM -0500, Aaron Bono wrote:
> I have two schemas, both with the exact same DDL. One performs great and
Do they have the same data?
They have different data. The fast one has about 150 rows and the slow one has about 40 rows. The field in question here, the branch_id, is a BIGSERIAL in both.
> I have
> checked and there IS an index, specifically a primary key index, on the
> branch_id in both schemas so I cannot figure out why the optimizer is
> looking at these differently. In fact, the table the branch_id comes from
> has the exact same indices and foreign keys on both schemas.
Different data? Different statistics? Different numbers of
(possibly dead) rows?
We don't allow deletes and updates are fairly infrequent. I also did a vacuum analyze to no effect.
What can I do to get more statistics that would help?
Thanks for the help,
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
"Aaron Bono" <postgresql@aranya.com> writes: > 9. -> Seq Scan on branch (cost=0.00..4.72 rows=1 > width=1281) (actual time=130129.988..157492.057 rows=1 loops=1) > 10. Filter: ((start_day <= now()) AND > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = > get_branch_for_zip('22151'::character varying))) There is something *awfully* wacko about that entry --- the fact that the cost estimate is less than 5 units means that the planner thinks there's 4 or fewer pages; either that's way wrong or the get_branch_for_zip function is taking enormous amounts of time per row. Have you tried timing that function on its own? One possible reason for the performance difference is if you have get_branch_for_zip marked as stable in one database and volatile in the other --- volatile would prevent it from being used in an indexqual as you'd like. regards, tom lane
On Sat, May 05, 2007 at 05:00:53PM -0500, Aaron Bono wrote: > > They have different data. The fast one has about 150 rows and the slow one > has about 40 rows. The field in question here, the branch_id, is a > BIGSERIAL in both. I'd be astonished if a table of 40 rows ever got index scanned. It's probably more efficient to read the whole table. But it seems your case may be strange. > We don't allow deletes and updates are fairly infrequent. I also did a > vacuum analyze to no effect. How do you "not allow" deletes? Does this happen because of a failed foreign key? If so, you can end up with dead tuples. I'd look at the output of VACUUM VERBOSE to make sure you don't have a lot of dead tuples. That said, I wonder if fiddling with the statistics on your tables might help. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
On 5/5/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yes, the first thing I did was run it on its own. It takes about 2-6 seconds to run which is not great but acceptable and the functions are marked as stable. It is the same +/- a second or two on each of the two schemas. The function is a plpgsql function that executes a dynamic SQL string. Would it be better to change it to a straight SQL function? Would that help the planner? I may be able to change it if this will help.
I just switched the functions in both schemas to be stable (before I posted the original question), they were volatile. After doing that, the fast schema worked great but the slow one is still under performing. I even went back to the slow schema and dropped and recreated the function with no discernible difference.
This is for PostgreSQL 8.1.3 and the results are the same on Windows and Linux versions. The windows version is a restore from the Linux version's backup.
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
"Aaron Bono" <postgresql@aranya.com> writes:
> 9. -> Seq Scan on branch (cost=0.00..4.72 rows=1
> width=1281) (actual time= 130129.988..157492.057 rows=1 loops=1)
> 10. Filter: ((start_day <= now()) AND
> ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> get_branch_for_zip('22151'::character varying)))
There is something *awfully* wacko about that entry --- the fact that
the cost estimate is less than 5 units means that the planner thinks
there's 4 or fewer pages; either that's way wrong or the
get_branch_for_zip function is taking enormous amounts of time per row.
Have you tried timing that function on its own?
Yes, the first thing I did was run it on its own. It takes about 2-6 seconds to run which is not great but acceptable and the functions are marked as stable. It is the same +/- a second or two on each of the two schemas. The function is a plpgsql function that executes a dynamic SQL string. Would it be better to change it to a straight SQL function? Would that help the planner? I may be able to change it if this will help.
One possible reason for the performance difference is if you have
get_branch_for_zip marked as stable in one database and volatile in the
other --- volatile would prevent it from being used in an indexqual as
you'd like.
I just switched the functions in both schemas to be stable (before I posted the original question), they were volatile. After doing that, the fast schema worked great but the slow one is still under performing. I even went back to the slow schema and dropped and recreated the function with no discernible difference.
This is for PostgreSQL 8.1.3 and the results are the same on Windows and Linux versions. The windows version is a restore from the Linux version's backup.
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On 5/5/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
Permissions are set on the tables to only allow the application to do select/insert/update. We use start/end dates to indicate that records are no longer active and then have the application run against views that filter out inactive records. It also allows "removal" of records sometime in the future, gives us the ability to "undelete" records, and keeps us from having to code the application to have to clean up because of foreign key constraints. There are other advantages.
Fast Schema:
psql:vacuumverbose.sql:1: INFO: vacuuming "fast_schema.branch"
psql:vacuumverbose.sql:1: INFO: index "branch_pkey" now contains 150 row versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
Slow Schema:
psql:vacuumverbose.sql:1: INFO: vacuuming "slow_schema.branch"
psql:vacuumverbose.sql:1: INFO: index "branch_pkey" now contains 29 row versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
Like Tom said earlier, I really think it has something to do with the volatile/stable indicator on the function but I changed that. The function in the slow schema acts like it is still volatile while the fast schema is obviously stable.
Is there another way to analyze how the database is using the function in the selects? That or is there a way to just look at how the function is being used by the optimizer?
I do appreciate the feedback.
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On Sat, May 05, 2007 at 05:00:53PM -0500, Aaron Bono wrote:
> We don't allow deletes and updates are fairly infrequent. I also did a
> vacuum analyze to no effect.
How do you "not allow" deletes?
Permissions are set on the tables to only allow the application to do select/insert/update. We use start/end dates to indicate that records are no longer active and then have the application run against views that filter out inactive records. It also allows "removal" of records sometime in the future, gives us the ability to "undelete" records, and keeps us from having to code the application to have to clean up because of foreign key constraints. There are other advantages.
I'd look at
the output of VACUUM VERBOSE to make sure you don't have a lot of
dead tuples.
Fast Schema:
psql:vacuumverbose.sql:1: INFO: vacuuming "fast_schema.branch"
psql:vacuumverbose.sql:1: INFO: index "branch_pkey" now contains 150 row versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
Slow Schema:
psql:vacuumverbose.sql:1: INFO: vacuuming "slow_schema.branch"
psql:vacuumverbose.sql:1: INFO: index "branch_pkey" now contains 29 row versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
Like Tom said earlier, I really think it has something to do with the volatile/stable indicator on the function but I changed that. The function in the slow schema acts like it is still volatile while the fast schema is obviously stable.
Is there another way to analyze how the database is using the function in the selects? That or is there a way to just look at how the function is being used by the optimizer?
I do appreciate the feedback.
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On 5/5/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I verified it by putting a RAISE NOTICE in the function. The fast schema runs the function twice (odd, I would think it would run only once). The slow schema runs it 30 times (the number of records returned + 1). I know I put the functions into both schemas as stable and even dropped and recreated the function. Then I verified with EMS Manager and it tells me the DDL for the function in the database is set to stable. Is there something I can do to tell PostgreSQL that I really did mean stable?
Thanks for all the help,
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
"Aaron Bono" <postgresql@aranya.com> writes:
> 9. -> Seq Scan on branch (cost=0.00..4.72 rows=1
> width=1281) (actual time= 130129.988..157492.057 rows=1 loops=1)
> 10. Filter: ((start_day <= now()) AND
> ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> get_branch_for_zip('22151'::character varying)))
There is something *awfully* wacko about that entry --- the fact that
the cost estimate is less than 5 units means that the planner thinks
there's 4 or fewer pages; either that's way wrong or the
get_branch_for_zip function is taking enormous amounts of time per row.
Have you tried timing that function on its own?
One possible reason for the performance difference is if you have
get_branch_for_zip marked as stable in one database and volatile in the
other --- volatile would prevent it from being used in an indexqual as
you'd like.
I verified it by putting a RAISE NOTICE in the function. The fast schema runs the function twice (odd, I would think it would run only once). The slow schema runs it 30 times (the number of records returned + 1). I know I put the functions into both schemas as stable and even dropped and recreated the function. Then I verified with EMS Manager and it tells me the DDL for the function in the database is set to stable. Is there something I can do to tell PostgreSQL that I really did mean stable?
Thanks for all the help,
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On 5/5/07, Aaron Bono <postgresql@aranya.com> wrote: > On 5/5/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Aaron Bono" <postgresql@aranya.com> writes: > > > 9. -> Seq Scan on branch (cost=0.00..4.72 > rows=1 > > > width=1281) (actual time= 130129.988..157492.057 rows=1 loops=1) > > > 10. Filter: ((start_day > <= now()) AND > > > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = > > > get_branch_for_zip('22151'::character varying))) > > > > There is something *awfully* wacko about that entry --- the fact that > > the cost estimate is less than 5 units means that the planner thinks > > there's 4 or fewer pages; either that's way wrong or the > > get_branch_for_zip function is taking enormous amounts of time per row. > > Have you tried timing that function on its own? > > > > One possible reason for the performance difference is if you have > > get_branch_for_zip marked as stable in one database and volatile in the > > other --- volatile would prevent it from being used in an indexqual as > > you'd like. > > > > I verified it by putting a RAISE NOTICE in the function. The fast schema > runs the function twice (odd, I would think it would run only once). The > slow schema runs it 30 times (the number of records returned + 1). I know I > put the functions into both schemas as stable and even dropped and recreated > the function. Then I verified with EMS Manager and it tells me the DDL for > the function in the database is set to stable. Is there something I can do > to tell PostgreSQL that I really did mean stable? > maybe this is silly but you can verify what the database thinks of the function selecting from pg_proc select pronamespace, provolatile from pg_proc where proname = 'get_branch_for_zip' -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
On 5/5/07, Jaime Casanova <systemguards@gmail.com> wrote:
select pronamespace, provolatile, proname
from pg_proc where proname = 'get_branch_for_zip';
pronamespace | provolatile | proname
--------------+-------------+--------------------
26644852 | s | get_branch_for_zip
26644856 | s | get_branch_for_zip
The select is using the function on the slow schema as if it were volatile but as stable on the fast schema.
I did a restart of the service and that didn't help.
Then I inserted 150 more records in the slow schema and pow - it started working like the fast schema.
So my conclusion is that the function is being treated as volatile even though it is stable because the number of records is small. Is there any way to tell PostgreSQL that when I say stable I really mean stable?
I am getting close to a solution. Thanks again for the help!
-Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On 5/5/07, Aaron Bono <postgresql@aranya.com> wrote:
> On 5/5/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Aaron Bono" < postgresql@aranya.com> writes:
> > > 9. -> Seq Scan on branch (cost=0.00..4.72
> rows=1
> > > width=1281) (actual time= 130129.988..157492.057 rows=1 loops=1)
> > > 10. Filter: ((start_day
> <= now()) AND
> > > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> > > get_branch_for_zip('22151'::character varying)))
> >
> > There is something *awfully* wacko about that entry --- the fact that
> > the cost estimate is less than 5 units means that the planner thinks
> > there's 4 or fewer pages; either that's way wrong or the
> > get_branch_for_zip function is taking enormous amounts of time per row.
> > Have you tried timing that function on its own?
> >
> > One possible reason for the performance difference is if you have
> > get_branch_for_zip marked as stable in one database and volatile in the
> > other --- volatile would prevent it from being used in an indexqual as
> > you'd like.
> >
>
> I verified it by putting a RAISE NOTICE in the function. The fast schema
> runs the function twice (odd, I would think it would run only once). The
> slow schema runs it 30 times (the number of records returned + 1). I know I
> put the functions into both schemas as stable and even dropped and recreated
> the function. Then I verified with EMS Manager and it tells me the DDL for
> the function in the database is set to stable. Is there something I can do
> to tell PostgreSQL that I really did mean stable?
>
maybe this is silly but you can verify what the database thinks of the
function selecting from pg_proc
select pronamespace, provolatile
from pg_proc where proname = 'get_branch_for_zip'
select pronamespace, provolatile, proname
from pg_proc where proname = 'get_branch_for_zip';
pronamespace | provolatile | proname
--------------+-------------+--------------------
26644852 | s | get_branch_for_zip
26644856 | s | get_branch_for_zip
The select is using the function on the slow schema as if it were volatile but as stable on the fast schema.
I did a restart of the service and that didn't help.
Then I inserted 150 more records in the slow schema and pow - it started working like the fast schema.
So my conclusion is that the function is being treated as volatile even though it is stable because the number of records is small. Is there any way to tell PostgreSQL that when I say stable I really mean stable?
I am getting close to a solution. Thanks again for the help!
-Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
On Sun, May 06, 2007 at 01:45:54PM -0500, Aaron Bono wrote: > Then I inserted 150 more records in the slow schema and pow - it started > working like the fast schema. > > So my conclusion is that the function is being treated as volatile even > though it is stable because the number of records is small. I don't think that's the issue. If this is dependent on the number of records, then for some reason the way the data is structured means that the planner thinks a seqscan's a better bet. This is probably due to distribution of the values. You could try increasing the stats sample, and see if that helps. A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Sun, May 06, 2007 at 01:45:54PM -0500, Aaron Bono wrote: >> So my conclusion is that the function is being treated as volatile even >> though it is stable because the number of records is small. > I don't think that's the issue. If this is dependent on the > number of records, then for some reason the way the data is > structured means that the planner thinks a seqscan's a better bet. > This is probably due to distribution of the values. You could try > increasing the stats sample, and see if that helps. It's got nothing to do with distribution, just with numbers of pages to fetch. You'll nearly always get a seqscan plan if there are only a couple of pages in the table, simply because it would take more I/O to read the index too. The reason this is a problem in this example is that the function is so expensive to execute. The planner should be avoiding the seqscan on the basis of CPU cost not I/O cost, but it doesn't know that the function is expensive enough to drive the decision that way. In CVS HEAD (8.3-to-be) we've added a "cost" property to functions, which provides a clean way to fix this issue, but there's no good way to deal with it in existing releases :-( regards, tom lane
On 5/7/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Since we don't delete records, we just deactivate them, I added 100 dummy records that are not active. This fixed the problem. As the number of records in that table grows, I will delete the dummy records.
Thanks for all the help!
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
Andrew Sullivan <ajs@crankycanuck.ca> writes:
> On Sun, May 06, 2007 at 01:45:54PM -0500, Aaron Bono wrote:
>> So my conclusion is that the function is being treated as volatile even
>> though it is stable because the number of records is small.
> I don't think that's the issue. If this is dependent on the
> number of records, then for some reason the way the data is
> structured means that the planner thinks a seqscan's a better bet.
> This is probably due to distribution of the values. You could try
> increasing the stats sample, and see if that helps.
It's got nothing to do with distribution, just with numbers of pages to
fetch. You'll nearly always get a seqscan plan if there are only a
couple of pages in the table, simply because it would take more I/O to
read the index too.
The reason this is a problem in this example is that the function is so
expensive to execute. The planner should be avoiding the seqscan on the
basis of CPU cost not I/O cost, but it doesn't know that the function is
expensive enough to drive the decision that way.
In CVS HEAD (8.3-to-be) we've added a "cost" property to functions,
which provides a clean way to fix this issue, but there's no good way to
deal with it in existing releases :-(
regards, tom lane
Thanks for all the help!
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================