Thread: Ad Hoc Indexes
Question: We have recently moved PostgreSql as we just migrated to OpenMfg. i really like the database and PL/pgSQL its very similar to how FoxPro language layout so learning has been pretty easy. Now for my question It does not appear PostgreSQL does not have an Ad Hoc Indexes ability where the Query planner will create an in memory index based on the Select, Update, Insert or Delete commands. I got pretty lazy with MsSQL and Foxpro doing ad hoc indexes to speed up complex queries where i should have created an index but never did. Is there any plans in the future to add the ability for PostgreSQL to create Ad Hoc indexes if it makes sense. Example i was creating a new report thats presently only processing around 120,000 records in the 7 tables for the query. But the query took between 18 to 19 seconds to run. I analyzed the query and added 2 indexes and drop the processing time to 93 milliseconds. This query will be run 3 to 5 times a day so it makes sense to create indexes But with reports that run maybe once a week or once a month it does not make sense to create indexes. this is where I think Add Hoc indexes are nice to have because to my understanding maintaining indexes can get pretty expensive. Whats the likely hood of Ad Hoc Indexes being added to PostgreSql thanks
Justin wrote: > Now for my question It does not appear PostgreSQL does not have an Ad > Hoc Indexes ability where the Query planner will create an in memory > index based on the Select, Update, Insert or Delete commands. How is that supposed to work? In order to create an index you would need to visit all the rows in the table. If you do that, you could just as well answer the query off a sequential scan. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Feb 18, 2008, at 9:09 PM, Peter Eisentraut wrote:
Justin wrote:Now for my question It does not appear PostgreSQL does not have an AdHoc Indexes ability where the Query planner will create an in memoryindex based on the Select, Update, Insert or Delete commands.How is that supposed to work? In order to create an index you would need tovisit all the rows in the table. If you do that, you could just as wellanswer the query off a sequential scan.
this is not quite true.
this kind of indexing makes sense if you visit the same data over and over again. WITH-queries would be an example for that and self joins could benefit from the this feature too.
the question however is: why not create "normal indexes" straight away?
i am not sure if the benefit of ad-hoc indexes justify additional complexity in the code ...
best regards,
hans
--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at
Justin <justin@emproshunts.com> writes: > Is there any plans in the future to add the ability for PostgreSQL to > create Ad Hoc indexes if it makes sense. No, I'm not aware of anyone contemplating such a thing. I can hardly imagine a situation where building an index for a single query is actually a win. Maybe those DBMSes you mention were using this as a substitute for having decent join mechanisms, or something? regards, tom lane
The idea of ad hoc indexes is speed up loop scans To prove my idea i created a sql file in PGAdmin that creates the indexes on the fly then runs the query then drops the indexs.
without the indexes it takes 18 to 19 seconds to run the query.
To create the index and do the query takes 400 milliseconds. I wish i could do an Explain on it with create index in the query but i can't it errors out. So i reran the query with indexes already in place it drops the query time 191 milliseconds.
Create and deleting the indexes on the fly improves performance almost 50 times. I think creating Ad Hoc indexes on the fly in memory makes sense . I imagine it would be even faster if the index stayed in memory
Tom Lane wrote:
without the indexes it takes 18 to 19 seconds to run the query.
To create the index and do the query takes 400 milliseconds. I wish i could do an Explain on it with create index in the query but i can't it errors out. So i reran the query with indexes already in place it drops the query time 191 milliseconds.
Create and deleting the indexes on the fly improves performance almost 50 times. I think creating Ad Hoc indexes on the fly in memory makes sense . I imagine it would be even faster if the index stayed in memory
Tom Lane wrote:
Justin <justin@emproshunts.com> writes:Is there any plans in the future to add the ability for PostgreSQL to create Ad Hoc indexes if it makes sense.No, I'm not aware of anyone contemplating such a thing. I can hardly imagine a situation where building an index for a single query is actually a win. Maybe those DBMSes you mention were using this as a substitute for having decent join mechanisms, or something? regards, tom lane
Attachment
Justin <justin@emproshunts.com> writes: > The idea of ad hoc indexes is speed up loop scans To prove my idea i > created a sql file in PGAdmin that creates the indexes on the fly then > runs the query then drops the indexs. > without the indexes it takes 18 to 19 seconds to run the query. > To create the index and do the query takes 400 milliseconds. The example you show doesn't convince me of much of anything, because the estimated rowcounts are so far off. I think you're basically dealing with an estimation failure and it's pure luck that the extra index fixes it. regards, tom lane
Hans-Juergen Schoenig wrote: > this kind of indexing makes sense if you visit the same data over and > over again. WITH-queries would be an example for that and self joins > could benefit from the this feature too. Yes, for joins it is useful. We have hash joins, which are kind of ad hoc hash indexes. If anyone wants to implement a btree join, that might work. :) But I think we essentially support what the OP is asking for, it is just under different labels. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Then why are the estimates so far off??? If estimates where correct would it improve the performance that much. <br /><br/> Vaccum is set to run automatically so the stats stay update. <br /><br /> Total record count for the tables for allthe tables put together is around 120,000 the query returns only 458 records which is correct. <br /><br /> If i am correctin my understanding the reason the index improved the query so much is the wooper table gets hit hard because it appearsin 3 separate nested queries . So taking only 458 records returned from the parent query times 3 for 1,375 tablescans going through 21,873 records for a total number records being processed to 30,075,375 on a table with no index. So if you look at it that way PostgreSql did remarkably well processing the query in 18 to 20 seconds. <br /><br/> The idea behind adhoc indexes is when one shot queries or really used queries are created that would require numerousindexes to run in a decent time can be run in a faction of the time. This also saves processing times across theentire system where creating indexes for the all the possible queries is impractical <br /><br /> This does not take awaythe need for index but speed up ad-hoc queries created from a website or other business analysis tool that someone mightcreate <br /><br /> Tom Lane wrote: <blockquote cite="mid:1552.1203370105@sss.pgh.pa.us" type="cite"><pre wrap="">Justin<a class="moz-txt-link-rfc2396E" href="mailto:justin@emproshunts.com"><justin@emproshunts.com></a> writes:</pre><blockquote type="cite"><pre wrap="">The idea of ad hoc indexes is speed up loop scans To prove my idea i created a sql file in PGAdmin that creates the indexes on the fly then runs the query then drops the indexs. </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap=""> withoutthe indexes it takes 18 to 19 seconds to run the query. </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><prewrap="">To create the index and do the query takes 400 milliseconds. </pre></blockquote><pre wrap=""> The example you show doesn't convince me of much of anything, because the estimated rowcounts are so far off. I think you're basically dealing with an estimation failure and it's pure luck that the extra index fixes it. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? <a class="moz-txt-link-freetext" href="http://www.postgresql.org/docs/faq">http://www.postgresql.org/docs/faq</a></pre></blockquote>
oops dam spell checker really should be rarely sorry <br /><br /> Justin wrote: <blockquote cite="mid:47BA01D7.9010001@emproshunts.com"type="cite"></blockquote> Then why are the estimates so far off??? If estimateswhere correct would it improve the performance that much. <br /><br /> Vaccum is set to run automatically so thestats stay update. <br /><br /> Total record count for the tables for all the tables put together is around 120,000 thequery returns only 458 records which is correct. <br /><br /> If i am correct in my understanding the reason the indeximproved the query so much is the wooper table gets hit hard because it appears in 3 separate nested queries . So takingonly 458 records returned from the parent query times 3 for 1,375 table scans going through 21,873 records for a totalnumber records being processed to 30,075,375 on a table with no index. So if you look at it that way PostgreSql didremarkably well processing the query in 18 to 20 seconds. <br /><br /> The idea behind adhoc indexes is when one shotqueries or rarely used queries are created that would require numerous indexes to run in a decent time can be run ina faction of the time. This also saves processing times across the entire system where creating indexes for the all thepossible queries is impractical <br /><br /> This does not take away the need for index but speed up ad-hoc queries createdfrom a website or other business analysis tool that someone might create <br /><br /> Tom Lane wrote: <blockquotecite="mid:1552.1203370105@sss.pgh.pa.us" type="cite"><pre wrap="">Justin <a class="moz-txt-link-rfc2396E" href="mailto:justin@emproshunts.com"moz-do-not-send="true"><justin@emproshunts.com></a> writes: </pre><blockquote type="cite"><prewrap="">The idea of ad hoc indexes is speed up loop scans To prove my idea i created a sql file in PGAdmin that creates the indexes on the fly then runs the query then drops the indexs. </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap=""> withoutthe indexes it takes 18 to 19 seconds to run the query. </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><prewrap="">To create the index and do the query takes 400 milliseconds. </pre></blockquote><pre wrap=""> The example you show doesn't convince me of much of anything, because the estimated rowcounts are so far off. I think you're basically dealing with an estimation failure and it's pure luck that the extra index fixes it. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? <a class="moz-txt-link-freetext" href="http://www.postgresql.org/docs/faq" moz-do-not-send="true">http://www.postgresql.org/docs/faq</a></pre></blockquote>
Justin wrote: > Then why are the estimates so far off? Perhaps a really high correlation between those records where coitem_status = 'O' and those records in your date range. Theplanner estimates 1 row when restricting by both restrictions, whereas there where 458. The 1 row was probably a resultof a small % for status='O' multiplied by a small % for the date range. > If estimates where correct would it improve the performance that much. Possibly, but a better performance gain might be obtained by rewriting the query, changing the case expression to somethingalong the lines of: coalesce((select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text) from wooper inner join wrkcnton wrkcnt_id = wooper_wrkcnt_idwhere wooper_rncomplete = true and wooper_wo_id = coitem_order_idorder by wooper_seqnumberdesc limit 1 ),'No Operation Completed') as LastFinshedOp regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachmentsis confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately,destroy it and do not copy, disclose or use it in any way. __________________________________________________________________ This email has been scanned by the DMZGlobal BusinessQuality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __________________________________________________________________
yes that improved the select statement allot from 18 to 20 to 6 seconds 3 fold improvement with no indexes. thanks
Now i tested your new and improved select statement with the indexes its query time was taking from 90 to 140 milliseconds compared to the original select statement of 94 to 120 milliseconds. I tested both select statements 5 times to get an average and they both hover around 110 to 120
I attached the results from the explain that used the indexes and one with no index
although in both case with and with out indexes the estimate still failed to return the correct number by allot.
Stephen Denne wrote:
Now i tested your new and improved select statement with the indexes its query time was taking from 90 to 140 milliseconds compared to the original select statement of 94 to 120 milliseconds. I tested both select statements 5 times to get an average and they both hover around 110 to 120
I attached the results from the explain that used the indexes and one with no index
although in both case with and with out indexes the estimate still failed to return the correct number by allot.
Stephen Denne wrote:
Justin wrote:Then why are the estimates so far off?Perhaps a really high correlation between those records where coitem_status = 'O' and those records in your date range. The planner estimates 1 row when restricting by both restrictions, whereas there where 458. The 1 row was probably a result of a small % for status='O' multiplied by a small % for the date range.If estimates where correct would it improve the performance that much.Possibly, but a better performance gain might be obtained by rewriting the query, changing the case expression to something along the lines of: coalesce((select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text) from wooper inner join wrkcnt on wrkcnt_id = wooper_wrkcnt_idwhere wooper_rncomplete = true and wooper_wo_id = coitem_order_idorder by wooper_seqnumber desc limit 1 ),'No Operation Completed') as LastFinshedOp regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __________________________________________________________________ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __________________________________________________________________ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Justin wrote: > although in both case with and with out indexes the estimate still failed to return the correct number by allot. The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being performedhundreds of times instead of the one time the planner estimated. The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there hadbeen any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate, itwouldn't have bothered creating any indexes on wooper. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachmentsis confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately,destroy it and do not copy, disclose or use it in any way. __________________________________________________________________ This email has been scanned by the DMZGlobal BusinessQuality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __________________________________________________________________
"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being performedhundreds of times instead of the one time the planner estimated. > The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there hadbeen any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate, itwouldn't have bothered creating any indexes on wooper. Right. And even more to the point, if it had gotten the estimate right and known that the subquery would have been repeated, that would have (ideally) prompted it to shift to a different plan structure. As Peter pointed out upthread, the existing hash join logic seems to be a pretty decent facsimile of an "ad hoc index" --- in fact, the hash table *is* a hash index for all intents and purposes. If you tilt your head at the right angle, a merge join could be seen as comparable to constructing a couple of ad-hoc btree indexes. Plus the join code is not burdened by any overhead that actual index code would be likely to have, such as niggling worries about crash-safety or concurrent access to the index. So in my mind the issue here is not why don't we have ad hoc indexes, it's why the planner didn't choose a more appropriate join method. It's clear that faulty rowcount estimation was one part of the answer. Now it may also be that there are some outright structural limitations involved due to the "join" arising from a sub-select --- I'm not sure that the planner *can* generate a bulk join plan from a query expressed like this one. But the TODO items that this suggests to me do not include implementing anything I'd call an ad-hoc index. regards, tom lane
there are certainly problems with this idea. Figuring out the cost to either create an index or just do table scan wouldbe pain but not impossible. The hit to index a table in memory i don't think would be that bad compare to do 100'sto thousands of loops with table scans. <br /><br /> I see more options for the query planner in config file to handlead-hoc indexes<br /><br /> Now to Toms points. The point of Ad Hoc index is they're only alive for the period timethe query is running and only live in memory. Once the query completes they die horrible deaths<br /><br /> Thesetemporay indexes will not do a table lock or row lock ever it only needs to view the record to create this index. Sothe same problems that affects 100 table scans would affect create temp index. I think this would help reduce concurrencesense the table is not being scanned thousands of times<br /><br /> The idea of creating one time use indexeshas been around for long time and has showed to be a benefit when dealing with large data sets where queries is onetime or rarely used and its to much labor and cost to figure out how to make it faster. This would also reduce IO diskactivity allot if the table can't fit in memory but the index would fit because it relativity small in comparison. <br /><br /> Tom Lane wrote: <blockquote cite="mid:19155.1203389537@sss.pgh.pa.us" type="cite"><pre wrap="">"StephenDenne" <a class="moz-txt-link-rfc2396E" href="mailto:Stephen.Denne@datamail.co.nz"><Stephen.Denne@datamail.co.nz></a>writes: </pre><blockquote type="cite"><prewrap="">The improvement wasn't to the part of the query that had the bad cost estimate, it was to the partthat was being performed hundreds of times instead of the one time the planner estimated. </pre></blockquote><pre wrap=""></pre><blockquote type="cite"><pre wrap="">The planner still thought it was only going to perform a sequential scanof your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider creatingindexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper. </pre></blockquote><prewrap=""> Right. And even more to the point, if it had gotten the estimate right and known that the subquery would have been repeated, that would have (ideally) prompted it to shift to a different plan structure. As Peter pointed out upthread, the existing hash join logic seems to be a pretty decent facsimile of an "ad hoc index" --- in fact, the hash table *is* a hash index for all intents and purposes. If you tilt your head at the right angle, a merge join could be seen as comparable to constructing a couple of ad-hoc btree indexes. Plus the join code is not burdened by any overhead that actual index code would be likely to have, such as niggling worries about crash-safety or concurrent access to the index. So in my mind the issue here is not why don't we have ad hoc indexes, it's why the planner didn't choose a more appropriate join method. It's clear that faulty rowcount estimation was one part of the answer. Now it may also be that there are some outright structural limitations involved due to the "join" arising from a sub-select --- I'm not sure that the planner *can* generate a bulk join plan from a query expressed like this one. But the TODO items that this suggests to me do not include implementing anything I'd call an ad-hoc index. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings </pre></blockquote>