Re: Ad Hoc Indexes - Mailing list pgsql-hackers
From | Justin |
---|---|
Subject | Re: Ad Hoc Indexes |
Date | |
Msg-id | 47BA535D.7010601@emproshunts.com Whole thread Raw |
In response to | Re: Ad Hoc Indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
there are certainly problems with this idea. Figuring out the cost to either create an index or just do table scan wouldbe pain but not impossible. The hit to index a table in memory i don't think would be that bad compare to do 100'sto thousands of loops with table scans. <br /><br /> I see more options for the query planner in config file to handlead-hoc indexes<br /><br /> Now to Toms points. The point of Ad Hoc index is they're only alive for the period timethe query is running and only live in memory. Once the query completes they die horrible deaths<br /><br /> Thesetemporay indexes will not do a table lock or row lock ever it only needs to view the record to create this index. Sothe same problems that affects 100 table scans would affect create temp index. I think this would help reduce concurrencesense the table is not being scanned thousands of times<br /><br /> The idea of creating one time use indexeshas been around for long time and has showed to be a benefit when dealing with large data sets where queries is onetime or rarely used and its to much labor and cost to figure out how to make it faster. This would also reduce IO diskactivity allot if the table can't fit in memory but the index would fit because it relativity small in comparison. <br /><br /> Tom Lane wrote: <blockquote cite="mid:19155.1203389537@sss.pgh.pa.us" type="cite"><pre wrap="">"StephenDenne" <a class="moz-txt-link-rfc2396E" href="mailto:Stephen.Denne@datamail.co.nz"><Stephen.Denne@datamail.co.nz></a>writes: </pre><blockquote type="cite"><prewrap="">The improvement wasn't to the part of the query that had the bad cost estimate, it was to the partthat was being performed hundreds of times instead of the one time the planner estimated. </pre></blockquote><pre wrap=""></pre><blockquote type="cite"><pre wrap="">The planner still thought it was only going to perform a sequential scanof your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider creatingindexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper. </pre></blockquote><prewrap=""> Right. And even more to the point, if it had gotten the estimate right and known that the subquery would have been repeated, that would have (ideally) prompted it to shift to a different plan structure. As Peter pointed out upthread, the existing hash join logic seems to be a pretty decent facsimile of an "ad hoc index" --- in fact, the hash table *is* a hash index for all intents and purposes. If you tilt your head at the right angle, a merge join could be seen as comparable to constructing a couple of ad-hoc btree indexes. Plus the join code is not burdened by any overhead that actual index code would be likely to have, such as niggling worries about crash-safety or concurrent access to the index. So in my mind the issue here is not why don't we have ad hoc indexes, it's why the planner didn't choose a more appropriate join method. It's clear that faulty rowcount estimation was one part of the answer. Now it may also be that there are some outright structural limitations involved due to the "join" arising from a sub-select --- I'm not sure that the planner *can* generate a bulk join plan from a query expressed like this one. But the TODO items that this suggests to me do not include implementing anything I'd call an ad-hoc index. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings </pre></blockquote>
pgsql-hackers by date: