Re: Ad Hoc Indexes - Mailing list pgsql-hackers

From Justin
Subject Re: Ad Hoc Indexes
Date
Msg-id 47BA535D.7010601@emproshunts.com
Whole thread Raw
In response to Re: Ad Hoc Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
there are certainly problems with this idea.  Figuring out the cost to either create an index or just do table scan
wouldbe pain but not impossible.   The hit to index a table in memory i don't think would be that bad compare to do
100'sto thousands of loops with table scans. <br /><br /> I see more options for the query planner in config file to
handlead-hoc indexes<br /><br /> Now to Toms points.  The point of Ad Hoc index is they're only alive for the period
timethe  query is running and only live in memory.   Once the query completes they die horrible deaths<br /><br />
Thesetemporay indexes will not do a table lock or row lock ever it only needs to view the record to create this index. 
Sothe same problems that affects 100 table scans would affect create temp index. I think this would help reduce
concurrencesense the table is not being scanned thousands of times<br /><br /> The idea of creating one time use
indexeshas been around for long time and has showed to be a benefit when dealing with  large data sets where queries is
onetime or rarely used and its to much labor and cost to figure out how to make it faster.  This would also reduce IO
diskactivity allot if the table can't fit in memory but the index would fit because it relativity small in
comparison.  <br /><br /> Tom Lane wrote: <blockquote cite="mid:19155.1203389537@sss.pgh.pa.us" type="cite"><pre
wrap="">"StephenDenne" <a class="moz-txt-link-rfc2396E"
href="mailto:Stephen.Denne@datamail.co.nz"><Stephen.Denne@datamail.co.nz></a>writes: </pre><blockquote
type="cite"><prewrap="">The improvement wasn't to the part of the query that had the bad cost estimate, it was to the
partthat was being performed hundreds of times instead of the one time the planner estimated.   </pre></blockquote><pre
wrap=""></pre><blockquote type="cite"><pre wrap="">The planner still thought it was only going to perform a sequential
scanof your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider
creatingindexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper.
</pre></blockquote><prewrap="">
 
Right.  And even more to the point, if it had gotten the estimate right
and known that the subquery would have been repeated, that would have
(ideally) prompted it to shift to a different plan structure.

As Peter pointed out upthread, the existing hash join logic seems to be
a pretty decent facsimile of an "ad hoc index" --- in fact, the hash
table *is* a hash index for all intents and purposes.  If you tilt your
head at the right angle, a merge join could be seen as comparable to
constructing a couple of ad-hoc btree indexes.  Plus the join code is
not burdened by any overhead that actual index code would be likely to
have, such as niggling worries about crash-safety or concurrent access
to the index.

So in my mind the issue here is not why don't we have ad hoc indexes,
it's why the planner didn't choose a more appropriate join method.
It's clear that faulty rowcount estimation was one part of the answer.
Now it may also be that there are some outright structural limitations
involved due to the "join" arising from a sub-select --- I'm not sure
that the planner *can* generate a bulk join plan from a query expressed
like this one.  But the TODO items that this suggests to me do not
include implementing anything I'd call an ad-hoc index.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings </pre></blockquote>

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Ad Hoc Indexes
Next
From: James Mansion
Date:
Subject: Re: wishlist for 8.4