Thread: Ad Hoc Indexes

Ad Hoc Indexes

From

Justin

Date:

18 February 2008, 15:47:22

Question:  We have recently moved PostgreSql as we just migrated to 
OpenMfg.  i really like the database and PL/pgSQL its very similar to 
how FoxPro language layout so learning has been pretty easy.

Now for my question It does not appear PostgreSQL does not have an Ad 
Hoc Indexes ability where the Query planner will create an in memory 
index based on the Select, Update, Insert or Delete commands.  I got 
pretty lazy with MsSQL and Foxpro doing ad hoc indexes to speed up 
complex queries where i should have created an index but never did.

Is there any plans in the future to add the ability for PostgreSQL to 
create Ad Hoc indexes if it makes sense.    Example i was creating a new 
report thats presently only processing around 120,000 records in the 7 
tables for the query. But the query took between 18 to 19 seconds to 
run.  I analyzed the query and added 2 indexes and drop the processing 
time to 93 milliseconds.
This query will be run 3 to 5 times a day so it makes sense to create 
indexes

But with reports that run maybe once a week or once a month it does not 
make sense to create indexes.  this is where I think Add Hoc indexes are 
nice to have because to my understanding maintaining indexes can get 
pretty expensive. 

Whats the likely hood of Ad Hoc Indexes being added to PostgreSql

thanks

Re: Ad Hoc Indexes

From

Peter Eisentraut

Date:

18 February 2008, 16:09:41

Justin wrote:
> Now for my question It does not appear PostgreSQL does not have an Ad
> Hoc Indexes ability where the Query planner will create an in memory
> index based on the Select, Update, Insert or Delete commands.

How is that supposed to work?  In order to create an index you would need to 
visit all the rows in the table.  If you do that, you could just as well 
answer the query off a sequential scan.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Ad Hoc Indexes

From

Hans-Juergen Schoenig

Date:

18 February 2008, 16:16:48

On Feb 18, 2008, at 9:09 PM, Peter Eisentraut wrote:

Justin wrote:
Now for my question It does not appear PostgreSQL does not have an Ad
Hoc Indexes ability where the Query planner will create an in memory
index based on the Select, Update, Insert or Delete commands.

How is that supposed to work? In order to create an index you would need to
visit all the rows in the table. If you do that, you could just as well
answer the query off a sequential scan.

this is not quite true.

this kind of indexing makes sense if you visit the same data over and over again. WITH-queries would be an example for that and self joins could benefit from the this feature too.

the question however is: why not create "normal indexes" straight away?

i am not sure if the benefit of ad-hoc indexes justify additional complexity in the code ...

best regards,

hans

Cybertec Schönig & Schönig GmbH

PostgreSQL Solutions and Support

Gröhrmühlgasse 26, 2700 Wiener Neustadt

Tel: +43/1/205 10 35 / 340

www.postgresql.at, www.cybertec.at

Re: Ad Hoc Indexes

From

Tom Lane

Date:

18 February 2008, 16:18:34

Justin <justin@emproshunts.com> writes:
> Is there any plans in the future to add the ability for PostgreSQL to 
> create Ad Hoc indexes if it makes sense.

No, I'm not aware of anyone contemplating such a thing.  I can hardly
imagine a situation where building an index for a single query is
actually a win.  Maybe those DBMSes you mention were using this as a
substitute for having decent join mechanisms, or something?
        regards, tom lane

Re: Ad Hoc Indexes

From

Justin

Date:

18 February 2008, 17:12:05

The idea of ad hoc indexes is speed up loop scans To prove my idea i created a sql file in PGAdmin that creates the indexes on the fly then runs the query then drops the indexs.

without the indexes it takes 18 to 19 seconds to run the query.

To create the index and do the query takes 400 milliseconds. I wish i could do an Explain on it with create index in the query but i can't it errors out. So i reran the query with indexes already in place it drops the query time 191 milliseconds.

Create and deleting the indexes on the fly improves performance almost 50 times. I think creating Ad Hoc indexes on the fly in memory makes sense . I imagine it would be even faster if the index stayed in memory

Tom Lane wrote:

Justin <justin@emproshunts.com> writes:

Is there any plans in the future to add the ability for PostgreSQL to 
create Ad Hoc indexes if it makes sense.

No, I'm not aware of anyone contemplating such a thing.  I can hardly
imagine a situation where building an index for a single query is
actually a win.  Maybe those DBMSes you mention were using this as a
substitute for having decent join mechanisms, or something?
		regards, tom lane

Attachment

proof.sql

Re: Ad Hoc Indexes

From

Tom Lane

Date:

18 February 2008, 17:28:34

Justin <justin@emproshunts.com> writes:
> The idea of ad hoc indexes is speed up loop scans  To prove my idea i 
> created a sql file in PGAdmin that creates the indexes on the fly then 
> runs the query then drops the indexs.

>  without the indexes it takes 18 to 19 seconds to run the query.

> To create the index and do the query takes 400 milliseconds.

The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off.  I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.
        regards, tom lane

Re: Ad Hoc Indexes

From

Peter Eisentraut

Date:

18 February 2008, 17:45:46

Hans-Juergen Schoenig wrote:
> this kind of indexing makes sense if you visit the same data over and  
> over again. WITH-queries would be an example for that and self joins  
> could benefit from the this feature too.

Yes, for joins it is useful.  We have hash joins, which are kind of ad hoc 
hash indexes.  If anyone wants to implement a btree join, that might work. :)  
But I think we essentially support what the OP is asking for, it is just 
under different labels.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Ad Hoc Indexes

From

Justin

Date:

18 February 2008, 18:12:04

Then why are the estimates so far off???  If estimates where correct would it improve the performance that much.  <br
/><br/> Vaccum is set to run automatically so the stats stay update. <br /><br /> Total record count for the tables for
allthe tables put together is around 120,000 the query returns only 458 records which is correct.  <br /><br /> If i am
correctin my understanding the reason the index improved the query so much is the wooper table gets hit hard because it
appearsin 3 separate nested queries . So taking only  458 records returned from the parent query times 3 for 1,375
tablescans going through 21,873 records for a total number records being processed to 30,075,375  on a table with no
index. So if you look at it that way PostgreSql did remarkably well processing the query in 18 to 20 seconds.    <br
/><br/> The idea behind adhoc indexes is when one shot queries or really used queries are created that would require
numerousindexes to run in a decent time can be run in a faction of the time.  This also saves processing times across
theentire system where creating indexes for the all the possible queries is impractical <br /><br /> This does not take
awaythe need for index but speed up  ad-hoc queries created from a website or other business analysis tool that someone
mightcreate <br /><br /> Tom Lane wrote: <blockquote cite="mid:1552.1203370105@sss.pgh.pa.us" type="cite"><pre
wrap="">Justin<a class="moz-txt-link-rfc2396E" href="mailto:justin@emproshunts.com"><justin@emproshunts.com></a>
writes:</pre><blockquote type="cite"><pre wrap="">The idea of ad hoc indexes is speed up loop scans  To prove my idea i

created a sql file in PGAdmin that creates the indexes on the fly then 
runs the query then drops the indexs.   </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">
withoutthe indexes it takes 18 to 19 seconds to run the query.   </pre></blockquote><pre wrap=""> </pre><blockquote
type="cite"><prewrap="">To create the index and do the query takes 400 milliseconds.   </pre></blockquote><pre
wrap="">
The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off.  I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              <a class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/faq">http://www.postgresql.org/docs/faq</a></pre></blockquote>

Re: Ad Hoc Indexes

From

Justin

Date:

18 February 2008, 18:17:28

oops dam spell checker really should be rarely sorry <br /><br /> Justin wrote: <blockquote
cite="mid:47BA01D7.9010001@emproshunts.com"type="cite"></blockquote> Then why are the estimates so far off???  If
estimateswhere correct would it improve the performance that much.  <br /><br /> Vaccum is set to run automatically so
thestats stay update. <br /><br /> Total record count for the tables for all the tables put together is around 120,000
thequery returns only 458 records which is correct.  <br /><br /> If i am correct in my understanding the reason the
indeximproved the query so much is the wooper table gets hit hard because it appears in 3 separate nested queries . So
takingonly  458 records returned from the parent query times 3 for 1,375 table scans going through 21,873 records for a
totalnumber records being processed to 30,075,375  on a table with no index.  So if you look at it that way PostgreSql
didremarkably well processing the query in 18 to 20 seconds.    <br /><br /> The idea behind adhoc indexes is when one
shotqueries or rarely used queries are created that would require numerous indexes to run in a decent time can be run
ina faction of the time.  This also saves processing times across the entire system where creating indexes for the all
thepossible queries is impractical <br /><br /> This does not take away the need for index but speed up  ad-hoc queries
createdfrom a website or other business analysis tool that someone might create <br /><br /> Tom Lane wrote:
<blockquotecite="mid:1552.1203370105@sss.pgh.pa.us" type="cite"><pre wrap="">Justin <a class="moz-txt-link-rfc2396E"
href="mailto:justin@emproshunts.com"moz-do-not-send="true"><justin@emproshunts.com></a> writes: </pre><blockquote
type="cite"><prewrap="">The idea of ad hoc indexes is speed up loop scans  To prove my idea i 
 
created a sql file in PGAdmin that creates the indexes on the fly then 
runs the query then drops the indexs.   </pre></blockquote><pre wrap=""> </pre><blockquote type="cite"><pre wrap="">
withoutthe indexes it takes 18 to 19 seconds to run the query.   </pre></blockquote><pre wrap=""> </pre><blockquote
type="cite"><prewrap="">To create the index and do the query takes 400 milliseconds.   </pre></blockquote><pre
wrap="">
The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off.  I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              <a class="moz-txt-link-freetext" href="http://www.postgresql.org/docs/faq"
moz-do-not-send="true">http://www.postgresql.org/docs/faq</a></pre></blockquote>

Re: Ad Hoc Indexes

From

"Stephen Denne"

Date:

18 February 2008, 21:01:22

Justin wrote:
> Then why are the estimates so far off?

Perhaps a really high correlation between those records where coitem_status = 'O' and those records in your date range.
Theplanner estimates 1 row when restricting by both restrictions, whereas there where 458. The 1 row was probably a
resultof a small % for status='O' multiplied by a small % for the date range.

> If estimates where correct would it improve the performance that much.

Possibly, but a better performance gain might be obtained by rewriting the query, changing the case expression to
somethingalong the lines of:

coalesce((select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text) from wooper inner join
wrkcnton wrkcnt_id = wooper_wrkcnt_idwhere wooper_rncomplete = true and wooper_wo_id = coitem_order_idorder by
wooper_seqnumberdesc limit 1
),'No Operation Completed') as LastFinshedOp

regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege. If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: Ad Hoc Indexes

From

Justin

Date:

18 February 2008, 21:34:21

yes that improved the select statement allot from 18 to 20 to 6 seconds 3 fold improvement with no indexes. thanks

Now i tested your new and improved select statement with the indexes its query time was taking from 90 to 140 milliseconds compared to the original select statement of 94 to 120 milliseconds. I tested both select statements 5 times to get an average and they both hover around 110 to 120

I attached the results from the explain that used the indexes and one with no index

although in both case with and with out indexes the estimate still failed to return the correct number by allot.

Stephen Denne wrote:

Justin wrote:

Then why are the estimates so far off?

Perhaps a really high correlation between those records where coitem_status = 'O' and those records in your date range. The planner estimates 1 row when restricting by both restrictions, whereas there where 458. The 1 row was probably a result of a small % for status='O' multiplied by a small % for the date range.

If estimates where correct would it improve the performance that much.

Possibly, but a better performance gain might be obtained by rewriting the query, changing the case expression to something along the lines of:

coalesce((select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text) from wooper inner join wrkcnt on wrkcnt_id = wooper_wrkcnt_idwhere wooper_rncomplete = true and wooper_wo_id = coitem_order_idorder by wooper_seqnumber desc limit 1
),'No Operation Completed') as LastFinshedOp

regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________ This email has been scanned by the DMZGlobal Business Quality              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              http://www.postgresql.org/docs/faq

Re: Ad Hoc Indexes

From

"Stephen Denne"

Date:

18 February 2008, 22:27:40

Justin wrote:
> although in both case with and with out indexes the estimate still failed to return the correct number by allot.

The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being
performedhundreds of times instead of the one time the planner estimated. 

The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there
hadbeen any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate,
itwouldn't have bothered creating any indexes on wooper. 

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality              Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

Re: Ad Hoc Indexes

From

Tom Lane

Date:

18 February 2008, 22:55:03

"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes:
> The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being
performedhundreds of times instead of the one time the planner estimated.

> The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there
hadbeen any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate,
itwouldn't have bothered creating any indexes on wooper.

Right. And even more to the point, if it had gotten the estimate right
and known that the subquery would have been repeated, that would have
(ideally) prompted it to shift to a different plan structure.

As Peter pointed out upthread, the existing hash join logic seems to be
a pretty decent facsimile of an "ad hoc index" --- in fact, the hash
table *is* a hash index for all intents and purposes. If you tilt your
head at the right angle, a merge join could be seen as comparable to
constructing a couple of ad-hoc btree indexes. Plus the join code is
not burdened by any overhead that actual index code would be likely to
have, such as niggling worries about crash-safety or concurrent access
to the index.

So in my mind the issue here is not why don't we have ad hoc indexes,
it's why the planner didn't choose a more appropriate join method.
It's clear that faulty rowcount estimation was one part of the answer.
Now it may also be that there are some outright structural limitations
involved due to the "join" arising from a sub-select --- I'm not sure
that the planner *can* generate a bulk join plan from a query expressed
like this one. But the TODO items that this suggests to me do not
include implementing anything I'd call an ad-hoc index.
regards, tom lane

Re: Ad Hoc Indexes

From

Justin

Date:

18 February 2008, 23:59:52

there are certainly problems with this idea.  Figuring out the cost to either create an index or just do table scan
wouldbe pain but not impossible.   The hit to index a table in memory i don't think would be that bad compare to do
100'sto thousands of loops with table scans. <br /><br /> I see more options for the query planner in config file to
handlead-hoc indexes<br /><br /> Now to Toms points.  The point of Ad Hoc index is they're only alive for the period
timethe  query is running and only live in memory.   Once the query completes they die horrible deaths<br /><br />
Thesetemporay indexes will not do a table lock or row lock ever it only needs to view the record to create this index. 
Sothe same problems that affects 100 table scans would affect create temp index. I think this would help reduce
concurrencesense the table is not being scanned thousands of times<br /><br /> The idea of creating one time use
indexeshas been around for long time and has showed to be a benefit when dealing with  large data sets where queries is
onetime or rarely used and its to much labor and cost to figure out how to make it faster.  This would also reduce IO
diskactivity allot if the table can't fit in memory but the index would fit because it relativity small in
comparison.  <br /><br /> Tom Lane wrote: <blockquote cite="mid:19155.1203389537@sss.pgh.pa.us" type="cite"><pre
wrap="">"StephenDenne" <a class="moz-txt-link-rfc2396E"
href="mailto:Stephen.Denne@datamail.co.nz"><Stephen.Denne@datamail.co.nz></a>writes: </pre><blockquote
type="cite"><prewrap="">The improvement wasn't to the part of the query that had the bad cost estimate, it was to the
partthat was being performed hundreds of times instead of the one time the planner estimated.   </pre></blockquote><pre
wrap=""></pre><blockquote type="cite"><pre wrap="">The planner still thought it was only going to perform a sequential
scanof your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider
creatingindexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper.
</pre></blockquote><prewrap="">
 
Right.  And even more to the point, if it had gotten the estimate right
and known that the subquery would have been repeated, that would have
(ideally) prompted it to shift to a different plan structure.

As Peter pointed out upthread, the existing hash join logic seems to be
a pretty decent facsimile of an "ad hoc index" --- in fact, the hash
table *is* a hash index for all intents and purposes.  If you tilt your
head at the right angle, a merge join could be seen as comparable to
constructing a couple of ad-hoc btree indexes.  Plus the join code is
not burdened by any overhead that actual index code would be likely to
have, such as niggling worries about crash-safety or concurrent access
to the index.

So in my mind the issue here is not why don't we have ad hoc indexes,
it's why the planner didn't choose a more appropriate join method.
It's clear that faulty rowcount estimation was one part of the answer.
Now it may also be that there are some outright structural limitations
involved due to the "join" arising from a sub-select --- I'm not sure
that the planner *can* generate a bulk join plan from a query expressed
like this one.  But the TODO items that this suggests to me do not
include implementing anything I'd call an ad-hoc index.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings </pre></blockquote>