> > the query above selects 294072 rows, which i obviously don't want to do,
but
> > I do want to use the clause above in a query that involves a join. eg
>
> Ah, I begin to understand. With an index scan you're going to get
> 294072 probes into the table (maybe even more, if there are deleted rows
> that match the category_key). If the rows are scattered all over the
> disk then that may actually take about 300k disk reads. After you
> cluster the table, the rows with the same category_key are all
> contiguous in the table, so many fewer blocks have to be read to visit
> them all. That's why clustering helps here.
Indeed, and this lead me to see if there was a way of getting the ids
without hitting the books table, but it doesn't look like there is :(
>
> Since you're selecting about 1/4th of the table, this particular query
> would probably be better off *not* using the index, but just doing a
> sequential scan of the whole table :-(. I assume most of your
> categories are more selective than this one, though, so dropping the
> category index entirely is probably not the answer.
>
> > select id from books, book_words where book_words.word='happy' and
> > book_words.id = books.id and books.category_key=1471;
>
> If this is what you're really doing, I think what you actually want is
> indexes on book_words.word and books.id.
I have indexes on both of these.
>That would allow book_words
> to be searched on the word (hopefully giving a more selective result
> than the category does), and then books would be probed using the id
index.
> id has unique values, right?
yep, but how can i use this subset to then select for category? perhaps at
this stage, I should start to do things in the application code?
>
> > perhaps (in the above query) there is a way of directing postgres to
only
> > access the books that are selected by the 'words' part of the query?
>
> You might want to look at contrib/fulltextindex in the distribution for
> ideas about indexing words. fulltextindex might be overkill for your
> needs, or maybe not, but you could probably adapt it for your purposes.
we have adapted this already (taking out the reg expression stuff so that it
is a bit quicker)
thanks for your help
timj
>
> regards, tom lane