Thread: Speed of joins using sparse indexes

Speed of joins using sparse indexes

From
Roberto Moreda
Date:
This mail tries to explain the solution that I've found to address the
problem of the joins that uses tables with very sparse indexes.

The exact problem was :

How can I manage the problem of select a few rows with a boolean atribute
when they are 5 rows with flag='Y' in a table of 100000 rows?
I't must to be an index, but
the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
but the interesting query is the other : to extract the 5 rows with
flag='Y' from whitin the 100000 rows with the flag='N'.

A possible solution to optimize this kind of query is to create an auxiliar
table with the id's of the 5 rows with flag='Y', maintained by rules watching
the attribute flag in the target table. In this manner, I never do a
update/insert in the flag table and I replace the "flag='Y'" in the query in
favour of "TABLE.id=FLAG_TABLE.id" (another join).

It's a kind of tell to Postgres  "Hey, I'm very interested in the rows with
flag='Y'" ... :) and the results in speed-up are amazing.

I think is better than "fake" a dense index to change the behaviour of the
optimizer.

Any suggestions?

    Roberto.

... sorry for my English ;)

--
Roberto Moreda
Resp. Dpto. Informática Handem/San Luis
Tlf +34 981 779000
Fax +34 981 779022
Pol. Piadela Sur, Autovía A6 Sal.567
15300 Betanzos (A Coruña) - España

Re: [GENERAL] Speed of joins using sparse indexes

From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> This mail tries to explain the solution that I've found to address the
> problem of the joins that uses tables with very sparse indexes.
>
> The exact problem was :
>
> How can I manage the problem of select a few rows with a boolean atribute
> when they are 5 rows with flag='Y' in a table of 100000 rows?
> I't must to be an index, but
> the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
> if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
> but the interesting query is the other : to extract the 5 rows with
> flag='Y' from whitin the 100000 rows with the flag='N'.
>
> A possible solution to optimize this kind of query is to create an auxiliar
> table with the id's of the 5 rows with flag='Y', maintained by rules watching
> the attribute flag in the target table. In this manner, I never do a
> update/insert in the flag table and I replace the "flag='Y'" in the query in
> favour of "TABLE.id=FLAG_TABLE.id" (another join).
>
> It's a kind of tell to Postgres  "Hey, I'm very interested in the rows with
> flag='Y'" ... :) and the results in speed-up are amazing.

This sounds like a very good solution.  Rules can help.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Speed of joins using sparse indexes

From
Rex McMaster
Date:
have you tried a composite index with another, more distributed
attribute (like a timestamp)? - with boolean as first component.

  create index ix on tablename(boolean-attribute,timestamp-attribute);

------------------------------------------------------------
Rex McMaster                            rmcm@compsoft.com.au
                                   rex@mcmaster.wattle.id.au
     PGP Public key: http://www.compsoft.com.au/~rmcm/pgp-pk

Roberto Moreda writes:
 > This mail tries to explain the solution that I've found to address the
 > problem of the joins that uses tables with very sparse indexes.
 >
 > The exact problem was :
 >
 > How can I manage the problem of select a few rows with a boolean atribute
 > when they are 5 rows with flag='Y' in a table of 100000 rows?
 > I't must to be an index, but
 > the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know :
 > if I ask for the 100000 rows with flag='N' then Seq Scan is the solution,
 > but the interesting query is the other : to extract the 5 rows with
 > flag='Y' from whitin the 100000 rows with the flag='N'.
 >
 > A possible solution to optimize this kind of query is to create an auxiliar
 > table with the id's of the 5 rows with flag='Y', maintained by rules watching
 > the attribute flag in the target table. In this manner, I never do a
 > update/insert in the flag table and I replace the "flag='Y'" in the query in
 > favour of "TABLE.id=FLAG_TABLE.id" (another join).
 >
 > It's a kind of tell to Postgres  "Hey, I'm very interested in the rows with
 > flag='Y'" ... :) and the results in speed-up are amazing.
 >
 > I think is better than "fake" a dense index to change the behaviour of the
 > optimizer.
 >
 > Any suggestions?
 >
 >     Roberto.
 >
 > ... sorry for my English ;)
 >
 > --
 > Roberto Moreda
 > Resp. Dpto. Informática Handem/San Luis
 > Tlf +34 981 779000
 > Fax +34 981 779022
 > Pol. Piadela Sur, Autovía A6 Sal.567
 > 15300 Betanzos (A Coruña) - España

--