Thread: sql indexing suggestions needed

sql indexing suggestions needed

From

Jonathan Vanasco

Date:

20 March 2007, 15:54:51

i'm going crazy trying to optimize this select.

The table has ~25 columns, the select is based on 10.   There are
approx 5 million records in the table and growing.

No matter how I index + analyze this table, including making an index
of every related column on the search, pg keeps doing a sequential
scan and never includes an index -- which takes ~2minutes to do.  I
really need to cut this down.

    SELECT
        *
    FROM
        table_a
    WHERE
        ( bool_a = False )
        AND
        ( bool_b= False )
        AND
        ( int_c IS NOT NULL )
        AND
        ( int_c <= 10 )
        AND
        ( bool_d = False )
        AND
        ( bool_e= True )
        AND
        ( timestamp_f IS NULL )
        AND
        ( bool_g IS False )
        AND
        ( int_h= 1 )
        AND
        ( bool_i = False )
    ORDER BY
        id ASC
    LIMIT 100

can anyone suggest an indexing approach that might get pg to use the
indexes ?  this is driving me crazy.

thanks.

Re: sql indexing suggestions needed

From

Benjamin Arai

Date:

20 March 2007, 16:10:25

I wonder if converting all of your bools to a bit string and storing
that string as a number would make things faster?

Benjamin

On Mar 20, 2007, at 11:54 AM, Jonathan Vanasco wrote:

> i'm going crazy trying to optimize this select.
>
> The table has ~25 columns, the select is based on 10.   There are
> approx 5 million records in the table and growing.
>
> No matter how I index + analyze this table, including making an
> index of every related column on the search, pg keeps doing a
> sequential scan and never includes an index -- which takes
> ~2minutes to do.  I really need to cut this down.
>
>     SELECT
>         *
>     FROM
>         table_a
>     WHERE
>         ( bool_a = False )
>         AND
>         ( bool_b= False )
>         AND
>         ( int_c IS NOT NULL )
>         AND
>         ( int_c <= 10 )
>         AND
>         ( bool_d = False )
>         AND
>         ( bool_e= True )
>         AND
>         ( timestamp_f IS NULL )
>         AND
>         ( bool_g IS False )
>         AND
>         ( int_h= 1 )
>         AND
>         ( bool_i = False )
>     ORDER BY
>         id ASC
>     LIMIT 100
>
> can anyone suggest an indexing approach that might get pg to use
> the indexes ?  this is driving me crazy.
>
> thanks.
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org/
>

Re: sql indexing suggestions needed

From

Jonathan Vanasco

Date:

20 March 2007, 16:12:26

On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote:

> No matter how I index + analyze this table, including making an
> index of every related column on the search, pg keeps doing a
> sequential scan and never includes an index -- which takes
> ~2minutes to do.  I really need to cut this down.

a friend just chimed in off-list -- i had missed putting the 'id'
column in the large index of all the used columns , a common
mistake.  works like a charm now.

Re: sql indexing suggestions needed

From

Ron Johnson

Date:

20 March 2007, 16:24:50

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/20/07 13:54, Jonathan Vanasco wrote:
> i'm going crazy trying to optimize this select.
>
> The table has ~25 columns, the select is based on 10.   There are approx
> 5 million records in the table and growing.
>
> No matter how I index + analyze this table, including making an index of
> every related column on the search, pg keeps doing a sequential scan and
> never includes an index -- which takes ~2minutes to do.  I really need
> to cut this down.
>
>     SELECT
>         *
>     FROM
>         table_a
>     WHERE
>         ( bool_a = False )
>         AND
>         ( bool_b= False )
>         AND
>         ( int_c IS NOT NULL )

If it's less than 10, it can't be NULL.  No need for this predicate.

>         AND
>         ( int_c <= 10 )
>         AND
>         ( bool_d = False )
>         AND
>         ( bool_e= True )
>         AND
>         ( timestamp_f IS NULL )
>         AND
>         ( bool_g IS False )
>         AND
>         ( int_h= 1 )
>         AND
>         ( bool_i = False )
>     ORDER BY
>         id ASC
>     LIMIT 100
>
> can anyone suggest an indexing approach that might get pg to use the
> indexes ?  this is driving me crazy.

The problem is that the bool columns only have 2 values, and so it's
 more efficient to scan the whole table than to use indexes.

How many *distinct* values are there in int_c?  What percentage of
them match "int_c <= 10"?

Same questions, but for int_h.

If int_h is relatively unique, then this index might help:
    (INT_H, INT_C)

If that helps, try
    (INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I)

HTH.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGADT0S9HxQb37XmcRAubuAJ0do/zu0vkaw5XzVQyPeJnFB2cJtwCeMCna
cH3p6UGwqes8ZbAc5QfE1ok=
=pPl0
-----END PGP SIGNATURE-----

Re: sql indexing suggestions needed

From

Ron Johnson

Date:

20 March 2007, 16:26:58

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/20/07 14:11, Jonathan Vanasco wrote:
>
> On Mar 20, 2007, at 2:54 PM, Jonathan Vanasco wrote:
>
>> No matter how I index + analyze this table, including making an index
>> of every related column on the search, pg keeps doing a sequential
>> scan and never includes an index -- which takes ~2minutes to do.  I
>> really need to cut this down.
>
> a friend just chimed in off-list -- i had missed putting the 'id' column
> in the large index of all the used columns , a common mistake.  works
> like a charm now.

If the WHERE clause filters out most of the records, having ID in
the index shouldn't be needed.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGADVyS9HxQb37XmcRAjJZAKCX1VmuSVdS9VfK6mXrYNl7mVATjQCgpWef
MVFm2rH8+paCqnzyw9NyKZk=
=kbvB
-----END PGP SIGNATURE-----

Re: sql indexing suggestions needed

From

Jonathan Vanasco

Date:

20 March 2007, 23:01:39

On Mar 20, 2007, at 3:24 PM, Ron Johnson wrote:
> How many *distinct* values are there in int_c?  What percentage of
> them match "int_c <= 10"?

right now there are 14, and 80% match -- but next week that number
will be 20 , 60% ... and so on

> Same questions, but for int_h.
same answer

> If int_h is relatively unique, then this index might help:
>     (INT_H, INT_C)

If that helps, try
>     (INT_H, INT_C, BOOL_A, BOOL_B, BOOL_D, BOOL_E, BOOL_G, BOOL_I)

i'll try both, thanks.

the current 'super index' works - i'm more worried for speed right
now than disk space , so its worth physical overhead.

Re: sql indexing suggestions needed

From

"Angva"

Date:

21 March 2007, 13:30:52

On Mar 20, 2:54 pm, postg...@2xlp.com (Jonathan Vanasco) wrote:
> can anyone suggest an indexing approach that might get pg to use the
> indexes ?  this is driving me crazy.

Have you tried an expression-based index?
http://www.postgresql.org/docs/8.1/interactive/indexes-expressional.html

I'm assuming the right side of each equals sign is constant for you.
Basically, you could create an expression-based index on all these
conditions, which would evaluate to true or false for every row in
your table.

Caveat: If you have heavy updates/inserts this might be a bad idea.
Also, the planner may still decide to seq scan, based on your
configuration and resultset size.

Mark