Thread: Improving performance on multicolumn query

Improving performance on multicolumn query

From

Jan Kesten

Date:

09 November 2005, 08:08:45

Hi, all!

I've been using postgresql for a long time now, but today I had some
problem I couldn't solve properly - hope here some more experienced
users have some hint's for me.

First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM
and I have a table with about 220 columns and 20000 rows - and the first
five columns build a primary key (and a unique index).

Now my problem: I need really many queries of rows using it's primary
key and fetching about five different columns but these are quite slow
(about 10 queries per second and as I have some other databases which
can have about 300 queries per second I think this is slow):

transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;

 Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
(actual time=0.161..0.167 rows=1 loops=1)
   Index Cond: (test_a = 9091150001::bigint)
   Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))

So, what to do to speed things up? If I understand correctly this
output, the planner uses my index (test_idx is the same as test_pkey
created along with the table), but only for the first column.

Accidently I can't refactor these tables as they were given to me.

Thanks for any hint!
Jan

Attachment

signature.asc

Re: Improving performance on multicolumn query

From

Richard Huxton

Date:

09 November 2005, 08:55:35

Jan Kesten wrote:
>
> First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM
> and I have a table with about 220 columns and 20000 rows - and the first
> five columns build a primary key (and a unique index).

> transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
> AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;
>
>  Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
> (actual time=0.161..0.167 rows=1 loops=1)
>    Index Cond: (test_a = 9091150001::bigint)
>    Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))

This says it's taking less than a millisecond - which is almost
certainly too fast to measure accurately anyway. Are you sure this query
is the problem?

> So, what to do to speed things up? If I understand correctly this
> output, the planner uses my index (test_idx is the same as test_pkey
> created along with the table), but only for the first column.

1. Are all of test_a/b/c/d/e bigint rather than int?
2. Have you tried explicitly casting your query parameters?
...WHERE test_a=123::bigint AND test_b=456::bigint...

--
   Richard Huxton
   Archonet Ltd

Re: Improving performance on multicolumn query

From

"Steinar H. Gunderson"

Date:

09 November 2005, 09:00:19

On Wed, Nov 09, 2005 at 01:08:07PM +0100, Jan Kesten wrote:
> First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM
> and I have a table with about 220 columns and 20000 rows - and the first
> five columns build a primary key (and a unique index).

I forgot this, but it should be mentioned: A primary key works as an
unique index, so you don't need both.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Improving performance on multicolumn query

From

Christopher Kings-Lynne

Date:

09 November 2005, 09:19:39

> transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
> AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;
>
>  Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
> (actual time=0.161..0.167 rows=1 loops=1)
>    Index Cond: (test_a = 9091150001::bigint)
>    Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))
>
> So, what to do to speed things up? If I understand correctly this
> output, the planner uses my index (test_idx is the same as test_pkey
> created along with the table), but only for the first column.

Hi Jan,

If you're using 7.4.x then the planner can't use the index for unquoted
bigints.  Try this:

SELECT * FROM test WHERE test_a='9091150001' AND test_b='1' AND
test_c=''2 AND test_d='0' AND test_e='0';

Chris

Re: Improving performance on multicolumn query

From

"Steinar H. Gunderson"

Date:

09 November 2005, 11:51:48

On Wed, Nov 09, 2005 at 01:08:07PM +0100, Jan Kesten wrote:
> Now my problem: I need really many queries of rows using it's primary
> key and fetching about five different columns but these are quite slow
> (about 10 queries per second and as I have some other databases which
> can have about 300 queries per second I think this is slow):
>
> transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
> AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;
>
>  Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
> (actual time=0.161..0.167 rows=1 loops=1)
>    Index Cond: (test_a = 9091150001::bigint)
>    Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))

You don't post your table definitions (please do), but it looks like test_b,
test_c, test_d and test_e might be bigints? If so, you may want to do
explicit "AND test_b=1::bigint AND test_c=2::bigint" etc. -- 7.4 doesn't
figure this out for you. (8.0 and higher does.)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Improving performance on multicolumn query

From

Jan Kesten

Date:

09 November 2005, 13:31:45

Hi all!

First thanks to any answer by now :-)

> You don't post your table definitions (please do), but it looks like
> test_b, test_c, test_d and test_e might be bigints? If so, you may
> want to do explicit "AND test_b=1::bigint AND test_c=2::bigint" etc.
> -- 7.4 doesn't figure this out for you. (8.0 and higher does.)

I didn't post table defintion, but you all are right, test_a to test_e
are all bigint. I use JDBC to connect to this database and use a
prepared statment for the queries and set all parameters with
pst.setLong() method. Perhaps this could be the problem? I'll try
'normal' statements with typecasting, because as far as I can see, the
query is the problem (postgresql takes more than 98% cpu while running
these statements) or the overhead produced (but not the network, as it
has only 1-2% load). Quering other tables (not as big - both rows and
columns are much less)  run quite fast with the same code.

So, thanks again - I'll try and report :-) Can't be so slow, I have some
self-build database with millions of rows and they run very fast - but
they don't use bigint ;-)

Cheers,
Jan