Thread: optimizing Postgres queries

optimizing Postgres queries

From
David Teran
Date:
Hi,

we are new to Postgres and we are evaluating Postgres 7.4 on MacOS X as
an alternative to FrontBase 3.6.27.

 From the available features Postgres is the choice #1.

We have some tests to check the performance and FrontBase is about 10
times faster than Postgres. We already played around with explain
analyse select. It seems that for large tables Postgres does not use an
index. We often see the scan message in the query plan. Were can we
find more hints about tuning the performance? The database is about 350
MB large, without BLOB's. We tried to define every important index for
the selects but it seems that something still goes wrong: FrontBase
needs about 23 seconds for about 4300 selects and Postgres needs 4
minutes, 34 seconds.

Any clues?

regards David


Re: optimizing Postgres queries

From
Shridhar Daithankar
Date:
On Monday 05 January 2004 16:58, David Teran wrote:
> We have some tests to check the performance and FrontBase is about 10
> times faster than Postgres. We already played around with explain
> analyse select. It seems that for large tables Postgres does not use an
> index. We often see the scan message in the query plan. Were can we
> find more hints about tuning the performance? The database is about 350
> MB large, without BLOB's. We tried to define every important index for
> the selects but it seems that something still goes wrong: FrontBase
> needs about 23 seconds for about 4300 selects and Postgres needs 4
> minutes, 34 seconds.

Check
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Are you sure you are using correct data types on indexes?

e.g. if field1 is an int2 field, then following query would not use an index.

select * from table where field1=2;

However following will

select * from table where field1=2::int2;

It is called as typecasting and postgresql is rather strict about it when it
comes to making a decision of index usage.

I am sure above two tips could take care of some of the problems.

Such kind of query needs more specific information. Can you post explain
analyze output for queries and database schema.

 HTH

 Shridhar


Re: optimizing Postgres queries

From
David Teran
Date:
Hi Shridhar,

> Are you sure you are using correct data types on indexes?
>
Did not know about this...

> e.g. if field1 is an int2 field, then following query would not use an
> index.
>
our fk have the type bigint, when i try one simple select like this:

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
t0.ID_FOREIGN_TABLE = 21110;

i see that no index is being used whereas when i use

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
t0.ID_FOREIGN_TABLE = 21110::bigint;

an index is used. Very fine, the performance is about 10 to 100 times
faster for the single select.

I am using WebObjects with JDBC. I will now create a DB with integer
instead of bigint and see how this performs.

regards David


Re: optimizing Postgres queries

From
Shridhar Daithankar
Date:
On Monday 05 January 2004 17:35, David Teran wrote:
> explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
> t0.ID_FOREIGN_TABLE = 21110;
>
> i see that no index is being used whereas when i use
>
> explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
> t0.ID_FOREIGN_TABLE = 21110::bigint;
>
> an index is used. Very fine, the performance is about 10 to 100 times
> faster for the single select.
>
> I am using WebObjects with JDBC. I will now create a DB with integer
> instead of bigint and see how this performs.

The performance will likely to be the same. Its just that integer happens to
be default integer type and hence it does not need an explicit typecast. ( I
don't remember exactly which integer is default but it is either of int2,int4
and int8...:-))

The performance diffference is likely due to use of index, which is in turn
due to typecasting. If you need bigint, you should use them. Just remember to
typecast whenever required.

 Shridhar


Re: optimizing Postgres queries

From
David Teran
Date:
Hi,

> The performance will likely to be the same. Its just that integer
> happens to
> be default integer type and hence it does not need an explicit
> typecast. ( I
> don't remember exactly which integer is default but it is either of
> int2,int4
> and int8...:-))
>
The docs say int4 is much faster than int8, but i will check this.

> The performance diffference is likely due to use of index, which is in
> turn
> due to typecasting. If you need bigint, you should use them. Just
> remember to
> typecast whenever required.

This is my bigger problem: i am using EOF (OR mapping tool) which frees
me more or less form writing a lot of SQL. If i need to typecast to use
an index then i have to see how to do this with this framework.

Regards David


Re: optimizing Postgres queries

From
Shridhar Daithankar
Date:
On Monday 05 January 2004 17:48, David Teran wrote:
> Hi,
>
> > The performance will likely to be the same. Its just that integer
> > happens to
> > be default integer type and hence it does not need an explicit
> > typecast. ( I
> > don't remember exactly which integer is default but it is either of
> > int2,int4
> > and int8...:-))
>
> The docs say int4 is much faster than int8, but i will check this.

Well yes. That is correct as well.

What I (really) meant to say that an index scan to pick few in4 tuples
wouldn't be hell much faster than an index scan to pick same number of tuples
with int8 definition.

The initial boost you got from converting to index scan, would be probably
best you can beat out of it..

Of course if you are scanning a few million of them sequentially, then it is
different story.

> This is my bigger problem: i am using EOF (OR mapping tool) which frees
> me more or less form writing a lot of SQL. If i need to typecast to use
> an index then i have to see how to do this with this framework.

Well, you can direct your queries to a function rather than table, that would
cast the argument appropriately and select. Postgresql support function
overloading as well, in case you need different types of arguments with same
name.

Or you can write an instead rule on server side which will perform casting
before touching the table.

I am not sure of exact details it would take to make it work, but it should
work, at least in theory. That way you can preserve the efforts invested in
the mapping tool.

Of course, converting everything to integer might be a simpler option after
all..:-)


 Shridhar


Re: optimizing Postgres queries

From
Christopher Kings-Lynne
Date:
> explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
> t0.ID_FOREIGN_TABLE = 21110::bigint;
>
> an index is used. Very fine, the performance is about 10 to 100 times
> faster for the single select.

An alternative technique is to do this:

... t0.ID_FOREIGN_TABLE = '21110';

Chris

Re: optimizing Postgres queries

From
Tom Lane
Date:
David Teran <david.teran@cluster9.com> writes:
> This is my bigger problem: i am using EOF (OR mapping tool) which frees
> me more or less form writing a lot of SQL. If i need to typecast to use
> an index then i have to see how to do this with this framework.

It's worth pointing out that this problem is fixed (at long last) in
CVS tip.  Ypu probably shouldn't expend large amounts of effort on
working around a problem that will go away in 7.5.

If you don't anticipate going to production for six months or so, you
could adopt CVS tip as your development platform, with the expectation
that 7.5 will be released by the time you need a production system.
I wouldn't recommend running CVS tip as a production database but it
should be plenty stable enough for devel purposes.

Another plan would be to use int4 columns for the time being with the
intention of widening them to int8 when you move to 7.5.  This would
depend on how soon you anticipate needing values > 32 bits, of course.

            regards, tom lane

Re: optimizing Postgres queries

From
David Teran
Date:
Hi Tom,

> It's worth pointing out that this problem is fixed (at long last) in
> CVS tip.  Ypu probably shouldn't expend large amounts of effort on
> working around a problem that will go away in 7.5.
>
We have now changed the definition to integer, this will work for some
time. We are currently evaluating and have several production database
we might switch in some time.

What we found out now is that a query with a single 'where' works fine,
the query planer uses the index but when we have 'two' where clauses it
does not use the index anymore:

EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
(t0."ID_VALUE" = 14542); performs fine, less than one millisecond.

EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
(t0."ID_VALUE" = 14542 OR t0."ID_VALUE" = 14550); performs bad: about
235 milliseconds.

I tried to change the second one to use IN but this did not help at
all. Am i doing something wrong? I have an index defined like this:

CREATE INDEX key_value_meta_data__id_value__fk_index ON
"KEY_VALUE_META_DATA" USING btree ("ID_VALUE");

Regards David


Re: optimizing Postgres queries

From
Tom Lane
Date:
David Teran <david.teran@cluster9.com> writes:
> What we found out now is that a query with a single 'where' works fine,
> the query planer uses the index but when we have 'two' where clauses it
> does not use the index anymore:

> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
> (t0."ID_VALUE" = 14542); performs fine, less than one millisecond.

> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
> (t0."ID_VALUE" = 14542 OR t0."ID_VALUE" = 14550); performs bad: about
> 235 milliseconds.

Please, when you ask this sort of question, show the EXPLAIN ANALYZE
output.  It is not a virtue to provide minimal information and see if
anyone can guess what's happening.

            regards, tom lane

Re: optimizing Postgres queries

From
David Teran
Date:
Hi Tom,


> David Teran <david.teran@cluster9.com> writes:
>> What we found out now is that a query with a single 'where' works
>> fine,
>> the query planer uses the index but when we have 'two' where clauses
>> it
>> does not use the index anymore:
>
>> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
>> (t0."ID_VALUE" = 14542); performs fine, less than one millisecond.
>
>> EXPLAIN ANALYZE SELECT columns...  FROM "KEY_VALUE_META_DATA" t0 WHERE
>> (t0."ID_VALUE" = 14542 OR t0."ID_VALUE" = 14550); performs bad: about
>> 235 milliseconds.
>
> Please, when you ask this sort of question, show the EXPLAIN ANALYZE
> output.  It is not a virtue to provide minimal information and see if
> anyone can guess what's happening.
>
Sorry for that, i thought this is such a trivial question that the
answer is easy.

explain result from first query:

Index Scan using key_value_meta_data__id_value__fk_index on "KEY_VALUE_M
ETA_DATA" t0  (cost=0.00..1585.52 rows=467 width=1068) (actual time=0.42
4..0.493 rows=13 loops=1)

  Index Cond: ("ID_VALUE" = 21094)

Total runtime: 0.608 ms



explain result from second query:

Seq Scan on "KEY_VALUE_META_DATA" t0  (cost=0.00..2671.16 rows=931 width
=1068) (actual time=122.669..172.179 rows=25 loops=1)

  Filter: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))

Total runtime: 172.354 ms



I found out that its possible to disable seq scans with set
enable_seqscan to off; then the second query result looks like this:

Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta
_data__id_value__fk_index on "KEY_VALUE_META_DATA" t0  (cost=0.00..3173.
35 rows=931 width=1068) (actual time=0.116..0.578 rows=25 loops=1)

  Index Cond: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))

Total runtime: 0.716 ms


But i read in the docs that its not OK to turn this off by default. I
really wonder if this is my fault or not, from my point of view this is
such a simple select that the query plan should not result in a table
scan.

Regards David


Re: optimizing Postgres queries

From
Tom Lane
Date:
David Teran <david.teran@cluster9.com> writes:
> explain result from second query:

> Seq Scan on "KEY_VALUE_META_DATA" t0  (cost=0.00..2671.16 rows=931 width
> =1068) (actual time=122.669..172.179 rows=25 loops=1)
>   Filter: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))

The problem is evidently that the row estimate is so far off (931
estimate vs 25 actual).  Have you done ANALYZE or VACUUM ANALYZE
on this table recently?  If you have, I'd be interested to see the
pg_stats row for ID_VALUE.  It might be that you need to increase
the statistics target for this table.

            regards, tom lane

Re: optimizing Postgres queries

From
David Teran
Date:
Hi Tom,

first of all thanks for your help! I really appreciate your fast
response and if you ever have a question about WebObjects, just drop me
line ;-)

>> Seq Scan on "KEY_VALUE_META_DATA" t0  (cost=0.00..2671.16 rows=931
>> width
>> =1068) (actual time=122.669..172.179 rows=25 loops=1)
>>   Filter: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))
>
> The problem is evidently that the row estimate is so far off (931
> estimate vs 25 actual).  Have you done ANALYZE or VACUUM ANALYZE
> on this table recently?  If you have, I'd be interested to see the
> pg_stats row for ID_VALUE.  It might be that you need to increase
> the statistics target for this table.
>
I am absolutely new to PostgreSQL. OK, after VACUUM ANALYZE i get:

Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta
_data__id_value__fk_index on "KEY_VALUE_META_DATA" t0  (cost=0.00..19.94
 rows=14 width=75) (actual time=0.615..1.017 rows=25 loops=1)
  Index Cond: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))
Total runtime: 2.565 ms

and the second time i invoke this i get


Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta
_data__id_value__fk_index on "KEY_VALUE_META_DATA" t0  (cost=0.00..19.94
 rows=14 width=75) (actual time=0.112..0.296 rows=25 loops=1)
  Index Cond: (("ID_VALUE" = 21094) OR ("ID_VALUE" = 21103))
Total runtime: 0.429 ms

Much better. So i think i will first read more about this optimization
stuff and regular maintenance things. This is something i like very
much from FrontBase: no need for such things, simply start and run. But
other things were not so fine ;-).

Is there any hint where to start to understand more about this
optimization problem?

regards David



Re: optimizing Postgres queries

From
Bruce Momjian
Date:
David Teran wrote:
> Index?Scan?using?key_value_meta_data__id_value__fk_index,?key_value_meta
> _data__id_value__fk_index?on?"KEY_VALUE_META_DATA"?t0??(cost=0.00..19.94
> ?rows=14?width=75)?(actual?time=0.112..0.296?rows=25?loops=1)
> ??Index?Cond:?(("ID_VALUE"?=?21094)?OR?("ID_VALUE"?=?21103))
> Total runtime: 0.429 ms
>
> Much better. So i think i will first read more about this optimization
> stuff and regular maintenance things. This is something i like very
> much from FrontBase: no need for such things, simply start and run. But
> other things were not so fine ;-).
>
> Is there any hint where to start to understand more about this
> optimization problem?

Read the FAQ.  There is an item about slow queries and indexes.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: optimizing Postgres queries

From
David Teran
Date:
... wow:

executing a batch file with about 4250 selects, including lots of joins
other things PostgreSQL 7.4 is about 2 times faster than FrontBase
3.6.27. OK, we will start to make larger tests but this is quite
interesting already: we did not optimize a lot, just invoked VACUUM
ANALYZE and then the selects ;-)

Thanks to all who answered to this thread.

cheers David


Re: optimizing Postgres queries

From
Tom Lane
Date:
David Teran <david.teran@cluster9.com> writes:
> Much better. So i think i will first read more about this optimization
> stuff and regular maintenance things.

See http://www.postgresql.org/docs/7.4/static/maintenance.html

> Is there any hint where to start to understand more about this
> optimization problem?

http://www.postgresql.org/docs/7.4/static/performance-tips.html

            regards, tom lane

Re: optimizing Postgres queries

From
Rod Taylor
Date:
On Mon, 2004-01-05 at 14:57, David Teran wrote:
> ... wow:
>
> executing a batch file with about 4250 selects, including lots of joins
> other things PostgreSQL 7.4 is about 2 times faster than FrontBase
> 3.6.27. OK, we will start to make larger tests but this is quite
> interesting already: we did not optimize a lot, just invoked VACUUM
> ANALYZE and then the selects ;-)
>
> Thanks to all who answered to this thread.

I presume that batch file was executed linearly -- no parallelism?
You're actually testing one of PostgreSQL's shortcomings.

PostgreSQL (in my experience) does much better in such comparisons with
a parallel load -- multiple connections executing varied work (short
selects, complex selects, inserts, updates, deletes).

Anyway, just a tip that you will want to test your actual load. If you
do batch work with a single thread, what you have is fine. But if you
have a website with tens or hundreds of simultaneous connections then
your non-parallel testing will not reflect that work load.