Thread: Postgresql simple query performance question

Postgresql simple query performance question

From
SHARMILA JOTHIRAJAH
Date:
Hi
We are in the process of testing for migration of our database from Oracle to Postgresql.
I hava a simple query

Select count(*) from foo
This table has 29384048 rows and is indexed on foo_id

The tables are vacuumed and the explain plan for postgresql is

QUERY PLAN                                                                                   

------------------------------------------
 Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual time=68797.280..68797.280 rows=1 loops=1)                               
   ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0) (actual
time=0.232..60657.948 rows=29384048 loops=1)
 Total runtime: 68797.358 ms                                                                 
                                       

The explain plan for oracle is
                    
OPERATION            OBJECT                    ACCESS_PREDICATES     FILTER_PREDICATES    
 -------------------  ------------------------  --------------------  --------------------
 SELECT STATEMENT ()  (null)                    (null)                (null)               
  SORT (AGGREGATE)    (null)                    (null)                (null)               
   INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)   

Oracle uses index for count(*) query in this case
This query in   Oracle takes only 5 sec and in postgresql it takes 1 min 10sec          

The same query in oracle without the index and full table scan(like in postgresql) has the

explain plan like this and it takes 34 sec.

select /*+ full(foo1) */ count(*) from foo1

OPERATION                OBJECT              ACCESS_PREDICATES     FILTER_PREDICATES    
 -----------------------  ------------------  --------------------  --------------------
 SELECT STATEMENT ()      (null)              (null)                (null)               
  SORT (AGGREGATE)        (null)              (null)                (null)   
    TABLE ACCESS (FULL)   foo                 (null)                (null)


In short the query "Select count(*) from foo" takes the following time:
Postgresql - 1m 10 sec
Oracle(index scan) - 5 sec
Oracle (full table scan) - 34 sec

How can I speed up this query in postgresql ? The other postgres settings are

postgresql    
        
   max_connections = 100
   shared_buffers = 50000                 
   temp_buffers = 5000
   work_mem = 16384                         
   maintenance_work_mem = 262144  
   fsync = on                             
   wal_sync_method = fsync
   effective_cache_size = 300000   
   random_page_cost =      4               
   cpu_tuple_cost = 0.01               
   cpu_index_tuple_cost = 0.001         
   cpu_operator_cost = 0.0025  

Are there any tuning that need to be done in the OS  or database side? I had attached the iostat and vmstat results of postgresql

Thanks

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Attachment

Re: Postgresql simple query performance question

From
"Pavel Stehule"
Date:
Hello

PostgreSQL doesn't use index for COUN(*)

http://www.varlena.com/GeneralBits/18.php
http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7

Regards
Pavel Stehule

On 06/11/2007, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
>
> Hi
> We are in the process of testing for migration of our database from Oracle
> to Postgresql.
> I hava a simple query
>
> Select count(*) from foo
> This table has 29384048 rows and is indexed on foo_id
>
> The tables are vacuumed and the explain plan for postgresql is
>
> QUERY PLAN
>
>
> ------------------------------------------
>  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> time=68797.280..68797.280 rows=1 loops=1)
>
>    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> (actual
> time=0.232..60657.948 rows=29384048 loops=1)
>  Total runtime: 68797.358 ms
>
>
>
> The explain plan for oracle is
>
> OPERATION            OBJECT                    ACCESS_PREDICATES
> FILTER_PREDICATES
>  -------------------  ------------------------  --------------------
> --------------------
>  SELECT STATEMENT ()  (null)                    (null)                (null)
>
>   SORT (AGGREGATE)    (null)                    (null)                (null)
>
>    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
>
> Oracle uses index for count(*) query in this case
> This query in   Oracle takes only 5 sec and in postgresql it takes 1 min
> 10sec
>
> The same query in oracle without the index and full table scan(like in
> postgresql) has the
>
> explain plan like this and it takes 34 sec.
>
> select /*+ full(foo1) */ count(*) from foo1
>
> OPERATION                OBJECT              ACCESS_PREDICATES
> FILTER_PREDICATES
>  -----------------------  ------------------  --------------------
> --------------------
>  SELECT STATEMENT ()      (null)              (null)                (null)
>
>   SORT (AGGREGATE)        (null)              (null)                (null)
>     TABLE ACCESS (FULL)   foo                 (null)                (null)
>
>
> In short the query "Select count(*) from foo" takes the following time:
> Postgresql - 1m 10 sec
> Oracle(index scan) - 5 sec
> Oracle (full table scan) - 34 sec
>
> How can I speed up this query in postgresql ? The other postgres settings
> are
>
> postgresql
>
>    max_connections = 100
>    shared_buffers = 50000
>    temp_buffers = 5000
>    work_mem = 16384
>    maintenance_work_mem = 262144
>    fsync = on
>    wal_sync_method = fsync
>    effective_cache_size = 300000
>    random_page_cost =      4
>    cpu_tuple_cost = 0.01
>    cpu_index_tuple_cost = 0.001
>    cpu_operator_cost = 0.0025
>
> Are there any tuning that need to be done in the OS  or database side? I had
> attached the iostat and vmstat results of postgresql
>
> Thanks
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>
>

Re: Postgresql simple query performance question

From
Reg Me Please
Date:
I have no doubt you're right, Pavel.
But why not?
It could be a simple enhacement.

Il Tuesday 06 November 2007 15:11:02 Pavel Stehule ha scritto:
> Hello
>
> PostgreSQL doesn't use index for COUN(*)
>
> http://www.varlena.com/GeneralBits/18.php
> http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7
>
> Regards
> Pavel Stehule
>
> On 06/11/2007, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
> > Hi
> > We are in the process of testing for migration of our database from
> > Oracle to Postgresql.
> > I hava a simple query
> >
> > Select count(*) from foo
> > This table has 29384048 rows and is indexed on foo_id
> >
> > The tables are vacuumed and the explain plan for postgresql is
> >
> > QUERY PLAN
> >
> >
> > ------------------------------------------
> >  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> > time=68797.280..68797.280 rows=1 loops=1)
> >
> >    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> > (actual
> > time=0.232..60657.948 rows=29384048 loops=1)
> >  Total runtime: 68797.358 ms
> >
> >
> >
> > The explain plan for oracle is
> >
> > OPERATION            OBJECT                    ACCESS_PREDICATES
> > FILTER_PREDICATES
> >  -------------------  ------------------------  --------------------
> > --------------------
> >  SELECT STATEMENT ()  (null)                    (null)
> > (null)
> >
> >   SORT (AGGREGATE)    (null)                    (null)
> > (null)
> >
> >    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
> >
> > Oracle uses index for count(*) query in this case
> > This query in   Oracle takes only 5 sec and in postgresql it takes 1 min
> > 10sec
> >
> > The same query in oracle without the index and full table scan(like in
> > postgresql) has the
> >
> > explain plan like this and it takes 34 sec.
> >
> > select /*+ full(foo1) */ count(*) from foo1
> >
> > OPERATION                OBJECT              ACCESS_PREDICATES
> > FILTER_PREDICATES
> >  -----------------------  ------------------  --------------------
> > --------------------
> >  SELECT STATEMENT ()      (null)              (null)
> > (null)
> >
> >   SORT (AGGREGATE)        (null)              (null)
> > (null) TABLE ACCESS (FULL)   foo                 (null)
> > (null)
> >
> >
> > In short the query "Select count(*) from foo" takes the following time:
> > Postgresql - 1m 10 sec
> > Oracle(index scan) - 5 sec
> > Oracle (full table scan) - 34 sec
> >
> > How can I speed up this query in postgresql ? The other postgres settings
> > are
> >
> > postgresql
> >
> >    max_connections = 100
> >    shared_buffers = 50000
> >    temp_buffers = 5000
> >    work_mem = 16384
> >    maintenance_work_mem = 262144
> >    fsync = on
> >    wal_sync_method = fsync
> >    effective_cache_size = 300000
> >    random_page_cost =      4
> >    cpu_tuple_cost = 0.01
> >    cpu_index_tuple_cost = 0.001
> >    cpu_operator_cost = 0.0025
> >
> > Are there any tuning that need to be done in the OS  or database side? I
> > had attached the iostat and vmstat results of postgresql
> >
> > Thanks
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >        subscribe-nomail command to majordomo@postgresql.org so that your
> >        message can get through to the mailing list cleanly
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match



--
Reg me Please

Re: Postgresql simple query performance question

From
Bill Moran
Date:
In response to SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>:

> Hi
> We are in the process of testing for migration of our database from Oracle to Postgresql.
> I hava a simple query
>
> Select count(*) from foo

This is asked a lot.  The quick answer is that PostgreSQL method of MVCC
makes it impossible to make this query fast.  Perhaps, someday, some
brilliant developer will come up with an optimization, but that hasn't
happened yet.

There may be some tweaks you can make to your tuning, see inline below.
However, if you really need a fast, accurate count of rows in that
table, I recommend you create a trigger to track it.

> This table has 29384048 rows and is indexed on foo_id
>
> The tables are vacuumed and the explain plan for postgresql is
>
> QUERY PLAN
>
> ------------------------------------------
>  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual time=68797.280..68797.280 rows=1 loops=1)
                   
>    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0) (actual
> time=0.232..60657.948 rows=29384048 loops=1)
>  Total runtime: 68797.358 ms
>
>
> The explain plan for oracle is
>
> OPERATION            OBJECT                    ACCESS_PREDICATES     FILTER_PREDICATES
>  -------------------  ------------------------  --------------------  --------------------
>  SELECT STATEMENT ()  (null)                    (null)                (null)
>   SORT (AGGREGATE)    (null)                    (null)                (null)
>    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
>
> Oracle uses index for count(*) query in this case
> This query in   Oracle takes only 5 sec and in postgresql it takes 1 min 10sec
>
> The same query in oracle without the index and full table scan(like in postgresql) has the
>
> explain plan like this and it takes 34 sec.
>
> select /*+ full(foo1) */ count(*) from foo1
>
> OPERATION                OBJECT              ACCESS_PREDICATES     FILTER_PREDICATES
>  -----------------------  ------------------  --------------------  --------------------
>  SELECT STATEMENT ()      (null)              (null)                (null)
>   SORT (AGGREGATE)        (null)              (null)                (null)
>     TABLE ACCESS (FULL)   foo                 (null)                (null)
>
>
> In short the query "Select count(*) from foo" takes the following time:
> Postgresql - 1m 10 sec
> Oracle(index scan) - 5 sec
> Oracle (full table scan) - 34 sec
>
> How can I speed up this query in postgresql ? The other postgres settings are
>
> postgresql
>
>    max_connections = 100
>    shared_buffers = 50000

How much memory does this system have?  What version of PostgreSQL are you
using?  If you're using an 8.X version and have more 2G of RAM, this
value is likely too low.  Start with 1/4 the available RAM and tune from
there.

>    temp_buffers = 5000
>    work_mem = 16384
>    maintenance_work_mem = 262144
>    fsync = on
>    wal_sync_method = fsync
>    effective_cache_size = 300000
>    random_page_cost =      4
>    cpu_tuple_cost = 0.01
>    cpu_index_tuple_cost = 0.001
>    cpu_operator_cost = 0.0025
>
> Are there any tuning that need to be done in the OS  or database side? I had attached the iostat and vmstat results
ofpostgresql 



--
Bill Moran
http://www.potentialtech.com

Re: Postgresql simple query performance question

From
Bill Moran
Date:
In response to Reg Me Please <regmeplease@gmail.com>:

> I have no doubt you're right, Pavel.
> But why not?
> It could be a simple enhacement.

It's not simple.  Do some searches on the mailing lists and you will
find discussion of why it's difficult to do.

>
> Il Tuesday 06 November 2007 15:11:02 Pavel Stehule ha scritto:
> > Hello
> >
> > PostgreSQL doesn't use index for COUN(*)
> >
> > http://www.varlena.com/GeneralBits/18.php
> > http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7
> >
> > Regards
> > Pavel Stehule
> >
> > On 06/11/2007, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
> > > Hi
> > > We are in the process of testing for migration of our database from
> > > Oracle to Postgresql.
> > > I hava a simple query
> > >
> > > Select count(*) from foo
> > > This table has 29384048 rows and is indexed on foo_id
> > >
> > > The tables are vacuumed and the explain plan for postgresql is
> > >
> > > QUERY PLAN
> > >
> > >
> > > ------------------------------------------
> > >  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> > > time=68797.280..68797.280 rows=1 loops=1)
> > >
> > >    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> > > (actual
> > > time=0.232..60657.948 rows=29384048 loops=1)
> > >  Total runtime: 68797.358 ms
> > >
> > >
> > >
> > > The explain plan for oracle is
> > >
> > > OPERATION            OBJECT                    ACCESS_PREDICATES
> > > FILTER_PREDICATES
> > >  -------------------  ------------------------  --------------------
> > > --------------------
> > >  SELECT STATEMENT ()  (null)                    (null)
> > > (null)
> > >
> > >   SORT (AGGREGATE)    (null)                    (null)
> > > (null)
> > >
> > >    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
> > >
> > > Oracle uses index for count(*) query in this case
> > > This query in   Oracle takes only 5 sec and in postgresql it takes 1 min
> > > 10sec
> > >
> > > The same query in oracle without the index and full table scan(like in
> > > postgresql) has the
> > >
> > > explain plan like this and it takes 34 sec.
> > >
> > > select /*+ full(foo1) */ count(*) from foo1
> > >
> > > OPERATION                OBJECT              ACCESS_PREDICATES
> > > FILTER_PREDICATES
> > >  -----------------------  ------------------  --------------------
> > > --------------------
> > >  SELECT STATEMENT ()      (null)              (null)
> > > (null)
> > >
> > >   SORT (AGGREGATE)        (null)              (null)
> > > (null) TABLE ACCESS (FULL)   foo                 (null)
> > > (null)
> > >
> > >
> > > In short the query "Select count(*) from foo" takes the following time:
> > > Postgresql - 1m 10 sec
> > > Oracle(index scan) - 5 sec
> > > Oracle (full table scan) - 34 sec
> > >
> > > How can I speed up this query in postgresql ? The other postgres settings
> > > are
> > >
> > > postgresql
> > >
> > >    max_connections = 100
> > >    shared_buffers = 50000
> > >    temp_buffers = 5000
> > >    work_mem = 16384
> > >    maintenance_work_mem = 262144
> > >    fsync = on
> > >    wal_sync_method = fsync
> > >    effective_cache_size = 300000
> > >    random_page_cost =      4
> > >    cpu_tuple_cost = 0.01
> > >    cpu_index_tuple_cost = 0.001
> > >    cpu_operator_cost = 0.0025
> > >
> > > Are there any tuning that need to be done in the OS  or database side? I
> > > had attached the iostat and vmstat results of postgresql
> > >
> > > Thanks
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam? Yahoo! Mail has the best spam protection around
> > > http://mail.yahoo.com
> > >
> > > ---------------------------(end of
> > > broadcast)---------------------------
> > > TIP 1: if posting/reading through Usenet, please send an appropriate
> > >        subscribe-nomail command to majordomo@postgresql.org so that your
> > >        message can get through to the mailing list cleanly
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
>
>
>
> --
> Reg me Please
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


--
Bill Moran
http://www.potentialtech.com

Re: Postgresql simple query performance question

From
Reg Me Please
Date:
While I would not spend resources in fine tuning the count(*), I would
spend some to underastand why and how the other ones do it better.

Just to be better.

Il Tuesday 06 November 2007 15:29:34 Bill Moran ha scritto:
> In response to Reg Me Please <regmeplease@gmail.com>:
> > I have no doubt you're right, Pavel.
> > But why not?
> > It could be a simple enhacement.
>
> It's not simple.  Do some searches on the mailing lists and you will
> find discussion of why it's difficult to do.
>
> > Il Tuesday 06 November 2007 15:11:02 Pavel Stehule ha scritto:
> > > Hello
> > >
> > > PostgreSQL doesn't use index for COUN(*)
> > >
> > > http://www.varlena.com/GeneralBits/18.php
> > > http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7
> > >
> > > Regards
> > > Pavel Stehule
> > >
> > > On 06/11/2007, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
> > > > Hi
> > > > We are in the process of testing for migration of our database from
> > > > Oracle to Postgresql.
> > > > I hava a simple query
> > > >
> > > > Select count(*) from foo
> > > > This table has 29384048 rows and is indexed on foo_id
> > > >
> > > > The tables are vacuumed and the explain plan for postgresql is
> > > >
> > > > QUERY PLAN
> > > >
> > > >
> > > > ------------------------------------------
> > > >  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> > > > time=68797.280..68797.280 rows=1 loops=1)
> > > >
> > > >    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> > > > (actual
> > > > time=0.232..60657.948 rows=29384048 loops=1)
> > > >  Total runtime: 68797.358 ms
> > > >
> > > >
> > > >
> > > > The explain plan for oracle is
> > > >
> > > > OPERATION            OBJECT                    ACCESS_PREDICATES
> > > > FILTER_PREDICATES
> > > >  -------------------  ------------------------  --------------------
> > > > --------------------
> > > >  SELECT STATEMENT ()  (null)                    (null)
> > > > (null)
> > > >
> > > >   SORT (AGGREGATE)    (null)                    (null)
> > > > (null)
> > > >
> > > >    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
> > > >
> > > > Oracle uses index for count(*) query in this case
> > > > This query in   Oracle takes only 5 sec and in postgresql it takes 1
> > > > min 10sec
> > > >
> > > > The same query in oracle without the index and full table scan(like
> > > > in postgresql) has the
> > > >
> > > > explain plan like this and it takes 34 sec.
> > > >
> > > > select /*+ full(foo1) */ count(*) from foo1
> > > >
> > > > OPERATION                OBJECT              ACCESS_PREDICATES
> > > > FILTER_PREDICATES
> > > >  -----------------------  ------------------  --------------------
> > > > --------------------
> > > >  SELECT STATEMENT ()      (null)              (null)
> > > > (null)
> > > >
> > > >   SORT (AGGREGATE)        (null)              (null)
> > > > (null) TABLE ACCESS (FULL)   foo                 (null)
> > > > (null)
> > > >
> > > >
> > > > In short the query "Select count(*) from foo" takes the following
> > > > time: Postgresql - 1m 10 sec
> > > > Oracle(index scan) - 5 sec
> > > > Oracle (full table scan) - 34 sec
> > > >
> > > > How can I speed up this query in postgresql ? The other postgres
> > > > settings are
> > > >
> > > > postgresql
> > > >
> > > >    max_connections = 100
> > > >    shared_buffers = 50000
> > > >    temp_buffers = 5000
> > > >    work_mem = 16384
> > > >    maintenance_work_mem = 262144
> > > >    fsync = on
> > > >    wal_sync_method = fsync
> > > >    effective_cache_size = 300000
> > > >    random_page_cost =      4
> > > >    cpu_tuple_cost = 0.01
> > > >    cpu_index_tuple_cost = 0.001
> > > >    cpu_operator_cost = 0.0025
> > > >
> > > > Are there any tuning that need to be done in the OS  or database
> > > > side? I had attached the iostat and vmstat results of postgresql
> > > >
> > > > Thanks
> > > >
> > > > __________________________________________________
> > > > Do You Yahoo!?
> > > > Tired of spam? Yahoo! Mail has the best spam protection around
> > > > http://mail.yahoo.com
> > > >
> > > > ---------------------------(end of
> > > > broadcast)---------------------------
> > > > TIP 1: if posting/reading through Usenet, please send an appropriate
> > > >        subscribe-nomail command to majordomo@postgresql.org so that
> > > > your message can get through to the mailing list cleanly
> > >
> > > ---------------------------(end of
> > > broadcast)--------------------------- TIP 9: In versions below 8.0, the
> > > planner will ignore your desire to choose an index scan if your joining
> > > column's datatypes do not match
> >
> > --
> > Reg me Please
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match



--
Reg me Please

Re: Postgresql simple query performance question

From
SHARMILA JOTHIRAJAH
Date:
I understand that. But why is that when oracle is given a hint to do full table scan instead of using index to get the count, it is still faster than postgres when both has the same explain plan? Oracle takes 34 sec and postgres takes 1 m10 sec . Is there anything that can be done in postgresql for speeding this up?

Oracle --select /*+ full(foo1) */ count(*) from foo1
>
> OPERATION                OBJECT              ACCESS_PREDICATES
> FILTER_PREDICATES
>  -----------------------  ------------------  --------------------
>
>  SELECT STATEMENT ()      (null)              (null)                (null)
>
>  SORT (AGGREGATE)        (null)              (null)                (null)
>    TABLE ACCESS (FULL)  foo                (null)                (null)

postgresql --Select count(*) from foo
> This table has 29384048 rows and is indexed on foo_id
>
> The tables are vacuumed and the explain plan for postgresql is
>
> QUERY PLAN
>
>
> ------------------------------------------
>  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> time=68797.280..68797.280 rows=1 loops=1)
>
>    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> (actual time=0.232..60657.948 rows=29384048 loops=1)
>  Total runtime: 68797.358 ms

Thanks again
sharmila

----- Original Message ----
From: Pavel Stehule <pavel.stehule@gmail.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Tuesday, November 6, 2007 9:11:02 AM
Subject: Re: [GENERAL] Postgresql simple query performance question

Hello

PostgreSQL doesn't use index for COUN(*)

http://www.varlena.com/GeneralBits/18.php
http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7

Regards
Pavel Stehule

On 06/11/2007, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
>
> Hi
> We are in the process of testing for migration of our database from Oracle
> to Postgresql.
> I hava a simple query
>
> Select count(*) from foo
> This table has 29384048 rows and is indexed on foo_id
>
> The tables are vacuumed and the explain plan for postgresql is
>
> QUERY PLAN
>
>
> ------------------------------------------
>  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> time=68797.280..68797.280 rows=1 loops=1)
>
>    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> (actual
> time=0.232..60657.948 rows=29384048 loops=1)
>  Total runtime: 68797.358 ms
>
>
>
> The explain plan for oracle is
>
> OPERATION            OBJECT                    ACCESS_PREDICATES
> FILTER_PREDICATES
>  -------------------  ------------------------  --------------------
> --------------------
>  SELECT STATEMENT ()  (null)                    (null)                (null)
>
>  SORT (AGGREGATE)    (null)                    (null)                (null)
>
>    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
>
> Oracle uses index for count(*) query in this case
> This query in  Oracle takes only 5 sec and in postgresql it takes 1 min
> 10sec
>
> The same query in oracle without the index and full table scan(like in
> postgresql) has the
>
> explain plan like this and it takes 34 sec.
>
> select /*+ full(foo1) */ count(*) from foo1
>
> OPERATION                OBJECT              ACCESS_PREDICATES
> FILTER_PREDICATES
>  -----------------------  ------------------  --------------------
> --------------------
>  SELECT STATEMENT ()      (null)              (null)                (null)
>
>  SORT (AGGREGATE)        (null)              (null)                (null)
>    TABLE ACCESS (FULL)  foo                (null)                (null)
>
>
> In short the query "Select count(*) from foo" takes the following time:
> Postgresql - 1m 10 sec
> Oracle(index scan) - 5 sec
> Oracle (full table scan) - 34 sec
>
> How can I speed up this query in postgresql ? The other postgres settings
> are
>
> postgresql
>
>    max_connections = 100
>    shared_buffers = 50000
>    temp_buffers = 5000
>    work_mem = 16384
>    maintenance_work_mem = 262144
>    fsync = on
>    wal_sync_method = fsync
>    effective_cache_size = 300000
>    random_page_cost =      4
>    cpu_tuple_cost = 0.01
>    cpu_index_tuple_cost = 0.001
>    cpu_operator_cost = 0.0025
>
> Are there any tuning that need to be done in the OS  or database side? I had
> attached the iostat and vmstat results of postgresql
>
> Thanks
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Postgresql simple query performance question

From
Richard Huxton
Date:
Reg Me Please wrote:
> While I would not spend resources in fine tuning the count(*), I would
> spend some to underastand why and how the other ones do it better.
>
> Just to be better.

The problem is well understood, and there is extensive discussion in the
mailing lists archives. The basic problem is that with PG's
implementation of MVCC the indexes don't have row visibility
information. The simple solution of adding it to every index entry would
increase index size substantially imposing costs on every index access
and update.

There's a thread in -hackers called "Visibility map thoughts" that is
looking at the situation again and if/how to implement visibility
information in a compact form.

--
   Richard Huxton
   Archonet Ltd

Re: Postgresql simple query performance question

From
"Gokulakannan Somasundaram"
Date:
Hi,
    Oracle, eventhough is a timestamp based database, stores only one
version for each row in the table and the rest of the versions of the
same-row(which might have got created due to updates) in a separate
place called undo log. In postgres, all the versions are stored in the
table. So the table would be bigger than it is in Oracle. Try doing a
Vacuum Full and fire the query. You might save some seconds.
    I think we will definitely address this problem in 8.4.

Thanks,
Gokul.


On Nov 6, 2007 8:07 PM, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
>
> I understand that. But why is that when oracle is given a hint to do full
> table scan instead of using index to get the count, it is still faster than
> postgres when both has the same explain plan? Oracle takes 34 sec and
> postgres takes 1 m10 sec . Is there anything that can be done in postgresql
> for speeding this up?
>
> Oracle --select /*+ full(foo1) */ count(*) from foo1
>
> >
> > OPERATION                OBJECT              ACCESS_PREDICATES
> > FILTER_PREDICATES
> >  -----------------------  ------------------  --------------------
> >
> >  SELECT STATEMENT ()      (null)              (null)                (null)
> >
> >  SORT (AGGREGATE)        (null)              (null)                (null)
> >    TABLE ACCESS (FULL)  foo                (null)                (null)
>
> postgresql --Select count(*) from foo
>
> > This table has 29384048 rows and is indexed on foo_id
> >
> > The tables are vacuumed and the explain plan for postgresql is
> >
> > QUERY PLAN
> >
> >
> > ------------------------------------------
> >  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> > time=68797.280..68797.280 rows=1 loops=1)
> >
> >    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> > (actual time=0.232..60657.948 rows=29384048 loops=1)
> >  Total runtime: 68797.358 ms
>
> Thanks again
> sharmila
>
>
>
> ----- Original Message ----
> From: Pavel Stehule <pavel.stehule@gmail.com>
> To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Sent: Tuesday, November 6, 2007 9:11:02 AM
> Subject: Re: [GENERAL] Postgresql simple query performance question
>
> Hello
>
> PostgreSQL doesn't use index for COUN(*)
>
> http://www.varlena.com/GeneralBits/18.php
> http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7
>
> Regards
> Pavel Stehule
>
> On 06/11/2007, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote:
> >
> > Hi
> > We are in the process of testing for migration of our database from Oracle
> > to Postgresql.
> > I hava a simple query
> >
> > Select count(*) from foo
> > This table has 29384048 rows and is indexed on foo_id
> >
> > The tables are vacuumed and the explain plan for postgresql is
> >
> > QUERY PLAN
> >
> >
> > ------------------------------------------
> >  Aggregate  (cost=1194020.60..1194020.61 rows=1 width=0) (actual
> > time=68797.280..68797.280 rows=1 loops=1)
> >
> >    ->  Seq Scan on foo  (cost=0.00..1120560.48 rows=29384048 width=0)
> > (actual
> > time=0.232..60657.948 rows=29384048 loops=1)
> >  Total runtime: 68797.358 ms
> >
> >
> >
> > The explain plan for oracle is
> >
> > OPERATION            OBJECT                    ACCESS_PREDICATES
> > FILTER_PREDICATES
> >  -------------------  ------------------------  --------------------
> > --------------------
> >  SELECT STATEMENT ()  (null)                    (null)
> (null)
> >
> >  SORT (AGGREGATE)    (null)                    (null)
> (null)
> >
> >    INDEX (FULL SCAN)  foo_IDX_ID  (null)        (null)
> >
> > Oracle uses index for count(*) query in this case
> > This query in  Oracle takes only 5 sec and in postgresql it takes 1 min
> > 10sec
> >
> > The same query in oracle without the index and full table scan(like in
> > postgresql) has the
> >
> > explain plan like this and it takes 34 sec.
> >
> > select /*+ full(foo1) */ count(*) from foo1
> >
> > OPERATION                OBJECT              ACCESS_PREDICATES
> > FILTER_PREDICATES
> >  -----------------------  ------------------  --------------------
> > --------------------
> >  SELECT STATEMENT ()      (null)              (null)                (null)
> >
> >  SORT (AGGREGATE)        (null)              (null)                (null)
> >    TABLE ACCESS (FULL)  foo                (null)                (null)
> >
> >
> > In short the query "Select count(*) from foo" takes the following time:
> > Postgresql - 1m 10 sec
> > Oracle(index scan) - 5 sec
> > Oracle (full table scan) - 34 sec
> >
> > How can I speed up this query in postgresql ? The other postgres settings
> > are
> >
> > postgresql
> >
> >    max_connections = 100
> >    shared_buffers = 50000
> >    temp_buffers = 5000
> >    work_mem = 16384
> >    maintenance_work_mem = 262144
> >    fsync = on
> >    wal_sync_method = fsync
> >    effective_cache_size = 300000
> >    random_page_cost =      4
> >    cpu_tuple_cost = 0.01
> >    cpu_index_tuple_cost = 0.001
> >    cpu_operator_cost = 0.0025
> >
> > Are there any tuning that need to be done in the OS  or database side? I
> had
> > attached the iostat and vmstat results of postgresql
> >
> > Thanks
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >        subscribe-nomail command to majordomo@postgresql.org so that your
> >        message can get through to the mailing list cleanly
> >
> >
> >
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Re: Postgresql simple query performance question

From
André Volpato
Date:
Richard Huxton escreveu:
> Reg Me Please wrote:
>> While I would not spend resources in fine tuning the count(*), I would
>> spend some to underastand why and how the other ones do it better.
>>
>> Just to be better.
>
> The problem is well understood, and there is extensive discussion in
> the mailing lists archives. The basic problem is that with PG's
> implementation of MVCC the indexes don't have row visibility
> information. The simple solution of adding it to every index entry
> would increase index size substantially imposing costs on every index
> access and update.
>
> There's a thread in -hackers called "Visibility map thoughts" that is
> looking at the situation again and if/how to implement visibility
> information in a compact form.
>


Remember that you can always use serial fields to count a table, like:

alter table foo add id serial;
select id from foo order by id desc limit 1;

This should return the same value than count(*), in a few msecs.

--
ACV

Re: Postgresql simple query performance question

From
Bill Moran
Date:
In response to André Volpato <andre.volpato@ecomtecnologia.com.br>:

> Richard Huxton escreveu:
> > Reg Me Please wrote:
> >> While I would not spend resources in fine tuning the count(*), I would
> >> spend some to underastand why and how the other ones do it better.
> >>
> >> Just to be better.
> >
> > The problem is well understood, and there is extensive discussion in
> > the mailing lists archives. The basic problem is that with PG's
> > implementation of MVCC the indexes don't have row visibility
> > information. The simple solution of adding it to every index entry
> > would increase index size substantially imposing costs on every index
> > access and update.
> >
> > There's a thread in -hackers called "Visibility map thoughts" that is
> > looking at the situation again and if/how to implement visibility
> > information in a compact form.
>
>
> Remember that you can always use serial fields to count a table, like:
>
> alter table foo add id serial;
> select id from foo order by id desc limit 1;
>
> This should return the same value than count(*), in a few msecs.

I don't think so.  What kind of accuracy do you have when rows are
deleted?  Also, sequences are not transactional, so rolled-back
transactions will increment the sequence without actually adding
rows.

--
Bill Moran
http://www.potentialtech.com

Re: Postgresql simple query performance question

From
Gregory Stark
Date:
"SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com> writes:

> I understand that. But why is that when oracle is given a hint to do full
> table scan instead of using index to get the count, it is still faster than
> postgres when both has the same explain plan? Oracle takes 34 sec and
> postgres takes 1 m10 sec . Is there anything that can be done in postgresql
> for speeding this up?

How large are the actual respective data files?

What are the columns in these tables? Do you have many char() and NUMERIC
columns?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: Postgresql simple query performance question

From
André Volpato
Date:
Reid Thompson escreveu: <blockquote cite="mid:1194369123.15405.4.camel@raker.ateb.com" type="cite"><pre wrap="">On
Tue,2007-11-06 at 14:39 -0300, André Volpato wrote: 
 </pre><blockquote type="cite"><pre wrap="">Remember that you can always use serial fields to count a table, like:

alter table foo add id serial;
select id from foo order by id desc limit 1;

This should return the same value than count(*), in a few msecs.

--
ACV

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate      subscribe-nomail command to <a
class="moz-txt-link-abbreviated"href="mailto:majordomo@postgresql.org">majordomo@postgresql.org</a> so that your
messagecan get through to the mailing list cleanly   </pre></blockquote><pre wrap=""> 

not so...

test=# select version();

version

----------------------------------------------------------------------------------------------------------------PostgreSQL
8.2.4on i686-pc-linux-gnu, compiled by GCC 
i686-pc-linux-gnu-gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.1)
(1 row)

test=# create table serialcount(aval integer);
CREATE TABLE

test=# \timing
Timing is on.

test=# insert into serialcount values ( generate_series(1,10000000));
INSERT 0 10000000
Time: 42297.468 ms
test=# select count(*) from serialcount; count
----------10000000
(1 row)

Time: 6158.188 ms
test=# select count(*) from serialcount; count
----------10000000
(1 row)

Time: 2366.596 ms
test=# select count(*) from serialcount; count
----------10000000
(1 row)

Time: 2090.416 ms
test=# select count(*) from serialcount; count
----------10000000
(1 row)

Time: 2125.377 ms
test=# select count(*) from serialcount; count
----------10000000
(1 row)

Time: 2122.584 ms
test=# alter table serialcount add id serial;
NOTICE:  ALTER TABLE will create implicit sequence "serialcount_id_seq"
for serial column "serialcount.id"
ALTER TABLE
Time: 51733.139 ms
test=# select id from serialcount order by id desc limit 1;   id
----------10000000
(1 row)

Time: 41088.062 ms
test=# select id from serialcount order by id desc limit 1;   id
----------10000000
(1 row)

Time: 35638.317 ms
test=# vacuum analyze serialcount;
VACUUM
Time: 927.760 ms
test=# select id from serialcount order by id desc limit 1;   id
----------10000000
(1 row)

Time: 34281.178 ms </pre></blockquote><br /> I meant to select using an index. I´ve done the same tests here, and
realizedthat my server is two times slower than yours:<br /><br /> testeprog=# select version();<br />
                                                version<br />
---------------------------------------------------------------------------------------------------------<br/>
 PostgreSQL8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)<br /> (1
row)<br/><br /> testeprog=# select count(*) from test;<br />   count<br /> ----------<br />  10000000<br /> (1 row)<br
/><br/> Time: 4116.613 ms<br /><br /> testeprog=# alter table test add id serial;<br /> NOTICE:  ALTER TABLE will
createimplicit sequence "test_id_seq" for serial column "test.id"<br /> ALTER TABLE<br /> Time: 90617.195 ms<br /><br
/>testeprog=# select id from test order by id desc limit 1;<br />     id<br /> ----------<br />  10000000<br /> (1
row)<br/><br /> Time: 64856.553 ms<br /><br /> testeprog=# create unique index itest1 on test using btree (id);<br />
CREATEINDEX<br /> Time: 29026.891 ms<br /><br /><br /> testeprog=# explain analyze select id from test order by id desc
limit1;<br />                                                                QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------------------------------<br
/> Limit  (cost=0.00..0.02 rows=1 width=4) (actual time=0.017..0.019 rows=1 loops=1)<br />    ->  Index Scan
Backwardusing itest1 on test  (cost=0.00..185954.00 rows=10000000 width=4) (actual time=0.014..0.014 rows=1 loops=1)<br
/> Total runtime: 0.059 ms<br /> (3 rows)<br /><br /><br /><br /> @Bill:<br /> Bill Moran wrote <blockquote
cite="mid:20071106115927.2aea950e.wmoran@potentialtech.com"type="cite"><pre wrap=""> 
I don't think so.  What kind of accuracy do you have when rows are
deleted?  Also, sequences are not transactional, so rolled-back
transactions will increment the sequence without actually adding
rows. </pre></blockquote><br /> You are right, the serial hack should not work in most oltp cases.<br /><br /> --<br />
ACV<br/><br /> 

Re: Postgresql simple query performance question

From
Reg Me Please
Date:
Il Wednesday 07 November 2007 13:08:46 André Volpato ha scritto:
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> <html>
> <head>
>   <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
>   <title></title>
> </head>
> <body bgcolor="#ffffff" text="#000000">
> Reid Thompson escreveu:

Would it be possible to avoid the so-called "HTML email body"?

--
Reg me Please

Re: Postgresql simple query performance question

From
Simon Riggs
Date:
On Tue, 2007-11-06 at 09:29 -0500, Bill Moran wrote:
> In response to SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>:
>
> > Hi
> > We are in the process of testing for migration of our database from Oracle to Postgresql.
> > I hava a simple query
> >
> > Select count(*) from foo
>
> This is asked a lot.  The quick answer is that PostgreSQL method of MVCC
> makes it impossible to make this query fast.  Perhaps, someday, some
> brilliant developer will come up with an optimization, but that hasn't
> happened yet.

What release level is being tested? It may already have happened.

8.3 is substantially faster at seq scans, so the tests should be re-run
on 8.3 beta.

Also, re-run the Postgres test. It should be faster the second time,
even if the database server is restarted between tests.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


Re: Postgresql simple query performance question

From
SHARMILA JOTHIRAJAH
Date:
Hi,
The table has 43 columns. I have attached the columns-list.They have many char() and numeric columns.
For the table size, these are the corresponding entries from the pg_class
foo is the table and the others are some of its indexes.

relname                 reltuples     relpages
 foo                      2.9384E7      825699     
 foo_idx_pat         2.9384E7      684995     
 foo_idx_service   2.9384E7      433549     
 foo_idx_serv       2.9384E7      433435     
 foo_pk                 2.9384E7      109057     

Thanks
Sharmila

----- Original Message ----
From: Gregory Stark <stark@enterprisedb.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: Pavel Stehule <pavel.stehule@gmail.com>; pgsql-general@postgresql.org
Sent: Tuesday, November 6, 2007 8:03:48 PM
Subject: Re: [GENERAL] Postgresql simple query performance question

"SHARMILA JOTHIRAJAH" <sharmi_jo@yahoo.com> writes:

> I understand that. But why is that when oracle is given a hint to do full
> table scan instead of using index to get the count, it is still faster than
> postgres when both has the same explain plan? Oracle takes 34 sec and
> postgres takes 1 m10 sec . Is there anything that can be done in postgresql
> for speeding this up?

How large are the actual respective data files?

What are the columns in these tables? Do you have many char() and NUMERIC
columns?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Attachment

Re: Postgresql simple query performance question

From
SHARMILA JOTHIRAJAH
Date:
Hi
we are testing with version  PostgreSQL 8.2.3. We already have a production system in Oracle and we wanted to migrate it to postgresql. If some tests are already done, are the results available for us to see?
Ill also check postgres 8.3 beta.
Thanks again
Sharmila

----- Original Message ----
From: Simon Riggs <simon@2ndquadrant.com>
To: Bill Moran <wmoran@potentialtech.com>
Cc: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>; pgsql-general@postgresql.org
Sent: Wednesday, November 7, 2007 6:34:26 AM
Subject: Re: [GENERAL] Postgresql simple query performance question

On Tue, 2007-11-06 at 09:29 -0500, Bill Moran wrote:
> In response to SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>:
>
> > Hi
> > We are in the process of testing for migration of our database from Oracle to Postgresql.
> > I hava a simple query
> >
> > Select count(*) from foo
>
> This is asked a lot.  The quick answer is that PostgreSQL method of MVCC
> makes it impossible to make this query fast.  Perhaps, someday, some
> brilliant developer will come up with an optimization, but that hasn't
> happened yet.

What release level is being tested? It may already have happened.

8.3 is substantially faster at seq scans, so the tests should be re-run
on 8.3 beta.

Also, re-run the Postgres test. It should be faster the second time,
even if the database server is restarted between tests.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Postgresql simple query performance question

From
Reg Me Please
Date:
Il Wednesday 07 November 2007 13:47:26 SHARMILA JOTHIRAJAH ha scritto:
> Hi
> we are testing with version  PostgreSQL 8.2.3.

Why not using at least the current 8.2.5?

Read here
http://www.postgresql.org/docs/current/static/release.html

for details.

--
Reg me Please