Thread: Postgres 9.0 has a bias against indexes

Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
I have a table EMP, with 14 rows and a description like this:
scott=> \d+ emp
                              Table "public.emp"
   Column  |            Type             | Modifiers | Storage  |
Description
----------+-----------------------------+-----------+----------+-------------
  empno    | smallint                    | not null  | plain    |
  ename    | character varying(10)       |           | extended |
  job      | character varying(9)        |           | extended |
  mgr      | smallint                    |           | plain    |
  hiredate | timestamp without time zone |           | plain    |
  sal      | double precision            |           | plain    |
  comm     | double precision            |           | plain    |
  deptno   | smallint                    |           | plain    |
Indexes:
     "emp_pkey" PRIMARY KEY, btree (empno)
     "emp_mgr_i" btree (mgr)
Foreign-key constraints:
     "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)
Has OIDs: no

scott=>

A recursive query doesn't use existing index on mgr:
scott=> explain analyze
with recursive e(empno,ename,mgr,bossname,level) as (
select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839
union
select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1
from emp,e
where emp.mgr=e.empno)
select * from e;
                                                          QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
  CTE Scan on e  (cost=20.59..23.21 rows=131 width=78) (actual
time=0.020..0.143 rows=14 loops=1)
    CTE e
      ->  Recursive Union  (cost=0.00..20.59 rows=131 width=52) (actual
time=0.018..0.128 rows=14 loops=1)
            ->  Seq Scan on emp  (cost=0.00..1.18 rows=1 width=10)
(actual time=0.013..0.015 rows=1 loops=1)
                  Filter: (empno = 7839)
            ->  Hash Join  (cost=0.33..1.68 rows=13 width=52) (actual
time=0.016..0.021 rows=3 loops=4)
                  Hash Cond: (public.emp.mgr = e.empno)
                  ->  Seq Scan on emp  (cost=0.00..1.14 rows=14
width=10) (actual time=0.001..0.004 rows=14 loops=4)
                  ->  Hash  (cost=0.20..0.20 rows=10 width=44) (actual
time=0.004..0.004 rows=4 loops=4)
                        Buckets: 1024  Batches: 1  Memory Usage: 1kB
                        ->  WorkTable Scan on e  (cost=0.00..0.20
rows=10 width=44) (actual time=0.001..0.002 rows=4 loops=4)
  Total runtime: 0.218 ms
(12 rows)

scott=>

The optimizer will not use index, not even when I turn off both hash and
merge joins. This is not particularly important for a table with 14
rows, but for a larger table, this is a problem. The
only way to actually force the use of index is by disabling seqscan, but
that chooses a wrong path
again, because it reads the "outer" table by primary key, which will be
very slow. Full table scan,
done by the primary key is probably the slowest thing around. I know
about the PostgreSQL philosophy
which says "hints are bad", and I deeply disagree with it, but would it
be possible to have at
least one parameter that would change calculations in such a way that
indexes are favored, where they exist?

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
Kenneth Marshall
Date:
On Thu, Jan 27, 2011 at 10:41:08AM -0500, Mladen Gogala wrote:
> I have a table EMP, with 14 rows and a description like this:
> scott=> \d+ emp
>                              Table "public.emp"
>   Column  |            Type             | Modifiers | Storage  |
> Description
> ----------+-----------------------------+-----------+----------+-------------
>  empno    | smallint                    | not null  | plain    |
>  ename    | character varying(10)       |           | extended |
>  job      | character varying(9)        |           | extended |
>  mgr      | smallint                    |           | plain    |
>  hiredate | timestamp without time zone |           | plain    |
>  sal      | double precision            |           | plain    |
>  comm     | double precision            |           | plain    |
>  deptno   | smallint                    |           | plain    |
> Indexes:
>     "emp_pkey" PRIMARY KEY, btree (empno)
>     "emp_mgr_i" btree (mgr)
> Foreign-key constraints:
>     "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)
> Has OIDs: no
>
> scott=>
>
> A recursive query doesn't use existing index on mgr:
> scott=> explain analyze
> with recursive e(empno,ename,mgr,bossname,level) as (
> select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839
> union
> select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1
> from emp,e
> where emp.mgr=e.empno)
> select * from e;
>                                                          QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>  CTE Scan on e  (cost=20.59..23.21 rows=131 width=78) (actual
> time=0.020..0.143 rows=14 loops=1)
>    CTE e
>      ->  Recursive Union  (cost=0.00..20.59 rows=131 width=52) (actual
> time=0.018..0.128 rows=14 loops=1)
>            ->  Seq Scan on emp  (cost=0.00..1.18 rows=1 width=10) (actual
> time=0.013..0.015 rows=1 loops=1)
>                  Filter: (empno = 7839)
>            ->  Hash Join  (cost=0.33..1.68 rows=13 width=52) (actual
> time=0.016..0.021 rows=3 loops=4)
>                  Hash Cond: (public.emp.mgr = e.empno)
>                  ->  Seq Scan on emp  (cost=0.00..1.14 rows=14 width=10)
> (actual time=0.001..0.004 rows=14 loops=4)
>                  ->  Hash  (cost=0.20..0.20 rows=10 width=44) (actual
> time=0.004..0.004 rows=4 loops=4)
>                        Buckets: 1024  Batches: 1  Memory Usage: 1kB
>                        ->  WorkTable Scan on e  (cost=0.00..0.20 rows=10
> width=44) (actual time=0.001..0.002 rows=4 loops=4)
>  Total runtime: 0.218 ms
> (12 rows)
>
> scott=>
>
> The optimizer will not use index, not even when I turn off both hash and
> merge joins. This is not particularly important for a table with 14 rows,
> but for a larger table, this is a problem. The
> only way to actually force the use of index is by disabling seqscan, but
> that chooses a wrong path
> again, because it reads the "outer" table by primary key, which will be
> very slow. Full table scan,
> done by the primary key is probably the slowest thing around. I know about
> the PostgreSQL philosophy
> which says "hints are bad", and I deeply disagree with it, but would it be
> possible to have at
> least one parameter that would change calculations in such a way that
> indexes are favored, where they exist?
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com

Hi Mladen,

PostgreSQL will only use an index if the planner thinks that it
will be faster than the alternative, a sequential scan in this case.
For 14 rows, a sequential scan is 1 read and should actually be
faster than the index. Did you try the query using EXPLAIN ANALYZE
once with index and once without? What were the timings? If they
do not match reality, adjusting cost parameters would be in order.

Regards,
Ken

Re: Postgres 9.0 has a bias against indexes

From
"Kevin Grittner"
Date:
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:

> The optimizer will not use index, not even when I turn off both
> hash and merge joins. This is not particularly important for a
> table with 14 rows, but for a larger table, this is a problem.

If it still does that with a larger table.  Do you have an example
of that?  Showing that it goes straight to the data page when the
table only has one, without first wasting time going through the
index page, doesn't prove that it won't use the index when it might
actually help -- much less point to the cause of the issue in the
larger table, which might lead to a solution.

-Kevin

Re: Postgres 9.0 has a bias against indexes

From
J Sisson
Date:
Odds are that a table of 14 rows will more likely be cached in RAM
than a table of 14 million rows.  PostgreSQL would certainly be more
"openminded" to using an index if chances are low that the table is
cached.  If the table *is* cached, though, what point would there be
in reading an index?

Also, if random_page_cost is set to default (4.0), the planner will
tend towards sequential scans.  You can drop this number a bit to
"help" the planner be more selective of indexes...and there's also
cpu_tuple_* settings that can be modified to pursuade the planner to
use indexes.

Doubtful that any prodding will force an index scan with a cached
table of 14 rows, though...

On 1/27/11, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
> I have a table EMP, with 14 rows and a description like this:
> scott=> \d+ emp
>                               Table "public.emp"
>    Column  |            Type             | Modifiers | Storage  |
> Description
> ----------+-----------------------------+-----------+----------+-------------
>   empno    | smallint                    | not null  | plain    |
>   ename    | character varying(10)       |           | extended |
>   job      | character varying(9)        |           | extended |
>   mgr      | smallint                    |           | plain    |
>   hiredate | timestamp without time zone |           | plain    |
>   sal      | double precision            |           | plain    |
>   comm     | double precision            |           | plain    |
>   deptno   | smallint                    |           | plain    |
> Indexes:
>      "emp_pkey" PRIMARY KEY, btree (empno)
>      "emp_mgr_i" btree (mgr)
> Foreign-key constraints:
>      "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno)
> Has OIDs: no
>
> scott=>
>
> A recursive query doesn't use existing index on mgr:
> scott=> explain analyze
> with recursive e(empno,ename,mgr,bossname,level) as (
> select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839
> union
> select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1
> from emp,e
> where emp.mgr=e.empno)
> select * from e;
>                                                           QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>   CTE Scan on e  (cost=20.59..23.21 rows=131 width=78) (actual
> time=0.020..0.143 rows=14 loops=1)
>     CTE e
>       ->  Recursive Union  (cost=0.00..20.59 rows=131 width=52) (actual
> time=0.018..0.128 rows=14 loops=1)
>             ->  Seq Scan on emp  (cost=0.00..1.18 rows=1 width=10)
> (actual time=0.013..0.015 rows=1 loops=1)
>                   Filter: (empno = 7839)
>             ->  Hash Join  (cost=0.33..1.68 rows=13 width=52) (actual
> time=0.016..0.021 rows=3 loops=4)
>                   Hash Cond: (public.emp.mgr = e.empno)
>                   ->  Seq Scan on emp  (cost=0.00..1.14 rows=14
> width=10) (actual time=0.001..0.004 rows=14 loops=4)
>                   ->  Hash  (cost=0.20..0.20 rows=10 width=44) (actual
> time=0.004..0.004 rows=4 loops=4)
>                         Buckets: 1024  Batches: 1  Memory Usage: 1kB
>                         ->  WorkTable Scan on e  (cost=0.00..0.20
> rows=10 width=44) (actual time=0.001..0.002 rows=4 loops=4)
>   Total runtime: 0.218 ms
> (12 rows)
>
> scott=>
>
> The optimizer will not use index, not even when I turn off both hash and
> merge joins. This is not particularly important for a table with 14
> rows, but for a larger table, this is a problem. The
> only way to actually force the use of index is by disabling seqscan, but
> that chooses a wrong path
> again, because it reads the "outer" table by primary key, which will be
> very slow. Full table scan,
> done by the primary key is probably the slowest thing around. I know
> about the PostgreSQL philosophy
> which says "hints are bad", and I deeply disagree with it, but would it
> be possible to have at
> least one parameter that would change calculations in such a way that
> indexes are favored, where they exist?
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


--
Computers are like air conditioners...
They quit working when you open Windows.

Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 10:45 AM, Kenneth Marshall wrote:
> PostgreSQL will only use an index if the planner thinks that it
> will be faster than the alternative, a sequential scan in this case.
> For 14 rows, a sequential scan is 1 read and should actually be
> faster than the index. Did you try the query using EXPLAIN ANALYZE
> once with index and once without? What were the timings? If they
> do not match reality, adjusting cost parameters would be in order.
>
I did. I even tried with an almost equivalent outer join:

  explain analyze select e1.empno,e1.ename,e2.empno,e2.ename
from emp e1 left outer join emp e2 on (e1.mgr=e2.empno);
                                                   QUERY PLAN

--------------------------------------------------------------------------------
------------------------------
  Nested Loop Left Join  (cost=0.00..7.25 rows=14 width=16) (actual
time=0.028..0
.105 rows=14 loops=1)
    Join Filter: (e1.mgr = e2.empno)
    ->  Seq Scan on emp e1  (cost=0.00..2.14 rows=14 width=10) (actual
time=0.006
..0.010 rows=14 loops=1)
    ->  Materialize  (cost=0.00..2.21 rows=14 width=8) (actual
time=0.001..0.003
rows=14 loops=14)
          ->  Seq Scan on emp e2  (cost=0.00..2.14 rows=14 width=8)
(actual time=
0.001..0.005 rows=14 loops=1)
  Total runtime: 0.142 ms
(6 rows)

This gives me the same result as the recursive version, minus the level
column. I am porting an application from Oracle, there is a fairly large
table that is accessed by "connect by". Rewriting it as a recursive join
is not a problem, but the optimizer doesn't really use the indexes.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 10:51 AM, J Sisson wrote:
> Also, if random_page_cost is set to default (4.0), the planner will
> tend towards sequential scans.
scott=> show random_page_cost;
  random_page_cost
------------------
  1
(1 row)

scott=> show seq_page_cost;
  seq_page_cost
---------------
  2
(1 row)


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
David Wilson
Date:


On Thu, Jan 27, 2011 at 10:56 AM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
I even tried with an almost equivalent outer join:

 explain analyze select e1.empno,e1.ename,e2.empno,e2.ename
from emp e1 left outer join emp e2 on (e1.mgr=e2.empno);
                                                 QUERY PLAN

--------------------------------------------------------------------------------
------------------------------
 Nested Loop Left Join  (cost=0.00..7.25 rows=14 width=16) (actual time=0.028..0
.105 rows=14 loops=1)
  Join Filter: (e1.mgr = e2.empno)
  ->  Seq Scan on emp e1  (cost=0.00..2.14 rows=14 width=10) (actual time=0.006
..0.010 rows=14 loops=1)
  ->  Materialize  (cost=0.00..2.21 rows=14 width=8) (actual time=0.001..0.003
rows=14 loops=14)
        ->  Seq Scan on emp e2  (cost=0.00..2.14 rows=14 width=8) (actual time=
0.001..0.005 rows=14 loops=1)
 Total runtime: 0.142 ms
(6 rows)

This gives me the same result as the recursive version, minus the level column. I am porting an application from Oracle, there is a fairly large table that is accessed by "connect by". Rewriting it as a recursive join is not a problem, but the optimizer doesn't really use the indexes.


You're still using a 14 row table, though. Postgres isn't going to be stupid enough to use an index in this case when the seq scan is clearly faster unless you go out of your way to absolutely force it to do so. If the table is going to be "fairly large", that's the size you need to be testing and tuning with.

--
- David T. Wilson
david.t.wilson@gmail.com

Re: Postgres 9.0 has a bias against indexes

From
Tom Lane
Date:
David Wilson <david.t.wilson@gmail.com> writes:
> You're still using a 14 row table, though.

Exactly.  Please note what it says in the fine manual:

    It is worth noting that EXPLAIN results should not be extrapolated
    to situations other than the one you are actually testing; for
    example, results on a toy-sized table cannot be assumed to apply to
    large tables. The planner's cost estimates are not linear and so it
    might choose a different plan for a larger or smaller table. An
    extreme example is that on a table that only occupies one disk page,
    you'll nearly always get a sequential scan plan whether indexes are
    available or not. The planner realizes that it's going to take one
    disk page read to process the table in any case, so there's no value
    in expending additional page reads to look at an index.


            regards, tom lane

Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 11:40 AM, Tom Lane wrote:
> It is worth noting that EXPLAIN results should not be extrapolated
>      to situations other than the one you are actually testing; for
>      example, results on a toy-sized table cannot be assumed to apply to
>      large tables.
Well, that's precisely what I tried. Bummer, I will have to copy a large
table over.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
"Igor Neyman"
Date:

> -----Original Message-----
> From: Mladen Gogala [mailto:mladen.gogala@vmsinfo.com]
> Sent: Thursday, January 27, 2011 12:00 PM
> To: Tom Lane
> Cc: David Wilson; Kenneth Marshall; pgsql-performance@postgresql.org
> Subject: Re: Postgres 9.0 has a bias against indexes
>
> On 1/27/2011 11:40 AM, Tom Lane wrote:
> > It is worth noting that EXPLAIN results should not be extrapolated
> >      to situations other than the one you are actually testing; for
> >      example, results on a toy-sized table cannot be
> assumed to apply to
> >      large tables.
> Well, that's precisely what I tried. Bummer, I will have to
> copy a large table over.
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>

Mladen,

I don't think, this is exclusive Postgres feature.
I'm pretty sure, Oracle optimizer will do "TABLE ACCESS (FULL)" instead
of using index on 14-row table either.

Regards,
Igor Neyman

Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 3:10 PM, Igor Neyman wrote:
>
> Mladen,
>
> I don't think, this is exclusive Postgres feature.
> I'm pretty sure, Oracle optimizer will do "TABLE ACCESS (FULL)" instead
> of using index on 14-row table either.
>
> Regards,
> Igor Neyman

Well, lets' see:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0    Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

Elapsed: 00:00:00.00
SQL> set autotrace on explain
SQL> with e(empno,ename,mgr,bossname,lev) as (
   2  select empno,ename,mgr,NULL,0 from emp where empno=7839
   3  union all
   4  select emp.empno,emp.ename,emp.mgr,e.ename,e.lev+1
   5  from emp,e
   6  where emp.mgr=e.empno)
   7  select * from e
   8  /

      EMPNO ENAME             MGR BOSSNAME          LEV
---------- ---------- ---------- ---------- ----------
       7839 KING                                      0
       7566 JONES            7839 KING                1
       7698 BLAKE            7839 KING                1
       7782 CLARK            7839 KING                1
       7499 ALLEN            7698 BLAKE               2
       7521 WARD             7698 BLAKE               2
       7654 MARTIN           7698 BLAKE               2
       7788 SCOTT            7566 JONES               2
       7844 TURNER           7698 BLAKE               2
       7900 JAMES            7698 BLAKE               2
       7902 FORD             7566 JONES               2

      EMPNO ENAME             MGR BOSSNAME          LEV
---------- ---------- ---------- ---------- ----------
       7934 MILLER           7782 CLARK               2
       7369 SMITH            7902 FORD                3
       7876 ADAMS            7788 SCOTT               3

14 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 2925328376

--------------------------------------------------------------------------------
--------------------

| Id  | Operation                                 | Name   | Rows  |
Bytes | Cos
t (%CPU)| Time     |

--------------------------------------------------------------------------------
--------------------

|   0 | SELECT STATEMENT                          |        |    15 |   795 |
  6  (17)| 00:00:56 |

|   1 |  VIEW                                     |        |    15 |   795 |
  6  (17)| 00:00:56 |

|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|        |       |       |
         |          |

|   3 |    TABLE ACCESS BY INDEX ROWID            | EMP    |     1 |    24 |
  1   (0)| 00:00:11 |

|*  4 |     INDEX UNIQUE SCAN                     | PK_EMP |     1 |       |
  0   (0)| 00:00:01 |

|*  5 |    HASH JOIN                              |        |    14 |   798 |
  5  (20)| 00:00:46 |

|   6 |     RECURSIVE WITH PUMP                   |        |       |       |
         |          |

|   7 |     TABLE ACCESS FULL                     | EMP    |    14 |   336 |
  3   (0)| 00:00:31 |

--------------------------------------------------------------------------------
--------------------


Predicate Information (identified by operation id):
---------------------------------------------------

    4 - access("EMPNO"=7839)
    5 - access("EMP"."MGR"="E"."EMPNO")

Note
-----
    - SQL plan baseline "SQL_PLAN_1tmxjj25531vff51d791e" used for this
statement

SQL> spool off


There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.

That's because Oracle has covering indexes.

Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
> <mladen.gogala@vmsinfo.com>  wrote:
>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
> That's because Oracle has covering indexes.
>
I am not sure what you mean by "covering indexes" but I hope that for
the larger table I have in mind,  indexes will be used.  For a small
table like this, not using an index may actually be a better plan. I
cannot compare because my development PostgreSQL cluster is on a much
weaker machine than the development Oracle database.
I even looked into Wikipedia for the notion of "covering index" and it
is defined as an index which contains all the data requested in a query.
This is not the case, EMP is not an index-organized table. The only
index used was the primary key, also available in the PostgreSQL version
of the table.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> On 1/27/2011 3:37 PM, Scott Marlowe wrote:
>>
>> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
>> <mladen.gogala@vmsinfo.com>  wrote:
>>>
>>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
>>
>> That's because Oracle has covering indexes.
>>
> I am not sure what you mean by "covering indexes" but I hope that for the
> larger table I have in mind,  indexes will be used.  For a small table like

In Oracle you can hit JUST the index to get the data you need (and
maybe rollback logs, which are generally pretty small)

In Pgsql, once you hit the index you must then hit the actual data
store to get the right version of your tuple.  So, index access in pg
is more expensive than in Oracle.  However, updates are cheaper.
Always a trade off

Re: Postgres 9.0 has a bias against indexes

From
"Igor Neyman"
Date:

> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Thursday, January 27, 2011 3:59 PM
> To: Mladen Gogala
> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall;
> pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
>
> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
> <mladen.gogala@vmsinfo.com> wrote:
> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> >>
> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
> >> <mladen.gogala@vmsinfo.com>  wrote:
> >>>
> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
> >>
> >> That's because Oracle has covering indexes.
> >>
> > I am not sure what you mean by "covering indexes" but I
> hope that for
> > the larger table I have in mind,  indexes will be used.  
> For a small
> > table like
>
> In Oracle you can hit JUST the index to get the data you need
> (and maybe rollback logs, which are generally pretty small)
>
> In Pgsql, once you hit the index you must then hit the actual
> data store to get the right version of your tuple.  So, index
> access in pg is more expensive than in Oracle.  However,
> updates are cheaper.
> Always a trade off
>
>

Scott,
What you describe here isn't about "covering indexes" - it's about different ways implementing MVCC in Oracle and PG.

Mladen,
you were right.
For recursive query like yours Oracle uses index even on small table.
I made an assumption without testing it.
However some other (non-recursive) queries against the same small table that also require reading all 14 rows do "table
scan".

Regards,
Igor Neyman

Re: Postgres 9.0 has a bias against indexes

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman <ineyman@perceptron.com> wrote:
>
>
>> -----Original Message-----
>> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>> Sent: Thursday, January 27, 2011 3:59 PM
>> To: Mladen Gogala
>> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall;
>> pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
>>
>> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
>> <mladen.gogala@vmsinfo.com> wrote:
>> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
>> >>
>> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
>> >> <mladen.gogala@vmsinfo.com>  wrote:
>> >>>
>> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
>> >>
>> >> That's because Oracle has covering indexes.
>> >>
>> > I am not sure what you mean by "covering indexes" but I
>> hope that for
>> > the larger table I have in mind,  indexes will be used.
>> For a small
>> > table like
>>
>> In Oracle you can hit JUST the index to get the data you need
>> (and maybe rollback logs, which are generally pretty small)
>>
>> In Pgsql, once you hit the index you must then hit the actual
>> data store to get the right version of your tuple.  So, index
>> access in pg is more expensive than in Oracle.  However,
>> updates are cheaper.
>> Always a trade off
>>
>>
>
> Scott,
> What you describe here isn't about "covering indexes" - it's about different ways implementing MVCC in Oracle and PG.

It is about covering indexes AND it's about the difference in how MVCC
is implemented in both databases.

Re: Postgres 9.0 has a bias against indexes

From
"Igor Neyman"
Date:

> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Thursday, January 27, 2011 4:16 PM
> To: Igor Neyman
> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall;
> pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
>
> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman
> <ineyman@perceptron.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> >> Sent: Thursday, January 27, 2011 3:59 PM
> >> To: Mladen Gogala
> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall;
> >> pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >>
> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
> >> <mladen.gogala@vmsinfo.com> wrote:
> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> >> >>
> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
> >> >> <mladen.gogala@vmsinfo.com>  wrote:
> >> >>>
> >> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
> >> >>
> >> >> That's because Oracle has covering indexes.
> >> >>
> >> > I am not sure what you mean by "covering indexes" but I
> >> hope that for
> >> > the larger table I have in mind,  indexes will be used.
> >> For a small
> >> > table like
> >>
> >> In Oracle you can hit JUST the index to get the data you need (and
> >> maybe rollback logs, which are generally pretty small)
> >>
> >> In Pgsql, once you hit the index you must then hit the actual data
> >> store to get the right version of your tuple.  So, index
> access in pg
> >> is more expensive than in Oracle.  However, updates are cheaper.
> >> Always a trade off
> >>
> >>
> >
> > Scott,
> > What you describe here isn't about "covering indexes" -
> it's about different ways implementing MVCC in Oracle and PG.
>
> It is about covering indexes AND it's about the difference in
> how MVCC is implemented in both databases.
>
>

Well, Mladen's query doesn't involve covering indexes.

Re: Postgres 9.0 has a bias against indexes

From
Kenneth Marshall
Date:
On Thu, Jan 27, 2011 at 04:12:53PM -0500, Igor Neyman wrote:
>
>
> > -----Original Message-----
> > From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> > Sent: Thursday, January 27, 2011 3:59 PM
> > To: Mladen Gogala
> > Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall;
> > pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >
> > On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
> > <mladen.gogala@vmsinfo.com> wrote:
> > > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> > >>
> > >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
> > >> <mladen.gogala@vmsinfo.com> ?wrote:
> > >>>
> > >>> There is INDEX UNIQUE SCAN PK_EMP. ?Oracle will use an index.
> > >>
> > >> That's because Oracle has covering indexes.
> > >>
> > > I am not sure what you mean by "covering indexes" but I
> > hope that for
> > > the larger table I have in mind, ?indexes will be used. ?
> > For a small
> > > table like
> >
> > In Oracle you can hit JUST the index to get the data you need
> > (and maybe rollback logs, which are generally pretty small)
> >
> > In Pgsql, once you hit the index you must then hit the actual
> > data store to get the right version of your tuple.  So, index
> > access in pg is more expensive than in Oracle.  However,
> > updates are cheaper.
> > Always a trade off
> >
> >
>
> Scott,
> What you describe here isn't about "covering indexes" - it's about different ways implementing MVCC in Oracle and PG.
>
> Mladen,
> you were right.
> For recursive query like yours Oracle uses index even on small table.
> I made an assumption without testing it.
> However some other (non-recursive) queries against the same small table that also require reading all 14 rows do
"tablescan". 
>
> Regards,
> Igor Neyman
>
Interesting. Can you force it to use a Seqential Scan and if so, how
does that affect the timing? i.e. Is the index scan actually faster?

Cheers,
Ken

Re: Postgres 9.0 has a bias against indexes

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 2:18 PM, Igor Neyman <ineyman@perceptron.com> wrote:
>
>> -----Original Message-----
>> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>> Sent: Thursday, January 27, 2011 4:16 PM
>> To: Igor Neyman
>> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall;
>> pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
>>
>> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman
>> <ineyman@perceptron.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>> >> Sent: Thursday, January 27, 2011 3:59 PM
>> >> To: Mladen Gogala
>> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall;
>> >> pgsql-performance@postgresql.org
>> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
>> >>
>> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
>> >> <mladen.gogala@vmsinfo.com> wrote:
>> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
>> >> >>
>> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
>> >> >> <mladen.gogala@vmsinfo.com>  wrote:
>> >> >>>
>> >> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.
>> >> >>
>> >> >> That's because Oracle has covering indexes.
>> >> >>
>> >> > I am not sure what you mean by "covering indexes" but I
>> >> hope that for
>> >> > the larger table I have in mind,  indexes will be used.
>> >> For a small
>> >> > table like
>> >>
>> >> In Oracle you can hit JUST the index to get the data you need (and
>> >> maybe rollback logs, which are generally pretty small)
>> >>
>> >> In Pgsql, once you hit the index you must then hit the actual data
>> >> store to get the right version of your tuple.  So, index
>> access in pg
>> >> is more expensive than in Oracle.  However, updates are cheaper.
>> >> Always a trade off
>> >>
>> >>
>> >
>> > Scott,
>> > What you describe here isn't about "covering indexes" -
>> it's about different ways implementing MVCC in Oracle and PG.
>>
>> It is about covering indexes AND it's about the difference in
>> how MVCC is implemented in both databases.
>>
>>
>
> Well, Mladen's query doesn't involve covering indexes.

On Oracle?  Then how can it get the values it needs without having to
hit the data store?

Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 4:20 PM, Kenneth Marshall wrote:
> Interesting. Can you force it to use a Seqential Scan and if so, how
> does that affect the timing? i.e. Is the index scan actually faster?
>
> Cheers,
> Ken
Yes,  Oracle can be forced into doing a sequential scan and it is
actually faster than an index scan:

SQL> set autotrace on explain
SQL> with e(empno,ename,mgr,bossname,lev) as (
   2  select empno,ename,mgr,NULL,0 from emp where empno=7839
   3  union all
   4  select emp.empno,emp.ename,emp.mgr,e.ename,e.lev+1
   5  from emp,e
   6  where emp.mgr=e.empno)
   7  select * from e
   8  /

      EMPNO ENAME             MGR BOSSNAME          LEV
---------- ---------- ---------- ---------- ----------
       7839 KING                                      0
       7566 JONES            7839 KING                1
       7698 BLAKE            7839 KING                1
       7782 CLARK            7839 KING                1
       7499 ALLEN            7698 BLAKE               2
       7521 WARD             7698 BLAKE               2
       7654 MARTIN           7698 BLAKE               2
       7788 SCOTT            7566 JONES               2
       7844 TURNER           7698 BLAKE               2
       7900 JAMES            7698 BLAKE               2
       7902 FORD             7566 JONES               2

      EMPNO ENAME             MGR BOSSNAME          LEV
---------- ---------- ---------- ---------- ----------
       7934 MILLER           7782 CLARK               2
       7369 SMITH            7902 FORD                3
       7876 ADAMS            7788 SCOTT               3

14 rows selected.

Elapsed: 00:00:00.18

Execution Plan
----------------------------------------------------------
Plan hash value: 2925328376

--------------------------------------------------------------------------------
--------------------

| Id  | Operation                                 | Name   | Rows  |
Bytes | Cos
t (%CPU)| Time     |

--------------------------------------------------------------------------------
--------------------

|   0 | SELECT STATEMENT                          |        |    15 |   795 |
  6  (17)| 00:00:56 |

|   1 |  VIEW                                     |        |    15 |   795 |
  6  (17)| 00:00:56 |

|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|        |       |       |
         |          |

|   3 |    TABLE ACCESS BY INDEX ROWID            | EMP    |     1 |    24 |
  1   (0)| 00:00:11 |

|*  4 |     INDEX UNIQUE SCAN                     | PK_EMP |     1 |       |
  0   (0)| 00:00:01 |

|*  5 |    HASH JOIN                              |        |    14 |   798 |
  5  (20)| 00:00:46 |

|   6 |     RECURSIVE WITH PUMP                   |        |       |       |
         |          |

|   7 |     TABLE ACCESS FULL                     | EMP    |    14 |   336 |
  3   (0)| 00:00:31 |

--------------------------------------------------------------------------------
--------------------


Predicate Information (identified by operation id):
---------------------------------------------------

    4 - access("EMPNO"=7839)
    5 - access("EMP"."MGR"="E"."EMPNO")

Note
-----
    - SQL plan baseline "SQL_PLAN_1tmxjj25531vff51d791e" used for this
statement

SQL>
SQL> with e1(empno,ename,mgr,bossname,lev) as (
   2  select /*+ full(emp) */ empno,ename,mgr,NULL,0 from emp where
empno=7839
   3  union all
   4  select /*+ full(e2) */
   5        e2.empno,e2.ename,e2.mgr,e1.ename,e1.lev+1
   6  from emp e2,e1
   7  where e2.mgr=e1.empno)
   8  select * from e1
   9  /

      EMPNO ENAME             MGR BOSSNAME          LEV
---------- ---------- ---------- ---------- ----------
       7839 KING                                      0
       7566 JONES            7839 KING                1
       7698 BLAKE            7839 KING                1
       7782 CLARK            7839 KING                1
       7499 ALLEN            7698 BLAKE               2
       7521 WARD             7698 BLAKE               2
       7654 MARTIN           7698 BLAKE               2
       7788 SCOTT            7566 JONES               2
       7844 TURNER           7698 BLAKE               2
       7900 JAMES            7698 BLAKE               2
       7902 FORD             7566 JONES               2

      EMPNO ENAME             MGR BOSSNAME          LEV
---------- ---------- ---------- ---------- ----------
       7934 MILLER           7782 CLARK               2
       7369 SMITH            7902 FORD                3
       7876 ADAMS            7788 SCOTT               3

14 rows selected.

Elapsed: 00:00:00.14

Execution Plan
----------------------------------------------------------
Plan hash value: 2042363665

--------------------------------------------------------------------------------
------------------

| Id  | Operation                                 | Name | Rows  | Bytes
| Cost
(%CPU)| Time     |

--------------------------------------------------------------------------------
------------------

|   0 | SELECT STATEMENT                          |      |    15 |   795
|    10
   (10)| 00:01:36 |

|   1 |  VIEW                                     |      |    15 |   795
|    10
   (10)| 00:01:36 |

|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |       |       |
       |          |

|*  3 |    TABLE ACCESS FULL                      | EMP  |     1 |    24
|     3
    (0)| 00:00:31 |

|*  4 |    HASH JOIN                              |      |    14 |   798
|     7
   (15)| 00:01:06 |

|   5 |     RECURSIVE WITH PUMP                   |      |       |       |
       |          |

|   6 |     TABLE ACCESS FULL                     | EMP  |    14 |   336
|     3
    (0)| 00:00:31 |

--------------------------------------------------------------------------------
------------------


Predicate Information (identified by operation id):
---------------------------------------------------

    3 - filter("EMPNO"=7839)
    4 - access("E2"."MGR"="E1"."EMPNO")

SQL> spool off

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
"Igor Neyman"
Date:

> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Thursday, January 27, 2011 4:25 PM
> To: Igor Neyman
> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall;
> pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
>
> On Thu, Jan 27, 2011 at 2:18 PM, Igor Neyman
> <ineyman@perceptron.com> wrote:
> >
> >> -----Original Message-----
> >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> >> Sent: Thursday, January 27, 2011 4:16 PM
> >> To: Igor Neyman
> >> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall;
> >> pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >>
> >> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman
> <ineyman@perceptron.com>
> >> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> >> >> Sent: Thursday, January 27, 2011 3:59 PM
> >> >> To: Mladen Gogala
> >> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall;
> >> >> pgsql-performance@postgresql.org
> >> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes
> >> >>
> >> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala
> >> >> <mladen.gogala@vmsinfo.com> wrote:
> >> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote:
> >> >> >>
> >> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
> >> >> >> <mladen.gogala@vmsinfo.com>  wrote:
> >> >> >>>
> >> >> >>> There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use
> an index.
> >> >> >>
> >> >> >> That's because Oracle has covering indexes.
> >> >> >>
> >> >> > I am not sure what you mean by "covering indexes" but I
> >> >> hope that for
> >> >> > the larger table I have in mind,  indexes will be used.
> >> >> For a small
> >> >> > table like
> >> >>
> >> >> In Oracle you can hit JUST the index to get the data
> you need (and
> >> >> maybe rollback logs, which are generally pretty small)
> >> >>
> >> >> In Pgsql, once you hit the index you must then hit the
> actual data
> >> >> store to get the right version of your tuple.  So, index
> >> access in pg
> >> >> is more expensive than in Oracle.  However, updates are cheaper.
> >> >> Always a trade off
> >> >>
> >> >>
> >> >
> >> > Scott,
> >> > What you describe here isn't about "covering indexes" -
> >> it's about different ways implementing MVCC in Oracle and PG.
> >>
> >> It is about covering indexes AND it's about the difference in how
> >> MVCC is implemented in both databases.
> >>
> >>
> >
> > Well, Mladen's query doesn't involve covering indexes.
>
> On Oracle?  Then how can it get the values it needs without
> having to hit the data store?
>
>

It doesn't.
It does "INDEX UNIQUE SCAN" and then "TABLE ACCESS BY INDEX ROWID".

Re: Postgres 9.0 has a bias against indexes

From
Scott Marlowe
Date:
On Thu, Jan 27, 2011 at 2:32 PM, Igor Neyman <ineyman@perceptron.com> wrote:
>> On Oracle?  Then how can it get the values it needs without
>> having to hit the data store?
>
> It doesn't.
> It does "INDEX UNIQUE SCAN" and then "TABLE ACCESS BY INDEX ROWID".

Ahhh, ok.  I thought Oracle used covering indexes by default.

Re: Postgres 9.0 has a bias against indexes

From
Mladen Gogala
Date:
On 1/27/2011 4:25 PM, Scott Marlowe wrote:
> On Oracle?  Then how can it get the values it needs without having to
> hit the data store?

It can't. It does hit the data store.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Postgres 9.0 has a bias against indexes

From
"Kevin Grittner"
Date:
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:

> Yes,  Oracle can be forced into doing a sequential scan and it is
> actually faster than an index scan:

And PostgreSQL can be coerced to use an indexed scan.  Its plans are
cost-based, with user configurable cost factors; so if you tell it
that seq_page_cost and random_page_cost are both equal to some
really low value (like 0.001), you'll get an index scan.  Part of
the process of tuning PostgreSQL is to discover the relative
*actual* costs on *your environment* (which is largely dependent on
the degree of caching of the active portion of your database).  When
you get your costing factors to approximate reality, the optimizer
will do a pretty good job of picking the fastest plan.

-Kevin