Thread: Postgres 9.0 has a bias against indexes
I have a table EMP, with 14 rows and a description like this: scott=> \d+ emp Table "public.emp" Column | Type | Modifiers | Storage | Description ----------+-----------------------------+-----------+----------+------------- empno | smallint | not null | plain | ename | character varying(10) | | extended | job | character varying(9) | | extended | mgr | smallint | | plain | hiredate | timestamp without time zone | | plain | sal | double precision | | plain | comm | double precision | | plain | deptno | smallint | | plain | Indexes: "emp_pkey" PRIMARY KEY, btree (empno) "emp_mgr_i" btree (mgr) Foreign-key constraints: "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno) Has OIDs: no scott=> A recursive query doesn't use existing index on mgr: scott=> explain analyze with recursive e(empno,ename,mgr,bossname,level) as ( select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839 union select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1 from emp,e where emp.mgr=e.empno) select * from e; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- CTE Scan on e (cost=20.59..23.21 rows=131 width=78) (actual time=0.020..0.143 rows=14 loops=1) CTE e -> Recursive Union (cost=0.00..20.59 rows=131 width=52) (actual time=0.018..0.128 rows=14 loops=1) -> Seq Scan on emp (cost=0.00..1.18 rows=1 width=10) (actual time=0.013..0.015 rows=1 loops=1) Filter: (empno = 7839) -> Hash Join (cost=0.33..1.68 rows=13 width=52) (actual time=0.016..0.021 rows=3 loops=4) Hash Cond: (public.emp.mgr = e.empno) -> Seq Scan on emp (cost=0.00..1.14 rows=14 width=10) (actual time=0.001..0.004 rows=14 loops=4) -> Hash (cost=0.20..0.20 rows=10 width=44) (actual time=0.004..0.004 rows=4 loops=4) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> WorkTable Scan on e (cost=0.00..0.20 rows=10 width=44) (actual time=0.001..0.002 rows=4 loops=4) Total runtime: 0.218 ms (12 rows) scott=> The optimizer will not use index, not even when I turn off both hash and merge joins. This is not particularly important for a table with 14 rows, but for a larger table, this is a problem. The only way to actually force the use of index is by disabling seqscan, but that chooses a wrong path again, because it reads the "outer" table by primary key, which will be very slow. Full table scan, done by the primary key is probably the slowest thing around. I know about the PostgreSQL philosophy which says "hints are bad", and I deeply disagree with it, but would it be possible to have at least one parameter that would change calculations in such a way that indexes are favored, where they exist? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Jan 27, 2011 at 10:41:08AM -0500, Mladen Gogala wrote: > I have a table EMP, with 14 rows and a description like this: > scott=> \d+ emp > Table "public.emp" > Column | Type | Modifiers | Storage | > Description > ----------+-----------------------------+-----------+----------+------------- > empno | smallint | not null | plain | > ename | character varying(10) | | extended | > job | character varying(9) | | extended | > mgr | smallint | | plain | > hiredate | timestamp without time zone | | plain | > sal | double precision | | plain | > comm | double precision | | plain | > deptno | smallint | | plain | > Indexes: > "emp_pkey" PRIMARY KEY, btree (empno) > "emp_mgr_i" btree (mgr) > Foreign-key constraints: > "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno) > Has OIDs: no > > scott=> > > A recursive query doesn't use existing index on mgr: > scott=> explain analyze > with recursive e(empno,ename,mgr,bossname,level) as ( > select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839 > union > select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1 > from emp,e > where emp.mgr=e.empno) > select * from e; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > CTE Scan on e (cost=20.59..23.21 rows=131 width=78) (actual > time=0.020..0.143 rows=14 loops=1) > CTE e > -> Recursive Union (cost=0.00..20.59 rows=131 width=52) (actual > time=0.018..0.128 rows=14 loops=1) > -> Seq Scan on emp (cost=0.00..1.18 rows=1 width=10) (actual > time=0.013..0.015 rows=1 loops=1) > Filter: (empno = 7839) > -> Hash Join (cost=0.33..1.68 rows=13 width=52) (actual > time=0.016..0.021 rows=3 loops=4) > Hash Cond: (public.emp.mgr = e.empno) > -> Seq Scan on emp (cost=0.00..1.14 rows=14 width=10) > (actual time=0.001..0.004 rows=14 loops=4) > -> Hash (cost=0.20..0.20 rows=10 width=44) (actual > time=0.004..0.004 rows=4 loops=4) > Buckets: 1024 Batches: 1 Memory Usage: 1kB > -> WorkTable Scan on e (cost=0.00..0.20 rows=10 > width=44) (actual time=0.001..0.002 rows=4 loops=4) > Total runtime: 0.218 ms > (12 rows) > > scott=> > > The optimizer will not use index, not even when I turn off both hash and > merge joins. This is not particularly important for a table with 14 rows, > but for a larger table, this is a problem. The > only way to actually force the use of index is by disabling seqscan, but > that chooses a wrong path > again, because it reads the "outer" table by primary key, which will be > very slow. Full table scan, > done by the primary key is probably the slowest thing around. I know about > the PostgreSQL philosophy > which says "hints are bad", and I deeply disagree with it, but would it be > possible to have at > least one parameter that would change calculations in such a way that > indexes are favored, where they exist? > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com Hi Mladen, PostgreSQL will only use an index if the planner thinks that it will be faster than the alternative, a sequential scan in this case. For 14 rows, a sequential scan is 1 read and should actually be faster than the index. Did you try the query using EXPLAIN ANALYZE once with index and once without? What were the timings? If they do not match reality, adjusting cost parameters would be in order. Regards, Ken
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > The optimizer will not use index, not even when I turn off both > hash and merge joins. This is not particularly important for a > table with 14 rows, but for a larger table, this is a problem. If it still does that with a larger table. Do you have an example of that? Showing that it goes straight to the data page when the table only has one, without first wasting time going through the index page, doesn't prove that it won't use the index when it might actually help -- much less point to the cause of the issue in the larger table, which might lead to a solution. -Kevin
Odds are that a table of 14 rows will more likely be cached in RAM than a table of 14 million rows. PostgreSQL would certainly be more "openminded" to using an index if chances are low that the table is cached. If the table *is* cached, though, what point would there be in reading an index? Also, if random_page_cost is set to default (4.0), the planner will tend towards sequential scans. You can drop this number a bit to "help" the planner be more selective of indexes...and there's also cpu_tuple_* settings that can be modified to pursuade the planner to use indexes. Doubtful that any prodding will force an index scan with a cached table of 14 rows, though... On 1/27/11, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > I have a table EMP, with 14 rows and a description like this: > scott=> \d+ emp > Table "public.emp" > Column | Type | Modifiers | Storage | > Description > ----------+-----------------------------+-----------+----------+------------- > empno | smallint | not null | plain | > ename | character varying(10) | | extended | > job | character varying(9) | | extended | > mgr | smallint | | plain | > hiredate | timestamp without time zone | | plain | > sal | double precision | | plain | > comm | double precision | | plain | > deptno | smallint | | plain | > Indexes: > "emp_pkey" PRIMARY KEY, btree (empno) > "emp_mgr_i" btree (mgr) > Foreign-key constraints: > "fk_deptno" FOREIGN KEY (deptno) REFERENCES dept(deptno) > Has OIDs: no > > scott=> > > A recursive query doesn't use existing index on mgr: > scott=> explain analyze > with recursive e(empno,ename,mgr,bossname,level) as ( > select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839 > union > select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1 > from emp,e > where emp.mgr=e.empno) > select * from e; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > CTE Scan on e (cost=20.59..23.21 rows=131 width=78) (actual > time=0.020..0.143 rows=14 loops=1) > CTE e > -> Recursive Union (cost=0.00..20.59 rows=131 width=52) (actual > time=0.018..0.128 rows=14 loops=1) > -> Seq Scan on emp (cost=0.00..1.18 rows=1 width=10) > (actual time=0.013..0.015 rows=1 loops=1) > Filter: (empno = 7839) > -> Hash Join (cost=0.33..1.68 rows=13 width=52) (actual > time=0.016..0.021 rows=3 loops=4) > Hash Cond: (public.emp.mgr = e.empno) > -> Seq Scan on emp (cost=0.00..1.14 rows=14 > width=10) (actual time=0.001..0.004 rows=14 loops=4) > -> Hash (cost=0.20..0.20 rows=10 width=44) (actual > time=0.004..0.004 rows=4 loops=4) > Buckets: 1024 Batches: 1 Memory Usage: 1kB > -> WorkTable Scan on e (cost=0.00..0.20 > rows=10 width=44) (actual time=0.001..0.002 rows=4 loops=4) > Total runtime: 0.218 ms > (12 rows) > > scott=> > > The optimizer will not use index, not even when I turn off both hash and > merge joins. This is not particularly important for a table with 14 > rows, but for a larger table, this is a problem. The > only way to actually force the use of index is by disabling seqscan, but > that chooses a wrong path > again, because it reads the "outer" table by primary key, which will be > very slow. Full table scan, > done by the primary key is probably the slowest thing around. I know > about the PostgreSQL philosophy > which says "hints are bad", and I deeply disagree with it, but would it > be possible to have at > least one parameter that would change calculations in such a way that > indexes are favored, where they exist? > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Computers are like air conditioners... They quit working when you open Windows.
On 1/27/2011 10:45 AM, Kenneth Marshall wrote: > PostgreSQL will only use an index if the planner thinks that it > will be faster than the alternative, a sequential scan in this case. > For 14 rows, a sequential scan is 1 read and should actually be > faster than the index. Did you try the query using EXPLAIN ANALYZE > once with index and once without? What were the timings? If they > do not match reality, adjusting cost parameters would be in order. > I did. I even tried with an almost equivalent outer join: explain analyze select e1.empno,e1.ename,e2.empno,e2.ename from emp e1 left outer join emp e2 on (e1.mgr=e2.empno); QUERY PLAN -------------------------------------------------------------------------------- ------------------------------ Nested Loop Left Join (cost=0.00..7.25 rows=14 width=16) (actual time=0.028..0 .105 rows=14 loops=1) Join Filter: (e1.mgr = e2.empno) -> Seq Scan on emp e1 (cost=0.00..2.14 rows=14 width=10) (actual time=0.006 ..0.010 rows=14 loops=1) -> Materialize (cost=0.00..2.21 rows=14 width=8) (actual time=0.001..0.003 rows=14 loops=14) -> Seq Scan on emp e2 (cost=0.00..2.14 rows=14 width=8) (actual time= 0.001..0.005 rows=14 loops=1) Total runtime: 0.142 ms (6 rows) This gives me the same result as the recursive version, minus the level column. I am porting an application from Oracle, there is a fairly large table that is accessed by "connect by". Rewriting it as a recursive join is not a problem, but the optimizer doesn't really use the indexes. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On 1/27/2011 10:51 AM, J Sisson wrote: > Also, if random_page_cost is set to default (4.0), the planner will > tend towards sequential scans. scott=> show random_page_cost; random_page_cost ------------------ 1 (1 row) scott=> show seq_page_cost; seq_page_cost --------------- 2 (1 row) -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Jan 27, 2011 at 10:56 AM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
You're still using a 14 row table, though. Postgres isn't going to be stupid enough to use an index in this case when the seq scan is clearly faster unless you go out of your way to absolutely force it to do so. If the table is going to be "fairly large", that's the size you need to be testing and tuning with.
I even tried with an almost equivalent outer join:
explain analyze select e1.empno,e1.ename,e2.empno,e2.ename
from emp e1 left outer join emp e2 on (e1.mgr=e2.empno);
QUERY PLAN
--------------------------------------------------------------------------------
------------------------------
Nested Loop Left Join (cost=0.00..7.25 rows=14 width=16) (actual time=0.028..0
.105 rows=14 loops=1)
Join Filter: (e1.mgr = e2.empno)
-> Seq Scan on emp e1 (cost=0.00..2.14 rows=14 width=10) (actual time=0.006
..0.010 rows=14 loops=1)
-> Materialize (cost=0.00..2.21 rows=14 width=8) (actual time=0.001..0.003
rows=14 loops=14)
-> Seq Scan on emp e2 (cost=0.00..2.14 rows=14 width=8) (actual time=
0.001..0.005 rows=14 loops=1)
Total runtime: 0.142 ms
(6 rows)
This gives me the same result as the recursive version, minus the level column. I am porting an application from Oracle, there is a fairly large table that is accessed by "connect by". Rewriting it as a recursive join is not a problem, but the optimizer doesn't really use the indexes.
You're still using a 14 row table, though. Postgres isn't going to be stupid enough to use an index in this case when the seq scan is clearly faster unless you go out of your way to absolutely force it to do so. If the table is going to be "fairly large", that's the size you need to be testing and tuning with.
--
- David T. Wilson
david.t.wilson@gmail.com
David Wilson <david.t.wilson@gmail.com> writes: > You're still using a 14 row table, though. Exactly. Please note what it says in the fine manual: It is worth noting that EXPLAIN results should not be extrapolated to situations other than the one you are actually testing; for example, results on a toy-sized table cannot be assumed to apply to large tables. The planner's cost estimates are not linear and so it might choose a different plan for a larger or smaller table. An extreme example is that on a table that only occupies one disk page, you'll nearly always get a sequential scan plan whether indexes are available or not. The planner realizes that it's going to take one disk page read to process the table in any case, so there's no value in expending additional page reads to look at an index. regards, tom lane
On 1/27/2011 11:40 AM, Tom Lane wrote: > It is worth noting that EXPLAIN results should not be extrapolated > to situations other than the one you are actually testing; for > example, results on a toy-sized table cannot be assumed to apply to > large tables. Well, that's precisely what I tried. Bummer, I will have to copy a large table over. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
> -----Original Message----- > From: Mladen Gogala [mailto:mladen.gogala@vmsinfo.com] > Sent: Thursday, January 27, 2011 12:00 PM > To: Tom Lane > Cc: David Wilson; Kenneth Marshall; pgsql-performance@postgresql.org > Subject: Re: Postgres 9.0 has a bias against indexes > > On 1/27/2011 11:40 AM, Tom Lane wrote: > > It is worth noting that EXPLAIN results should not be extrapolated > > to situations other than the one you are actually testing; for > > example, results on a toy-sized table cannot be > assumed to apply to > > large tables. > Well, that's precisely what I tried. Bummer, I will have to > copy a large table over. > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > > Mladen, I don't think, this is exclusive Postgres feature. I'm pretty sure, Oracle optimizer will do "TABLE ACCESS (FULL)" instead of using index on 14-row table either. Regards, Igor Neyman
On 1/27/2011 3:10 PM, Igor Neyman wrote: > > Mladen, > > I don't think, this is exclusive Postgres feature. > I'm pretty sure, Oracle optimizer will do "TABLE ACCESS (FULL)" instead > of using index on 14-row table either. > > Regards, > Igor Neyman Well, lets' see: SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for Linux: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production Elapsed: 00:00:00.00 SQL> set autotrace on explain SQL> with e(empno,ename,mgr,bossname,lev) as ( 2 select empno,ename,mgr,NULL,0 from emp where empno=7839 3 union all 4 select emp.empno,emp.ename,emp.mgr,e.ename,e.lev+1 5 from emp,e 6 where emp.mgr=e.empno) 7 select * from e 8 / EMPNO ENAME MGR BOSSNAME LEV ---------- ---------- ---------- ---------- ---------- 7839 KING 0 7566 JONES 7839 KING 1 7698 BLAKE 7839 KING 1 7782 CLARK 7839 KING 1 7499 ALLEN 7698 BLAKE 2 7521 WARD 7698 BLAKE 2 7654 MARTIN 7698 BLAKE 2 7788 SCOTT 7566 JONES 2 7844 TURNER 7698 BLAKE 2 7900 JAMES 7698 BLAKE 2 7902 FORD 7566 JONES 2 EMPNO ENAME MGR BOSSNAME LEV ---------- ---------- ---------- ---------- ---------- 7934 MILLER 7782 CLARK 2 7369 SMITH 7902 FORD 3 7876 ADAMS 7788 SCOTT 3 14 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2925328376 -------------------------------------------------------------------------------- -------------------- | Id | Operation | Name | Rows | Bytes | Cos t (%CPU)| Time | -------------------------------------------------------------------------------- -------------------- | 0 | SELECT STATEMENT | | 15 | 795 | 6 (17)| 00:00:56 | | 1 | VIEW | | 15 | 795 | 6 (17)| 00:00:56 | | 2 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | | 3 | TABLE ACCESS BY INDEX ROWID | EMP | 1 | 24 | 1 (0)| 00:00:11 | |* 4 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 | |* 5 | HASH JOIN | | 14 | 798 | 5 (20)| 00:00:46 | | 6 | RECURSIVE WITH PUMP | | | | | | | 7 | TABLE ACCESS FULL | EMP | 14 | 336 | 3 (0)| 00:00:31 | -------------------------------------------------------------------------------- -------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("EMPNO"=7839) 5 - access("EMP"."MGR"="E"."EMPNO") Note ----- - SQL plan baseline "SQL_PLAN_1tmxjj25531vff51d791e" used for this statement SQL> spool off There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. That's because Oracle has covering indexes.
On 1/27/2011 3:37 PM, Scott Marlowe wrote: > On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala > <mladen.gogala@vmsinfo.com> wrote: >> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. > That's because Oracle has covering indexes. > I am not sure what you mean by "covering indexes" but I hope that for the larger table I have in mind, indexes will be used. For a small table like this, not using an index may actually be a better plan. I cannot compare because my development PostgreSQL cluster is on a much weaker machine than the development Oracle database. I even looked into Wikipedia for the notion of "covering index" and it is defined as an index which contains all the data requested in a query. This is not the case, EMP is not an index-organized table. The only index used was the primary key, also available in the PostgreSQL version of the table. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > On 1/27/2011 3:37 PM, Scott Marlowe wrote: >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala >> <mladen.gogala@vmsinfo.com> wrote: >>> >>> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. >> >> That's because Oracle has covering indexes. >> > I am not sure what you mean by "covering indexes" but I hope that for the > larger table I have in mind, indexes will be used. For a small table like In Oracle you can hit JUST the index to get the data you need (and maybe rollback logs, which are generally pretty small) In Pgsql, once you hit the index you must then hit the actual data store to get the right version of your tuple. So, index access in pg is more expensive than in Oracle. However, updates are cheaper. Always a trade off
> -----Original Message----- > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Sent: Thursday, January 27, 2011 3:59 PM > To: Mladen Gogala > Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; > pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > > On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala > <mladen.gogala@vmsinfo.com> wrote: > > On 1/27/2011 3:37 PM, Scott Marlowe wrote: > >> > >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala > >> <mladen.gogala@vmsinfo.com> wrote: > >>> > >>> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. > >> > >> That's because Oracle has covering indexes. > >> > > I am not sure what you mean by "covering indexes" but I > hope that for > > the larger table I have in mind, indexes will be used. > For a small > > table like > > In Oracle you can hit JUST the index to get the data you need > (and maybe rollback logs, which are generally pretty small) > > In Pgsql, once you hit the index you must then hit the actual > data store to get the right version of your tuple. So, index > access in pg is more expensive than in Oracle. However, > updates are cheaper. > Always a trade off > > Scott, What you describe here isn't about "covering indexes" - it's about different ways implementing MVCC in Oracle and PG. Mladen, you were right. For recursive query like yours Oracle uses index even on small table. I made an assumption without testing it. However some other (non-recursive) queries against the same small table that also require reading all 14 rows do "table scan". Regards, Igor Neyman
On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman <ineyman@perceptron.com> wrote: > > >> -----Original Message----- >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com] >> Sent: Thursday, January 27, 2011 3:59 PM >> To: Mladen Gogala >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; >> pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes >> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala >> <mladen.gogala@vmsinfo.com> wrote: >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote: >> >> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala >> >> <mladen.gogala@vmsinfo.com> wrote: >> >>> >> >>> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. >> >> >> >> That's because Oracle has covering indexes. >> >> >> > I am not sure what you mean by "covering indexes" but I >> hope that for >> > the larger table I have in mind, indexes will be used. >> For a small >> > table like >> >> In Oracle you can hit JUST the index to get the data you need >> (and maybe rollback logs, which are generally pretty small) >> >> In Pgsql, once you hit the index you must then hit the actual >> data store to get the right version of your tuple. So, index >> access in pg is more expensive than in Oracle. However, >> updates are cheaper. >> Always a trade off >> >> > > Scott, > What you describe here isn't about "covering indexes" - it's about different ways implementing MVCC in Oracle and PG. It is about covering indexes AND it's about the difference in how MVCC is implemented in both databases.
> -----Original Message----- > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Sent: Thursday, January 27, 2011 4:16 PM > To: Igor Neyman > Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; > pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > > On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman > <ineyman@perceptron.com> wrote: > > > > > >> -----Original Message----- > >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > >> Sent: Thursday, January 27, 2011 3:59 PM > >> To: Mladen Gogala > >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; > >> pgsql-performance@postgresql.org > >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > >> > >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala > >> <mladen.gogala@vmsinfo.com> wrote: > >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote: > >> >> > >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala > >> >> <mladen.gogala@vmsinfo.com> wrote: > >> >>> > >> >>> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. > >> >> > >> >> That's because Oracle has covering indexes. > >> >> > >> > I am not sure what you mean by "covering indexes" but I > >> hope that for > >> > the larger table I have in mind, indexes will be used. > >> For a small > >> > table like > >> > >> In Oracle you can hit JUST the index to get the data you need (and > >> maybe rollback logs, which are generally pretty small) > >> > >> In Pgsql, once you hit the index you must then hit the actual data > >> store to get the right version of your tuple. So, index > access in pg > >> is more expensive than in Oracle. However, updates are cheaper. > >> Always a trade off > >> > >> > > > > Scott, > > What you describe here isn't about "covering indexes" - > it's about different ways implementing MVCC in Oracle and PG. > > It is about covering indexes AND it's about the difference in > how MVCC is implemented in both databases. > > Well, Mladen's query doesn't involve covering indexes.
On Thu, Jan 27, 2011 at 04:12:53PM -0500, Igor Neyman wrote: > > > > -----Original Message----- > > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > > Sent: Thursday, January 27, 2011 3:59 PM > > To: Mladen Gogala > > Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; > > pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > > > > On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala > > <mladen.gogala@vmsinfo.com> wrote: > > > On 1/27/2011 3:37 PM, Scott Marlowe wrote: > > >> > > >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala > > >> <mladen.gogala@vmsinfo.com> ?wrote: > > >>> > > >>> There is INDEX UNIQUE SCAN PK_EMP. ?Oracle will use an index. > > >> > > >> That's because Oracle has covering indexes. > > >> > > > I am not sure what you mean by "covering indexes" but I > > hope that for > > > the larger table I have in mind, ?indexes will be used. ? > > For a small > > > table like > > > > In Oracle you can hit JUST the index to get the data you need > > (and maybe rollback logs, which are generally pretty small) > > > > In Pgsql, once you hit the index you must then hit the actual > > data store to get the right version of your tuple. So, index > > access in pg is more expensive than in Oracle. However, > > updates are cheaper. > > Always a trade off > > > > > > Scott, > What you describe here isn't about "covering indexes" - it's about different ways implementing MVCC in Oracle and PG. > > Mladen, > you were right. > For recursive query like yours Oracle uses index even on small table. > I made an assumption without testing it. > However some other (non-recursive) queries against the same small table that also require reading all 14 rows do "tablescan". > > Regards, > Igor Neyman > Interesting. Can you force it to use a Seqential Scan and if so, how does that affect the timing? i.e. Is the index scan actually faster? Cheers, Ken
On Thu, Jan 27, 2011 at 2:18 PM, Igor Neyman <ineyman@perceptron.com> wrote: > >> -----Original Message----- >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com] >> Sent: Thursday, January 27, 2011 4:16 PM >> To: Igor Neyman >> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; >> pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes >> >> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman >> <ineyman@perceptron.com> wrote: >> > >> > >> >> -----Original Message----- >> >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com] >> >> Sent: Thursday, January 27, 2011 3:59 PM >> >> To: Mladen Gogala >> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; >> >> pgsql-performance@postgresql.org >> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes >> >> >> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala >> >> <mladen.gogala@vmsinfo.com> wrote: >> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote: >> >> >> >> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala >> >> >> <mladen.gogala@vmsinfo.com> wrote: >> >> >>> >> >> >>> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use an index. >> >> >> >> >> >> That's because Oracle has covering indexes. >> >> >> >> >> > I am not sure what you mean by "covering indexes" but I >> >> hope that for >> >> > the larger table I have in mind, indexes will be used. >> >> For a small >> >> > table like >> >> >> >> In Oracle you can hit JUST the index to get the data you need (and >> >> maybe rollback logs, which are generally pretty small) >> >> >> >> In Pgsql, once you hit the index you must then hit the actual data >> >> store to get the right version of your tuple. So, index >> access in pg >> >> is more expensive than in Oracle. However, updates are cheaper. >> >> Always a trade off >> >> >> >> >> > >> > Scott, >> > What you describe here isn't about "covering indexes" - >> it's about different ways implementing MVCC in Oracle and PG. >> >> It is about covering indexes AND it's about the difference in >> how MVCC is implemented in both databases. >> >> > > Well, Mladen's query doesn't involve covering indexes. On Oracle? Then how can it get the values it needs without having to hit the data store?
On 1/27/2011 4:20 PM, Kenneth Marshall wrote: > Interesting. Can you force it to use a Seqential Scan and if so, how > does that affect the timing? i.e. Is the index scan actually faster? > > Cheers, > Ken Yes, Oracle can be forced into doing a sequential scan and it is actually faster than an index scan: SQL> set autotrace on explain SQL> with e(empno,ename,mgr,bossname,lev) as ( 2 select empno,ename,mgr,NULL,0 from emp where empno=7839 3 union all 4 select emp.empno,emp.ename,emp.mgr,e.ename,e.lev+1 5 from emp,e 6 where emp.mgr=e.empno) 7 select * from e 8 / EMPNO ENAME MGR BOSSNAME LEV ---------- ---------- ---------- ---------- ---------- 7839 KING 0 7566 JONES 7839 KING 1 7698 BLAKE 7839 KING 1 7782 CLARK 7839 KING 1 7499 ALLEN 7698 BLAKE 2 7521 WARD 7698 BLAKE 2 7654 MARTIN 7698 BLAKE 2 7788 SCOTT 7566 JONES 2 7844 TURNER 7698 BLAKE 2 7900 JAMES 7698 BLAKE 2 7902 FORD 7566 JONES 2 EMPNO ENAME MGR BOSSNAME LEV ---------- ---------- ---------- ---------- ---------- 7934 MILLER 7782 CLARK 2 7369 SMITH 7902 FORD 3 7876 ADAMS 7788 SCOTT 3 14 rows selected. Elapsed: 00:00:00.18 Execution Plan ---------------------------------------------------------- Plan hash value: 2925328376 -------------------------------------------------------------------------------- -------------------- | Id | Operation | Name | Rows | Bytes | Cos t (%CPU)| Time | -------------------------------------------------------------------------------- -------------------- | 0 | SELECT STATEMENT | | 15 | 795 | 6 (17)| 00:00:56 | | 1 | VIEW | | 15 | 795 | 6 (17)| 00:00:56 | | 2 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | | 3 | TABLE ACCESS BY INDEX ROWID | EMP | 1 | 24 | 1 (0)| 00:00:11 | |* 4 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 0 (0)| 00:00:01 | |* 5 | HASH JOIN | | 14 | 798 | 5 (20)| 00:00:46 | | 6 | RECURSIVE WITH PUMP | | | | | | | 7 | TABLE ACCESS FULL | EMP | 14 | 336 | 3 (0)| 00:00:31 | -------------------------------------------------------------------------------- -------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("EMPNO"=7839) 5 - access("EMP"."MGR"="E"."EMPNO") Note ----- - SQL plan baseline "SQL_PLAN_1tmxjj25531vff51d791e" used for this statement SQL> SQL> with e1(empno,ename,mgr,bossname,lev) as ( 2 select /*+ full(emp) */ empno,ename,mgr,NULL,0 from emp where empno=7839 3 union all 4 select /*+ full(e2) */ 5 e2.empno,e2.ename,e2.mgr,e1.ename,e1.lev+1 6 from emp e2,e1 7 where e2.mgr=e1.empno) 8 select * from e1 9 / EMPNO ENAME MGR BOSSNAME LEV ---------- ---------- ---------- ---------- ---------- 7839 KING 0 7566 JONES 7839 KING 1 7698 BLAKE 7839 KING 1 7782 CLARK 7839 KING 1 7499 ALLEN 7698 BLAKE 2 7521 WARD 7698 BLAKE 2 7654 MARTIN 7698 BLAKE 2 7788 SCOTT 7566 JONES 2 7844 TURNER 7698 BLAKE 2 7900 JAMES 7698 BLAKE 2 7902 FORD 7566 JONES 2 EMPNO ENAME MGR BOSSNAME LEV ---------- ---------- ---------- ---------- ---------- 7934 MILLER 7782 CLARK 2 7369 SMITH 7902 FORD 3 7876 ADAMS 7788 SCOTT 3 14 rows selected. Elapsed: 00:00:00.14 Execution Plan ---------------------------------------------------------- Plan hash value: 2042363665 -------------------------------------------------------------------------------- ------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- ------------------ | 0 | SELECT STATEMENT | | 15 | 795 | 10 (10)| 00:01:36 | | 1 | VIEW | | 15 | 795 | 10 (10)| 00:01:36 | | 2 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | | | | | |* 3 | TABLE ACCESS FULL | EMP | 1 | 24 | 3 (0)| 00:00:31 | |* 4 | HASH JOIN | | 14 | 798 | 7 (15)| 00:01:06 | | 5 | RECURSIVE WITH PUMP | | | | | | | 6 | TABLE ACCESS FULL | EMP | 14 | 336 | 3 (0)| 00:00:31 | -------------------------------------------------------------------------------- ------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("EMPNO"=7839) 4 - access("E2"."MGR"="E1"."EMPNO") SQL> spool off -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
> -----Original Message----- > From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > Sent: Thursday, January 27, 2011 4:25 PM > To: Igor Neyman > Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; > pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > > On Thu, Jan 27, 2011 at 2:18 PM, Igor Neyman > <ineyman@perceptron.com> wrote: > > > >> -----Original Message----- > >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > >> Sent: Thursday, January 27, 2011 4:16 PM > >> To: Igor Neyman > >> Cc: Mladen Gogala; Tom Lane; David Wilson; Kenneth Marshall; > >> pgsql-performance@postgresql.org > >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > >> > >> On Thu, Jan 27, 2011 at 2:12 PM, Igor Neyman > <ineyman@perceptron.com> > >> wrote: > >> > > >> > > >> >> -----Original Message----- > >> >> From: Scott Marlowe [mailto:scott.marlowe@gmail.com] > >> >> Sent: Thursday, January 27, 2011 3:59 PM > >> >> To: Mladen Gogala > >> >> Cc: Igor Neyman; Tom Lane; David Wilson; Kenneth Marshall; > >> >> pgsql-performance@postgresql.org > >> >> Subject: Re: [PERFORM] Postgres 9.0 has a bias against indexes > >> >> > >> >> On Thu, Jan 27, 2011 at 1:44 PM, Mladen Gogala > >> >> <mladen.gogala@vmsinfo.com> wrote: > >> >> > On 1/27/2011 3:37 PM, Scott Marlowe wrote: > >> >> >> > >> >> >> On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala > >> >> >> <mladen.gogala@vmsinfo.com> wrote: > >> >> >>> > >> >> >>> There is INDEX UNIQUE SCAN PK_EMP. Oracle will use > an index. > >> >> >> > >> >> >> That's because Oracle has covering indexes. > >> >> >> > >> >> > I am not sure what you mean by "covering indexes" but I > >> >> hope that for > >> >> > the larger table I have in mind, indexes will be used. > >> >> For a small > >> >> > table like > >> >> > >> >> In Oracle you can hit JUST the index to get the data > you need (and > >> >> maybe rollback logs, which are generally pretty small) > >> >> > >> >> In Pgsql, once you hit the index you must then hit the > actual data > >> >> store to get the right version of your tuple. So, index > >> access in pg > >> >> is more expensive than in Oracle. However, updates are cheaper. > >> >> Always a trade off > >> >> > >> >> > >> > > >> > Scott, > >> > What you describe here isn't about "covering indexes" - > >> it's about different ways implementing MVCC in Oracle and PG. > >> > >> It is about covering indexes AND it's about the difference in how > >> MVCC is implemented in both databases. > >> > >> > > > > Well, Mladen's query doesn't involve covering indexes. > > On Oracle? Then how can it get the values it needs without > having to hit the data store? > > It doesn't. It does "INDEX UNIQUE SCAN" and then "TABLE ACCESS BY INDEX ROWID".
On Thu, Jan 27, 2011 at 2:32 PM, Igor Neyman <ineyman@perceptron.com> wrote: >> On Oracle? Then how can it get the values it needs without >> having to hit the data store? > > It doesn't. > It does "INDEX UNIQUE SCAN" and then "TABLE ACCESS BY INDEX ROWID". Ahhh, ok. I thought Oracle used covering indexes by default.
On 1/27/2011 4:25 PM, Scott Marlowe wrote: > On Oracle? Then how can it get the values it needs without having to > hit the data store? It can't. It does hit the data store. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > Yes, Oracle can be forced into doing a sequential scan and it is > actually faster than an index scan: And PostgreSQL can be coerced to use an indexed scan. Its plans are cost-based, with user configurable cost factors; so if you tell it that seq_page_cost and random_page_cost are both equal to some really low value (like 0.001), you'll get an index scan. Part of the process of tuning PostgreSQL is to discover the relative *actual* costs on *your environment* (which is largely dependent on the degree of caching of the active portion of your database). When you get your costing factors to approximate reality, the optimizer will do a pretty good job of picking the fastest plan. -Kevin