Thread: indexes ignored when querying the master table

indexes ignored when querying the master table

From
Thomas Hägi
Date:
i have around 25mio records of data distributed yearly over 9 child
tables (data.logs_20xx) that inherit from the master table data.logs.
the tables are partitioned using the field "re_timestamp", which has
btree indexes defined on all tables.

the query "SELECT * FROM data.logs ORDER BY re_timestamp DESC LIMIT 100"
does use seq scans on all tables instead of using the existing indexes
which takes ages. when issuing the the same query to one of the child
tables directly ("SELECT * FROM data.logs_2011 ORDER BY re_timestamp
DESC LIMIT 100") the index is used as expected and the data returned
quickly.

how can i get postgres to use the indexes when querying the master table?

please find below the EXPLAIN ANALYZE output for both queries on my
development machine (pgsql 9.0 x64 on windows 7).

thanks in advance,
thomas


EXPLAIN ANALYZE SELECT * FROM data.logs
ORDER BY re_timestamp DESC LIMIT 100;

Limit  (cost=6331255.90..6331256.15 rows=100 width=1388) (actual
time=1592287.794..1592287.808 rows=100 loops=1)
   ->  Sort  (cost=6331255.90..6395928.37 rows=25868986 width=1388)
(actual time=1592287.789..1592287.796 rows=100 loops=1)
         Sort Key: data.logs.re_timestamp
         Sort Method:  top-N heapsort  Memory: 217kB
         ->  Result  (cost=0.00..5342561.86 rows=25868986 width=1388)
(actual time=0.026..1466420.868 rows=25870101 loops=1)
               ->  Append  (cost=0.00..5342561.86 rows=25868986
width=1388) (actual time=0.020..1417490.892 rows=25870101 loops=1)
                     ->  Seq Scan on logs  (cost=0.00..10.40 rows=40
width=1776) (actual time=0.002..0.002 rows=0 loops=1)
                     ->  Seq Scan on logs_2011 logs
(cost=0.00..195428.00 rows=904800 width=1449) (actual
time=0.017..92381.769 rows=904401 loops=1)
                     ->  Seq Scan on logs_2010 logs
(cost=0.00..759610.67 rows=3578567 width=1426) (actual
time=23.996..257612.143 rows=3579586 loops=1)
                     ->  Seq Scan on logs_2009 logs
(cost=0.00..841998.35 rows=3987235 width=1423) (actual
time=12.921..200385.903 rows=3986473 loops=1)
                     ->  Seq Scan on logs_2008 logs
(cost=0.00..942810.60 rows=4409860 width=1444) (actual
time=18.861..226867.499 rows=4406653 loops=1)
                     ->  Seq Scan on logs_2007 logs
(cost=0.00..730863.69 rows=3600569 width=1359) (actual
time=14.406..174082.413 rows=3603739 loops=1)
                     ->  Seq Scan on logs_2006 logs
(cost=0.00..620978.29 rows=3089929 width=1348) (actual
time=21.647..147244.677 rows=3091214 loops=1)
                     ->  Seq Scan on logs_2005 logs
(cost=0.00..486928.59 rows=2440959 width=1342) (actual
time=0.005..126479.314 rows=2438968 loops=1)
                     ->  Seq Scan on logs_2004 logs
(cost=0.00..402991.92 rows=2031092 width=1327) (actual
time=23.007..98531.883 rows=2034041 loops=1)
                     ->  Seq Scan on logs_2003 logs
(cost=0.00..360941.35 rows=1825935 width=1325) (actual
time=20.220..91773.705 rows=1825026 loops=1)
Total runtime: 1592293.267 ms


EXPLAIN ANALYZE SELECT * FROM data.logs_2011
ORDER BY re_timestamp DESC LIMIT 100;

Limit  (cost=0.00..22.65 rows=100 width=1449) (actual
time=59.161..60.226 rows=100 loops=1)
   ->  Index Scan Backward using logs_fts_2011_timestamp_idx on
logs_2011  (cost=0.00..204919.30 rows=904800 width=1449) (actual
time=59.158..60.215 rows=100 loops=1)
Total runtime: 60.316 ms

Re: indexes ignored when querying the master table

From
Florian Weimer
Date:
* Thomas Hägi:

> how can i get postgres to use the indexes when querying the master
> table?

I believe that this is a new feature in PostgreSQL 9.1 ("Allow
inheritance table queries to return meaningfully-sorted results").

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: indexes ignored when querying the master table

From
Greg Smith
Date:
On 05/06/2011 05:13 PM, Thomas Hägi wrote:
> the query "SELECT * FROM data.logs ORDER BY re_timestamp DESC LIMIT
> 100" does use seq scans on all tables instead of using the existing
> indexes which takes ages. when issuing the the same query to one of
> the child tables directly ("SELECT * FROM data.logs_2011 ORDER BY
> re_timestamp DESC LIMIT 100") the index is used as expected and the
> data returned quickly.
>

Let's see, cut and paste
http://archives.postgresql.org/message-id/4DB8CE7D.8030503@2ndquadrant.com
and:

This is probably the limitation that's fixed in PostgreSQL 9.1 by this
commit (following a few others leading up to it):
http://archives.postgresql.org/pgsql-committers/2010-11/msg00028.php

There was a good example showing what didn't work as expected before
(along with an earlier patch that didn't everything the larger 9.1
improvement does) at
http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php ;
"ORDER BY x DESC LIMIT 1" returns the same things as MAX(x).

It's a pretty serious issue with the partitioning in earlier versions. I
know of multiple people, myself included, who have been compelled to
apply this change to an earlier version of PostgreSQL to make larger
partitioned databases work correctly. The other option is to manually
decompose the queries into ones that target each of the child tables
individually, then combine the results, which is no fun either.

(Am thinking about a documentation backpatch pointing out this limitation)


--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: indexes ignored when querying the master table

From
Thomas Hägi
Date:
hi florian

sorry for the late reply - it took almost a day to dump & reload the
data into 9.1b1.

>> how can i get postgres to use the indexes when querying the master
>> table?
>
> I believe that this is a new feature in PostgreSQL 9.1 ("Allow
> inheritance table queries to return meaningfully-sorted results").

you are right, pgsql 9.1 indeed makes use of the indexes now:

EXPLAIN ANALYZE SELECT * FROM data.logs
ORDER BY re_timestamp DESC LIMIT 100;
--------
Limit  (cost=11.63..36.45 rows=100 width=1390) (actual time=0.169..0.639
rows=100 loops=1)
   ->  Result  (cost=11.63..6421619.07 rows=25870141 width=1390) (actual
time=0.154..0.610 rows=100 loops=1)
         ->  Merge Append  (cost=11.63..6421619.07 rows=25870141
width=1390) (actual time=0.150..0.429 rows=100 loops=1)
               Sort Key: data.logs.re_timestamp
               ->  Sort  (cost=11.46..11.56 rows=40 width=1776) (actual
time=0.014..0.014 rows=0 loops=1)
                     Sort Key: data.logs.re_timestamp
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on logs  (cost=0.00..10.40 rows=40
width=1776) (actual time=0.003..0.003 rows=0 loops=1)
               ->  Index Scan Backward using logs_2003_timestamp_idx on
logs_2003 logs  (cost=0.00..373508.47 rows=1825026 width=1327) (actual
time=0.026..0.026 rows=1 loops=1)
               ->  Index Scan Backward using logs_2004_timestamp_idx on
logs_2004 logs  (cost=0.00..417220.55 rows=2034041 width=1327) (actual
time=0.012..0.012 rows=1 loops=1)
               ->  Index Scan Backward using logs_2005_timestamp_idx on
logs_2005 logs  (cost=0.00..502664.57 rows=2438968 width=1345) (actual
time=0.015..0.015 rows=1 loops=1)
               ->  Index Scan Backward using logs_2006_timestamp_idx on
logs_2006 logs  (cost=0.00..640419.01 rows=3091214 width=1354) (actual
time=0.015..0.015 rows=1 loops=1)
               ->  Index Scan Backward using logs_2007_timestamp_idx on
logs_2007 logs  (cost=0.00..752875.00 rows=3603739 width=1369) (actual
time=0.009..0.009 rows=1 loops=1)
               ->  Index Scan Backward using logs_2008_timestamp_idx on
logs_2008 logs  (cost=0.00..969357.51 rows=4406653 width=1440) (actual
time=0.007..0.007 rows=1 loops=1)
               ->  Index Scan Backward using logs_2009_timestamp_idx on
logs_2009 logs  (cost=0.00..862716.39 rows=3986473 width=1422) (actual
time=0.016..0.016 rows=1 loops=1)
               ->  Index Scan Backward using logs_2010_timestamp_idx on
logs_2010 logs  (cost=0.00..778529.29 rows=3579586 width=1426) (actual
time=0.009..0.009 rows=1 loops=1)
               ->  Index Scan Backward using logs_2011_timestamp_idx on
logs_2011 logs  (cost=0.00..200253.71 rows=904401 width=1453) (actual
time=0.006..0.089 rows=100 loops=1)
Total runtime: 1.765 ms


thanks for your help,
thomas