Re: FW: Constraint exclusion in partitions - Mailing list pgsql-general

From Daniel Begin
Subject Re: FW: Constraint exclusion in partitions
Date
Msg-id COL129-DS21BD01ACEF9E20F097F04594CD0@phx.gbl
Whole thread Raw
In response to FW: Constraint exclusion in partitions  (Daniel Begin <jfd553@hotmail.com>)
Responses Re: FW: Constraint exclusion in partitions  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general
Thank for your patience :-)

- About using PgAdmin, anecdotal problems or not, I did the whole tests again in plain postgresql.

- About running queries once or not, Bill and Francisco both pointed out somehow that I should run each query multiple
timesto get appropriate statistics. I did it for all queries - First trial always longer, all other stabilise around
thesame values. The EXPLAIN ANALYSE for first and second trial on each query I ran on original table and on the
partitionedone can be found below. 

However, in my case, I will have to run most of my queries only once since I simply need to extract sample data for a
researchtopic - there is no insert/update in the DB (that is why I thought looking at first trial was more
appropriate). 

- About adding the exclusion check constraint, thank for remembering me such a simple thing that could have caused all
this!-)but sadly, it was set to "partition", as expected. However, I have decided to run all the queries after having
setthe parameter to ON and restarted the database, just in case. 

Even after doing all this, I did not find any improvement in execution times between my original fat table and the
partitionedversion (sometime even worst). If partitioning the table has improved significantly queries running times, I
couldhave partitioned the tables differently to accommodate other query types I will have to run later in my research
(Ihave the same problem for half a dozen tables).  

Since it does not seem that partitioning will do the job, I will get back to the original table to run my queries...

However, just in case someone knows a magical trick that can improve significantly the speed of my queries (but haven't
toldme yet!-) here are the details about the concerned table/indexes  

(Using https://wiki.postgresql.org/wiki/Index_Maintenance query ...)
Number of records: 3870130000
Table size: 369GB
Indexes size: 425GB
 - nodes_idversion_pk: 125GB
 - nodes_changesetid_idx: 86GB
 - nodes_geom_idx: 241GB

Each record has 3 bigint, 2 boolean, 1 timestamp and 1 geography type.

I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB external drives with write caching.

Best regards,
Daniel


Results/explain/analyse follow...

--Constant
ids-------------------------------------------------------------------------------------------------------------------------------------------------
--Explain analyse on original table for a query that will look into different partitions on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual
time=52.226..288.700rows=6 loops=1) 
   Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 288.732 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.035
rows=6loops=1) 
   Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 0.056 ms

-- Explain analyse on partitioned table for a query that will look into different partitions
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
 Append  (cost=0.00..933.40 rows=223 width=66) (actual time=108.903..287.068 rows=6 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78 rows=156 width=66) (actual
time=108.900..108.916rows=1 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25 rows=31 width=66) (actual
time=89.523..89.543rows=1 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01 rows=26 width=66) (actual
time=49.978..49.998rows=3 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37 rows=9 width=66) (actual
time=38.600..38.603rows=1 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 287.144 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
 Append  (cost=0.00..933.40 rows=223 width=66) (actual time=0.012..0.065 rows=6 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78 rows=156 width=66) (actual
time=0.010..0.017rows=1 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25 rows=31 width=66) (actual
time=0.010..0.015rows=1 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01 rows=26 width=66) (actual
time=0.012..0.016rows=3 loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37 rows=9 width=66) (actual time=0.013..0.013
rows=1loops=1) 
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 0.125 ms

--Explain analyse on original table for a query that will look into one partition on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(723005000,733005000,743005000,753005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual
time=37.366..158.445rows=4 loops=1) 
   Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 158.479 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(723005000,733005000,743005000,753005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.032
rows=4loops=1) 
   Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 0.054 ms

--Explain analyse on partitioned table for a query that will look into one partition
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(723005000,733005000,743005000,753005000);
 Append  (cost=0.00..196.84 rows=47 width=66) (actual time=163.898..441.497 rows=4 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
   ->  Index Scan using nodes31_idversion_pk on nodes_31  (cost=0.57..196.84 rows=46 width=66) (actual
time=163.894..441.491rows=4 loops=1) 
         Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 441.549 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(723005000,733005000,743005000,753005000);
 Append  (cost=0.00..196.84 rows=47 width=66) (actual time=0.011..0.027 rows=4 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
   ->  Index Scan using nodes31_idversion_pk on nodes_31  (cost=0.57..196.84 rows=46 width=66) (actual
time=0.009..0.025rows=4 loops=1) 
         Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 0.062 ms


--Range
ids-------------------------------------------------------------------------------------------------------------------------------------------------
--Explain analyse on original table for a query that will look into different partitions on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id BETWEEN 1522999949 AND 1523000049;
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..383.51 rows=144 width=66) (actual time=73.115..180.769
rows=53loops=1) 
   Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 180.820 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id BETWEEN 1522999949 AND 1523000049;
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..383.51 rows=144 width=66) (actual time=0.020..0.039
rows=53loops=1) 
   Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 0.060 ms

--Explain analyse on partitioned table for a query that will look into different partitions
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id BETWEEN 1522999949 AND 1523000049;
 Append  (cost=0.00..408.16 rows=104 width=66) (actual time=0.014..46.196 rows=53 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes51_idversion_pk on nodes_51  (cost=0.56..183.52 rows=46 width=66) (actual
time=0.012..20.216rows=18 loops=1) 
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes52_idversion_pk on nodes_52  (cost=0.56..224.64 rows=57 width=66) (actual
time=0.022..25.973rows=35 loops=1) 
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 46.254 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id BETWEEN 1522999949 AND 1523000049;
 Append  (cost=0.00..408.16 rows=104 width=66) (actual time=0.010..0.038 rows=53 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes51_idversion_pk on nodes_51  (cost=0.56..183.52 rows=46 width=66) (actual
time=0.008..0.015rows=18 loops=1) 
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes52_idversion_pk on nodes_52  (cost=0.56..224.64 rows=57 width=66) (actual
time=0.006..0.017rows=35 loops=1) 
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 0.081 ms


--Select
ids-------------------------------------------------------------------------------------------------------------------------------------------------
--Explain analyse on original table for a query that will look into one partition on the new table but list of ids
providedthrough a select statement 
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(SELECT * FROM subset);
 Nested Loop  (cost=99.93..884823.94 rows=2028512050 width=66) (actual time=97.489..2289.772 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=2.155..3.649 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.018..0.581 rows=5978 loops=1)
   ->  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4406.68 rows=1693 width=66) (actual
time=0.384..0.384rows=1 loops=5941) 
         Index Cond: (id = subset.id)
 Total runtime: 2290.122 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(SELECT * FROM subset);
 Nested Loop  (cost=99.93..884823.94 rows=2028512050 width=66) (actual time=1.785..25.730 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=1.767..2.661 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.009..0.373 rows=5978 loops=1)
   ->  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4406.68 rows=1693 width=66) (actual
time=0.003..0.003rows=1 loops=5941) 
         Index Cond: (id = subset.id)
 Total runtime: 26.005 ms

--Explain analyse on partitioned table for a query that will look into one partition but list of ids provided through a
selectstatement 
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(SELECT * FROM subset);
 Nested Loop  (cost=99.22..1403193.39 rows=1935067087 width=66) (actual time=12146.666..19140.901 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=1.998..4.496 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.010..0.390 rows=5978 loops=1)
   ->  Append  (cost=0.00..6997.97 rows=1749 width=66) (actual time=2.925..3.214 rows=1 loops=5941)
         ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.000..0.000 rows=0 loops=5941)
               Filter: (subset.id = id)
         ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.56..151.70 rows=39 width=66) (actual
time=0.013..0.013rows=0 loops=5941) 
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes02_idversion_pk on nodes_02  (cost=0.56..219.02 rows=57 width=66) (actual
time=0.012..0.012rows=0 loops=5941) 
               Index Cond: (id = subset.id)
-- skipped for nodes_03 to nodes_84
         ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.56..12.33 rows=2 width=66) (actual
time=0.036..0.036rows=0 loops=5941) 
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes86_idversion_pk on nodes_86  (cost=0.56..12.33 rows=2 width=66) (actual
time=0.688..0.704rows=1 loops=5941) 
               Index Cond: (id = subset.id)
 Total runtime: 19142.983 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(SELECT * FROM subset);
 Nested Loop  (cost=99.22..1403193.39 rows=1935067087 width=66) (actual time=2.282..1382.156 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=1.834..4.327 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.009..0.376 rows=5978 loops=1)
   ->  Append  (cost=0.00..6997.97 rows=1749 width=66) (actual time=0.225..0.226 rows=1 loops=5941)
         ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.000..0.000 rows=0 loops=5941)
               Filter: (subset.id = id)
         ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.56..151.70 rows=39 width=66) (actual
time=0.003..0.003rows=0 loops=5941 
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes02_idversion_pk on nodes_02  (cost=0.56..219.02 rows=57 width=66) (actual
time=0.002..0.002rows=0 loops=5941 
               Index Cond: (id = subset.id)
-- skipped for nodes_03 to nodes_84
         ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.56..12.33 rows=2 width=66) (actual
time=0.002..0.002rows=0 loops=5941) 
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes86_idversion_pk on nodes_86  (cost=0.56..12.33 rows=2 width=66) (actual
time=0.004..0.004rows=1 loops=5941) 
               Index Cond: (id = subset.id)
 Total runtime: 1383.929 ms



pgsql-general by date:

Previous
From: "Peter J. Holzer"
Date:
Subject: Re: Queries for unused/useless indexes
Next
From: Francisco Reyes
Date:
Subject: MD5 password storage - should be the same everywhere?