Re: FW: Constraint exclusion in partitions - Mailing list pgsql-general

From Francisco Olarte
Subject Re: FW: Constraint exclusion in partitions
Date
Msg-id CA+bJJbzGvwsq_i62SzryB8G+b75WydE+wvLwfz=fVgQYo9W4pA@mail.gmail.com
Whole thread Raw
In response to FW: Constraint exclusion in partitions  (Daniel Begin <jfd553@hotmail.com>)
Responses Re: FW: Constraint exclusion in partitions
List pgsql-general
Hi Daniel

On Fri, May 22, 2015 at 7:21 PM, Daniel Begin <jfd553@hotmail.com> wrote:
> I have split a large table (billions of records) into multiple partitions,
> hoping the access would be faster. I used an ID to make partitions check
> (check (id >= 100 AND id < 200)…) and created over 80 tables (children) that
> are now filled with data.
...
> I understand that the following query will use constraint exclusion and will
> run faster…
> a-      Select * from parent_table where id >=9999; -- using a constant
> But how constraint exclusion would react with the following queries …
> b-      Select * from parent_table where id between 2345 and 6789; -- using
> a range of ids
> c-       Select * from parent_table where id in(select ids from
> anothertable); -- using a list of ids from a select

Given you have already partitioned it, why don't you just use explain
[ analyze ] on the queries? I.e., in one of my tables, partitioned
monthly by a timestamp ( with time zone ) field I get ( even if I
never use between, a closed interval, for a continuous like type like
timestamp, the optimizer clearly shows it's transfroaming it to the
equivalent AND condition ):


explain select * from carrier_cdrs where setup between
'20150107T123456' and '20150322T222222';


QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..82202.41 rows=2346599 width=74)
   ->  Seq Scan on carrier_cdrs  (cost=0.00..0.00 rows=1 width=184)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
   ->  Seq Scan on carrier_cdrs_201501  (cost=0.00..30191.10
rows=816551 width=74)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
   ->  Seq Scan on carrier_cdrs_201502  (cost=0.00..25277.45
rows=872830 width=74)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
   ->  Seq Scan on carrier_cdrs_201503  (cost=0.00..26733.85
rows=657217 width=74)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
(9 rows)

> Since I mostly use queries of type b and c, I am wondering if partitioning
> the large table was appropriate and if the queries are going to be longer to
> run…

The problem is gonna be all the extra conditions, so I'll check real
queries. My bet is in a query EXACTLY like b it will use constraint
exclusion, and can be potentially faster if your interval are for just
100 ids, but why speculate when YOU can measure?

Also think if you touch 67-23+1=45 partitions and the DB has to check
other indexed fields it is a lot of work. As I said, it depend on your
actual data, actual indexes, and actual queries, just measure it. In
my excample table I partition the data by the TS, which is the single
indexed field and my tests showed it was faster that way ( but my
queries are normally big scans of date ranges or more selective ones
with narrow TS conditions, and I measured them ).

Francisco Olarte.


pgsql-general by date:

Previous
From: Arjen Nienhuis
Date:
Subject: Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
Next
From: twoflower
Date:
Subject: Server tries to read a different config file than it is supposed to