Thread: Partitioning

Partitioning

From

James Sewell

Date:

19 January 2015, 01:59:09

Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.

Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so something like:

WITH idlist as (SELECT id from othertable)

SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?

Any tricks I am overlooking?

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Partitioning

From

John R Pierce

Date:

19 January 2015, 02:55:04

On 1/18/2015 5:58 PM, James Sewell wrote:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

select mt.id, ... from mastertable mt join othertable ot on mt.id=ot.id;

might optimize better.

-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Partitioning

From

James Sewell

Date:

19 January 2015, 03:13:42

Sadly not ... I still hit all the tables.

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:

On 1/18/2015 5:58 PM, James Sewell wrote:
WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

select mt.id, ... from mastertable mt join othertable ot on mt.id=ot.id;

might optimize better.

-- john r pierce 37N 122W somewhere on the middle of the left coast

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

19 January 2015, 06:11:05

Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in
<CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.

> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> >  On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> >     select mt.id, ...   from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Partitioning

From

James Sewell

Date:

19 January 2015, 07:13:12

Hello,

Thanks for the reply. I can write queries which get index scans, but they are still slow.

Each index is about 2.5GB, I suspect I am trying to read a these into memory in entirety.

Perhaps there is no way to tune this?

Cheers, james

On Monday, 19 January 2015, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in <CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.

> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> > On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> > WITH idlist as (SELECT id from othertable)
> > SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> > select mt.id, ... from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

Re: Partitioning

From

John R Pierce

Date:

19 January 2015, 07:23:28

On 1/18/2015 11:13 PM, James Sewell wrote:
>
> Each index is about 2.5GB, I suspect I am trying to read a these into
> memory in entirety.

an 11GB table with a (presumably integer) primary key requires an 2.5GB
index ?      100 of these would need 250GB of shared_buffers to stay
resident, not likely.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Partitioning

From

James Sewell

Date:

19 January 2015, 07:59:20

Yeah definitely not staying resident once read, although the machine does gave 256GB of memory so some will persist in the OS cache.

Actually this brings up another question, if I have an unique integer index of 2.gb what percentage would I expect to read for a value that was higher or lower than all index values?

Cheers,

James

On Monday, 19 January 2015, John R Pierce <pierce@hogranch.com> wrote:

On 1/18/2015 11:13 PM, James Sewell wrote:

Each index is about 2.5GB, I suspect I am trying to read a these into memory in entirety.

an 11GB table with a (presumably integer) primary key requires an 2.5GB index ? 100 of these would need 250GB of shared_buffers to stay resident, not likely.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

Re: Partitioning

From

John R Pierce

Date:

19 January 2015, 08:15:15

On 1/18/2015 11:59 PM, James Sewell wrote:

Actually this brings up another question, if I have an unique integer index of 2.gb what percentage would I expect to read for a value that was higher or lower than all index values?

a couple 8k blocks. its a b-tree.

-- 
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

19 January 2015, 08:43:15

Hi,

At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com>
> On 1/18/2015 11:59 PM, James Sewell wrote:
> >
> > Actually this brings up another question, if I have an unique integer
> > index of 2.gb <http://2.gb> what percentage would I expect to read for
> > a value that was higher or lower than all index values?
>
> a couple 8k blocks.  its a b-tree.

Yeah, scanning children reading highest/lowest values for each of
them would make it faster than the first single query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

19 January 2015, 11:08:53

Sorry,

> Yeah, scanning children reading highest/lowest values for each of them would make it faster than the first single query.

Mmm..no. It has nothing different from querieng on the parent table. Please Ignore the silly thing.

--
Kyotaro Horiguchi

Re: Partitioning

From

François Beausoleil

Date:

19 January 2015, 11:58:36

Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :

Hello,

I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.

Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?

I want to select all rows which have an id which is in another query, so something like:

WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);

I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?

How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded in the query.

If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiple queries.

Hope that helps!

François Beausoleil

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

20 January 2015, 01:13:38

Hi,

19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in
<CC2FD572-320A-4225-B98C-48F20992527C@teksol.info>
>
> > Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :
> >
> > Hello,
> >
> > I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id'
column.I have an index on the id column on each sub-table. 
> >
> > Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do
index/sequentialscans of every sub-table? 
> >
> > I want to select all rows which have an id which is in another query, so something like:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> > I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my
subqueryat plan time? 
> >
> How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded
inthe query. 
>
> If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in
multiplequeries. 

The strategy's effectiveness is depends mainly on how many ids
come from the othertable. It wins if fewer than certain number or
converged in a few partitions, however, straight joins will win
elsewise.

The result of EXPLAIN ANALYZE might draw more precise advices.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

18 February 2015, 20:12:25

Hello,

an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in
<CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.

| 5.9.4. Partitioning and Constraint Exclusion

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html

Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).

Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.

> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> >  On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> >     select mt.id, ...   from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.

As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.

For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

18 February 2015, 20:22:41

Hi,

At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com>
> On 1/18/2015 11:59 PM, James Sewell wrote:
> >
> > Actually this brings up another question, if I have an unique integer
> > index of 2.gb <http://2.gb> what percentage would I expect to read for
> > a value that was higher or lower than all index values?
>
> a couple 8k blocks.  its a b-tree.

Yeah, scanning children reading highest/lowest values for each of
them would make it faster than the first single query.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Partitioning

From

Kyotaro HORIGUCHI

Date:

18 February 2015, 20:22:58

Hi,

19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in
<CC2FD572-320A-4225-B98C-48F20992527C@teksol.info>
>
> > Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit :
> >
> > Hello,
> >
> > I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id'
column.I have an index on the id column on each sub-table. 
> >
> > Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do
index/sequentialscans of every sub-table? 
> >
> > I want to select all rows which have an id which is in another query, so something like:
> >
> >   WITH idlist as (SELECT id from othertable)
> >     SELECT id from mastertable WHERE id = idlist.id);
> >
> > I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my
subqueryat plan time? 
> >
> How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded
inthe query. 
>
> If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in
multiplequeries. 

The strategy's effectiveness is depends mainly on how many ids
come from the othertable. It wins if fewer than certain number or
converged in a few partitions, however, straight joins will win
elsewise.

The result of EXPLAIN ANALYZE might draw more precise advices.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general