Thread: Partitioning
Hello,
I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.
Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?
I want to select all rows which have an id which is in another query, so something like:
WITH idlist as (SELECT id from othertable)
SELECT id from mastertable WHERE id = idlist.id);
I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?
Any tricks I am overlooking?
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
PostgreSQL Team Lead / Solutions Architect
______________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On 1/18/2015 5:58 PM, James Sewell wrote:
WITH idlist as (SELECT id from othertable)SELECT id from mastertable WHERE id = idlist.id);
select mt.id, ... from mastertable mt join othertable ot on mt.id=ot.id;
might optimize better.
-- john r pierce 37N 122W somewhere on the middle of the left coast
Sadly not ... I still hit all the tables.
Cheers,
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Hello, an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in <CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com> > Sadly not ... I still hit all the tables. | 5.9.4. Partitioning and Constraint Exclusion http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html Constraint exclusion is a mechanism to omit tables that are known to have no hit by the query *beforehand* execution. So the criteria cannot rely on out of the query itself (and CHECK constraints, of course). Your query uses the result of the WITH-clause-query in the WHERE clause which is unknown to the planner so constraint exclusion does not work. JOINs don't change the situation. > On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote: > > > On 1/18/2015 5:58 PM, James Sewell wrote: > > > > WITH idlist as (SELECT id from othertable) > > SELECT id from mastertable WHERE id = idlist.id); > > > > > > > > select mt.id, ... from mastertable mt join othertable ot on mt.id= > > ot.id; > > > > might optimize better. As the result, the query inevitably scans all the tables, but not necessariry in sequqntial scans or simple index scans. The suggestion above seeems showing the notation which the planner can find the better plans on that premise. For example, if you have an index on id of one of the two tables, (and some other conditions match, of course) index only scan will be selected for it and the suggested query will give you a seemingly better plan than your query. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hello,
On Monday, 19 January 2015, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
--
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Thanks for the reply. I can write queries which get index scans, but they are still slow.
Each index is about 2.5GB, I suspect I am trying to read a these into memory in entirety.
Perhaps there is no way to tune this?
Cheers, james
On Monday, 19 January 2015, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,
an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in <CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com>
> Sadly not ... I still hit all the tables.
| 5.9.4. Partitioning and Constraint Exclusion
http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html
Constraint exclusion is a mechanism to omit tables that are known
to have no hit by the query *beforehand* execution. So the
criteria cannot rely on out of the query itself (and CHECK
constraints, of course).
Your query uses the result of the WITH-clause-query in the WHERE
clause which is unknown to the planner so constraint exclusion
does not work. JOINs don't change the situation.
> On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote:
>
> > On 1/18/2015 5:58 PM, James Sewell wrote:
> >
> > WITH idlist as (SELECT id from othertable)
> > SELECT id from mastertable WHERE id = idlist.id);
> >
> >
> >
> > select mt.id, ... from mastertable mt join othertable ot on mt.id=
> > ot.id;
> >
> > might optimize better.
As the result, the query inevitably scans all the tables, but not
necessariry in sequqntial scans or simple index scans. The
suggestion above seeems showing the notation which the planner
can find the better plans on that premise.
For example, if you have an index on id of one of the two tables,
(and some other conditions match, of course) index only scan will
be selected for it and the suggested query will give you a
seemingly better plan than your query.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On 1/18/2015 11:13 PM, James Sewell wrote: > > Each index is about 2.5GB, I suspect I am trying to read a these into > memory in entirety. an 11GB table with a (presumably integer) primary key requires an 2.5GB index ? 100 of these would need 250GB of shared_buffers to stay resident, not likely. -- john r pierce 37N 122W somewhere on the middle of the left coast
Yeah definitely not staying resident once read, although the machine does gave 256GB of memory so some will persist in the OS cache.
--
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Actually this brings up another question, if I have an unique integer index of 2.gb what percentage would I expect to read for a value that was higher or lower than all index values?
Cheers,
James
On Monday, 19 January 2015, John R Pierce <pierce@hogranch.com> wrote:
On Monday, 19 January 2015, John R Pierce <pierce@hogranch.com> wrote:
On 1/18/2015 11:13 PM, James Sewell wrote:
Each index is about 2.5GB, I suspect I am trying to read a these into memory in entirety.
an 11GB table with a (presumably integer) primary key requires an 2.5GB index ? 100 of these would need 250GB of shared_buffers to stay resident, not likely.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On 1/18/2015 11:59 PM, James Sewell wrote:
Actually this brings up another question, if I have an unique integer index of 2.gb what percentage would I expect to read for a value that was higher or lower than all index values?
a couple 8k blocks. its a b-tree.
-- john r pierce 37N 122W somewhere on the middle of the left coast
Hi, At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com> > On 1/18/2015 11:59 PM, James Sewell wrote: > > > > Actually this brings up another question, if I have an unique integer > > index of 2.gb <http://2.gb> what percentage would I expect to read for > > a value that was higher or lower than all index values? > > a couple 8k blocks. its a b-tree. Yeah, scanning children reading highest/lowest values for each of them would make it faster than the first single query. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Sorry,
> Yeah, scanning children reading highest/lowest values for each of them would make it faster than the first single query.
Mmm..no. It has nothing different from querieng on the parent table. Please Ignore the silly thing.
--
Kyotaro Horiguchi
How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded in the query.Hello,I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column. I have an index on the id column on each sub-table.Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequential scans of every sub-table?I want to select all rows which have an id which is in another query, so something like:WITH idlist as (SELECT id from othertable)SELECT id from mastertable WHERE id = idlist.id);I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subquery at plan time?
If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiple queries.
Hope that helps!
François Beausoleil
Hi, 19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in <CC2FD572-320A-4225-B98C-48F20992527C@teksol.info> > > > Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit : > > > > Hello, > > > > I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column.I have an index on the id column on each sub-table. > > > > Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequentialscans of every sub-table? > > > > I want to select all rows which have an id which is in another query, so something like: > > > > WITH idlist as (SELECT id from othertable) > > SELECT id from mastertable WHERE id = idlist.id); > > > > I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subqueryat plan time? > > > How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded inthe query. > > If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiplequeries. The strategy's effectiveness is depends mainly on how many ids come from the othertable. It wins if fewer than certain number or converged in a few partitions, however, straight joins will win elsewise. The result of EXPLAIN ANALYZE might draw more precise advices. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hello, an 2015 14:13:37 +1100, James Sewell <james.sewell@lisasoft.com> wrote in <CANkGpBs8GypQ3TQGKdjTD+n-w1rkq5uO97h3tuhg5eWaKR6RbA@mail.gmail.com> > Sadly not ... I still hit all the tables. | 5.9.4. Partitioning and Constraint Exclusion http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html Constraint exclusion is a mechanism to omit tables that are known to have no hit by the query *beforehand* execution. So the criteria cannot rely on out of the query itself (and CHECK constraints, of course). Your query uses the result of the WITH-clause-query in the WHERE clause which is unknown to the planner so constraint exclusion does not work. JOINs don't change the situation. > On Mon, Jan 19, 2015 at 1:54 PM, John R Pierce <pierce@hogranch.com> wrote: > > > On 1/18/2015 5:58 PM, James Sewell wrote: > > > > WITH idlist as (SELECT id from othertable) > > SELECT id from mastertable WHERE id = idlist.id); > > > > > > > > select mt.id, ... from mastertable mt join othertable ot on mt.id= > > ot.id; > > > > might optimize better. As the result, the query inevitably scans all the tables, but not necessariry in sequqntial scans or simple index scans. The suggestion above seeems showing the notation which the planner can find the better plans on that premise. For example, if you have an index on id of one of the two tables, (and some other conditions match, of course) index only scan will be selected for it and the suggested query will give you a seemingly better plan than your query. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hi, At Mon, 19 Jan 2015 00:14:55 -0800, John R Pierce <pierce@hogranch.com> wrote in <54BCBCFF.5040100@hogranch.com> > On 1/18/2015 11:59 PM, James Sewell wrote: > > > > Actually this brings up another question, if I have an unique integer > > index of 2.gb <http://2.gb> what percentage would I expect to read for > > a value that was higher or lower than all index values? > > a couple 8k blocks. its a b-tree. Yeah, scanning children reading highest/lowest values for each of them would make it faster than the first single query. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hi, 19 Jan 2015 06:58:21 -0500, François Beausoleil <francois@teksol.info> wrote in <CC2FD572-320A-4225-B98C-48F20992527C@teksol.info> > > > Le 2015-01-18 à 20:58, James Sewell <james.sewell@lisasoft.com> a écrit : > > > > Hello, > > > > I am using partitioning with around 100 sub-tables. Each sub-table is around 11GB and partitioned on the 'id' column.I have an index on the id column on each sub-table. > > > > Is it possible to get a query like the following working using constraint exclusion, or am I doomed to do index/sequentialscans of every sub-table? > > > > I want to select all rows which have an id which is in another query, so something like: > > > > WITH idlist as (SELECT id from othertable) > > SELECT id from mastertable WHERE id = idlist.id); > > > > I am guessing that I am not getting constraint exclusion to work as the planner doesn't know the outcome of my subqueryat plan time? > > > How many rows in idlist? Can you do two queries? Fetch the ids, then call the 2nd query with those values hard-coded inthe query. > > If necessary, and you know where each partition starts, tou could sort in the app and query the correct ranges, in multiplequeries. The strategy's effectiveness is depends mainly on how many ids come from the othertable. It wins if fewer than certain number or converged in a few partitions, however, straight joins will win elsewise. The result of EXPLAIN ANALYZE might draw more precise advices. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general