Thread: [PERFORM] why we do not create indexes on master

[PERFORM] why we do not create indexes on master

From
Valerii Valeev
Date:
Dear colleagues,

can anyone please explain, why we do not create indexes on master?
In my case master / child design blindly follows partitioning guide https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html.
My collaborator was unhappy with performance of queries over master table with filtering by one of fields

SELECT * FROM “master" WHERE “field" BETWEEN x AND y

(there are indexes for “field” on child tables).
He has created index on master once and found that the query returns 100x faster.
I have naive idea that it won’t help if index is created before the data is there  — i.e. indexes on master aren’t updated when data loaded to child table.
I’m curious is it right or it’s something less primitive.

Thanks and have a happy holidays!
Val.

Re: [PERFORM] why we do not create indexes on master

From
Andreas Kretschmer
Date:
Valerii Valeev <valerii.valeev@mail.ru> wrote:

> Dear colleagues,
>
> can anyone please explain, why we do not create indexes on master?
> In my case master / child design blindly follows partitioning guide https://
> www.postgresql.org/docs/9.6/static/ddl-partitioning.html.
> My collaborator was unhappy with performance of queries over master table with
> filtering by one of fields
>
> SELECT * FROM “master" WHERE “field" BETWEEN x AND y
>
> (there are indexes for “field” on child tables).
> He has created index on master once and found that the query returns 100x
> faster.

please show us explain analyse with/without index on master.



Regards, Andreas Kretschmer
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PERFORM] why we do not create indexes on master

From
"David G. Johnston"
Date:
On Tue, Dec 27, 2016 at 8:22 AM, Valerii Valeev <valerii.valeev@mail.ru> wrote:
I have naive idea that it won’t help if index is created before the data is there  — i.e. indexes on master aren’t updated when data loaded to child table.

​Indexes on the master table of a partition scheme never reflect the contents of child​ tables.

In most partitioning schemes the master table is empty so even if it doesn't have an index on a particular field execution would typically be quick.  This is why #4 on the page you linked to:

"""
For each partition, create an index on the key column(s), as well as any other indexes you might want. (The key index is not strictly necessary, but in most scenarios it is helpful. If you intend the key values to be unique then you should always create a unique or primary-key constraint for each partition.)
"""

doesn't say anything about creating other indexes on the master table.  See #1 in that list for an explicit statement of this assumption.

If the master is not empty, and of considerable size, and the field being searched is not indexed, then it is unsurprising that the query would take a long time to execute when obtaining rows from the master table.  If this is the case then you've gotten away from the expected usage of partitions and so need to do things that aren't in the manual to make them work.

David J.



David J.

Re: [PERFORM] why we do not create indexes on master

From
Valerii Valeev
Date:
Thank you David,

I used same rationale to convince my colleague — it didn’t work :)
Sort of “pragmatic” person who does what seems working no matter what happens tomorrow.
So I’m seeking for better understanding of what's happening to have other cause to convince him.

Let me break it down once again. The experience is as follows:

- partitioning follows the guide
- master empty, no indexes
- child tables have index on field “field”
- query like
SELECT * FROM “master” WHERE “field” BETWEEN ‘1' AND ‘2’
takes more than 100 sec
- after that my mate adds index on “master”(“field”) — again, all data is in child tables
- same query takes under 1sec

Questions I’d love to clarify:

- Q1: is it correct that described situation happens because index created on master does account data that is already there in child?
- Q2: is it correct that index on master created before inserting record to child tables will not take into account this record?
- Q3: are there any other bad sides of indexes on master table?

Regards,
Val.

On Dec 27 2016, at 19:19, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tue, Dec 27, 2016 at 8:22 AM, Valerii Valeev <valerii.valeev@mail.ru> wrote:
I have naive idea that it won’t help if index is created before the data is there  — i.e. indexes on master aren’t updated when data loaded to child table.

​Indexes on the master table of a partition scheme never reflect the contents of child​ tables.

In most partitioning schemes the master table is empty so even if it doesn't have an index on a particular field execution would typically be quick.  This is why #4 on the page you linked to:

"""
For each partition, create an index on the key column(s), as well as any other indexes you might want. (The key index is not strictly necessary, but in most scenarios it is helpful. If you intend the key values to be unique then you should always create a unique or primary-key constraint for each partition.)
"""

doesn't say anything about creating other indexes on the master table.  See #1 in that list for an explicit statement of this assumption.

If the master is not empty, and of considerable size, and the field being searched is not indexed, then it is unsurprising that the query would take a long time to execute when obtaining rows from the master table.  If this is the case then you've gotten away from the expected usage of partitions and so need to do things that aren't in the manual to make them work.

David J.



David J.


Re: [PERFORM] why we do not create indexes on master

From
ProPAAS DBA
Date:

Possibly some buffer caching is happening, what happens if you then remove the added index and run the query again?



On 12/27/2016 10:38 AM, Valerii Valeev wrote:
Thank you David,

I used same rationale to convince my colleague — it didn’t work :)
Sort of “pragmatic” person who does what seems working no matter what happens tomorrow.
So I’m seeking for better understanding of what's happening to have other cause to convince him.

Let me break it down once again. The experience is as follows:

- partitioning follows the guide
- master empty, no indexes
- child tables have index on field “field”
- query like
SELECT * FROM “master” WHERE “field” BETWEEN ‘1' AND ‘2’
takes more than 100 sec
- after that my mate adds index on “master”(“field”) — again, all data is in child tables
- same query takes under 1sec

Questions I’d love to clarify:

- Q1: is it correct that described situation happens because index created on master does account data that is already there in child?
- Q2: is it correct that index on master created before inserting record to child tables will not take into account this record?
- Q3: are there any other bad sides of indexes on master table?

Regards,
Val.

On Dec 27 2016, at 19:19, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tue, Dec 27, 2016 at 8:22 AM, Valerii Valeev <valerii.valeev@mail.ru> wrote:
I have naive idea that it won’t help if index is created before the data is there  — i.e. indexes on master aren’t updated when data loaded to child table.

​Indexes on the master table of a partition scheme never reflect the contents of child​ tables.

In most partitioning schemes the master table is empty so even if it doesn't have an index on a particular field execution would typically be quick.  This is why #4 on the page you linked to:

"""
For each partition, create an index on the key column(s), as well as any other indexes you might want. (The key index is not strictly necessary, but in most scenarios it is helpful. If you intend the key values to be unique then you should always create a unique or primary-key constraint for each partition.)
"""

doesn't say anything about creating other indexes on the master table.  See #1 in that list for an explicit statement of this assumption.

If the master is not empty, and of considerable size, and the field being searched is not indexed, then it is unsurprising that the query would take a long time to execute when obtaining rows from the master table.  If this is the case then you've gotten away from the expected usage of partitions and so need to do things that aren't in the manual to make them work.

David J.



David J.



Re: [PERFORM] why we do not create indexes on master

From
"David G. Johnston"
Date:
On Tue, Dec 27, 2016 at 10:38 AM, Valerii Valeev <valerii.valeev@mail.ru> wrote:
Thank you David,

I used same rationale to convince my colleague — it didn’t work :)
Sort of “pragmatic” person who does what seems working no matter what happens tomorrow.
So I’m seeking for better understanding of what's happening to have other cause to convince him.

Let me break it down once again. The experience is as follows:

- partitioning follows the guide

​Only somewhat helpful...
- master empty, no indexes
- child tables have index on field “field”
- query like
SELECT * FROM “master” WHERE “field” BETWEEN ‘1' AND ‘2’
takes more than 100 sec

​All retrieved data now exists in cache/buffers...
 
- after that my mate adds index on “master”(“field”) — again, all data is in child tables
- same query takes under 1sec

​As ​Andreas said if you really want to explore what is happening here you need to use EXPLAIN ANALYZE.

Given the flow described above I/O retrieval performance differences, or the attempt to query the table kicking off an ANALYZE, seems like possible contributing factors.


Questions I’d love to clarify:

- Q1: is it correct that described situation happens because index created on master does account data that is already there in child? 

​No
- Q2: is it correct that index on master created before inserting record to child tables will not take into account this record? 

Yes
- Q3: are there any other bad sides of indexes on master table?

No​

David J.

Re: [PERFORM] why we do not create indexes on master

From
Valerii Valeev
Date:
David,

thanks a lot for the comments and for clarity. As I already responded to Andreas, I’m going to get some test data and try to investigate myself.
Thought maybe I’m missing some common knowledge, that’s why asked here before taking deeper look.

Regards,
Val.
 
On Dec 27 2016, at 20:48, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tue, Dec 27, 2016 at 10:38 AM, Valerii Valeev <valerii.valeev@mail.ru> wrote:
Thank you David,

I used same rationale to convince my colleague — it didn’t work :)
Sort of “pragmatic” person who does what seems working no matter what happens tomorrow.
So I’m seeking for better understanding of what's happening to have other cause to convince him.

Let me break it down once again. The experience is as follows:

- partitioning follows the guide

​Only somewhat helpful...
- master empty, no indexes
- child tables have index on field “field”
- query like
SELECT * FROM “master” WHERE “field” BETWEEN ‘1' AND ‘2’
takes more than 100 sec

​All retrieved data now exists in cache/buffers...
 
- after that my mate adds index on “master”(“field”) — again, all data is in child tables
- same query takes under 1sec

​As ​Andreas said if you really want to explore what is happening here you need to use EXPLAIN ANALYZE.

Given the flow described above I/O retrieval performance differences, or the attempt to query the table kicking off an ANALYZE, seems like possible contributing factors.


Questions I’d love to clarify:

- Q1: is it correct that described situation happens because index created on master does account data that is already there in child? 

​No
- Q2: is it correct that index on master created before inserting record to child tables will not take into account this record? 

Yes
- Q3: are there any other bad sides of indexes on master table?

No​

David J.