# Thread: experiments in query optimization

#### experiments in query optimization

From:
Faheem Mitha
Date:

Hi everyone,

I've been trying to reduce both memory usage and runtime for a query.
Comments/suggestions gratefully received. Details are at

http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf

See particularly Section 1 - Background and Discussion.

If you want a text version, see

http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex

For background see

http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version
http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and
http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf

Please CC any replies to me at the above email address. Thanks.

Regards, Faheem.



#### Re: experiments in query optimization

From:
Robert Haas
Date:

On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha <> wrote:
>
> Hi everyone,
>
> I've been trying to reduce both memory usage and runtime for a query.
> Comments/suggestions gratefully received. Details are at
>
> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf
>
> See particularly Section 1 - Background and Discussion.
>
> If you want a text version, see
>
> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex
>
> For background see
>
> http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version
> http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and
> http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf
>
> Please CC any replies to me at the above email address. Thanks.

Didn't you (or someone) post about these queries before?

It's not really too clear to me from reading this what specific
questions you're trying to answer.  One random thought: WHERE
row_number() = 1 is not too efficient.  Try using LIMIT or DISTINCT ON
instead.

If you're concerned about memory usage, try reducing work_mem; you've
probably got it set to something huge.

You might need to create some indices, too.

...Robert



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Mon, 29 Mar 2010, Robert Haas wrote:

> On Thu, Mar 25, 2010 at 3:57 PM, Faheem Mitha <> wrote:
>>
>> Hi everyone,
>>
>> I've been trying to reduce both memory usage and runtime for a query.
>> Comments/suggestions gratefully received. Details are at
>>
>> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf
>>
>> See particularly Section 1 - Background and Discussion.
>>
>> If you want a text version, see
>>
>> http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex
>>
>> For background see
>>
>> http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version
>> http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and
>> http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf
>>
>> Please CC any replies to me at the above email address. Thanks.
>
> Didn't you (or someone) post about these queries before?

I did write to the list about an earlier version of these queries, yes. In
fact you replied to that message.

> It's not really too clear to me from reading this what specific
> questions you're trying to answer.

Quote from opt.{tex/pdf}, Section 1:

"If I have to I can use Section~\ref{ped_hybrid} and
Section~\ref{tped_hybrid}, but I am left wondering why I get the
performance I do out of the earlier versions. Specifically, why is
Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and
why does the memory usage in Section~\ref{ped_phenoout} blow up relative
to Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?"

> One random thought: WHERE row_number() = 1 is not too efficient.
> Try using LIMIT or DISTINCT ON instead.

Possibly. However, the CTE that uses

WHERE row_number() = 1

doesn't dominate the runtime or memory usage, so I'm not too concerned
about it.

> If you're concerned about memory usage, try reducing work_mem; you've
> probably got it set to something huge.

work_mem = 1 GB (see diag.{tex/pdf}).

The point isn't that I'm using so much memory. Again, my question is, why
are these changes affecting memory usage so drastically?

> You might need to create some indices, too.

Ok. To what purpose? This query picks up everything from the tables and
the planner does table scans, so conventional wisdom and indeed my
experience, says that indexes are not going to be so useful.

Regards, Faheem.



#### Re: experiments in query optimization

From:
Robert Haas
Date:

On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha <> wrote:
>> It's not really too clear to me from reading this what specific
>> questions you're trying to answer.
>
> Quote from opt.{tex/pdf}, Section 1:
>
> "If I have to I can use Section~\ref{ped_hybrid} and
> Section~\ref{tped_hybrid}, but I am left wondering why I get the performance
> I do out of the earlier versions. Specifically, why is
> Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and
> why does the memory usage in Section~\ref{ped_phenoout} blow up relative to
> Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?"

Here and in the document, you refer to section numbers for the
"hybrid" version but I don't see where you define what the "hybrid"
version actually is.  And the differences between your queries are not
real clear either - first you say you took out pheno and sex because
they weren't necessary, but then you decide to put them back.  I don't
know what that means.  If they're not necessary, leave them out.

>> One random thought: WHERE row_number() = 1 is not too efficient.
>> Try using LIMIT or DISTINCT ON instead.
>
> Possibly. However, the CTE that uses
>
> WHERE row_number() = 1
>
> doesn't dominate the runtime or memory usage, so I'm not too concerned
> about it.

Hmm, you might be right.

>> If you're concerned about memory usage, try reducing work_mem; you've
>> probably got it set to something huge.
>
> work_mem = 1 GB (see diag.{tex/pdf}).
>
> The point isn't that I'm using so much memory. Again, my question is, why
> are these changes affecting memory usage so drastically?

Well each sort or hash can use an amount of memory that is limited
from above by work_mem.  So if you write the query in a way that
involves more sorts or hashes, each one can add up to 1GB to your
memory usage, plus overhead.  However, it doesn't look like any of
your queries including 30 sorts or hashes, so I'm thinking that the
RSS number probably also includes some of the shared memory that has
been mapped into each backend's address space.  RSS is not a terribly
reliable number when dealing with shared memory; it's hard to say what
that really means.

>> You might need to create some indices, too.
>
> Ok. To what purpose? This query picks up everything from the tables and the
> planner does table scans, so conventional wisdom and indeed my experience,
> says that indexes are not going to be so useful.

Well, a hash join is not usually the first thing that pops to mind
when dealing with a table that has 825 million rows (geno).  I don't
know if a nested loop with inner-indexscan would be faster, but it
would almost certainly use less memory.

...Robert



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Mon, 29 Mar 2010, Robert Haas wrote:

> On Mon, Mar 29, 2010 at 2:31 PM, Faheem Mitha <> wrote:
>>> It's not really too clear to me from reading this what specific
>>> questions you're trying to answer.
>>
>> Quote from opt.{tex/pdf}, Section 1:
>>
>> "If I have to I can use Section~\ref{ped_hybrid} and
>> Section~\ref{tped_hybrid}, but I am left wondering why I get the performance
>> I do out of the earlier versions. Specifically, why is
>> Section~\ref{ped_bigjoin} so much slower than Section~\ref{ped_trunc}, and
>> why does the memory usage in Section~\ref{ped_phenoout} blow up relative to
>> Section~\ref{ped_bigjoin} and Section~\ref{ped_trunc}?"
>
> Here and in the document, you refer to section numbers for the
> "hybrid" version but I don't see where you define what the "hybrid"
> version actually is.

It is defined later in the file. I don't know if you are looking at the
pdf, but if so, it is Section 2.4 (for the hybrid PED query). In the text
file, I guess the easist way would be to grep for the label ped_hybrid.

> And the differences between your queries are not real clear either -
> first you say you took out pheno and sex because they weren't necessary,
> but then you decide to put them back.  I don't know what that means.
> If they're not necessary, leave them out.

I don't see where I say that pheno and sex weren't necessary. In fact, the
word 'necessary' does not appear in the opt document. I took them out to
see how it would affect performance. Which is does, dramatically. I say

"So, I decided to remove the joins to tables corresponding to the patient
data, namely pheno and sex, and the runtime dropped to 150 min, while the
memory stayed around 5G."

Maybe I wasn't being sufficiently explicit here. Perhaps

"So, I decided to remove the joins to tables corresponding to the patient
data, namely pheno and sex, to see how it would affect performance..."

would have been better.

>>> One random thought: WHERE row_number() = 1 is not too efficient.
>>> Try using LIMIT or DISTINCT ON instead.
>>
>> Possibly. However, the CTE that uses
>>
>> WHERE row_number() = 1
>>
>> doesn't dominate the runtime or memory usage, so I'm not too concerned
>> about it.
>
> Hmm, you might be right.
>
>>> If you're concerned about memory usage, try reducing work_mem; you've
>>> probably got it set to something huge.
>>
>> work_mem = 1 GB (see diag.{tex/pdf}).
>>
>> The point isn't that I'm using so much memory. Again, my question is, why
>> are these changes affecting memory usage so drastically?
>
> Well each sort or hash can use an amount of memory that is limited
> from above by work_mem.  So if you write the query in a way that
> involves more sorts or hashes, each one can add up to 1GB to your
> memory usage, plus overhead.  However, it doesn't look like any of
> your queries including 30 sorts or hashes, so I'm thinking that the
> RSS number probably also includes some of the shared memory that has
> been mapped into each backend's address space.  RSS is not a terribly
> reliable number when dealing with shared memory; it's hard to say what
> that really means.

>>> You might need to create some indices, too.

>> Ok. To what purpose? This query picks up everything from the tables and the
>> planner does table scans, so conventional wisdom and indeed my experience,
>> says that indexes are not going to be so useful.

> Well, a hash join is not usually the first thing that pops to mind when
> dealing with a table that has 825 million rows (geno).  I don't know if
> a nested loop with inner-indexscan would be faster, but it would almost
> certainly use less memory.

Can you provide an illustration of what you mean? I don't know what a
"nested loop with inner-indexscan" is in this context.

Regards, Faheem.



#### Re: experiments in query optimization

From:
"Kevin Grittner"
Date:

Faheem Mitha <> wrote:

>> If you're concerned about memory usage, try reducing work_mem;
>> you've probably got it set to something huge.
>
> work_mem = 1 GB (see diag.{tex/pdf}).
>
> The point isn't that I'm using so much memory. Again, my question
> is, why are these changes affecting memory usage so drastically?

Because the planner looks at a very wide variety of plans, some of
which may use many allocations of work_mem size, and some of which
don't.  The costs are compared and the lowest cost one is chosen. If
you are close to the "tipping point" then even a very small change
might affect which is chosen.  It pays to keep the work_mem setting
sane so that unexpected plan changes don't cause problems.

Look at the plans and their costs to get a feel for what's being
chosen and why.  Although it's a very bad idea to use these in
production, you can often shift the plan to something you *think*
would be better using the enable_* settings, to see what the planner
thinks such a plan will cost and where it thinks the cost would be;
that can help in tuning the settings.

>> You might need to create some indices, too.
>
> Ok. To what purpose? This query picks up everything from the
> tables and the planner does table scans, so conventional wisdom
> and indeed my experience, says that indexes are not going to be so
> useful.

There are situations where scanning the entire table to build up a
hash table is more expensive than using an index.  Why not test it?

-Kevin



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Tue, 30 Mar 2010, Kevin Grittner wrote:

> Faheem Mitha <> wrote:
>
>>> If you're concerned about memory usage, try reducing work_mem;
>>> you've probably got it set to something huge.
>>
>> work_mem = 1 GB (see diag.{tex/pdf}).
>>
>> The point isn't that I'm using so much memory. Again, my question
>> is, why are these changes affecting memory usage so drastically?
>
> Because the planner looks at a very wide variety of plans, some of
> which may use many allocations of work_mem size, and some of which
> don't.  The costs are compared and the lowest cost one is chosen. If
> you are close to the "tipping point" then even a very small change
> might affect which is chosen.  It pays to keep the work_mem setting
> sane so that unexpected plan changes don't cause problems.

Sure, but define sane setting, please. I guess part of the point is that
I'm trying to keep memory low, and it seems this is not part of the
planner's priorities. That it, it does not take memory usage into
consideration when choosing a plan. If that it wrong, let me know, but
that is my understanding.

> Look at the plans and their costs to get a feel for what's being
> chosen and why.  Although it's a very bad idea to use these in
> production, you can often shift the plan to something you *think*
> would be better using the enable_* settings, to see what the planner
> thinks such a plan will cost and where it thinks the cost would be;
> that can help in tuning the settings.

Right. You mean to close off certain options to the planner using 'Planner
Method Configuration'. I suppose one can also use 'Planner Cost Constants'
to alter plan behaviour. I haven't tried changing these.

>>> You might need to create some indices, too.
>>
>> Ok. To what purpose? This query picks up everything from the
>> tables and the planner does table scans, so conventional wisdom
>> and indeed my experience, says that indexes are not going to be so
>> useful.
>
> There are situations where scanning the entire table to build up a
> hash table is more expensive than using an index.  Why not test it?

Certainly, but I don't know what you and Robert have in mind, and I'm not
experienced enough to make an educated guess. I'm open to specific
suggestions.

Regards, Faheem.



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:

On thing which I haven't really mentioned in this thread or in my writeup,
is that the planners value for the number of rows in geno is way off base
some of the time. It is around 800 million, it thinks it is 100 million. I
don't know if this is significant or not, or what to do about it.

eg. in the ped_bigjoin EXPLAIN ANALYZE VERBOSE:

->  Sort  (cost=56855882.72..57144683.54 rows=115520330 width=42)
(actual time=23027732.092..37113627.380 rows=823086774 loops=1)
Output: (CASE WHEN (hapmap.geno.snpval_id = (-1)) THEN '0 0'::text
WHEN (hapmap.geno.snpval_id = 0) THEN
(((dedup_patient_anno.allelea_id)::text || ' '::text) ||
(dedup_patient_anno.allelea_id)::text) WHEN (hapmap.geno.snpval_id = 1)
THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) ||
(dedup_patient_anno.alleleb_id)::text) WHEN (hapmap.geno.snpval_id = 2)
THEN (((dedup_patient_anno.alleleb_id)::text || ' '::text) ||
(dedup_patient_anno.alleleb_id)::text) ELSE NULL::text END),
hapmap.geno.idlink_id, hapmap.geno.anno_id, pheno.patientid,
pheno.phenotype, sex.code

Faheem.

On Tue, 30 Mar 2010, Kevin Grittner wrote:

> Faheem Mitha <> wrote:
>
>>> If you're concerned about memory usage, try reducing work_mem;
>>> you've probably got it set to something huge.
>>
>> work_mem = 1 GB (see diag.{tex/pdf}).
>>
>> The point isn't that I'm using so much memory. Again, my question
>> is, why are these changes affecting memory usage so drastically?
>
> Because the planner looks at a very wide variety of plans, some of
> which may use many allocations of work_mem size, and some of which
> don't.  The costs are compared and the lowest cost one is chosen. If
> you are close to the "tipping point" then even a very small change
> might affect which is chosen.  It pays to keep the work_mem setting
> sane so that unexpected plan changes don't cause problems.
>
> Look at the plans and their costs to get a feel for what's being
> chosen and why.  Although it's a very bad idea to use these in
> production, you can often shift the plan to something you *think*
> would be better using the enable_* settings, to see what the planner
> thinks such a plan will cost and where it thinks the cost would be;
> that can help in tuning the settings.
>
>>> You might need to create some indices, too.
>>
>> Ok. To what purpose? This query picks up everything from the
>> tables and the planner does table scans, so conventional wisdom
>> and indeed my experience, says that indexes are not going to be so
>> useful.
>
> There are situations where scanning the entire table to build up a
> hash table is more expensive than using an index.  Why not test it?
>
> -Kevin
>
>



#### Re: experiments in query optimization

From:
Robert Haas
Date:

On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <> wrote:
> Sure, but define sane setting, please. I guess part of the point is that I'm
> trying to keep memory low, and it seems this is not part of the planner's
> priorities. That it, it does not take memory usage into consideration when
> choosing a plan. If that it wrong, let me know, but that is my
> understanding.

I don't understand quite why you're confused here.  We've already
explained to you that the planner will not employ a plan that uses
more than the amount of memory defined by work_mem for each sort or
hash.

Typical settings for work_mem are between 1MB and 64MB.  1GB is enormous.

>>>> You might need to create some indices, too.
>>>
>>> Ok. To what purpose? This query picks up everything from the
>>> tables and the planner does table scans, so conventional wisdom
>>> and indeed my experience, says that indexes are not going to be so
>>> useful.
>>
>> There are situations where scanning the entire table to build up a
>> hash table is more expensive than using an index.  Why not test it?
>
> Certainly, but I don't know what you and Robert have in mind, and I'm not
> experienced enough to make an educated guess. I'm open to specific
> suggestions.

Try creating an index on geno on the columns that are being used for the join.

...Robert



#### Re: experiments in query optimization

From:
Matthew Wakeling
Date:

On Tue, 30 Mar 2010, Faheem Mitha wrote:
>>> work_mem = 1 GB (see diag.{tex/pdf}).
>
> Sure, but define sane setting, please. I guess part of the point is that I'm
> trying to keep memory low

You're trying to keep memory usage low, but you have work_mem set to 1GB?

Matthew

--
"Prove to thyself that all circuits that radiateth and upon which thou worketh
are grounded, lest they lift thee to high-frequency potential and cause thee
to radiate also. "             -- The Ten Commandments of Electronics



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Wed, 31 Mar 2010, Matthew Wakeling wrote:

> On Tue, 30 Mar 2010, Faheem Mitha wrote:
>>>> work_mem = 1 GB (see diag.{tex/pdf}).
>>
>> Sure, but define sane setting, please. I guess part of the point is that
>> I'm trying to keep memory low
>
> You're trying to keep memory usage low, but you have work_mem set to 1GB?

I'm trying to keep both runtime and memory usage low. I assume that with
lower levels of memory, the runtime would be longer, other things being
equal.
Regards, Faheem.



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:

[If Kevin Grittner reads this, please fix your email address. I am
getting bounces from your email address.]

On Tue, 30 Mar 2010, Robert Haas wrote:

> On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <> wrote:
>> Sure, but define sane setting, please. I guess part of the point is that I'm
>> trying to keep memory low, and it seems this is not part of the planner's
>> priorities. That it, it does not take memory usage into consideration when
>> choosing a plan. If that it wrong, let me know, but that is my
>> understanding.
>
> I don't understand quite why you're confused here.  We've already
> explained to you that the planner will not employ a plan that uses
> more than the amount of memory defined by work_mem for each sort or
> hash.

> Typical settings for work_mem are between 1MB and 64MB.  1GB is enormous.

I don't think I am confused. To be clear, when I said "it does not take
memory usage into consideration' I was talking about overall memory usage.
Let me summarize:

The planner will choose the plan with the minimum total cost, with the
constraint that the number of memory used for each of certain steps is
less than work_mem. In other words with k such steps it can use at most

k(plan)*work_mem

memory where k(plan) denotes that k is a function of the plan. (I'm
assuming here that memory is not shared between the different steps).
However, k(plan)*work_mem is not itself bounded. I fail to see how
reducing work_mem significantly would help me. This would mean that the
current plans I am using would likely be ruled out, and I would be left
with plans which, by definition, would have larger cost and so longer run
times. The current runtimes are already quite long - for the PED query,
the best I can do with work_mem=1 GB is 2 1/2 hrs, and that is after
splitting the query into two pieces.

I might actually be better off *increasing* the memory, since then the
planner would have more flexibility to choose plans where the individual
steps might require more memory, but the overall memory sum might be
lower.

>>>>> You might need to create some indices, too.
>>>>
>>>> Ok. To what purpose? This query picks up everything from the
>>>> tables and the planner does table scans, so conventional wisdom
>>>> and indeed my experience, says that indexes are not going to be so
>>>> useful.
>>>
>>> There are situations where scanning the entire table to build up a
>>> hash table is more expensive than using an index.  Why not test it?
>>
>> Certainly, but I don't know what you and Robert have in mind, and I'm not
>> experienced enough to make an educated guess. I'm open to specific
>> suggestions.
>
> Try creating an index on geno on the columns that are being used for the join.

Ok, I'll try that. I guess the cols in question on geno are idlink_id and
anno_id. I thought that I already had indexes on them, but no. Maybe I had
indexes, but removed them.

If I understand the way this works, if you request, say an INNER JOIN, the
planner can choose different ways/algorithms to do this, as in
http://en.wikipedia.org/wiki/Join_(SQL)#Nested_loops . It may choose a
hash join, or an nested loop join or something else, based on cost. If the
indexes don't exist that may make the inner loop join more expensive, so
tip the balance in favor of using a hash join. However, I have no way to
control which option it chooses, short of disabling eg. the hash join
option, which is not an option for production usage anyway. Correct?

Regards, Faheem.



#### Re: experiments in query optimization

From:
Robert Haas
Date:

On Wed, Mar 31, 2010 at 6:10 AM, Faheem Mitha <> wrote:
>
> [If Kevin Grittner reads this, please fix your email address. I am getting
> bounces from your email address.]
>
> On Tue, 30 Mar 2010, Robert Haas wrote:
>
>> On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <>
>> wrote:
>>>
>>> Sure, but define sane setting, please. I guess part of the point is that
>>> I'm
>>> trying to keep memory low, and it seems this is not part of the planner's
>>> priorities. That it, it does not take memory usage into consideration
>>> when
>>> choosing a plan. If that it wrong, let me know, but that is my
>>> understanding.
>>
>> I don't understand quite why you're confused here.  We've already
>> explained to you that the planner will not employ a plan that uses
>> more than the amount of memory defined by work_mem for each sort or
>> hash.
>
>> Typical settings for work_mem are between 1MB and 64MB.  1GB is enormous.
>
> I don't think I am confused. To be clear, when I said "it does not take
> memory usage into consideration' I was talking about overall memory usage.
> Let me summarize:
>
> The planner will choose the plan with the minimum total cost, with the
> constraint that the number of memory used for each of certain steps is less
> than work_mem. In other words with k such steps it can use at most
>
> k(plan)*work_mem
>
> memory where k(plan) denotes that k is a function of the plan. (I'm assuming
> here that memory is not shared between the different steps). However,
> k(plan)*work_mem is not itself bounded. I fail to see how reducing work_mem
> significantly would help me. This would mean that the current plans I am
> using would likely be ruled out, and I would be left with plans which, by
> definition, would have larger cost and so longer run times. The current
> runtimes are already quite long - for the PED query, the best I can do with
> work_mem=1 GB is 2 1/2 hrs, and that is after splitting the query into two
> pieces.
>
> I might actually be better off *increasing* the memory, since then the
> planner would have more flexibility to choose plans where the individual
> steps might require more memory, but the overall memory sum might be lower.

OK, your understanding is correct.

>>>>>> You might need to create some indices, too.
>>>>>
>>>>> Ok. To what purpose? This query picks up everything from the
>>>>> tables and the planner does table scans, so conventional wisdom
>>>>> and indeed my experience, says that indexes are not going to be so
>>>>> useful.
>>>>
>>>> There are situations where scanning the entire table to build up a
>>>> hash table is more expensive than using an index.  Why not test it?
>>>
>>> Certainly, but I don't know what you and Robert have in mind, and I'm not
>>> experienced enough to make an educated guess. I'm open to specific
>>> suggestions.
>>
>> Try creating an index on geno on the columns that are being used for the
>> join.
>
> Ok, I'll try that. I guess the cols in question on geno are idlink_id and
> anno_id. I thought that I already had indexes on them, but no. Maybe I had
> indexes, but removed them.
>
> If I understand the way this works, if you request, say an INNER JOIN, the
> planner can choose different ways/algorithms to do this, as in
> http://en.wikipedia.org/wiki/Join_(SQL)#Nested_loops . It may choose a hash
> join, or an nested loop join or something else, based on cost. If the
> indexes don't exist that may make the inner loop join more expensive, so tip
> the balance in favor of using a hash join. However, I have no way to control
> which option it chooses, short of disabling eg. the hash join option, which
> is not an option for production usage anyway. Correct?

Yep.

...Robert



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Wed, 31 Mar 2010, Faheem Mitha wrote:

> On Tue, 30 Mar 2010, Robert Haas wrote:
>
>> On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha <>

>>>>>> You might need to create some indices, too.
>>>>>
>>>>> Ok. To what purpose? This query picks up everything from the
>>>>> tables and the planner does table scans, so conventional wisdom
>>>>> and indeed my experience, says that indexes are not going to be so
>>>>> useful.
>>>>
>>>> There are situations where scanning the entire table to build up a
>>>> hash table is more expensive than using an index.  Why not test it?
>>>
>>> Certainly, but I don't know what you and Robert have in mind, and I'm not
>>> experienced enough to make an educated guess. I'm open to specific
>>> suggestions.
>>
>> Try creating an index on geno on the columns that are being used for the
>> join.
>
> Ok, I'll try that. I guess the cols in question on geno are idlink_id and
> anno_id. I thought that I already had indexes on them, but no. Maybe I had
> indexes, but removed them.

Looking at this more closely, idlink_id and anno_id are primary keys, so
already have indexes on them, so my understanding (from the docs) is there
is no purpose in creating them. That's why I removed the indexes that were
there (back last August, actually, according to my logs). Anyway, doesn't
look there is anything I can do here. Does anyone have additions or
corrections to this?

Regards, Faheem.



#### Re: experiments in query optimization

From:
Eliot Gable
Date:

On Thu, Apr 1, 2010 at 7:46 AM, Faheem Mitha <> wrote:

Looking at this more closely, idlink_id and anno_id are primary keys, so already have indexes on them, so my understanding (from the docs) is there is no purpose in creating them. That's why I removed the indexes that were there (back last August, actually, according to my logs). Anyway, doesn't look there is anything I can do here. Does anyone have additions or corrections to this?

When you do a join, you typically have a foreign key in one table referencing a primary key in another table. While designating a foreign key does put a constraint on the key to ensure referential integrity, it does not put an index on the column that is being designated as a foreign key. If I understand correctly, the scan done as the inner loop of the nested loop scan for the join is going to be your foreign key column, not your primary key column. Thus, if you have no index on the foreign key column, you will be forced to do a sequential table scan to do the join. In that case the hash-based join will almost certainly be faster (especially for such a large number of rows). If you put an index on the foreign key, then the inner scan can be an index scan and that might turn out to be faster than building the hash indexes on all the table rows.

Somebody can correct me if I'm wrong.

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

#### Re: experiments in query optimization

From:
Faheem Mitha
Date:

Hi Eliot,

Thanks for the comment.

On Thu, 1 Apr 2010, Eliot Gable wrote:

> On Thu, Apr 1, 2010 at 7:46 AM, Faheem Mitha <> wrote:

> Looking at this more closely, idlink_id and anno_id are primary keys, so
> already have indexes on them, so my understanding (from the docs) is
> there is no purpose in creating them. That's why I removed the indexes
> that were there (back last August, actually, according to my logs).
> Anyway, doesn't look there is anything I can do here. Does anyone have
> additions or corrections to this?

> When you do a join, you typically have a foreign key in one table
> referencing a primary key in another table. While designating a foreign
> key does put a constraint on the key to ensure referential integrity, it
> does not put an index on the column that is being designated as a
> foreign key. If I understand correctly, the scan done as the inner loop
> of the nested loop scan for the join is going to be your foreign key
> column, not your primary key column. Thus, if you have no index on the
> foreign key column, you will be forced to do a sequential table scan to
> do the join. In that case the hash-based join will almost certainly be
> faster (especially for such a large number of rows). If you put an index
> on the foreign key, then the inner scan can be an index scan and that
> might turn out to be faster than building the hash indexes on all the
> table rows.

> Somebody can correct me if I'm wrong.

I had set the foreign keys in question (on the geno table) to be primary
keys. This is because this setup is basically a glorified spreadsheet, and
I don't want more than one cell corresponding to a particular tuple of
idlink.id and anno.id (the conceptual rows and cols). Since a primary key
defines an index, I thought putting indexes on idlink_id and anno_id was
redundant. However, it looks like (unsurprisingly) the index corresponding
to the primary key is across both columns, which may not be what is wanted
for the aforesaid join. Ie.

ALTER TABLE ONLY geno ADD CONSTRAINT geno_pkey PRIMARY KEY (idlink_id, anno_id)

(As a side comment, with respect to the indexes on the other side of the
joins, in one case, we have idlink.id = geno.idlink_id, and idlink.id is a
primary key too. In the other, namely geno.anno_id =
dedup_patient_anno.id, dedup_patient_anno is a CTE, so no index on
dedup_patient_anno.id. But maybe indexes aren't needed there.)

Here is the join

SELECT decode_genotype(geno.snpval_id, %(allelea)s, %(alleleb)s) AS g,
geno.idlink_id, geno.anno_id
FROM    geno
INNER JOIN dedup_patient_anno
ON      geno.anno_id = dedup_patient_anno.id
INNER JOIN idlink
ON      geno.idlink_id = idlink.id
ORDER BY idlink_id, anno_id

Here is the table dump.

****************************************************************
-- Name: geno; Type: TABLE; Schema: hapmap; Owner: snp; Tablespace:
--
CREATE TABLE geno (
idlink_id integer NOT NULL,
anno_id integer NOT NULL,
snpval_id integer NOT NULL
)
WITH (autovacuum_enabled=true);

ALTER TABLE hapmap.geno OWNER TO snp;
--
-- Name: geno_pkey; Type: CONSTRAINT; Schema: hapmap; Owner: snp;
Tablespace:
--
ALTER TABLE ONLY geno
ADD CONSTRAINT geno_pkey PRIMARY KEY (idlink_id, anno_id); (!!!!)
--
-- Name: geno_anno_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner:
snp
--
ALTER TABLE ONLY geno
ADD CONSTRAINT geno_anno_id_fkey FOREIGN KEY (anno_id) REFERENCES
anno(id) ON UPDATE CASCADE ON DELETE CASCADE;
--
-- Name: geno_idlink_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner:
snp
--
ALTER TABLE ONLY geno
ADD CONSTRAINT geno_idlink_id_fkey FOREIGN KEY (idlink_id) REFERENCES
idlink(id) ON UPDATE CASCADE ON DELETE CASCADE;
--
-- Name: geno_snpval_id_fkey; Type: FK CONSTRAINT; Schema: hapmap; Owner:
snp
--
ALTER TABLE ONLY geno
ADD CONSTRAINT geno_snpval_id_fkey FOREIGN KEY (snpval_id) REFERENCES
snpval(val) ON UPDATE CASCADE ON DELETE CASCADE;
*************************************************************************

So, should I add indexes on the individual foreign key cols idlink_id
and anno_id after all?

Regards, Faheem.

> --
> Eliot Gable
>
> "We do not inherit the Earth from our ancestors: we borrow it from our
> children." ~David Brower

> "I decided the words were too conservative for me. We're not borrowing
> from our children, we're stealing from them--and it's not even
> considered to be a crime." ~David Brower

Nice quotes.

> "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live;
> not live to eat.) ~Marcus Tullius Cicero



#### Re: experiments in query optimization

From:
Robert Haas
Date:

On Thu, Apr 1, 2010 at 2:15 PM, Faheem Mitha <> wrote:
> I had set the foreign keys in question (on the geno table) to be primary
> keys. This is because this setup is basically a glorified spreadsheet, and I
> don't want more than one cell corresponding to a particular tuple of
> idlink.id and anno.id (the conceptual rows and cols). Since a primary key
> defines an index, I thought putting indexes on idlink_id and anno_id was
> redundant. However, it looks like (unsurprisingly) the index corresponding
> to the primary key is across both columns, which may not be what is wanted
> for the aforesaid join

Actually it is what is wanted - that is good.

> So, should I add indexes on the individual foreign key cols idlink_id
> and anno_id after all?

I doubt that would help.

The bottom line may be that you're dealing with hundreds of millions
of rows here, so things are going to take a long time.  Of course you
can always get more/faster memory, a bigger I/O subsystem, faster
processors...  and it could be that with detailed study there are
optimizations that could be done even without spending money, but I
think I'm about tapped out on what I can do over an Internet mailing
list.

...Robert



#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Thu, 1 Apr 2010, Robert Haas wrote:

> On Thu, Apr 1, 2010 at 2:15 PM, Faheem Mitha <> wrote:

>> I had set the foreign keys in question (on the geno table) to be
>> primary keys. This is because this setup is basically a glorified
>> spreadsheet, and I don't want more than one cell corresponding to a
>> particular tuple of idlink.id and anno.id (the conceptual rows and
>> cols). Since a primary key defines an index, I thought putting indexes
>> on idlink_id and anno_id was redundant. However, it looks like
>> (unsurprisingly) the index corresponding to the primary key is across
>> both columns, which may not be what is wanted for the aforesaid join

> Actually it is what is wanted - that is good.

I see.

>> So, should I add indexes on the individual foreign key cols idlink_id
>> and anno_id after all?
>
> I doubt that would help.

You're sure of this?

> The bottom line may be that you're dealing with hundreds of millions of
> rows here, so things are going to take a long time.  Of course you can
> always get more/faster memory, a bigger I/O subsystem, faster
> processors...  and it could be that with detailed study there are
> optimizations that could be done even without spending money, but I
> think I'm about tapped out on what I can do over an Internet mailing
> list.

Thanks for your assistance, Robert. It's been educational.

Regards, Faheem.



#### Re: experiments in query optimization

From:
Eliot Gable
Date:

On Thu, Apr 1, 2010 at 3:01 PM, Faheem Mitha <> wrote:

So, should I add indexes on the individual foreign key cols idlink_id
and anno_id after all?

I doubt that would help.

You're sure of this?

It is always best to test and be certain.

#### Re: experiments in query optimization

From:
Faheem Mitha
Date:


On Thu, 1 Apr 2010, Eliot Gable wrote:

>
>
> On Thu, Apr 1, 2010 at 3:01 PM, Faheem Mitha <> wrote:
>
>                   So, should I add indexes on the individual foreign key cols idlink_id
>                   and anno_id after all?
>
>
>             I doubt that would help.
>
>
> You're sure of this?
>
>
> It is always best to test and be certain.

Fair enough. I may also try disabling hash joins and see what happens...

Regards, Faheem.