Thread: Question on disk contention

Question on disk contention

From
"Charles Clavadetscher"
Date:
Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing at the same time, they will cause
contentionin
 
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles




Re: Question on disk contention

From
Melvin Davidson
Date:


On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote:
Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles



>Could you elaborate a little more on the meaning of "contention in the disk"?
>What is it that happens?

To simplify, you have two users/jobs, both wanting the exact same information. So the system instructs the disk to get
that information from the disk, which causes the disk head to "seek" to the position of the first eligible row and
continues positioning to other eligible rows. Now the job is not exclusive, so the system temporarily switches to the
other job, which causes the disk to go back to the first row and work from there. The switching back and forth continues,
so that instead of one job finishing quickly, they both have to take turns waiting for needed information. That takes
a lot longer,

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a query with a WHERE clause and see how long
it takes. Then submit the same query from 5 separate connections simultaneously and see how long that takes.

--
Melvin Davidson
Maj. Database & Exploration Specialist

Universe Exploration Command – UXC

Employment by invitation only!

Re: Question on disk contention

From
Ron
Date:
On 05/31/2018 08:52 AM, Melvin Davidson wrote:


On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote:
Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles



>Could you elaborate a little more on the meaning of "contention in the disk"?
>What is it that happens?

To simplify, you have two users/jobs, both wanting the exact same information. So the system instructs the disk to get
that information from the disk, which causes the disk head to "seek" to the position of the first eligible row and
continues positioning to other eligible rows. Now the job is not exclusive, so the system temporarily switches to the
other job, which causes the disk to go back to the first row and work from there. The switching back and forth continues,
so that instead of one job finishing quickly, they both have to take turns waiting for needed information. That takes
a lot longer,

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a query with a WHERE clause and see how long
it takes. Then submit the same query from 5 separate connections simultaneously and see how long that takes.

Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?


--
Angular momentum makes the world go 'round.

Re: Question on disk contention

From
Melvin Davidson
Date:


On Thu, May 31, 2018 at 10:04 AM, Ron <ronljohnsonjr@gmail.com> wrote:
On 05/31/2018 08:52 AM, Melvin Davidson wrote:


On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote:
Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles



>Could you elaborate a little more on the meaning of "contention in the disk"?
>What is it that happens?

To simplify, you have two users/jobs, both wanting the exact same information. So the system instructs the disk to get
that information from the disk, which causes the disk head to "seek" to the position of the first eligible row and
continues positioning to other eligible rows. Now the job is not exclusive, so the system temporarily switches to the
other job, which causes the disk to go back to the first row and work from there. The switching back and forth continues,
so that instead of one job finishing quickly, they both have to take turns waiting for needed information. That takes
a lot longer,

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a query with a WHERE clause and see how long
it takes. Then submit the same query from 5 separate connections simultaneously and see how long that takes.

Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?


--
Angular momentum makes the world go 'round.


> Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?
It does, but the cache is for each connection/job. They are not shared.
--
Melvin Davidson
Maj. Database & Exploration Specialist

Universe Exploration Command – UXC

Employment by invitation only!

RE: Question on disk contention

From
Igor Neyman
Date:



> Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?

It does, but the cache is for each connection/job. They are not shared.

--

Melvin Davidson
Maj. Database & Exploration Specialist

Universe Exploration Command – UXC

Employment by invitation only!

 

 

 

That is simply not true: shared_buffers have this name for a reason. What’s not shared is work_mem used for sorting, etc…

Also OS cache is shared too.

 

Regards,

Igor Neyman

 

Re: Question on disk contention

From
Fabio Pardi
Date:
As far as I know, the OS cache is shared, and shared_buffers too.

Back to the matter of contention, your statement i think might be true only in cases when you are querying data which
doesnot fit in RAM. 
 

Under those circumstances, the OS or Postgres might need to evict blocks from RAM to make room to new blocks fetched
fromdisk, while the concurrent query is trying to do the same. Then the disk contention might happen. 
 


regards,

fabio pardi

On 31/05/18 16:09, Melvin Davidson wrote:
> 
> 
> On Thu, May 31, 2018 at 10:04 AM, Ron <ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>> wrote:
> 
>     On 05/31/2018 08:52 AM, Melvin Davidson wrote:
>>
>>
>>     On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <clavadetscher@swisspug.org
<mailto:clavadetscher@swisspug.org>>wrote:
 
>>
>>         Hi Melvin
>>
>>         As an answer to a previous post you wrote:
>>
>>         "Also, your main problem is that when you have two exact same queries executing at the same time, they will
causecontention in
 
>>         the disk, and neither one will make much progress."
>>
>>         Could you elaborate a little more on the meaning of "contention in the disk"?
>>         What is it that happens?
>>
>>         Thank you and have a good day.
>>         Regards
>>         Charles
>>
>>
>>
>>     >Could you elaborate a little more on the meaning of "contention in the disk"?
>>     >What is it that happens?
>>
>>     To simplify, you have two users/jobs, both wanting the exact same information. So the system instructs the disk
toget
 
>>     that information from the disk, which causes the disk head to "seek" to the position of the first eligible row
and
>>     continues positioning to other eligible rows. Now the job is not exclusive, so the system temporarily switches
tothe
 
>>     other job, which causes the disk to go back to the first row and work from there. The switching back and forth
continues,
>>     so that instead of one job finishing quickly, they both have to take turns waiting for needed information. That
takes
>>     a lot longer,
>>
>>     Try this, Select a table that has a lot of rows, ideally 1M+. Then start a query with a WHERE clause and see how
long
>>     it takes. Then submit the same query from 5 separate connections simultaneously and see how long that takes.
> 
>     Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?
> 
> 
>     -- 
>     Angular momentum makes the world go 'round.
> 
> 
> 
>> Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?
> It does, but the cache is for each connection/job. They are not shared.
> -- 
> *Melvin Davidson**
> Maj. Database & Exploration Specialist**
> Universe Exploration Command – UXC***
> Employment by invitation only!


Re: Question on disk contention

From
Fabio Pardi
Date:
For the sake of completeness I think i have to slightly correct myself.

While I cannot find it in the documentation, (if somebody has pointers, please provide them) on my personal notes I
foundthat 'if a scan is already in progress, and a new scan starts, then the new scan will start where the existing
scanis.'
 

In other words, the new scan will make use of the existing data retrieval, and then retrieve the data it misses.

I did some tests which look like my notes are right.

I created a table:

 \dt+ big_series 
                      List of relations
  Schema   |    Name    | Type  | Owner | Size  | Description 
-----------+------------+-------+-------+-------+-------------
 pv_public | big_series | table | user | 24 GB | 
(1 row)


 \d big_series ;
 Table "pv_public.big_series"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 


containing 700+ million records


On a 4 cores VM, with 10GB RAM running Postgres 9.6.3: 

If i run a single query, then it takes around 3.5 minutes

running i=50 queries in parallel, which do

select * from big_series where a = '$i' 

and start with 2 seconds difference one to another,

then the times are:

output.0: Execution time: 213519.445 ms
output.10: Execution time: 223256.190 ms
output.11: Execution time: 224113.198 ms
output.12: Execution time: 225147.653 ms
output.13: Execution time: 226101.934 ms
output.14: Execution time: 227028.992 ms
output.15: Execution time: 228622.188 ms
output.16: Execution time: 229584.695 ms
output.17: Execution time: 229795.122 ms
output.18: Execution time: 229949.648 ms
output.19: Execution time: 229974.346 ms
output.1: Execution time: 214274.906 ms
output.20: Execution time: 230001.553 ms
output.21: Execution time: 230138.985 ms
output.22: Execution time: 230080.078 ms
output.23: Execution time: 230135.255 ms
output.24: Execution time: 230393.351 ms
output.25: Execution time: 230467.203 ms
output.26: Execution time: 230651.052 ms
output.27: Execution time: 230603.229 ms
output.28: Execution time: 230502.608 ms
output.29: Execution time: 230692.864 ms
output.2: Execution time: 215348.711 ms
output.30: Execution time: 230630.022 ms
output.31: Execution time: 230501.983 ms
output.32: Execution time: 230213.728 ms
output.33: Execution time: 229992.756 ms
output.34: Execution time: 229515.631 ms
output.35: Execution time: 228769.224 ms
output.36: Execution time: 228341.051 ms
output.37: Execution time: 227881.148 ms
output.38: Execution time: 226883.588 ms
output.39: Execution time: 225697.683 ms
output.3: Execution time: 216344.473 ms
output.40: Execution time: 224860.213 ms
output.41: Execution time: 222169.478 ms
output.42: Execution time: 221259.683 ms
output.43: Execution time: 220444.522 ms
output.44: Execution time: 219244.326 ms
output.45: Execution time: 218170.045 ms
output.46: Execution time: 217346.484 ms
output.47: Execution time: 216860.823 ms
output.48: Execution time: 216431.684 ms
output.49: Execution time: 216286.271 ms
output.4: Execution time: 217365.770 ms
output.5: Execution time: 218376.528 ms
output.6: Execution time: 219395.868 ms
output.7: Execution time: 220501.178 ms
output.8: Execution time: 221297.149 ms
output.9: Execution time: 222157.996 ms




Please correct me if my notes are wrong, or provide pointers to the documentation.


regards,

fabio pardi


On 31/05/18 16:20, Fabio Pardi wrote:
> As far as I know, the OS cache is shared, and shared_buffers too.
> 
> Back to the matter of contention, your statement i think might be true only in cases when you are querying data which
doesnot fit in RAM. 
 
> 
> Under those circumstances, the OS or Postgres might need to evict blocks from RAM to make room to new blocks fetched
fromdisk, while the concurrent query is trying to do the same. Then the disk contention might happen. 
 
> 
> 
> regards,
> 
> fabio pardi
> 
> On 31/05/18 16:09, Melvin Davidson wrote:
>>
>>
>> On Thu, May 31, 2018 at 10:04 AM, Ron <ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>> wrote:
>>
>>     On 05/31/2018 08:52 AM, Melvin Davidson wrote:
>>>
>>>
>>>     On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <clavadetscher@swisspug.org
<mailto:clavadetscher@swisspug.org>>wrote:
 
>>>
>>>         Hi Melvin
>>>
>>>         As an answer to a previous post you wrote:
>>>
>>>         "Also, your main problem is that when you have two exact same queries executing at the same time, they will
causecontention in
 
>>>         the disk, and neither one will make much progress."
>>>
>>>         Could you elaborate a little more on the meaning of "contention in the disk"?
>>>         What is it that happens?
>>>
>>>         Thank you and have a good day.
>>>         Regards
>>>         Charles
>>>
>>>
>>>
>>>     >Could you elaborate a little more on the meaning of "contention in the disk"?
>>>     >What is it that happens?
>>>
>>>     To simplify, you have two users/jobs, both wanting the exact same information. So the system instructs the disk
toget
 
>>>     that information from the disk, which causes the disk head to "seek" to the position of the first eligible row
and
>>>     continues positioning to other eligible rows. Now the job is not exclusive, so the system temporarily switches
tothe
 
>>>     other job, which causes the disk to go back to the first row and work from there. The switching back and forth
continues,
>>>     so that instead of one job finishing quickly, they both have to take turns waiting for needed information. That
takes
>>>     a lot longer,
>>>
>>>     Try this, Select a table that has a lot of rows, ideally 1M+. Then start a query with a WHERE clause and see
howlong
 
>>>     it takes. Then submit the same query from 5 separate connections simultaneously and see how long that takes.
>>
>>     Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?
>>
>>
>>     -- 
>>     Angular momentum makes the world go 'round.
>>
>>
>>
>>> Why isn't the OS caching the disk blocks, and why isn't Postgres using the cached data?
>> It does, but the cache is for each connection/job. They are not shared.
>> -- 
>> *Melvin Davidson**
>> Maj. Database & Exploration Specialist**
>> Universe Exploration Command – UXC***
>> Employment by invitation only!
> 


RE: Question on disk contention

From
"Charles Clavadetscher"
Date:

Hi Melvin

 

From: Melvin Davidson [mailto:melvin6925@gmail.com]
Sent: Donnerstag, 31.
Mai 2018 15:53
To: Charles Clavadetscher <clavadetscher@swisspug.org>
Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Question on disk contention

 

 

 

On Thu, May 31, 2018 at 1:13 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote:

Hi Melvin

As an answer to a previous post you wrote:

"Also, your main problem is that when you have two exact same queries executing at the same time, they will cause contention in
the disk, and neither one will make much progress."

Could you elaborate a little more on the meaning of "contention in the disk"?
What is it that happens?

Thank you and have a good day.
Regards
Charles


>Could you elaborate a little more on the meaning of "contention in the disk"?
>What is it that happens?

 

To simplify, you have two users/jobs, both wanting the exact same information. So the system instructs the disk to get

that information from the disk, which causes the disk head to "seek" to the position of the first eligible row and

continues positioning to other eligible rows. Now the job is not exclusive, so the system temporarily switches to the

other job, which causes the disk to go back to the first row and work from there. The switching back and forth continues,

so that instead of one job finishing quickly, they both have to take turns waiting for needed information. That takes

a lot longer,

 

Try this, Select a table that has a lot of rows, ideally 1M+. Then start a query with a WHERE clause and see how long

it takes. Then submit the same query from 5 separate connections simultaneously and see how long that takes.

 

Thank you very much for your answer.

Regards

Charles

 


--

Melvin Davidson
Maj. Database & Exploration Specialist

Universe Exploration Command – UXC

Employment by invitation only!