Re: Auto-vacuum is not running in 9.1.12 - Mailing list pgsql-hackers

From Prakash Itnal
Subject Re: Auto-vacuum is not running in 9.1.12
Date
Msg-id CAHC5u7-1WvP+WQeyJdvbiR62NZeDEkhkzGraM=a94Mai8zoSiA@mail.gmail.com
Whole thread Raw
In response to Re: Auto-vacuum is not running in 9.1.12  (Prakash Itnal <prakash074@gmail.com>)
List pgsql-hackers
Hi Tom/Alvaro,

Kindly let us know if the correction provided in previous mail is fine or not! Current code any way handle scenario-1 whereas it is still vulnerable to scenario-2.

From previous mail:
Scenario-1: current_time (2015) -> changed_to_past (1995) -> stays-here-for-half-day -> corrected to current_time (2015)
Scenario-2: current_time (2015) -> changed_to_future (2020) -> stays-here-for-half-day -> corrected to current_time (2015)

We are waiting for your response.

On Sun, Jun 21, 2015 at 2:56 PM, Prakash Itnal <prakash074@gmail.com> wrote:
Hi,

To my understanding it will probably not open doors for worst situations! Please correct if my below understanding is correct.

The latch will wake up under below three situations:
a) Socket error (=> result is set to negative number)
b) timeout (=> result is set to TIMEOUT)
c) some event arrived on socket (=> result is set to non-zero value, if caller registers for arrived events otherwise no value is set)

Given the above conditions, the result can be zero only if there is an unregistered event which breaks the latch (*). In such case, current implementation evaluates the remaining sleep time. This calculation is making the situation worst, if time goes back. 

The time difference between cur_time (current time) and start_time (time when latch started) should always be a positive integer because cur_time is always greater than start_time under all normal conditions. 

    delta_timeout = cur_time - start_time;

The difference can be negative only if time shifts to past. So it is possible to detect if time shifted to past. When it is possible to detect can it be possible to correct? I think we can correct and prevent long sleeps due to time shifts.

Currently I treat it as TIMEOUT, though conceptually it is not. The ideal solution would be to leave this decision to the caller of WaitLatch(). With my little knowledge of postgres code, I think TIMEOUT would be fine! 


(*) The above description is true only for timed wait. If latch is started with blocking wait (no timeout) then above logic is not applicable.

On Sat, Jun 20, 2015 at 10:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Prakash Itnal <prakash074@gmail.com> writes:
> Sorry for the late response. The current patch only fixes the scenario-1
> listed below. It will not address the scenario-2. Also we need a fix in
> unix_latch.c where the remaining sleep time is evaluated, if latch is woken
> by other events (or result=0). Here to it is possible the latch might go in
> long sleep if time shifts to past time.

Forcing WL_TIMEOUT if the clock goes backwards seems like quite a bad
idea to me.  That seems like a great way to make a bad situation worse,
ie it induces failures where there were none before.

                        regards, tom lane



--
Cheers,
Prakash



--
Cheers,
Prakash

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Next
From: Robert Haas
Date:
Subject: RFC: replace pg_stat_activity.waiting with something more descriptive