Thread: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

[PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
dinesh kumar
Date:

Hi Team,
 
Would like to propose a new DIAGNOSTICS attribute, which returns the no.of rows got skipped during the FOR UPDATE SKIP LOCKED;
 
Using this attribute, we can have more control on parallel operations like,
 
IF SKIPPED_ROW_COUNT =0 THEN
<<Treat me as, a complete transaction, and do below stuff>>
ELSE
<<Got only few tuples than required, and do below stuff>>
END IF;
 
Kindly let me know your inputs/suggestions on this.
 

Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
Tom Lane
Date:
dinesh kumar <dineshkumar02@gmail.com> writes:
> Would like to propose a new DIAGNOSTICS attribute, which returns the no.of
> rows got skipped during the FOR UPDATE SKIP LOCKED;

I'm concerned that there may not be any implementation-independent
definition of this.  That is, the query plan might or might not reject
rows before the locking step is reached, which would result in
random-looking changes in the output of the proposed counter.

Constraining the query plan might fix that, but only at unacceptable
performance costs, especially since those constraints would have to apply
to every plan ever generated (since the query planner can't know whether
you will inquire about this counter value later).

> Using this attribute, we can have more control on parallel operations like,

> IF SKIPPED_ROW_COUNT =0 THEN
> <<Treat me as, a complete transaction, and do below stuff>>
> ELSE
> <<Got only few tuples than required, and do below stuff>>
> END IF;

Um ... so what?  This is not a use-case.
        regards, tom lane



Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
dinesh kumar
Date:
On Mon, Oct 12, 2015 at 9:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
dinesh kumar <dineshkumar02@gmail.com> writes:
> Would like to propose a new DIAGNOSTICS attribute, which returns the no.of
> rows got skipped during the FOR UPDATE SKIP LOCKED;

I'm concerned that there may not be any implementation-independent
definition of this.  That is, the query plan might or might not reject
rows before the locking step is reached, which would result in
random-looking changes in the output of the proposed counter.

Constraining the query plan might fix that, but only at unacceptable
performance costs, especially since those constraints would have to apply
to every plan ever generated (since the query planner can't know whether
you will inquire about this counter value later).


Thanks Tom. Understood.
 
> Using this attribute, we can have more control on parallel operations like,

> IF SKIPPED_ROW_COUNT =0 THEN
> <<Treat me as, a complete transaction, and do below stuff>>
> ELSE
> <<Got only few tuples than required, and do below stuff>>
> END IF;

Um ... so what?  This is not a use-case.


In my view, "How one can be sure that, he obtained all the tuples with SKIP LOCKED". If the end user has this counter value, he may proceed with a different approach with partially locked tuples.

                        regards, tom lane



--

Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
"David G. Johnston"
Date:

> Using this attribute, we can have more control on parallel operations like,

> IF SKIPPED_ROW_COUNT =0 THEN
> <<Treat me as, a complete transaction, and do below stuff>>
> ELSE
> <<Got only few tuples than required, and do below stuff>>
> END IF;

Um ... so what?  This is not a use-case.


In my view, "How one can be sure that, he obtained all the tuples with SKIP LOCKED". If the end user has this counter value, he may proceed with a different approach with partially locked tuples.


​Can you be more specific?  In most cases I can come up with (queues, basically) where skipped locked is running the processing performing the query is going to re-query the database on the next tick regardless of whether they thought they say only some or all of the potential rows on the prior pass.

David J.

Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
dinesh kumar
Date:
On Tue, Oct 13, 2015 at 5:53 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

> Using this attribute, we can have more control on parallel operations like,

> IF SKIPPED_ROW_COUNT =0 THEN
> <<Treat me as, a complete transaction, and do below stuff>>
> ELSE
> <<Got only few tuples than required, and do below stuff>>
> END IF;

Um ... so what?  This is not a use-case.


In my view, "How one can be sure that, he obtained all the tuples with SKIP LOCKED". If the end user has this counter value, he may proceed with a different approach with partially locked tuples.


​Can you be more specific?  In most cases I can come up with (queues, basically) where skipped locked is running the processing performing the query is going to re-query the database on the next tick regardless of whether they thought they say only some or all of the potential rows on the prior pass.


Sure,

In an existing wait policies like WAIT(default) and NO WAIT,
one can be sure to determine(Using ROW_COUNT daignostics counter),
how many required tuples he processed in a transaction.
But this is not case when it comes to SKIP LOCKED. My concern is,
how we come to know that, our SKIP LOCKED missed few tuples due to lock, OR it's processed all the tuples.
I understood that, the name itself defining SKIP the LOCKED rows, but we are not measuring it.

I wrote the patch and it's working as below. But I haven't extended this to other planners yet.

postgres=# DO
$$
DECLARE
rc INT;
BEGIN
PERFORM * FROM test WHERE mod(t,2)=0 FOR UPDATE SKIP LOCKED;
GET DIAGNOSTICS rc=SKIPPED_ROW_COUNT;
RAISE NOTICE 'Skipped : %', rc;
GET DIAGNOSTICS rc=ROW_COUNT;
RAISE NOTICE 'Processed : %', rc;
END;
$$
;
NOTICE:  Skipped : 2
NOTICE:  Processed : 3
DO


Once we measured the SKIP LOCKED, then we can only consider
to re-process the Skipped !=0 transactions rather doing every transaction again.

In my view, SKIP LOCKED is a nice feature, which gives only the available OR unlocked tuples.
But those are not the complete required tuples for the given SQL statement. Isn't it ?!

Let me know, if I am still not clear about this.

David J.




--

Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
Robert Haas
Date:
On Tue, Oct 13, 2015 at 10:37 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:
> In an existing wait policies like WAIT(default) and NO WAIT,
> one can be sure to determine(Using ROW_COUNT daignostics counter),
> how many required tuples he processed in a transaction.
> But this is not case when it comes to SKIP LOCKED.

Sure it is.  You didn't process the ones that you skipped.  This is no
different than if you say WHERE a = 5.  Depending on plan choice and
table contents, you may have "skipped" a large number of rows where a
!= 5, or you may have skipped none at all.

> In my view, SKIP LOCKED is a nice feature, which gives only the available OR
> unlocked tuples.
> But those are not the complete required tuples for the given SQL statement.
> Isn't it ?!

They better be.  If you wanted the locked tuples, you shouldn't have
asked to skip them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
dinesh kumar
Date:
Hi Robert,
 
On Wed, Oct 14, 2015 at 12:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 13, 2015 at 10:37 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:
> In an existing wait policies like WAIT(default) and NO WAIT,
> one can be sure to determine(Using ROW_COUNT daignostics counter),
> how many required tuples he processed in a transaction.
> But this is not case when it comes to SKIP LOCKED.

Sure it is.  You didn't process the ones that you skipped.  This is no
different than if you say WHERE a = 5.  Depending on plan choice and
table contents, you may have "skipped" a large number of rows where a
!= 5, or you may have skipped none at all.
 
Yes, True.
 
But, using SKIP LOCKED we may bypass the tuples where a = 5, If those were locked by parallel operations.
 

> In my view, SKIP LOCKED is a nice feature, which gives only the available OR
> unlocked tuples.
> But those are not the complete required tuples for the given SQL statement.
> Isn't it ?!

They better be.
 
Agreed.
 
  If you wanted the locked tuples, you shouldn't have
asked to skip them.
 
Kindly let me know if I am going in a wrong way.
 
I see this feature as an add on to do the parallel DML operations.
There won't be any problem, if operations are mutually exclusive.
I mean, each session operates on unique set of tuples.
 
In the above case, we don't even need of SKIP LOCKED wait policy.
 
But, when it comes to mutually depend operations, isn't it nice to provide,
how much were locked by the other sessions. OR atlest a HINT to the other session like,
 
GET DIAGNOSTICS var = DID_I_MISS_ANYTHING_FROM_OTHER_SESSIONS;
 
I agree that, adding counter will take a performance hit.
Rather going to my actual proposal on providing the counter value,
isn't it good to provide a boolean type HINT, if we miss atleast a single tuple.
 
Let me know your thoughts.
 
Thanks in advance.
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--

Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
Robert Haas
Date:
On Wed, Oct 14, 2015 at 6:28 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:
> I see this feature as an add on to do the parallel DML operations.
> There won't be any problem, if operations are mutually exclusive.
> I mean, each session operates on unique set of tuples.
>
> In the above case, we don't even need of SKIP LOCKED wait policy.
>
> But, when it comes to mutually depend operations, isn't it nice to provide,
> how much were locked by the other sessions. OR atlest a HINT to the other
> session like,
>
> GET DIAGNOSTICS var = DID_I_MISS_ANYTHING_FROM_OTHER_SESSIONS;
>
> I agree that, adding counter will take a performance hit.
> Rather going to my actual proposal on providing the counter value,
> isn't it good to provide a boolean type HINT, if we miss atleast a single
> tuple.

Suppose there are 5 locked rows and 5 unlocked rows in the heap and you do this:

select * from t1 for share skip locked limit 5

The Boolean you propose will be false if the first 5 rows in physical
order are locked, and otherwise it will be false.  But there's no
difference between those two scenarios from the perspective of the
application.  Here's another example:

with foo as (select * from t1 for share skip locked) select * from foo
where a = 2;

If foo contains any locked rows at all, this will return true,
regardless of whether a = 2.

It's true that, for a lot of normal-ish queries, LockRows is applied
late enough that your proposed Boolean would return the intended
answer.  But there are a bunch of exceptions, like the ones shown
above, and there might be more in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT

From
dinesh kumar
Date:
On Wed, Oct 14, 2015 at 4:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 14, 2015 at 6:28 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:
> I see this feature as an add on to do the parallel DML operations.
> There won't be any problem, if operations are mutually exclusive.
> I mean, each session operates on unique set of tuples.
>
> In the above case, we don't even need of SKIP LOCKED wait policy.
>
> But, when it comes to mutually depend operations, isn't it nice to provide,
> how much were locked by the other sessions. OR atlest a HINT to the other
> session like,
>
> GET DIAGNOSTICS var = DID_I_MISS_ANYTHING_FROM_OTHER_SESSIONS;
>
> I agree that, adding counter will take a performance hit.
> Rather going to my actual proposal on providing the counter value,
> isn't it good to provide a boolean type HINT, if we miss atleast a single
> tuple.

Suppose there are 5 locked rows and 5 unlocked rows in the heap and you do this:

select * from t1 for share skip locked limit 5

The Boolean you propose will be false if the first 5 rows in physical
order are locked, and otherwise it will be false.  But there's no
difference between those two scenarios from the perspective of the
application.  Here's another example:

with foo as (select * from t1 for share skip locked) select * from foo
where a = 2;

If foo contains any locked rows at all, this will return true,
regardless of whether a = 2.

It's true that, for a lot of normal-ish queries, LockRows is applied
late enough that your proposed Boolean would return the intended
answer.  But there are a bunch of exceptions, like the ones shown
above, and there might be more in the future.


Hi Robert,

As usual, a great guidance from you. Thanks :-)

But I'm still trying to see, is there a way we can implement this for all use cases.
Will update this thread with my findings.
 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--