Thread: Performance question

Performance question

From
Anil Menon
Date:
Hello,

I would like to ask from your experience which would be the best "generic" method for checking if row sets of a certain condition exists in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
  do X
else
  do y
end if
Cons : It seems doing a count(*) is not the best option for PG

2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
  do X
else
  do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a plpgsql->sql->plpgsql switches

3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
  do X
else
  do y
end if

Seems to remove the above (item 2) issues (if any)

4) using exists
if exists ( select 1 from table where <<condition>> ) then
 do x
else
 do y
end if


My question is what would be the best (in terms of performance) method to use? My gut feeling is to use option 4 for PG. Am I right or is there any other method?

Thanks in advance
Anil

Re: Performance question

From
Francisco Olarte
Date:
Hi Anil:

On Wed, Nov 19, 2014 at 5:26 PM, Anil Menon <gakmenon@gmail.com> wrote:
Cons : It seems doing a count(*) is not the best option for PG

For this and some of the following options,  if you are going to just test for existence, I would consider adding a limit 1 somewehere on the query, to let the optimizer know you only need one and it will abort the scan on first hit. Probabley not needed if you are going to give a query which uses an unique index, but it shouldn't hurt.

Francisco Olarte.

Re: Performance question

From
Adrian Klaver
Date:
On 11/19/2014 08:26 AM, Anil Menon wrote:
> Hello,
>
> I would like to ask from your experience which would be the best
> "generic" method for checking if row sets of a certain condition exists
> in a PLPGSQL function.
>
> I know of 4 methods so far (please feel free to add if I missed out any
> others)
>
> 1) get a count (my previous experience with ORCL shaped this option)
>
> select count(*) into vcnt
> from table
> where <<condition>>
> if vcnt >0 then
>    do X
> else
>    do y
> end if
> Cons : It seems doing a count(*) is not the best option for PG


Well that would depend on the table size, whether it was 100 rows vs
1,000,000 rows

>
> 2) Use a non-count option
> select primary_key_Col into vcnt
> from table
> where <<condition>>
> if found then
>    do X
> else
>    do y
> end if
> Cons :Some people seems not to prefer this as (AFAIU) it causes a
> plpgsql->sql->plpgsql switches

plpgsql is fairly tightly coupled to SQL, so I have not really seen any
problems. But then I am not working on large datasets.

>
> 3) using perform
> perform primary_key_Col into vcnt
> from table
> where <<condition>>
> if found then
>    do X
> else
>    do y
> end if
>
> Seems to remove the above (item 2) issues (if any)

AFAIK, you cannot do the above as written. PERFORM does not return a result:

http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

It would have to be more like:

perform primary_key_Col from table where <<condition>>

>
> 4) using exists
> if exists ( select 1 from table where <<condition>> ) then
>   do x
> else
>   do y
> end if
>
>
> My question is what would be the best (in terms of performance) method
> to use? My gut feeling is to use option 4 for PG. Am I right or is there
> any other method?

All of the above is context specific. To know for sure you will need to
test on actual data.

>
> Thanks in advance
> Anil


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Performance question

From
Anil Menon
Date:
Thanks Adrian


On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/19/2014 08:26 AM, Anil Menon wrote:
Hello,

I would like to ask from your experience which would be the best
"generic" method for checking if row sets of a certain condition exists
in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
   do X
else
   do y
end if
Cons : It seems doing a count(*) is not the best option for PG


Well that would depend on the table size, whether it was 100 rows vs 1,000,000 rows



​The table is estimated/guesstimated to be ~900 million rows (~30Ma day​, 90 days history, though initially it would be ~30M), though the <<where>> part of the query would return between 0 and ~2 rows


 

2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
   do X
else
   do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches

plpgsql is fairly tightly coupled to SQL, so I have not really seen any problems. But then I am not working on large datasets.

​I think that ~900M rows would constitute a large data set most likely
 


3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
   do X
else
   do y
end if

Seems to remove the above (item 2) issues (if any)

AFAIK, you cannot do the above as written. PERFORM does not return a result:

http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

It would have to be more like:

perform primary_key_Col from table where <<condition>>


​You are absolutely right - my bad​.

4) using exists
if exists ( select 1 from table where <<condition>> ) then
  do x
else
  do y
end if


My question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?

All of the above is context specific. To know for sure you will need to test on actual data.

​Absolutely right, just that I want to ensure that I follow the most optimal method before the DB goes into production, after which priorities change on what needs to be changed.​
 


--
Adrian Klaver
adrian.klaver@aklaver.com

​I guess the best answer would be "its very context specific​", but picking the brains of experienced resources helps :-)

Thanks again
Anil​

Re: Performance question

From
Albe Laurenz
Date:
Anil Menon wrote:
> I would like to ask from your experience which would be the best "generic" method for checking if row
> sets of a certain condition exists in a PLPGSQL function.
> 
> I know of 4 methods so far (please feel free to add if I missed out any others)
[...]

Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?

It may be that rows are added or removed between the check and the
corresponding action.

Yours,
Laurenz Albe

Re: Performance question

From
Anil Menon
Date:
Thanks Laurenz, very good point!

Luckily (phew!) the business scenario is such that race conditions cannot occur (and the transaction table is append only). There is business workflow to address duplicates but
1) it occurs extremely rarely (it would be a deliberate sabotage if it occurs)
2) there is no impact on business

Yours
Anil



On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Anil Menon wrote:
> I would like to ask from your experience which would be the best "generic" method for checking if row
> sets of a certain condition exists in a PLPGSQL function.
>
> I know of 4 methods so far (please feel free to add if I missed out any others)
[...]

Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?

It may be that rows are added or removed between the check and the
corresponding action.

Yours,
Laurenz Albe