Thread: Performance question
Hello,
I would like to ask from your experience which would be the best "generic" method for checking if row sets of a certain condition exists in a PLPGSQL function.
I know of 4 methods so far (please feel free to add if I missed out any others)
I know of 4 methods so far (please feel free to add if I missed out any others)
1) get a count (my previous experience with ORCL shaped this option)
select count(*) into vcnt
from table
where <<condition>>
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
end if
Cons : It seems doing a count(*) is not the best option for PG
2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
from table
where <<condition>>
if found then
do X
else
do y
end if
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a plpgsql->sql->plpgsql switches
3) using perform
4) using exists3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
from table
where <<condition>>
if found then
do X
else
do y
end if
end if
Seems to remove the above (item 2) issues (if any)
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end if
end if
My question is what would be the best (in terms of performance) method to use? My gut feeling is to use option 4 for PG. Am I right or is there any other method?
Thanks in advance
Anil
Hi Anil:
On Wed, Nov 19, 2014 at 5:26 PM, Anil Menon <gakmenon@gmail.com> wrote:
Cons : It seems doing a count(*) is not the best option for PG
For this and some of the following options, if you are going to just test for existence, I would consider adding a limit 1 somewehere on the query, to let the optimizer know you only need one and it will abort the scan on first hit. Probabley not needed if you are going to give a query which uses an unique index, but it shouldn't hurt.
Francisco Olarte.
On 11/19/2014 08:26 AM, Anil Menon wrote: > Hello, > > I would like to ask from your experience which would be the best > "generic" method for checking if row sets of a certain condition exists > in a PLPGSQL function. > > I know of 4 methods so far (please feel free to add if I missed out any > others) > > 1) get a count (my previous experience with ORCL shaped this option) > > select count(*) into vcnt > from table > where <<condition>> > if vcnt >0 then > do X > else > do y > end if > Cons : It seems doing a count(*) is not the best option for PG Well that would depend on the table size, whether it was 100 rows vs 1,000,000 rows > > 2) Use a non-count option > select primary_key_Col into vcnt > from table > where <<condition>> > if found then > do X > else > do y > end if > Cons :Some people seems not to prefer this as (AFAIU) it causes a > plpgsql->sql->plpgsql switches plpgsql is fairly tightly coupled to SQL, so I have not really seen any problems. But then I am not working on large datasets. > > 3) using perform > perform primary_key_Col into vcnt > from table > where <<condition>> > if found then > do X > else > do y > end if > > Seems to remove the above (item 2) issues (if any) AFAIK, you cannot do the above as written. PERFORM does not return a result: http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT It would have to be more like: perform primary_key_Col from table where <<condition>> > > 4) using exists > if exists ( select 1 from table where <<condition>> ) then > do x > else > do y > end if > > > My question is what would be the best (in terms of performance) method > to use? My gut feeling is to use option 4 for PG. Am I right or is there > any other method? All of the above is context specific. To know for sure you will need to test on actual data. > > Thanks in advance > Anil -- Adrian Klaver adrian.klaver@aklaver.com
Thanks Adrian
On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
The table is estimated/guesstimated to be ~900 million rows (~30Ma day, 90 days history, though initially it would be ~30M), though the <<where>> part of the query would return between 0 and ~2 rows
On 11/19/2014 08:26 AM, Anil Menon wrote:Hello,
I would like to ask from your experience which would be the best
"generic" method for checking if row sets of a certain condition exists
in a PLPGSQL function.
I know of 4 methods so far (please feel free to add if I missed out any
others)
1) get a count (my previous experience with ORCL shaped this option)
select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PG
Well that would depend on the table size, whether it was 100 rows vs 1,000,000 rows
The table is estimated/guesstimated to be ~900 million rows (~30Ma day, 90 days history, though initially it would be ~30M), though the <<where>> part of the query would return between 0 and ~2 rows
2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches
plpgsql is fairly tightly coupled to SQL, so I have not really seen any problems. But then I am not working on large datasets.
I think that ~900M rows would constitute a large data set most likely
3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Seems to remove the above (item 2) issues (if any)
AFAIK, you cannot do the above as written. PERFORM does not return a result:
http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT
It would have to be more like:
perform primary_key_Col from table where <<condition>>
You are absolutely right - my bad.
4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end if
My question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?
All of the above is context specific. To know for sure you will need to test on actual data.
Absolutely right, just that I want to ensure that I follow the most optimal method before the DB goes into production, after which priorities change on what needs to be changed.
--
Adrian Klaver
adrian.klaver@aklaver.com
I guess the best answer would be "its very context specific", but picking the brains of experienced resources helps :-)
Thanks again
Thanks again
Anil
Anil Menon wrote: > I would like to ask from your experience which would be the best "generic" method for checking if row > sets of a certain condition exists in a PLPGSQL function. > > I know of 4 methods so far (please feel free to add if I missed out any others) [...] Are you aware that all of these methods have a race condition unless you use isolation level READ STABILITY or better? It may be that rows are added or removed between the check and the corresponding action. Yours, Laurenz Albe
Thanks Laurenz, very good point!
Luckily (phew!) the business scenario is such that race conditions cannot occur (and the transaction table is append only). There is business workflow to address duplicates but
1) it occurs extremely rarely (it would be a deliberate sabotage if it occurs)
2) there is no impact on business
Yours
Anil
Anil
On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Anil Menon wrote:
> I would like to ask from your experience which would be the best "generic" method for checking if row
> sets of a certain condition exists in a PLPGSQL function.
>
> I know of 4 methods so far (please feel free to add if I missed out any others)
[...]
Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?
It may be that rows are added or removed between the check and the
corresponding action.
Yours,
Laurenz Albe