Thread: Mechanics of Select

Mechanics of Select

From

Willem Buitendyk

Date:

10 February 2008, 22:13:25

I have the following function that returns the first day of the next
month from whatever date is inserted.  If I use this as part of a select
statement then it takes almost twice as long to perform.  Is this
because for each scanned record this function is being called?  If so
any ideas how I could make this only occur once?

For instance:

select * from track where datetime >= '2007-04-01' and datetime <
'2007-05-01'; takes about 30 ms to return 650K rows.

select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month'2007-04-01'; takes about 55 ms to return 650K rows

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
  RETURNS date AS
$BODY$
declare
inputmonth1 integer;
inputyear1 integer;
inputmonth2 integer;
inputyear2 integer;
resultdate date;
BEGIN
inputmonth1 = extract(month from inputdate)::integer;
inputyear1 = extract(year from inputdate)::integer;

if inputmonth1 = 12 then
inputyear2 = inputyear1 + 1;
else
inputyear2 = inputyear1;
end if;

if inputmonth1 = 12 then
inputmonth2 = 1;
else
inputmonth2 = inputmonth1 + 1;
end if;

resultdate = (inputyear2)::text || '-' || (inputmonth2)::text || '-' ||
'01';
resultdate = to_date(resultdate::text,'yyyy-MM-DD');

RETURN resultdate;
END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;

Re: Mechanics of Select

From

Greg Smith

Date:

10 February 2008, 22:31:01

On Sun, 10 Feb 2008, Willem Buitendyk wrote:

> I have the following function that returns the first day of the next month
> from whatever date is inserted.

See if you can do this with date_trunc instead to avoid calling a
function, which avoids the whole thing.  The first day of next month is:

select date_trunc('month',now())+interval '1 month';

I'd be curious how the runtime using that compares with the plpgsql
version you've done.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Mechanics of Select

From

Bill Moran

Date:

10 February 2008, 22:37:29

Willem Buitendyk <willem@pcfish.ca> wrote:
>
> I have the following function that returns the first day of the next
> month from whatever date is inserted.  If I use this as part of a select
> statement then it takes almost twice as long to perform.  Is this
> because for each scanned record this function is being called?

An explain of the query would help you answer that question.

>  If so
> any ideas how I could make this only occur once?

Don't mark it as VOLITILE.  Sounds like an IMMUTABLE function to me.
PostgreSQL is doing exactly what you told it to do.

> For instance:
>
> select * from track where datetime >= '2007-04-01' and datetime <
> '2007-05-01'; takes about 30 ms to return 650K rows.
>
> select * from track where datetime >= '2007-04-01' and datetime <
> first_day_next_month'2007-04-01'; takes about 55 ms to return 650K rows
>
> CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
>   RETURNS date AS
> $BODY$
> declare
> inputmonth1 integer;
> inputyear1 integer;
> inputmonth2 integer;
> inputyear2 integer;
> resultdate date;
> BEGIN
> inputmonth1 = extract(month from inputdate)::integer;
> inputyear1 = extract(year from inputdate)::integer;
>
> if inputmonth1 = 12 then
> inputyear2 = inputyear1 + 1;
> else
> inputyear2 = inputyear1;
> end if;
>
> if inputmonth1 = 12 then
> inputmonth2 = 1;
> else
> inputmonth2 = inputmonth1 + 1;
> end if;
>
> resultdate = (inputyear2)::text || '-' || (inputmonth2)::text || '-' ||
> '01';
> resultdate = to_date(resultdate::text,'yyyy-MM-DD');
>
> RETURN resultdate;
> END;
> $BODY$
>   LANGUAGE 'plpgsql' VOLATILE
>   COST 100;
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


--
Bill Moran
http://www.potentialtech.com

Re: Mechanics of Select

From

Tom Lane

Date:

10 February 2008, 22:45:21

Willem Buitendyk <willem@pcfish.ca> writes:
> I have the following function that returns the first day of the next
> month from whatever date is inserted.  If I use this as part of a select
> statement then it takes almost twice as long to perform.  Is this
> because for each scanned record this function is being called?  If so
> any ideas how I could make this only occur once?

Don't declare it VOLATILE, when (AFAICS) you have no reason to.
It would be evaluated only once if it were marked IMMUTABLE.

BTW, I think you're doing it the hard way --- this could almost
certainly be a one-liner if you were using the available date
arithmetic facilities.

            regards, tom lane

Re: Mechanics of Select

From

brian

Date:

10 February 2008, 22:47:42

Willem Buitendyk wrote:
> I have the following function that returns the first day of the next
> month from whatever date is inserted.  If I use this as part of a select
> statement then it takes almost twice as long to perform.  Is this
> because for each scanned record this function is being called?  If so
> any ideas how I could make this only occur once?
>
> For instance:
>
> select * from track where datetime >= '2007-04-01' and datetime <
> '2007-05-01'; takes about 30 ms to return 650K rows.
>
> select * from track where datetime >= '2007-04-01' and datetime <
> first_day_next_month'2007-04-01'; takes about 55 ms to return 650K rows
>
> CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
>  RETURNS date AS
> $BODY$
> declare
> inputmonth1 integer;
> inputyear1 integer;
> inputmonth2 integer;
> inputyear2 integer;
> resultdate date;
> BEGIN
> inputmonth1 = extract(month from inputdate)::integer;  inputyear1 =
> extract(year from inputdate)::integer;
>
> if inputmonth1 = 12 then
> inputyear2 = inputyear1 + 1;
> else
> inputyear2 = inputyear1;
> end if;
>
> if inputmonth1 = 12 then
> inputmonth2 = 1;
> else
> inputmonth2 = inputmonth1 + 1;
> end if;
>
> resultdate = (inputyear2)::text || '-' || (inputmonth2)::text || '-' ||
> '01';
> resultdate = to_date(resultdate::text,'yyyy-MM-DD');
>
> RETURN resultdate;
> END;
> $BODY$
>  LANGUAGE 'plpgsql' VOLATILE
>  COST 100;
>


Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
  RETURNS date AS
$BODY$
DECLARE
    resultdate date;
BEGIN
    SELECT INTO resultdate to_date(to_char((inputdate + interval \
'1 month'), 'yyyy-MM') || '-01', 'yyyy-mm-dd');
    RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql';

Mind the wrap.

Re: Mechanics of Select

From

Alban Hertroys

Date:

10 February 2008, 23:23:41

On Feb 11, 2008, at 12:43 AM, brian wrote:
> Try:
>
> CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
>  RETURNS date AS
> $BODY$
> DECLARE
>     resultdate date;
> BEGIN
>     SELECT INTO resultdate to_date(to_char((inputdate + interval \
> '1 month'), 'yyyy-MM') || '-01', 'yyyy-mm-dd');
>     RETURN resultdate;
> END;
> $BODY$
> LANGUAGE 'plpgsql';

No need for the variable or the SELECT, and it's an immutable
function, so better define that. Besides that it's probably better to
use the date_trunc function here.

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
  RETURNS date AS
$BODY$
BEGIN
    RETURN date_trunc('month', inputdate + interval '1 month');
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

And with that I wonder why you'd even need a function :)

Another thing I've taught myself is to prefix local variables and
parameters in functions, so that they can NEVER accidentally match a
column name that you use in a query (I usually use '_'). Otherwise
you can get silly queries like "SELECT * FROM table WHERE x = x" that
look perfectly fine while you're writing them down, being perfectly
able to make the distinction between *variable x* and *column x* in
your mind.

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47af8f8e167321323610058!

Re: Mechanics of Select

From

Willem Buitendyk

Date:

11 February 2008, 01:45:40

As others have suggested my big problem with the function I wrote was
that I had made it Volatile instead of Immutable (it is no doubt
suffering from code bloat as well).  That made all the difference.
Curiously though - I tried it just with the date_trunc function and it
was just as slow as my old Volatile function.

select * from track where datetime >= '2007-04-01' and datetime <
date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was
about 55s
select * from track where datetime >= '2007-04-01' and datetime <
first_day_next_month('2007-04-01'); was about 36s

cheers

Greg Smith wrote:
> On Sun, 10 Feb 2008, Willem Buitendyk wrote:
>
>> I have the following function that returns the first day of the next
>> month from whatever date is inserted.
>
> See if you can do this with date_trunc instead to avoid calling a
> function, which avoids the whole thing.  The first day of next month is:
>
> select date_trunc('month',now())+interval '1 month';
>
> I'd be curious how the runtime using that compares with the plpgsql
> version you've done.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>

Re: Mechanics of Select

From

Alban Hertroys

Date:

11 February 2008, 09:17:33

On Feb 11, 2008, at 2:23 AM, Willem Buitendyk wrote:

> As others have suggested my big problem with the function I wrote
> was that I had made it Volatile instead of Immutable (it is no
> doubt suffering from code bloat as well).  That made all the
> difference. Curiously though - I tried it just with the date_trunc
> function and it was just as slow as my old Volatile function.
>
> select * from track where datetime >= '2007-04-01' and datetime <
> date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was
> about 55s

That's probably because '2007-04-01'::timestamp can be at different
time zones depending on client configuration and hence is volatile.

If you need a timestamp you probably want to use the servers TZ,
which you can specify using: timestamp at <your timezone>

Are you always entering the first day of a month for start date? In
that case you can leave out the entire date_trunc as the interval
already calculates the correct length internally:

template1=> select '2007-04-01'::date + interval '1 month';
       ?column?
---------------------
  2007-05-01 00:00:00

> select * from track where datetime >= '2007-04-01' and datetime <
> first_day_next_month('2007-04-01'); was about 36s

Also, specifying dates like this without specifying their format is a
bad habit in my book. You're in trouble as soon as the date format
for your database changes (different system, for example). I suggest
you use to_date('2007-04-01', 'YYYY-MM-DD') instead.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47b015f9167323996417255!

Re: Mechanics of Select

From

Bill Moran

Date:

11 February 2008, 11:59:25

In response to Alban Hertroys <dalroi@solfertje.student.utwente.nl>:

> On Feb 11, 2008, at 12:43 AM, brian wrote:
> > Try:
> >
> > CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
> >  RETURNS date AS
> > $BODY$
> > DECLARE
> >     resultdate date;
> > BEGIN
> >     SELECT INTO resultdate to_date(to_char((inputdate + interval \
> > '1 month'), 'yyyy-MM') || '-01', 'yyyy-mm-dd');
> >     RETURN resultdate;
> > END;
> > $BODY$
> > LANGUAGE 'plpgsql';
>
>
> No need for the variable or the SELECT, and it's an immutable
> function, so better define that. Besides that it's probably better to
> use the date_trunc function here.
>
> Try:
>
> CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
>   RETURNS date AS
> $BODY$
> BEGIN
>     RETURN date_trunc('month', inputdate + interval '1 month');
> END;
> $BODY$
> LANGUAGE 'plpgsql' IMMUTABLE;
>
> And with that I wonder why you'd even need a function :)

Because it's clear what the function does by the name.  It becomes
self-documenting, and ginormous queries will be easier to grok with
a function called first_day_next_month().

--
Bill Moran
http://www.potentialtech.com

Re: Mechanics of Select

From

"Scott Marlowe"

Date:

11 February 2008, 22:21:59

On Feb 11, 2008 3:56 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:
> On Feb 11, 2008, at 2:23 AM, Willem Buitendyk wrote:
>
> > As others have suggested my big problem with the function I wrote
> > was that I had made it Volatile instead of Immutable (it is no
> > doubt suffering from code bloat as well).  That made all the
> > difference. Curiously though - I tried it just with the date_trunc
> > function and it was just as slow as my old Volatile function.
> >
> > select * from track where datetime >= '2007-04-01' and datetime <
> > date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was
> > about 55s
>
> That's probably because '2007-04-01'::timestamp can be at different
> time zones depending on client configuration and hence is volatile.
>
> If you need a timestamp you probably want to use the servers TZ,
> which you can specify using: timestamp at <your timezone>

No, straight up timestamps shouldn't have this problem, only timestamptz.

I'd suggest trying an index on the date_trunc function here and see if
that helped.