Re: width_bucket function for timestamps - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: width_bucket function for timestamps
Date
Msg-id 20061009155346.GY72517@nasby.net
Whole thread Raw
In response to width_bucket function for timestamps  (Jeremy Drake <pgsql@jdrake.com>)
Responses Re: width_bucket function for timestamps  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Sinte we already have width_bucket, I'd argue this should go in core. If
someone's feeling adventurous, there should probably be a double
precision version as well. Hrm... and maybe text...

Doesn't the backend already have something like this for calculating
histograms?

On Sun, Oct 08, 2006 at 10:30:47PM -0700, Jeremy Drake wrote:
> I just came across this code I wrote about a year ago which implements a
> function equivilant to width_bucket for timestamps.
> 
> I wrote this when I was trying to plot some data over time, and I had more
> points than I needed.  This function allowed me to create a pre-determined
> number of "bins" to average the data inside of so that I could get a sane
> number of points.  Part of the problem was that there were so many data
> points, that a sql implementation of the function (or plpgsql, I forget,
> it was a year ago) was painfully slow.  This C function provided much
> better performance than any other means at my disposal.
> 
> I wanted to share this code since it may be useful for someone else, but I
> don't know exactly what to do with it.  So I am putting it out there, and
> asking what the proper home for such a function might be.  I believe it
> would be generally useful for people, but it is so small that it hardly
> seems like a reasonable pgFoundry project.  Maybe there is a home for such
> a thing in the core distribution in a future release?
> 
> The code can be found at
> http://www.jdrake.com/postgresql/bintimestamp.tar.gz for a buildable PGXS
> module, or I attached just the C code.  There is no documentation, the
> parameters work the same as the width_bucket function.  The code is not
> necessarily the most readable in the world, I was trying to get as much
> speed out of it as possible, since I was calling it over a million times
> as a group by value.
> 
> Thanks for any pointers...
> 
> -- 
> Fortune's Office Door Sign of the Week:
> 
>     Incorrigible punster -- Do not incorrige.

> /*****************************************************************************
>  * file:        $RCSfile: bintimestamp.c,v $ $Revision: 1.1 $
>  * module:      timestamp
>  * authors:     jeremyd
>  * last mod:    $Author: jeremyd $ at $Date: 2005/10/28 20:26:38 $
>  * 
>  * created:     Fri Oct 28 13:26:38 PDT 2005
>  * 
>  *****************************************************************************/
> 
> #include <string.h>
> #include <math.h>
> #include "postgres.h"
> 
> #include "fmgr.h"
> #include "libpq/pqformat.h"
> #include "utils/builtins.h"
> #include "funcapi.h"
> #include "utils/timestamp.h"
> 
> #ifndef JROUND
> #    define JROUND(x) (x)
> #endif
> 
> Datum timestamp_get_bin_size(PG_FUNCTION_ARGS);
> Datum timestamp_bin(PG_FUNCTION_ARGS);
> 
> PG_FUNCTION_INFO_V1(timestamp_get_bin_size);
> Datum
> timestamp_get_bin_size(PG_FUNCTION_ARGS)
> {
>     Timestamp start = PG_GETARG_TIMESTAMP(0);
>     Timestamp stop = PG_GETARG_TIMESTAMP(1);
>     int32 nbuckets = PG_GETARG_INT32(2);
>     Interval * retval = (Interval *)palloc (sizeof(Interval));
> 
>     if (!retval)
>     {
>         ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), errmsg("insufficient memory for Interval allocation")));
>         PG_RETURN_NULL();
>     }
> 
>     memset (retval, 0, sizeof(Interval));
> 
>     retval->time = JROUND ((stop - start) / nbuckets);
> 
>     PG_RETURN_INTERVAL_P(retval);
> }
> 
> PG_FUNCTION_INFO_V1(timestamp_bin);
> Datum
> timestamp_bin(PG_FUNCTION_ARGS)
> {
>     /*Timestamp op = PG_GETARG_TIMESTAMP(0);*/
>     Timestamp start = PG_GETARG_TIMESTAMP(1);
>     /*Timestamp stop = PG_GETARG_TIMESTAMP(2);*/
>     Timestamp binsz;
>     /*int32 nbuckets = PG_GETARG_INT32(3)*/;
> 
>     binsz = (PG_GETARG_TIMESTAMP(2) - start) / PG_GETARG_INT32(3);
> 
>     PG_RETURN_TIMESTAMP(JROUND((int)((PG_GETARG_TIMESTAMP(0) - start) / binsz) * binsz + start));
> }

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly


-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


pgsql-hackers by date:

Previous
From: "Mark Woodward"
Date:
Subject: Re: Upgrading a database dump/restore
Next
From: Tom Lane
Date:
Subject: Re: width_bucket function for timestamps