Thread: extracting min date and grouping

extracting min date and grouping

From
"Keith Worthington"
Date:
Hi All,

I have a table with a bunch of measurement data that I need to summarize.  I
would like to use the min date for my output along with a sum of the
quantites.  At some point this will all be written into a target table.

This is the data.
IPADB=# SELECT * FROM inventory.tbl_scanner;
   scan_timestamp    | item_id | quantity | employee_id | void
---------------------+---------+----------+-------------+------
 2004-12-20 16:09:47 | SEB12   |      555 | 116         | f
 2004-12-20 16:10:03 | B346.0  |      555 | 116         | f
 2004-12-20 16:10:11 | B346.5  |      888 | 116         | f
 2004-12-20 16:09:33 | SAC38   |       66 | 116         | f
 2004-12-19 09:05:29 | SNAP50  |     2255 | 116         | f
 2004-12-19 09:05:39 | RSN2222 |      525 | 116         | f
 2004-12-19 09:05:49 | SAC38   |      658 | 116         | f
(7 rows)

I can get the min date but isn't there an easier/faster way?
IPADB=# SELECT CAST(extract(year from min(scan_timestamp)) || extract(month
from min(scan_timestamp)) || extract(day from min(scan_timestamp)) AS date) AS
inv_date FROM inventory.tbl_scanner;
  inv_date
------------
 2004-12-19
(1 row)

The group and the sum is straightforward but I get this
IPADB=# SELECT CAST(extract(year from min(scan_timestamp)) || extract(month
from min(scan_timestamp)) || extract(day from min(scan_timestamp)) AS date) AS
inventory_date, item_id, sum(quantity) as total FROM inventory.tbl_scanner
GROUP BY item_id;
 inventory_date | item_id | total
----------------+---------+-------
 2004-12-20     | B346.0  |   555
 2004-12-20     | B346.5  |   888
 2004-12-19     | RSN2222 |   525
 2004-12-19     | SAC38   |   724
 2004-12-20     | SEB12   |   555
 2004-12-19     | SNAP50  |  2255
(6 rows)

When what I really want is this.
 inventory_date | item_id | total
----------------+---------+-------
 2004-12-19     | B346.0  |   555
 2004-12-19     | B346.5  |   888
 2004-12-19     | RSN2222 |   525
 2004-12-19     | SAC38   |   724
 2004-12-19     | SEB12   |   555
 2004-12-19     | SNAP50  |  2255

Any help is appreciated.

Kind Regards,
Keith

______________________________________________
99main Internet Services http://www.99main.com


Re: extracting min date and grouping

From
Michael Fuhr
Date:
On Tue, Dec 21, 2004 at 10:37:56PM -0500, Keith Worthington wrote:

>    scan_timestamp    | item_id | quantity | employee_id | void
> ---------------------+---------+----------+-------------+------
>  2004-12-20 16:09:47 | SEB12   |      555 | 116         | f
>  2004-12-20 16:10:03 | B346.0  |      555 | 116         | f
>  2004-12-20 16:10:11 | B346.5  |      888 | 116         | f
>  2004-12-20 16:09:33 | SAC38   |       66 | 116         | f
>  2004-12-19 09:05:29 | SNAP50  |     2255 | 116         | f
>  2004-12-19 09:05:39 | RSN2222 |      525 | 116         | f
>  2004-12-19 09:05:49 | SAC38   |      658 | 116         | f

It would be helpful to see the SQL statements used to create and
populate the sample table -- that way readers can paste them into
their database and experiment with the queries you're after.

> I can get the min date but isn't there an easier/faster way?
> IPADB=# SELECT CAST(extract(year from min(scan_timestamp)) || extract(month
> from min(scan_timestamp)) || extract(day from min(scan_timestamp)) AS date) AS
> inv_date FROM inventory.tbl_scanner;

SELECT MIN(scan_timestamp)::DATE AS inv_date FROM inventory.tbl_scanner;

If the table is large then you could speed up the operation by
creating an index on DATE(scan_timestamp) and using ORDER BY and
LIMIT (a PostgreSQL trick to speed up certain aggregate queries):

CREATE INDEX tbl_scanner_date_idx
  ON inventory.tbl_scanner (DATE(scan_timestamp));

SELECT DATE(scan_timestamp) AS inv_date
FROM inventory.tbl_scanner
ORDER BY DATE(scan_timestamp) LIMIT 1;

If you use EXPLAIN ANALYZE on the various queries, you should see
that the last one uses the index while the others do a sequential
scan.

> The group and the sum is straightforward but I get this
> IPADB=# SELECT CAST(extract(year from min(scan_timestamp)) || extract(month
> from min(scan_timestamp)) || extract(day from min(scan_timestamp)) AS date) AS
> inventory_date, item_id, sum(quantity) as total FROM inventory.tbl_scanner
> GROUP BY item_id;
>  inventory_date | item_id | total
> ----------------+---------+-------
>  2004-12-20     | B346.0  |   555
>  2004-12-20     | B346.5  |   888
>  2004-12-19     | RSN2222 |   525
>  2004-12-19     | SAC38   |   724
>  2004-12-20     | SEB12   |   555
>  2004-12-19     | SNAP50  |  2255
> (6 rows)
>
> When what I really want is this.
>  inventory_date | item_id | total
> ----------------+---------+-------
>  2004-12-19     | B346.0  |   555
>  2004-12-19     | B346.5  |   888
>  2004-12-19     | RSN2222 |   525
>  2004-12-19     | SAC38   |   724
>  2004-12-19     | SEB12   |   555
>  2004-12-19     | SNAP50  |  2255

Here's one way to get that result for your example data set -- it
gets the table's minimum date in a subselect and cross joins it to
the table's records:

SELECT inventory_date, item_id, SUM(quantity) AS total
FROM (SELECT MIN(scan_timestamp)::DATE AS inventory_date
      FROM inventory.tbl_scanner) AS s
CROSS JOIN inventory.tbl_scanner
GROUP BY inventory_date, item_id
ORDER BY item_id;

Why do you want all records to have an inventory_date of 2004-12-19?
Because that's the beginning of an inventory period?  If so, how
is that beginning date determined, i.e., how do you know when one
period ends and another begins?  If we knew more about what you're
trying to do then maybe we could suggest improvements.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/