Thread: query by partial timestamp
I have a column of type TIMESTAMP, I'd like to query all records from 2011. If it were text I could use a partial such as: WHERE text ~ '2011' There must be a simple way to pull the year part out of a timestamp format. Thanks in advance.
On 08/01/2013 22:19, Kirk Wythers wrote: > I have a column of type TIMESTAMP, I'd like to query all records from > 2011. If it were text I could use a partial such as: > > WHERE text ~ '2011' > > There must be a simple way to pull the year part out of a timestamp > format. Thanks in advance. You want the extract() function. Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On 08/01/2013 22:26, Raymond O'Donnell wrote: > On 08/01/2013 22:19, Kirk Wythers wrote: >> I have a column of type TIMESTAMP, I'd like to query all records from >> 2011. If it were text I could use a partial such as: >> >> WHERE text ~ '2011' >> >> There must be a simple way to pull the year part out of a timestamp >> format. Thanks in advance. > > You want the extract() function. Sorry, meant to include the reference: http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On 01/08/2013 03:39 PM, Raymond O'Donnell wrote: > On 08/01/2013 22:26, Raymond O'Donnell wrote: >> On 08/01/2013 22:19, Kirk Wythers wrote: >>> I have a column of type TIMESTAMP, I'd like to query all records from >>> 2011. If it were text I could use a partial such as: >>> >>> WHERE text ~ '2011' >>> >>> There must be a simple way to pull the year part out of a timestamp >>> format. Thanks in advance. >> >> You want the extract() function. > > Sorry, meant to include the reference: > > http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT > > Ray. > Ray, I thought you were simply encouraging the OP to learn to fish :)
On 01/08/2013 02:19 PM, Kirk Wythers wrote: > I have a column of type TIMESTAMP, I'd like to query all records from 2011. If it were text I could use a partial suchas: > > WHERE > text ~ '2011' > > There must be a simple way to pull the year part out of a timestamp format. Thanks in advance. > > > > As others pointed out, you can do this with "extract(...). BUT, if your timestamp column is indexed (and if the index will constrain your records to a sufficiently small subset of the table that use of indexes is warranted) you may be better off using date comparisons. I doubt the planner will use the indexes otherwise. Cheers, Steve
On Tuesday, January 8, 2013 at 09:26, Raymond O'Donnell wrote: >On 08/01/2013 22:19, Kirk Wythers wrote: >>I have a column of type TIMESTAMP, I'd like to query all records from >>2011. If it were text I could use a partial such as: >> >>WHERE text ~ '2011' >> >>There must be a simple way to pull the year part out of a timestamp >>format. Thanks in advance. > >You want the extract() function. > From my perspective there are at least three ways to attack this problem: (I have not tested these, so apologies for the stupid syntax errors.) 1. SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp); 2. SELECT ... WHERE '2011-01-01'::TIMESTAMP <= col_of_type_timestamp AND col_of_type_timestamp <= '2011-12-31'::TIMESTAMP; 3. SELECT ... WHERE (col_of_type_timestamp, col_of_type_timestamp) OVERLAPS (DATE '2011-01-01', DATE '2012-01-01'); Is this the full list? So... generalizing the original question: which approach would yield the best performance and/or compliance with SQL standards? I note Steve Crawford has (strongly) hinted that direct date comparison is more likely to use an index (when available) so I suspect this is the way to go, but would an index based on extract(YEAR...) negate this difference? Regards Gavan Schneider
Gavan Schneider <pg-gts@snkmail.com> writes: > From my perspective there are at least three ways to attack > this problem: > (I have not tested these, so apologies for the stupid syntax errors.) > 1. SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp); > 2. SELECT ... WHERE > '2011-01-01'::TIMESTAMP <= col_of_type_timestamp > AND col_of_type_timestamp <= '2011-12-31'::TIMESTAMP; > 3. SELECT ... WHERE > (col_of_type_timestamp, col_of_type_timestamp) OVERLAPS > (DATE '2011-01-01', DATE '2012-01-01'); > Is this the full list? Another possibility is date_trunc, viz 4. SELECT ... WHERE date_trunc(col_of_type_timestamp, 'year') = '2011-01-01' You could also use BETWEEN, but that's just syntactic sugar for method 2. Note that in both methods 2 and 3 it's easy to get the edge cases wrong; in particular I think your version of method 2 gives the wrong answer for later-than-midnight times on 2011-12-31, while #3 might (not sure) give the wrong answer for exactly midnight on 2012-01-01. These things are fixable of course with a bit of care. Personally I'd go with col >= '2011-01-01' AND col < '2012-01-01' > So... generalizing the original question: which approach would > yield the best performance and/or compliance with SQL standards? > I note Steve Crawford has (strongly) hinted that direct date > comparison is more likely to use an index (when available) so I > suspect this is the way to go, but would an index based on > extract(YEAR...) negate this difference? Method 3 is not indexable at all and is unlikely to become so --- the SQL standard's definition of OVERLAPS is squirrely enough that people haven't bothered to think about optimizing it. Method 2 works well with a plain btree index on the timestamp column. You can get method 1 to be indexed if you create a functional index on "extract(year from col)"; but since the index would have pretty much no other use than answering this exact type of query, that's not a very attractive alternative. Method 4 is like method 1 --- you'd need a specialized index. Note that in any case an index is not going to be helpful if the query would need to fetch more than a few percent of the table. The OP didn't suggest how many years his data covers, but it's quite possible that pulling a full year's worth of data will read enough of the table that there's no point in worrying about whether an index could be used anyway. Another thing to think about is whether you'll have related sorts of queries that aren't about full years --- maybe sometimes you need a month's worth of data, for example. The BETWEEN-style query and a btree index will adapt easily to non-year intervals, while the EXTRACT approach will not, and date_trunc is rather limited as well. regards, tom lane
The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.
On Jan 8, 2013 6:15 PM, "Kirk Wythers" <wythe001@umn.edu> wrote:
>
>
> On Jan 8, 2013, at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> The OP didn't
>> suggest how many years his data covers, but it's quite possible that
>> pulling a full year's worth of data will read enough of the table that
>> there's no point in worrying about whether an index could be used
>> anyway.
>
>
> There are only a few years worth of data, 2008 - 2012. However, the data consists of 15 min measurements and when renormalized (un-pivoted) is several hundred million records. It is conceivable that someone will want to query by month, or even hour of the day.
If that's the case, you may want to look at creating a date dimension and possibly a time dimension for your data analysis (there's a good one to start with on the PostgreSQL wiki). I would highly recommend that you take a look at some dimensional modeling concepts (Kimball is a good place to start).
Also, you may want to look at partitioning the data if it's several hundred million rows.
On Jan 8, 2013, at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.There are only a few years worth of data, 2008 - 2012. However, the data consists of 15 min measurements and when renormalized (un-pivoted) is several hundred million records. It is conceivable that someone will want to query by month, or even hour of the day.
As another poster mentioned, you may want to consider partitioning the table not only for performance but also for eventual archiving/purging of the data.
As long as we are looking at a variety of alternatives, appropriate construction of partial indexes and the query *might* be of value but at Tom and I mentioned previously, indexes become more of a hindrance than a help once you start writing queries that access too much of the table so the planner won't use them in those cases.
Cheers,
Steve
On 1/8/13, Gavan Schneider <pg-gts@snkmail.com> wrote: > 2. SELECT ... WHERE > '2011-01-01'::TIMESTAMP <= col_of_type_timestamp > AND col_of_type_timestamp <= > '2011-12-31'::TIMESTAMP; This won't quite work, because '2011-12-31'::TIMESTAMP is the same as 2011-12-31 00:00:00.00000 so records timestamped later in the day on the 31st would not get selected SELECT ... WHERE '2011-01-01'::TIMESTAMP <= col_of_type_timestamp AND col_of_type_timestamp < '2012-01:01'::TIMESTAMP; would get all records with a 2011 timestamp. -- Mike Nolan
It is probably not the most efficient, but I often use this syntax, which reads better. Select ..... where col_type_timestamp::date between '2011-01-01' and '2011-12-31' This will use a timestamp index. -- Mike Nolan
On Tue, Jan 08, 2013 at 04:19:59PM -0600, Kirk Wythers wrote: > I have a column of type TIMESTAMP, I'd like to query all records from 2011. If it were text I could use a partial suchas: > > WHERE > text ~ '2011' > > There must be a simple way to pull the year part out of a timestamp format. Thanks in advance. using partial checks (like extract, date_part, or even casting field to date) will have problem with index usage. the best way to handle it, is to write the parameters using date arithmetic. like: where column >= '2011-01-01' and column < '2012-01-01' do not be tempted to do: where column >= '2011-01-01' and column <='2011-12-31' which is very bad idea, and will cause data loss. More on index usage: http://www.depesz.com/2010/09/09/why-is-my-index-not-being-used/ Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
On Wednesday, January 9, 2013 at 04:42, Michael Nolan wrote: >On 1/8/13, Gavan Schneider wrote: > >>2. SELECT ... WHERE >>'2011-01-01'::TIMESTAMP <= col_of_type_timestamp >>AND col_of_type_timestamp <= >>'2011-12-31'::TIMESTAMP; > >This won't quite work, because '2011-12-31'::TIMESTAMP >is the same as 2011-12-31 00:00:00.00000 >so records timestamped later in the day on the 31st would not get selected > >SELECT ... WHERE >'2011-01-01'::TIMESTAMP <= col_of_type_timestamp >AND col_of_type_timestamp < '2012-01:01'::TIMESTAMP; > >would get all records with a 2011 timestamp. > Thank you. I was wondering where Tom and Depesz were coming from when they both said less than or equal to the '2011-12-31'::TIMESTAMP would miss data. I was giving it a rest before re-reading, testing, and/or asking 'the right question'. You have supplied the missing part to my puzzle. Mostly I use DATE so have not had much practice wrestling the TIMESTAMP edge cases. I also prefer the closed-open equality tests as you suggest especially as they are the 'only way to go' when grouping data on a monthly basis. My only 'defense' is that I tried to craft my examples as close as possible to the OP statement and not introduce the 'next year' unless forced... lame I know. :) Regards Gavan Schneider