Re: WIP: Make timestamptz_out less slow. - Mailing list pgsql-hackers

From David Rowley
Subject Re: WIP: Make timestamptz_out less slow.
Date
Msg-id CAKJS1f8GvjqFx7SaLTvSLyzfw_SGmcC3MCGUb0=N9n=Xq9Ax6w@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Make timestamptz_out less slow.  (Peter Geoghegan <pg@heroku.com>)
Responses Re: WIP: Make timestamptz_out less slow.  (Peter Geoghegan <pg@heroku.com>)
Re: WIP: Make timestamptz_out less slow.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 5 November 2015 at 13:20, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Sep 11, 2015 at 9:00 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Attached is also timestamp_delta.patch which changes pg_ltostr() to use the
> INT_MIN special case that pg_ltoa also uses. I didn't make a similar change
> to pg_ltostr_zeropad() as I'd need to add even more special code to prepend
> the correct number of zero before the 2147483648. This zero padding reason
> was why I originally came up with the alternative way to handle the negative
> numbers. I had just thought that it was best to keep both functions as
> similar as possible.

I've started looking at this -- "timestamp_out_speedup_2015-09-12.patch".

> I've not done anything yet to remove the #ifdef HAVE_INT64_TIMESTAMP from
> AppendSeconds(). The only way I can think to handle this is to just make
> fsec_t unconditionally an int64 (since with float datetimes the precision is
> 10 decimal digits after the dec point, this does not fit in int32). I'd go
> off and do this, but this means I need to write an int64 version of
> pg_ltostr(). Should I do it this way?

Have you thought about *just* having an int64 pg_ltostr()? Are you
aware of an appreciable overhead from having int32 callers just rely
on a promotion to int64?


I'd not thought of that. It would certainly add unnecessary overhead on 32bit machines. 
To be honest, I don't really like the idea of making fsec_t an int64, there are just to many places around the code base that need to be updated. Plus with float datetimes, we'd need to multiple the fractional seconds by 1000000000, which is based on the MAX_TIME_PRECISION setting as defined when HAVE_INT64_TIMESTAMP is undefined.

 
I have a few minor nitpicks about the patch.

No need for "negative number" comment here -- just use
"Assert(precision >= 0)" code:

+ * Note 'precision' must not be a negative number.
+ * Note callers should assume cp will not be NUL terminated.
  */
-static void
+static char *
 AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros)
 {
+#ifdef HAVE_INT64_TIMESTAMP
+

I would just use a period/full stop here:

+ * Note str is not NUL terminated, callers are responsible for NUL terminating
+ * str themselves.


Do you mean change the comment to "Note str is not NUL terminated. Callers are responsible for NUL terminating str themselves." ?

 
Don't think you need so many "Note:" specifiers here:

+ * Note: Callers should ensure that 'padding' is above zero.
+ * Note: This function is optimized for the case where the number is not too
+ *      big to fit inside of the specified padding.
+ * Note: Caller must ensure that 'str' points to enough memory to hold the
+        result (at least 12 bytes, counting a leading sign and trailing NUL,
+        or padding + 1 bytes, whichever is larger).
+ */
+char *
+pg_ltostr_zeropad(char *str, int32 value, int32 padding)


Yes, that's a bit ugly. How about just Assert(padding > 0) just like above? That gets rid of one Note:
The optimized part is probably not that important anyway. I just mentioned it to try and reduce the changes of someone using it when the padding is always too small.
 
Not so sure about this, within the new pg_ltostr_zeropad() function:

+   /*
+    * Handle negative numbers in a special way. We can't just append a '-'
+    * prefix and reverse the sign as on two's complement machines negative
+    * numbers can be 1 further from 0 than positive numbers, we do it this way
+    * so we properly handle the smallest possible value.
+    */
+   if (num < 0)
+   {
      ...
+   }
+   else
+   {
      ...
+   }

Maybe it's better to just special case INT_MIN and the do an Abs()?
Actually, would any likely caller of pg_ltostr_zeropad() actually care
if INT_MIN was a case that you cannot handle, and assert against? You
could perhaps reasonably make it the caller's problem. Haven't made up
my mind if your timestamp_delta.patch is better than that; cannot
immediately see a problem with putting it on the caller.


I can make that case work the same as pg_ltoa() for pg_ltostr(), that's not a problem, I did that in the delta patch. With pg_ltostr_zeropad() I'd need to add some corner case code to prepend the correct number of zeros onto -2147483648. This is likely not going to look very nice, and also it's probably going to end up about the same number of lines of code to what's in the patch already. If you think it's better for performance, then I've already done tests to show that it's not better. I really don't think it'll look very nice either. I quite like my negative number handling, since it's actually code that would be exercised 50% of the time ran than 1/ 2^32 of the time, assuming all numbers have an equal chance of being passed.
 
More generally, maybe routines like EncodeDateTime() ought now to use
the Abs() macro.


You might be right about that. Then we can just have pg_ltostr() do unsigned only. The reason I didn't do that is because I was trying to minimise what I was changing in the patch, and I thought it was less likely to make it if I started adding calls to Abs() around the code base.

Perhaps it's good to have pg_ltostr() there so that we have a good fast way to build generic strings in Postgres incrementally. I'd say we'd have a much less reusable function if we only supported unsigned int, as we don't have a database type which is based on unsigned int.
 
Those are all of my nitpicks. :-)


Many thanks for looking at this. Nitpicks are fine :)
 
> I also did some tests on server grade hardware, and the performance increase
> is looking a bit better.
>
> create table ts (ts timestamp not null);
> insert into ts select generate_series('2010-01-01 00:00:00', '2011-01-01
> 00:00:00', '1 sec'::interval);
> vacuum freeze ts;

On my laptop, I saw a 2.4X - 2.6X speedup for this case. That seems
pretty nice to me.

Will make another pass over this soon.

Thanks

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Bitmap index scans use of filters on available columns
Next
From: Peter Eisentraut
Date:
Subject: Re: OS X El Capitan and DYLD_LIBRARY_PATH