Thread: Slim down integer formatting
Folks, Please find attached a patch to do $subject. It's down to a one table lookup and 3 instructions. In covering the int64 versions, I swiped a light weight division from the Ryu stuff. I'm pretty sure that what I did is not how to do #includes, but it's a PoC. What would be a better way to do this? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
On Tue, Jul 27, 2021 at 9:51 AM David Fetter <david@fetter.org> wrote: > > In covering the int64 versions, I swiped a light weight division from > the Ryu stuff. I'm pretty sure that what I did is not how to do > #includes, but it's a PoC. What would be a better way to do this? > That patch didn't apply for me (on latest source) so I've attached an equivalent with those changes, that does apply, and also tweaks the Makefile include path to address that #include issue. Regards, Greg Nancarrow Fujitsu Australia
Attachment
On Tue, Jul 27, 2021 at 12:28:22PM +1000, Greg Nancarrow wrote: > That patch didn't apply for me (on latest source) so I've attached an > equivalent with those changes, that does apply, and also tweaks the > Makefile include path to address that #include issue. When applying some micro-benchmarking to stress those APIs, how much does this change things? At the end of the day, this also comes down to an evaluation of pg_ulltoa_n() and pg_ultoa_n(). #include "common/int.h" +#include "d2s_intrinsics.h" Er, are you sure about this part? The first version of the patch did that in a different, also incorrect, way. -- Michael
Attachment
On Tue, 27 Jul 2021 at 14:42, Michael Paquier <michael@paquier.xyz> wrote: > When applying some micro-benchmarking to stress those APIs, how much > does this change things? At the end of the day, this also comes down > to an evaluation of pg_ulltoa_n() and pg_ultoa_n(). I'd suggest something like creating a table with, say 1 million INTs and testing the performance of copy <table> to '/dev/null'; Repeat for BIGINT David
On Tue, Jul 27, 2021 at 12:42 PM Michael Paquier <michael@paquier.xyz> wrote: > > #include "common/int.h" > +#include "d2s_intrinsics.h" > Er, are you sure about this part? The first version of the patch did > that in a different, also incorrect, way. Er, I was just trying to help out, so at least the patch could be applied (whether the patch has merit is a different story). Are you saying that it's incorrect to include that header file in this source, or that's the wrong way to do it? (i.e. it's wrong to adjust the makefile include path to pickup the location where that header is located and use #include "<header>"? That header is in src/common, which is not on the default include path). The method I used certainly works, but you have objections? Can you clarify what you say is incorrect? Regards, Greg Nancarrow Fujitsu Australia
On Tue, 27 Jul 2021 at 15:08, Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Jul 27, 2021 at 12:42 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > #include "common/int.h" > > +#include "d2s_intrinsics.h" > > Er, are you sure about this part? The first version of the patch did > > that in a different, also incorrect, way. > > Er, I was just trying to help out, so at least the patch could be > applied (whether the patch has merit is a different story). > Are you saying that it's incorrect to include that header file in this > source, or that's the wrong way to do it? (i.e. it's wrong to adjust > the makefile include path to pickup the location where that header is > located and use #include "<header>"? That header is in src/common, > which is not on the default include path). > The method I used certainly works, but you have objections? > Can you clarify what you say is incorrect? I think the mistake is that the header file is not in src/include/common. For some reason, it's ended up with all the .c files in src/common. I imagine Andrew did this because he didn't ever expect anything else to have a use for these. He indicates that in [1]. Maybe Andrew can confirm? [1] https://www.postgresql.org/message-id/87mup9192t.fsf%40news-spur.riddles.org.uk David
>>>>> "David" == David Rowley <dgrowleyml@gmail.com> writes: David> I think the mistake is that the header file is not in David> src/include/common. For some reason, it's ended up with all the David> .c files in src/common. David> I imagine Andrew did this because he didn't ever expect anything David> else to have a use for these. He indicates that in [1]. David> Maybe Andrew can confirm? It's not that anything else wouldn't have a use for those, it's that anything else SHOULDN'T have a use for those because they are straight imports from upstream Ryu code, they aren't guaranteed to work outside the ranges of values required by Ryu, and if we decided to import a newer copy of Ryu then it would be annoying if any other code was broken as a result. In short, please don't include d2s_intrinsics.h from anywhere other than d2s.c -- Andrew.
On Tue, Jul 27, 2021 at 3:08 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > > In short, please don't include d2s_intrinsics.h from anywhere other than > d2s.c > Thanks for the clarification. The patch author will need to take this into account for the patch's div1e8() usage. Regards, Greg Nancarrow Fujitsu Australia
On 2021-Jul-26, David Fetter wrote: > Folks, > > Please find attached a patch to do $subject. It's down to a one table > lookup and 3 instructions. So how much faster is it than the original? -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > So how much faster is it than the original? I only did some very quick tests. They're a bit noisey. The results indicate an average speedup of 1.7%, but the noise level is above that, so unsure. create table a (a int); insert into a select a from generate_series(1,1000000)a; vacuum freeze a; bench.sql: copy a to '/dev/null'; master @ 93a0bf239 drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres latency average = 153.815 ms latency average = 152.955 ms latency average = 147.491 ms master + v2 patch drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres latency average = 144.749 ms latency average = 151.525 ms latency average = 150.392 ms David
On Wed, Jul 28, 2021 at 01:17:43PM +1200, David Rowley wrote: > On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > So how much faster is it than the original? > > I only did some very quick tests. They're a bit noisey. The results > indicate an average speedup of 1.7%, but the noise level is above > that, so unsure. > > create table a (a int); > insert into a select a from generate_series(1,1000000)a; > vacuum freeze a; > > bench.sql: copy a to '/dev/null'; > > master @ 93a0bf239 > drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres > latency average = 153.815 ms > latency average = 152.955 ms > latency average = 147.491 ms > > master + v2 patch > drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres > latency average = 144.749 ms > latency average = 151.525 ms > latency average = 150.392 ms Thanks for testing this! I got a few promising results early on with -O0, and the technique seemed like a neat way to do things. I generated a million int4s intended to be uniformly distributed across the range of int4, and similarly across int8. int4: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 362.149 ms 359.933 ms latency stddev: 3.44 ms 3.40 ms int8: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 434.944 ms 422.270 ms latency stddev: 3.23 ms 4.02 ms when compiled with -O2: int4: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 167.262 ms 148.673 ms latency stddev: 6.26 ms 1.28 ms i.e. it was actually slower, at least over the 10 runs I did. I assume that "uniform distribution across the range" is a bad case scenario for ints, but I was a little surprised to measure worse performance. Interestingly, what I got for int8s generated to be uniform across their range was int8: patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab latency average: 171.737 ms 174.013 ms latency stddev: 1.94 ms 6.84 ms which doesn't look like a difference to me. Intuitively, I'd expect us to get things in the neighborhood of 1 a lot more often than things in the neighborhood of 1 << (30 or 60). Do we have some idea of the distribution, or at least of the distribution family, that we should expect for ints? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, 28 Jul 2021 at 14:25, David Fetter <david@fetter.org> wrote: > Intuitively, I'd expect us to get things in the neighborhood of 1 a > lot more often than things in the neighborhood of 1 << (30 or 60). Do > we have some idea of the distribution, or at least of the distribution > family, that we should expect for ints? serial and bigserial are generally going to start with smaller numbers. Larger and longer lived databases those numbers could end up on the larger side. serial and bigserial should be a fairly large portion of the use case for integer types, so anything that slows down int4out and int8out for lower order numbers is not a good idea. I think it would have to be a very small slowdown on the low order numbers vs a large speedup for higher order numbers for us to even consider it. David