Re: Non-decimal integer literals - Mailing list pgsql-hackers

From David Rowley
Subject Re: Non-decimal integer literals
Date
Msg-id CAApHDvpBTHL58j6UmnUyrz=5h15NhY30_AvZ_5fQ7LJMhMcyxA@mail.gmail.com
Whole thread Raw
In response to Re: Non-decimal integer literals  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers
On Sat, 26 Nov 2022 at 05:13, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 24.11.22 10:13, David Rowley wrote:
> > I
> > remember many years ago and several jobs ago when working with SQL
> > Server being able to speed up importing data using hexadecimal
> > DATETIMEs. I can't think why else you might want to represent a
> > DATETIME as a hexstring, so I assumed this was a large part of the use
> > case for INTs in PostgreSQL. Are you telling me that better
> > performance is not something anyone will want out of this feature?
>
> This isn't about datetimes but about integers.

I'm aware. My aim was to show that hex is commonly used as a more
efficient way of getting integer numbers in and out of computers.

Likely it's better for me to quantify this performance increase claim
with some actual performance results.

Here's master (@f0cd57f85) doing copy ab2 from '/tmp/ab.csv';

ab2 is a table with no indexes and just 2 int columns.

  16.55%  postgres          [.] CopyReadLine
   7.82%  postgres          [.] pg_strtoint32
   7.60%  postgres          [.] CopyReadAttributesText
   7.06%  postgres          [.] NextCopyFrom
   4.40%  postgres          [.] CopyFrom

The copy completes in 2512.5278 ms (average time over 10 runs)

Patching master with your v11 patch and copying in hex numbers instead
of decimal numbers shows:

  14.39%  postgres          [.] CopyReadLine
   8.60%  postgres          [.] pg_strtoint32
   6.95%  postgres          [.] NextCopyFrom
   6.79%  postgres          [.] CopyReadAttributesText
   4.81%  postgres          [.] CopyFrom

This shows that we're spending proportionally less time in
CopyReadLine() and proportionally more time in pg_strtoint32(). There
are probably two things going on there, CopyReadLine is likely faster
due to having to read fewer bytes and pg_strtoint32() is likely slower
due to additional branching and code size.

This (copy ab2 from '/tmp/abhex.csv') saw an average time of 2720.1387
ms over 10 runs.

Patching master with your v11 patch +
more_efficient_hex_oct_and_binary_processing.diff

  15.68%  postgres          [.] CopyReadLine
   7.75%  postgres          [.] NextCopyFrom
   7.73%  postgres          [.] pg_strtoint32
   6.25%  postgres          [.] CopyReadAttributesText
   4.76%  postgres          [.] CopyFrom

The average time to import the hex version of the csv file comes down
to 2385.7298 ms over 10 runs.

I didn't run any tests to see how much the performance of importing
the decimal representation slowed down from the v11 patch. I assume
there will be a small performance hit due to the extra processing done
in pg_strtoint32()

David



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: fixing CREATEROLE
Next
From: Michael Paquier
Date:
Subject: Re: Allow file inclusion in pg_hba and pg_ident files