Why is pq_begintypsend so slow? - Mailing list pgsql-hackers

From Jeff Janes
Subject Why is pq_begintypsend so slow?
Date
Msg-id CAMkU=1whbRDUwa4eayD9+59K-coxO9senDkPRbTn3cg0pUz4AQ@mail.gmail.com
Whole thread Raw
Responses Re: Why is pq_begintypsend so slow?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I was using COPY recently and was wondering why BINARY format is not much (if any) faster than the default format.  Once I switched from mostly exporting ints to mostly exporting double precisions (7e6 rows of 100 columns, randomly generated), it was faster, but not by as much as I intuitively thought it should be.  

Running 'perf top' to profile a "COPY BINARY .. TO '/dev/null'" on a AWS m5.large machine running Ubuntu 18.04, with self compiled PostgreSQL:

PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit

I saw that the hotspot was pq_begintypsend at 20%, which was twice the percentage as the next place winner (AllocSetAlloc).  If I drill down into teh function, I see something like the below.  I don't really speak assembly, but usually when I see an assembly instruction being especially hot and not being the inner most instruction in a loop, I blame it on CPU cache misses.  But everything being touched here should already be well cached, since  initStringInfo has just got done setting it up. And if not for that, then the by the 2nd invocation of appendStringInfoCharMacro it certainly should be in the cache, yet that one is even slower than the 1st appendStringInfoCharMacro.

Why is this such a bottleneck?

pq_begintypsend  /usr/local/pgsql/bin/postgres

 0.15 |    push  %rbx
 0.09 |    mov  %rdi,%rbx
   |       initStringInfo(buf);
 3.03 |    callq initStringInfo
   |       /* Reserve four bytes for the bytea length word */
   |       appendStringInfoCharMacro(buf, '\0');
   |    movslq 0x8(%rbx),%rax
 1.05 |    lea  0x1(%rax),%edx
 0.72 |    cmp  0xc(%rbx),%edx
   |    jge  b0
 2.92 |    mov  (%rbx),%rdx
   |    movb  $0x0,(%rdx,%rax,1)
13.76 |    mov  0x8(%rbx),%eax
 0.81 |    mov  (%rbx),%rdx
 0.52 |    add  $0x1,%eax
 0.12 |    mov  %eax,0x8(%rbx)
 2.85 |    cltq
 0.01 |    movb  $0x0,(%rdx,%rax,1)
   |       appendStringInfoCharMacro(buf, '\0');
10.65 |    movslq 0x8(%rbx),%rax
   |    lea  0x1(%rax),%edx
 0.90 |    cmp  0xc(%rbx),%edx
   |    jge  ca
 0.54 | 42:  mov  (%rbx),%rdx
 1.84 |    movb  $0x0,(%rdx,%rax,1)
13.88 |    mov  0x8(%rbx),%eax
 0.03 |    mov  (%rbx),%rdx
   |    add  $0x1,%eax
 0.33 |    mov  %eax,0x8(%rbx)
 2.60 |    cltq
 0.06 |    movb  $0x0,(%rdx,%rax,1)
   |       appendStringInfoCharMacro(buf, '\0');
 3.21 |    movslq 0x8(%rbx),%rax
 0.23 |    lea  0x1(%rax),%edx
 1.74 |    cmp  0xc(%rbx),%edx
   |    jge  e0
 0.21 | 67:  mov  (%rbx),%rdx
 1.18 |    movb  $0x0,(%rdx,%rax,1)
 9.29 |    mov  0x8(%rbx),%eax
 0.18 |    mov  (%rbx),%rdx
   |    add  $0x1,%eax
 0.19 |    mov  %eax,0x8(%rbx)
 3.14 |    cltq
 0.12 |    movb  $0x0,(%rdx,%rax,1)
   |       appendStringInfoCharMacro(buf, '\0');
 5.29 |    movslq 0x8(%rbx),%rax
 0.03 |    lea  0x1(%rax),%edx
 1.45 |    cmp  0xc(%rbx),%edx
   |    jge  f6
 0.41 | 8c:  mov  (%rbx),%rdx

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Make autovacuum sort tables in descending order of xid_age
Next
From: Tom Lane
Date:
Subject: Re: Why is pq_begintypsend so slow?