Re: [ADMIN] Processing very large TEXT columns (300MB+) using C/libpq - Mailing list pgsql-admin

From Geoff Winkless
Subject Re: [ADMIN] Processing very large TEXT columns (300MB+) using C/libpq
Date
Msg-id CAEzk6ffoE052b_PMYnf_y+u=2ckn5WTYisxmPuzHj5SWmi8r_A@mail.gmail.com
Whole thread Raw
In response to Re: [ADMIN] Processing very large TEXT columns (300MB+) using C/libpq  (Bear Giles <bgiles@coyotesong.com>)
Responses Re: [ADMIN] Processing very large TEXT columns (300MB+) using C/libpq  (Geoff Winkless <pgsqladmin@geoff.dj>)
List pgsql-admin
On 21 Oct 2017 12:32, "Bear Giles" <bgiles@coyotesong.com> wrote:

> In that case you must put a read lock on the string that covers the loop. If you're in
> a multi-threaded environment and not using locks when appropriate then all bets are off.

You reckon a compiler can decide to blow up your code by making
assumptions like that?

Your loop could set a var for a state machine in a processing thread
to modify the string. That doesn't preclude correct locking behaviour.

If you think that's too contrived then forget threads, you could make
a shared library call that the compiler can't assess at compile-time
that could change the string.

Yes, in either case, using strlen to check for that is poor code, but
the compiler can't assume you're not using poor code.

This argument is pretty pointless. The only way to be sure to avoid
the problem is to assume that the compiler won't optimize bad code.

FWIW gcc 4.8.5 with -O3 doesn't optimize away strlen even in code this simple:

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv) { int i; char *buff; buff=malloc(strlen(argv[1])); for (i=0; i < strlen(argv[1]); i++)
{  buff[i]=argv[1][i]; } printf("%s", buff);
 
}


.L3:       movzbl  0(%rbp,%rbx), %edx       movb    %dl, (%r12,%rbx)       movq    8(%r13), %rbp       addq    $1,
%rbx
.L2:       movq    %rbp, %rdi       call    strlen       cmpq    %rax, %rbx       jb      .L3

However, it _does_ optimize this code:

int main (int argc, char **argv) { int i; char *buff; char *buff2; buff2=strdup(argv[1]); buff=malloc(strlen(buff2));
for(i=0; i < strlen(buff2); i++) {   buff[i]=buff2[i]; } printf("%s", buff);
 
}

I assume that's because it can be certain at compile time that, since
both buff and buff2 are local, nothing else is going to modify the
source string (without some stack smashing, anyway).

Geoff


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

pgsql-admin by date:

Previous
From: Ervin Weber
Date:
Subject: [ADMIN] confusing .pgpass behaviour for undocumented replication=trueconnection parameter
Next
From: Geoff Winkless
Date:
Subject: Re: [ADMIN] Processing very large TEXT columns (300MB+) using C/libpq