Re: performance modality in 7.1 for large text attributes? - Mailing list pgsql-hackers

From Paul A Vixie
Subject Re: performance modality in 7.1 for large text attributes?
Date
Msg-id 200012200043.QAA34847@redpaul.mfnx.net
Whole thread Raw
In response to Re: performance modality in 7.1 for large text attributes?  (Thomas Lockhart <lockhart@alumni.caltech.edu>)
List pgsql-hackers
> > anybody know what i could be doing wrong?  (i'm also wondering why SELECT
> > takes ~250ms whereas INSERT takes ~70ms... seems counterintuitive, unless
> > TOAST is doing a LOT better than i think.)
> 
> I would think that this is entirely due to planning the query. An INSERT
> has no decisions to make, whereas a SELECT must decide among a variety
> of possible plans. To hand-optimize selects, you can set some parameters
> to force only some kinds of plans (such as index scan) but in general
> you will need to remember to unset them afterwards or you run the risk
> of bizarrely inappropriate plans for other queries in the same session.

since every "pgcat" invocation is its own sql session, i have no worries
about that.  what i don't know, is how to set these options.  i'm rerunning
my test with PRIMARY KEY on the thing i'm searching on, and will report
results here soon.  it appears that 60ms is still the average INSERT time
(which is fine, btw) but that PRIMARY KEY just about doubles the amount of
disk I/O per INSERT, and the postgres server process is using 4% of the CPU
rather than the 0.5% it had used without PRIMARY KEY.

> > furthermore, are there any plans to offer a better libpq interface to
> > INSERT?  the things i'm doing now to quote the text, and the extra copy
> > i'm maintaining, are painful.
> 
> What exactly are you looking for in "better"? Is it just the quoting
> issue (a longstanding problem which persists for historical reasons :(

well, the programmatic interface to SELECT is just about perfect.  i can
construct the command and send it over, then check the result to see how
many tuples and fields i got back, and then i can get the value in its
native form as a big block of goo.
       /* Send the query. */       if (snprintf(cmd, sizeof cmd, "SELECT %s FROM %s WHERE %s = %s%s%s",
  text, table, key, p, value, p) >= sizeof cmd) {               fprintf(stderr, "%s: snprintf overflow\n", progname);
           goto done;       }       res = PQexec(conn, cmd);       if (PQresultStatus(res) != PGRES_TUPLES_OK) {
      fprintf(stderr, "%s: \"%s\": %s", progname, cmd,                       PQresultErrorMessage(res));
gotodone;       }       if (PQnfields(res) != 1) {               fprintf(stderr, "%s: \"%s\": %d fields?\n",
          progname, cmd, PQnfields(res));               goto done;       }       if (PQntuples(res) != 1) {
 fprintf(stderr, "%s: \"%s\": %d tuples?\n",                       progname, cmd, PQntuples(res));               goto
done;      }
 
       /* Output the result. */       pch = '\0';       for (p = PQgetvalue(res, 0, 0), ch = '\0'; (ch = *p) != '\0';
p++){               putc(ch, fp);               pch = ch;       }       if (pch != '\n')               putc('\n', fp);
    status = 0;
 

for INSERT, though, there is no analogue.  i guess i'm looking for functions
which might have names like PQinsert() and PQaddtuple().  instead, i've got
       /*        * Read the file to find out how large it will be when quoted.        * If it's not a regular file,
makea copy while reading, then switch.        */...       /* Construct the INSERT command. */       n = snprintf(cmd,
sizeofcmd,                    "INSERT INTO %s ( %s, %s ) VALUES ( %s%s%s, '",                    table, key, text, p,
value,p);       if (n >= sizeof cmd) {               fprintf(stderr, "%s: snprintf overflow\n", progname);
goto done;       }       t = malloc(n + size + sizeof "');");       if (t == NULL) {               perror("malloc");
          goto done;       }       strcpy(t, cmd);       tp = t + n;       while ((ch = getc(fp)) != EOF) {
 if (ch == '\\' || ch == '\'')                       *tp++ = '\\';               *tp++ = ch;       }       *tp++ =
'\'';      *tp++ = ')';       *tp++ = ';';       *tp++ = '\0';
 
       /* Send the command. */       res = PQexec(conn, t);       if (PQresultStatus(res) != PGRES_COMMAND_OK) {
      fprintf(stderr, "%s: \"%s\": %s", progname, t,                       PQresultErrorMessage(res));
gotodone;       }       if (strcmp(PQcmdTuples(res), "1") != 0) {               fprintf(stderr, "%s: \"%s...\": '%s'
tuples?(%s)\n",                       progname, cmd, PQcmdTuples(res), PQcmdStatus(res));               goto done;
}       status = 0;
 

which is really, really painful.  the large "text" is a great idea, but the
old "lo_" API actually had some things going for it.

> Possible causes of the 1.5s "mode" (at least as a starting point):
> 
> o task scheduling on your test machine (not likely??)
> 
> o swapping/thrashing on your test machine (not likely??)

the machine was idle other than for this test.  it's a two processor
freebsd machine.

> o WAL fsync() log commits and cleanup (aggregate throughput is great,
> but every once in a while someone waits while the paperwork gets done.
> Waiting may be due to processor resource competition)
> 
> o Underlying file system bookkeeping from the kernel. e.g. flushing
> buffers to disk etc etc.

i'm going to make a 500MB MFS partition for /usr/local/pgsql/data/base if
the PRIMARY KEY idea doesn't work out, just to rule out actuator noise.


pgsql-hackers by date:

Previous
From: Josh Rovero
Date:
Subject: Re: Sample databases?
Next
From: Tom Lane
Date:
Subject: Re: day 2 results