Thread: doc/FAQ* files are hosed

doc/FAQ* files are hosed

From
Tom Lane
Date:
Comparing the current-CVS FAQ files with those in the REL6_4 branch
is depressing --- almost every one is messed up in one branch or the
other.

The main FAQ is newer in the 6.4 branch than in the current branch; the
6.4 version seems to have numerous small fixes that are not in the main
branch.  FAQ_CVS is also newer in 6.4; it's been de-HTMLized in 6.4
(good) but not in current.  FAQ_HPUX, FAQ_Irix, and FAQ_Linux are
completely trashed in the 6.4 branch --- all three have been overwritten
with some generic document.

I'd check in some fixes, but a look at the CVS logs makes me think that
part of the problem is too many cooks stirring the broth already.
Whoever is primary maintainer of these files probably should take
responsibility for getting the branches back in sync.
        regards, tom lane


Re: [HACKERS] doc/FAQ* files are hosed

From
Bruce Momjian
Date:
> Comparing the current-CVS FAQ files with those in the REL6_4 branch
> is depressing --- almost every one is messed up in one branch or the
> other.
> 
> The main FAQ is newer in the 6.4 branch than in the current branch; the
> 6.4 version seems to have numerous small fixes that are not in the main
> branch.  FAQ_CVS is also newer in 6.4; it's been de-HTMLized in 6.4
> (good) but not in current.  FAQ_HPUX, FAQ_Irix, and FAQ_Linux are
> completely trashed in the 6.4 branch --- all three have been overwritten
> with some generic document.

I must have done that during the 6.4.2 release process.  I decided to
grab them from the web site, but it looks like I got the wrong pages
somehow.

They get over-written with ever release, so I have another chance to try
it for 6.5.


This does not apply to the FAQ_CVS.  That's someone else.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] doc/FAQ* files are hosed

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> Comparing the current-CVS FAQ files with those in the REL6_4 branch
>> is depressing --- almost every one is messed up in one branch or the
>> other.

> They get over-written with ever release, so I have another chance to try
> it for 6.5.

Oh?  Aren't the CVS versions considered the master copies?  If not,
where *are* the master copies?
        regards, tom lane


Postgres Speed or lack thereof

From
John Holland
Date:
Hello -

I've been lurking on this list for a little while. I have just done a
little tinkering with Postgres 6.4.2, comparing it to Oracle and mySQL on
Linux. It would appear that just doing a lot of inserts (ie 40000) in a
loop is enormously slower in Postgres in two ways that I tried it.
One - using a loop in Java that makes a JDBC call to insert.
Two- using plpgsql as a comparision to PL/SQL.

Perhaps these are bad techniques and a C based proc would do better?

I really like the idea of an open source DB and am impressed with a lot I
see about PostgreSQL - however the speed difference is pretty bad -
4.5 minutes versus about 20 seconds.

Is that just the way it goes? Are there options that would make it faster
that are not the defaults?

Any reaction would be appreciated.

John Holland



Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
John Holland wrote:
> 
> Hello -
> 
> I've been lurking on this list for a little while. I have just done a
> little tinkering with Postgres 6.4.2, comparing it to Oracle and mySQL on
> Linux. It would appear that just doing a lot of inserts (ie 40000) in a
> loop is enormously slower in Postgres in two ways that I tried it.
> One - using a loop in Java that makes a JDBC call to insert.
> Two- using plpgsql as a comparision to PL/SQL.
> 
> Perhaps these are bad techniques and a C based proc would do better?
> 
> I really like the idea of an open source DB and am impressed with a lot I
> see about PostgreSQL - however the speed difference is pretty bad -
> 4.5 minutes versus about 20 seconds.
> 
> Is that just the way it goes? Are there options that would make it faster
> that are not the defaults?

Oracle uses chained transaction mode (i.e. - all queries run in
single transaction untill explicit COMMIT/ABORT), MySQL hasn't
transaction at all...

PostgreSQL uses un-chained transaction mode: each query runs
in own transaction - 40000 transaction commits !!!

Try to use BEGIN/END to run all inserts in single transaction
and please let us know results.

Vadim


Re: [HACKERS] doc/FAQ* files are hosed

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> Comparing the current-CVS FAQ files with those in the REL6_4 branch
> >> is depressing --- almost every one is messed up in one branch or the
> >> other.
> 
> > They get over-written with ever release, so I have another chance to try
> > it for 6.5.
> 
> Oh?  Aren't the CVS versions considered the master copies?  If not,
> where *are* the master copies?

The master copies are on the web site.  In fact, people send e-mail to a
special account here, and they get transfered to the web site
automatically.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> John Holland wrote:
>> I've been lurking on this list for a little while. I have just done a
>> little tinkering with Postgres 6.4.2, comparing it to Oracle and mySQL on
>> Linux. It would appear that just doing a lot of inserts (ie 40000) in a
>> loop is enormously slower in Postgres in two ways that I tried it.
>> One - using a loop in Java that makes a JDBC call to insert.
>> Two- using plpgsql as a comparision to PL/SQL.
>> 
>> I really like the idea of an open source DB and am impressed with a lot I
>> see about PostgreSQL - however the speed difference is pretty bad -
>> 4.5 minutes versus about 20 seconds.

> Try to use BEGIN/END to run all inserts in single transaction
> and please let us know results.

I tried this myself and found that wrapping BEGIN/END around a series of
INSERT statements didn't make much difference at all.

On an HP 715 (75MHz RISC box, not very heavy iron by current standards),
I inserted about 13000 records into an initially-empty table having 38
columns (just because it's data I had handy...).  I timed it at:

Individual INSERT statements (as generated by pg_dump -d):33 inserts/sec
Same with BEGIN/END wrapped around 'em:34 inserts/sec
Single COPY statement (as generated by pg_dump without -d):1400 inserts/sec

This was for a simple table with no indexes.  In reality, this table
type has four b-tree indexes on different columns in our normal usage.
The COPY speed dropped to 325 inserts/sec when I had the indexes in
place.  I didn't bother trying the INSERTs that way.

The conventional wisdom is that you should use COPY for bulk loading,
and this result supports it...

John didn't say what hardware he's using, so I don't know how comparable
my result is to his 150 inserts/sec --- that might have been on a table
with many fewer columns, or maybe his machine is just faster.

As for where the time is going: "top" showed that the time for the
INSERT ops was almost all going into backend CPU time.  My guess is
that most of the cost is in parsing and planning the INSERT statements.
Pre-planned statement skeletons might help, but I think the real answer
will be to find a way to avoid constructing and parsing SQL statements
for simple operations like INSERT.  (One reason I'm interested in the
idea of a CORBA interface is that it might help here.)

My guess is that Oracle and mySQL have APIs that allow the construction
of an SQL INSERT command to be bypassed, and that's why they show up
better on this operation.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom
Date:
On Sun, 17 Jan 1999, Tom Lane wrote:

> I tried this myself and found that wrapping BEGIN/END around a series of
> INSERT statements didn't make much difference at all.
 Using what API?  Some APIs control autocommit for you, so execing
"BEGIN" and "END" commands may not do anything.  You might want to check
your API for a way to set autocommit off.  In fact, I would suggest that
you never use BEGIN and END except via cli.

Tom



Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Tom <tom@sdf.com> writes:
> On Sun, 17 Jan 1999, Tom Lane wrote:
>> I tried this myself and found that wrapping BEGIN/END around a series of
>> INSERT statements didn't make much difference at all.

>   Using what API?

Sorry, I neglected to specify that it was psql (being driven by
hand-trimmed pg_dump scripts).

> Some APIs control autocommit for you, so execing
> "BEGIN" and "END" commands may not do anything.

That's not the case for psql.  If it were, I would have measured no
difference in speed, rather than a small difference.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom
Date:
On Sun, 17 Jan 1999, Tom Lane wrote:

> Tom <tom@sdf.com> writes:
> > On Sun, 17 Jan 1999, Tom Lane wrote:
> >> I tried this myself and found that wrapping BEGIN/END around a series of
> >> INSERT statements didn't make much difference at all.
> 
> >   Using what API?
> 
> Sorry, I neglected to specify that it was psql (being driven by
> hand-trimmed pg_dump scripts).
 This also wouldn't you give you any benefit from "prepare" that would
speed up repeated executions of the same statement.  PostgreSQL does allow
statements to be prepared right?  If it doesn't, chances are COPY has a
way of doing it.  It saves a lot of parsing and planning time.

> > Some APIs control autocommit for you, so execing
> > "BEGIN" and "END" commands may not do anything.
> 
> That's not the case for psql.  If it were, I would have measured no
> difference in speed, rather than a small difference.
> 
>             regards, tom lane

Tom



Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> > Try to use BEGIN/END to run all inserts in single transaction
> > and please let us know results.
> 
> I tried this myself and found that wrapping BEGIN/END around a series of
> INSERT statements didn't make much difference at all.
> 
> On an HP 715 (75MHz RISC box, not very heavy iron by current standards),
> I inserted about 13000 records into an initially-empty table having 38
                    ^^
 
> columns (just because it's data I had handy...).  I timed it at: ^^^^^^^
So much.

> Individual INSERT statements (as generated by pg_dump -d):
>         33 inserts/sec
> Same with BEGIN/END wrapped around 'em:
>         34 inserts/sec
> Single COPY statement (as generated by pg_dump without -d):
>         1400 inserts/sec
> 
...
> 
> John didn't say what hardware he's using, so I don't know how comparable
> my result is to his 150 inserts/sec --- that might have been on a table
> with many fewer columns, or maybe his machine is just faster.
> 
> As for where the time is going: "top" showed that the time for the
> INSERT ops was almost all going into backend CPU time.  My guess is
> that most of the cost is in parsing and planning the INSERT statements.
> Pre-planned statement skeletons might help, but I think the real answer
> will be to find a way to avoid constructing and parsing SQL statements
> for simple operations like INSERT.  (One reason I'm interested in the
> idea of a CORBA interface is that it might help here.)
> 
> My guess is that Oracle and mySQL have APIs that allow the construction
> of an SQL INSERT command to be bypassed, and that's why they show up
> better on this operation.

For the table with single int4 column & 2000 INSERTs I have:

BEGIN/END:        3.5sec        ~600insert/sec
Without:          151sec        ~13insert/sec

(IDE disk!)

I guess that you don't use -F flag..?

Vadim


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
I wrote:
> As for where the time is going: "top" showed that the time for the
> INSERT ops was almost all going into backend CPU time.  My guess is
> that most of the cost is in parsing and planning the INSERT
> statements.

Having an idle hour this evening, I thought it'd be interesting to build
a backend with profiling enabled, so as to confirm or deny the above
guess.  It seems that indeed a lot of time is being wasted, but where
it's being wasted might surprise you!

I profiled a backend built from this morning's current CVS sources,
being driven by a single psql run reading a series of exactly 5000
INSERT commands.  The first few are:

INSERT INTO "fehistory_1" values (64,14,252,'D','-','-','s',1,4027,NULL,'S',1,4088,NULL,32,'Thu Jan 07 15:15:00 1999
EST','FriJan 08 08:30:38 1999 EST',4027,32,NULL,NULL,46,NULL);
 
INSERT INTO "fehistory_1" values (65,14,252,'P','-','-','S',1,4086,NULL,NULL,NULL,NULL,NULL,32,'Fri Jan 08 16:00:00
1999EST',NULL,NULL,NULL,NULL,NULL,45,NULL);
 
INSERT INTO "fehistory_1" values (66,14,135,'P','-','-','S',1,13619,NULL,'s',1,12967,NULL,100,'Fri Jan 08 02:00:00 1999
EST',NULL,NULL,NULL,NULL,NULL,44,NULL);

and it goes on for a long while in that vein :-).  Total runtime was
278.09 CPU seconds in the backend, 307 seconds elapsed realtime.  The
table being inserted into has no indexes and was initially empty.

The profiler is GNU gprof, which provides both flat and structured
profile data.  The top few entries in the flat profile are:
 %   cumulative   self              self     total           time   seconds   seconds    calls  ms/call  ms/call  name
 34.96     97.21    97.21                             _mcount 7.18    117.19    19.98  8047816     0.00     0.00
tree_cut6.56    135.43    18.24   962187     0.02     0.02  recv 3.55    145.31     9.88  8234278     0.00     0.00
tree_insert3.49    155.02     9.71  4292947     0.00     0.01  malloc 3.31    164.22     9.20  1965056     0.00
0.01 SearchSysCache 3.24    173.23     9.01  4291775     0.00     0.01  free 2.44    180.01     6.78  6171667     0.00
  0.00  tree_concatenate 2.17    186.04     6.03  8234069     0.00     0.00  tree_delete 1.07    189.02     2.98
10250    0.29     0.29  write 1.05    191.94     2.92                             _psort_cmp 0.95    194.57     2.63
2983301    0.00     0.00  newNode 0.92    197.14     2.57  4251762     0.00     0.00  OrderedElemPop 0.88    199.60
2.46 2746172     0.00     0.00  comphash 0.82    201.88     2.28  4252495     0.00     0.01  AllocSetAlloc 0.77
204.02    2.14  1965056     0.00     0.00  SearchSysCacheTuple 0.73    206.05     2.03   265000     0.01     0.01
yylex0.69    207.96     1.91                             $$remU 0.68    209.86     1.90  4252500     0.00     0.00
OrderedElemPushHead0.62    211.58     1.72  4247473     0.00     0.00  palloc 0.60    213.26     1.68    25004     0.07
   0.07  send 0.56    214.81     1.55  1965056     0.00     0.00  CatalogCacheComputeHashIndex 0.55    216.33     1.52
  5000     0.30     1.07  yyparse 0.54    217.82     1.49  4251762     0.00     0.01  AllocSetFree 0.53    219.29
1.47 4252494     0.00     0.00  MemoryContextAlloc 0.51    220.71     1.42  4252500     0.00     0.00
OrderedElemPushInto0.49    222.07     1.36  3430844     0.00     0.00  strncmp 0.48    223.41     1.34  4205327
0.00    0.00  OrderedSetGetHead 0.47    224.73     1.32                             elog 0.45    225.99     1.26
1964976    0.00     0.00  DLRemove 0.43    227.19     1.20  2510857     0.00     0.00  strcmp 0.43    228.38     1.19
4200327    0.00     0.00  OrderedElemGetBase 0.42    229.56     1.18  4245740     0.00     0.01  PortalHeapMemoryAlloc
0.38   230.62     1.06  4252500     0.00     0.00  OrderedElemPush 0.37    231.66     1.04  4205327     0.00     0.00
AllocSetGetFirst0.36    232.66     1.00  1965145     0.00     0.00  DLAddHead
 

_mcount is part of the profiling apparatus and can be ignored (although
the fact that it's so high up implies that we're executing an awful lot
of short routine calls, because mcount runs once per entry and exit of
profilable routines).  tree_cut, tree_insert, tree_concatenate, and
tree_delete are subroutines of malloc/free.  Altogether, malloc/free
and friends accounted for 61.39 seconds out of the 180.88 non-overhead
CPU seconds in this run.

In other words, we're spending a third of our time mallocing and freeing
memory.  A tad high, what?

Actually, it's worse than that, because AllocSetAlloc,
PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
routines represent our own bookkeeping layer atop malloc/free.
That's another 18.66 seconds spent in these top routines, which means
that we are real close to expending half the backend's runtime on
memory bookkeeping.  This needs work.

The other thing that jumps out here is the unreasonably high position of
recv(), which is called 962187 times.  The script being read by psql was
only 957186 characters.  Evidently we're invoking a kernel recv() call
once per character read from the frontend.  I suspect this is an
inefficiency introduced by Magnus Hagander's recent rewrite of backend
libpq (see, I told you there was a reason for using stdio ;-)).  We're
gonna have to do something about that, though it's not as critical as
the memory-allocation issue.  It also appears that send() is now being
invoked multiple times per backend reply, which is going to create
inefficiencies outside the backend (ie, multiple packets per reply).
On a test case with a lot of SELECTs that would show up more than it
does here.


Moving on to the dynamic profile, the most interesting items are:

index % time    self  children    called     name
-----------------------------------------------               0.05  118.06       1/1           DoBackend [7]
[8]     65.3    0.05  118.06       1         PostgresMain [8]               0.04   57.64    5000/5000
pg_exec_query[9]               0.01   39.83    5000/5000        CommitTransactionCommand [11]               0.02
18.86   5001/5001        ReadCommand [27]               0.01    1.18    5000/5000        StartTransactionCommand [118]
            0.01    0.36    5001/5001        ReadyForQuery [190]               0.04    0.00    5001/5001
EnableNotifyInterrupt[385]               0.04    0.00    5000/5000        strspn [388]               0.00    0.01
1/1          InitPostgres [495]               0.00    0.00    5000/548573      strlen [172]               0.00    0.00
 5001/93349       memset [366]               0.00    0.00       1/1           read_pg_options [601]               0.00
 0.00       2/2           pq_putint [637]               0.00    0.00       1/15002       pq_putnchar [123]
0.00    0.00       2/10013       getenv [241]               0.00    0.00       1/1           SetPgUserName [683]
      0.00    0.00       1/1           pq_init [687]               0.00    0.00       1/1           proc_exit [704]
         0.00    0.00       1/1           pq_close [721]               0.00    0.00       1/5002        getpgusername
[521]              0.00    0.00       1/2           FindExec [722]               0.00    0.00       5/8
getopt[728]               0.00    0.00    5001/5001        DisableNotifyInterrupt [761]               0.00    0.00
9/34          pqsignal [806]               0.00    0.00       2/5           atoi [837]               0.00    0.00
1/2          SetProcessingMode [868]               0.00    0.00       1/10002       TransactionFlushEnabled [749]
       0.00    0.00       1/1           sigsetjmp [934]
 

("self" is the amount of time (CPU sec) spent directly in the named function,
while "children" is the amount of time spent in its callees, pro-rated
by the number of calls.  For example, strlen was actually measured to
use 0.47 sec in the whole run, but since PostgresMain called it less
than 1% of all the times it was called, PostgresMain gets blamed for
less than 0.01 sec of that total.)

The interesting thing here is the relative times indicated for
pg_exec_query, CommitTransactionCommand, and ReadCommand.  We already
found out why ReadCommand is so slow (recv() per character) ... but why
is CommitTransactionCommand so high, when I already discovered that
doing only one commit doesn't help much?  Patience, we'll get there.

-----------------------------------------------               0.03   57.61    5000/5000        pg_exec_query [9]
[10]    31.9    0.03   57.61    5000         pg_exec_query_dest [10]               0.09   39.12    5000/5000
pg_parse_and_plan[13]               0.00   17.37    5000/5000        ProcessQuery [36]               0.02    0.60
5000/5000       SetQuerySnapshot [159]               0.01    0.34    5000/5000        CommandCounterIncrement [196]
         0.02    0.05   10000/4291775     free [22]
 
-----------------------------------------------               0.09   39.12    5000/5000        pg_exec_query_dest [10]
[13]    21.7    0.09   39.12    5000         pg_parse_and_plan [13]               0.02   19.28    5000/5000
parser[26]               0.03   17.63    5000/5000        planner [34]               0.00    1.98    5000/5000
QueryRewrite[90]               0.02    0.05   10000/4292947     malloc [21]               0.02    0.05   10000/4291775
  free [22]               0.00    0.02    5000/497779      lappend [91]               0.01    0.00    5000/5000
IsAbortedTransactionBlockState[529]               0.00    0.00    5000/72779       length [342]
 

In other words, parsing, planning, and executing an INSERT each take
about the same amount of time.

-----------------------------------------------               0.01   39.83    5000/5000        PostgresMain [8]
[11]    22.0    0.01   39.83    5000         CommitTransactionCommand [11]               0.02   39.81    5000/5000
 CommitTransaction [12]
 
-----------------------------------------------               0.02   39.81    5000/5000        CommitTransactionCommand
[11]
[12]    22.0    0.02   39.81    5000         CommitTransaction [12]               0.00   38.12    5000/5000
AtCommit_Memory[16]               0.01    1.10    5000/5000        RecordTransactionCommit [122]               0.00
0.30   5000/5000        AtCommit_Locks [211]               0.02    0.10    5000/5000        AtEOXact_portals [286]
        0.03    0.07    5000/5000        DestroyTempRels [305]               0.03    0.00    5000/5000
_lo_commit[425]               0.01    0.00    5000/5000        AtCommit_Notify [522]               0.01    0.00
5000/5000       CloseSequences [523]               0.01    0.00    5000/5000        RelationPurgeLocalRelation [531]
          0.00    0.00    5000/10000       AtCommit_Cache [750]
 
-----------------------------------------------               0.00   38.12    5000/5000        CommitTransaction [12]
[16]    21.1    0.00   38.12    5000         AtCommit_Memory [16]               0.01   38.11    5000/5000
EndPortalAllocMode[17]               0.00    0.00    5000/10552       MemoryContextSwitchTo [747]
 
-----------------------------------------------               0.01   38.11    5000/5000        AtCommit_Memory [16]
[17]    21.1    0.01   38.11    5000         EndPortalAllocMode [17]               0.54   37.55    5000/5000
AllocSetReset[18]               0.01    0.00    5000/10000       PortalHeapMemoryGetVariableMemory [449]
0.01   0.00    5000/5000        FixedStackPop [528]               0.00    0.00    5000/51434       MemoryContextFree
[367]
-----------------------------------------------               0.54   37.55    5000/5000        EndPortalAllocMode [17]
[18]    21.1    0.54   37.55    5000         AllocSetReset [18]               1.47   32.51 4200327/4251762
AllocSetFree[20]               1.04    2.53 4205327/4205327     AllocSetGetFirst [64]
 

In other words, essentially *all* of the CPU time spent in
CommitTransaction is spent freeing memory.  That's probably why
ganging the transactions doesn't help --- it's the same number of
memory blocks getting allocated and freed.
               0.02   19.28    5000/5000        pg_parse_and_plan [13]
[26]    10.7    0.02   19.28    5000         parser [26]               0.02   13.88    5000/5000        parse_analyze
[41]              1.52    3.81    5000/5000        yyparse [54]               0.01    0.01    5000/5000        init_io
[474]              0.00    0.01    5000/420496      pstrdup [124]               0.01    0.00    5000/5000
parser_init[533]
 

Thomas might know why parse_analyze is taking so much time compared to
the rest of the parsing machinery...


I won't bother the list with the rest of the profile, although I'll be
happy to send it to anyone who wants to see all of it.  Our immediate
performance problems seem pretty clear, however: time to rethink memory
management.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Hannu Krosing
Date:
Tom Lane wrote:
> 
> The other thing that jumps out here is the unreasonably high position of
> recv(), which is called 962187 times.  The script being read by psql was
> only 957186 characters.  Evidently we're invoking a kernel recv() call
> once per character read from the frontend.  I suspect this is an
> inefficiency introduced by Magnus Hagander's recent rewrite of backend
> libpq (see, I told you there was a reason for using stdio ;-)).

At least part of the problem is also the fe-be protocol itself, maybe 
not in the backend receiving side, but certainly while front-end is 
receiving.

The main problem is that you must very often wait for a specific end 
character (instead of sending first the length and then reading the 
required amount in one chunk), and it requires some trickery to do it
efficiently without reading each character separately.

I once did the fe-be protocol in python (an interpreted scripting 
language, see http://www.python.org), it was for v6.2. I was quite 
amazed by the baroqueness of the protocol - it uses a mix three 
techniques - 1. wait for EOT char, 2. receive a pascal style string and
3.
get a record consisting of a mix of 1 and 2.

>  We're
> gonna have to do something about that, though it's not as critical as
> the memory-allocation issue.  It also appears that send() is now being
> invoked multiple times per backend reply, which is going to create
> inefficiencies outside the backend (ie, multiple packets per reply).
> On a test case with a lot of SELECTs that would show up more than it
> does here.

I am sure that we need to change the protocol sometime soon, if we 
want to increase the performance. I have contemplated something like 
the X-window protocol, that seems to be quite well designed.

And of course we need some sort of CLI that can do prepared statements 
and that can use binary fields (not only the file interface).

Currently we have some of it, bu only in SPI.

Unfortunately most inserts are not done using SPI :(

It may be the CORBA interface that somebody may be working on, or 
it may be something simpler.

---------------------
Hannu Krosing


Re: [HACKERS] Postgres Speed or lack thereof

From
John Holland
Date:

I tried to do this.....

What I've done is create a plpgpsql function that does the inserts.
There is a begin/end in the function of course...I also tried doing it
with a begin work before calling the function in psql, and it did seem
that this worked as it should - ie a rollback work would get rid of the 
40000 inserts.

Still, it took about 3.5 minutes, and Oracle does it in about 30 seconds.

factor of 7.....

On the plus side for Postgres, trying to find some queries against the
data for other speed tests, I found that it seemed to respond quickly -
not much thinking to do with one table of integers I guess. Actually I 
couldn't come up with a query that would take very long - need another
table I suppose.

I found that UPDATE TEST SET A = B -1 took only 20 seconds for 40000
records.

The real plus is that Oracle couldn't do that transaction because it was
running out of rollback segment space - something I have fixed in the past
but I recall being a pain to fix - need to make bigger rollback datafiles
or whatever.


Any suggested speed tests would be appreciated. 

> 
> Try to use BEGIN/END to run all inserts in single transaction
> and please let us know results.
> 
> Vadim
> 



Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Hannu Krosing <hannu@trust.ee> writes:
> I am sure that we need to change the protocol sometime soon, if we 
> want to increase the performance.

The immediate problem is just that we've given back performance that we
had a week ago.  stdio-based libpq didn't have this problem, because it
dealt with the kernel a bufferload at a time not a character at a time.

Also, while the fe/be protocol is certainly a bit crufty, I'm not
convinced that a well-designed frontend is going to suffer any real
performance problem from parsing the protocol.  The trick again is to
read into a buffer and parse from the buffer.  The last time I profiled
frontend libpq, it turned out to be spending all its time in memory
allocation not parsing.  (Which I fixed, btw.)

> And of course we need some sort of CLI that can do prepared statements 
> and that can use binary fields (not only the file interface).

> Currently we have some of it, bu only in SPI.

SPI?  I missed something ... what's that?
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Michael Meskes
Date:
On Sun, Jan 17, 1999 at 07:23:15PM -0800, Tom wrote:
>   This also wouldn't you give you any benefit from "prepare" that would
> speed up repeated executions of the same statement.  PostgreSQL does allow
> statements to be prepared right?  If it doesn't, chances are COPY has a

Since when? I thought we're still looking for someone to implement it, i.e.
at least the PREPARE statement.

Michael
-- 
Michael Meskes                         | Go SF 49ers!
Th.-Heuss-Str. 61, D-41812 Erkelenz    | Go Rhein Fire!
Tel.: (+49) 2431/72651                 | Use Debian GNU/Linux!
Email: Michael.Meskes@gmx.net          | Use PostgreSQL!


Re: [HACKERS] Postgres Speed or lack thereof

From
Michael Meskes
Date:
On Sun, Jan 17, 1999 at 09:08:28PM -0500, John Holland wrote:
> I've been lurking on this list for a little while. I have just done a
> little tinkering with Postgres 6.4.2, comparing it to Oracle and mySQL on
> Linux. It would appear that just doing a lot of inserts (ie 40000) in a
> loop is enormously slower in Postgres in two ways that I tried it.
> One - using a loop in Java that makes a JDBC call to insert.
> Two- using plpgsql as a comparision to PL/SQL.

Did you use the '-F' option? I tried somethings similar from C and found
similar results when calling fsync() after each write. Once I used -F it was
way faster and comparable to Oracle.

> Perhaps these are bad techniques and a C based proc would do better?

Feel free to try src/interfaces/ecpg/test/perftest.c

Michael
-- 
Michael Meskes                         | Go SF 49ers!
Th.-Heuss-Str. 61, D-41812 Erkelenz    | Go Rhein Fire!
Tel.: (+49) 2431/72651                 | Use Debian GNU/Linux!
Email: Michael.Meskes@gmx.net          | Use PostgreSQL!


Re: [HACKERS] Postgres Speed or lack thereof

From
Hannu Krosing
Date:
Tom Lane wrote:
> 
> Also, while the fe/be protocol is certainly a bit crufty, I'm not
> convinced that a well-designed frontend is going to suffer any real
> performance problem from parsing the protocol.  The trick again is to
> read into a buffer and parse from the buffer.

That's also what I did, but still better protocol would have helped

> > And of course we need some sort of CLI that can do prepared statements
> > and that can use binary fields (not only the file interface).
> 
> > Currently we have some of it, bu only in SPI.
> 
> SPI?  I missed something ... what's that?

I think it stands for Server Programming Interface (?). Vadim (?) added 
it as the interface for C and other programming language functions 
to execute queries inside the backend. AFAIK, it follows the standard 
prepare-bind-execute-fetch model for its queries.

It is quite well documented in the standard PG docs.

-------------------
Hannu Krosing


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> For the table with single int4 column & 2000 INSERTs I have:

> BEGIN/END:        3.5sec        ~600insert/sec
> Without:          151sec        ~13insert/sec

> (IDE disk!)

> I guess that you don't use -F flag..?

*Do* use -F, you meant of course.  You're right, I do.  With an
fsync after every transaction, I'm sure my test would've been much
slower.

However, with a UPS and an OS that only crashes about once every
other year, I feel pretty secure using -F ...
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Michael Meskes wrote:
> 
> On Sun, Jan 17, 1999 at 07:23:15PM -0800, Tom wrote:
> >   This also wouldn't you give you any benefit from "prepare" that would
> > speed up repeated executions of the same statement.  PostgreSQL does allow
> > statements to be prepared right?  If it doesn't, chances are COPY has a
> 
> Since when? I thought we're still looking for someone to implement it, i.e.
> at least the PREPARE statement.

1. PG can handle prepared queries.  Parser & Planner can parse/plan (i.e. prepare) query with  parameters once, so such
preparedquery can be executed   many times by executor without parser/planner invocation.
 
2. We have to implement some interface to use this ability.

Vadim


Prepared statements (was: Postgres Speed or lack thereof)

From
Taral
Date:
On Mon, 18 Jan 1999, Vadim Mikheev wrote:
>1. PG can handle prepared queries.
>   Parser & Planner can parse/plan (i.e. prepare) query with
>   parameters once, so such prepared query can be executed 
>   many times by executor without parser/planner invocation.
>2. We have to implement some interface to use this ability.

Yes, I need prepared statements for CORBA, too.

Taral


Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> > For the table with single int4 column & 2000 INSERTs I have:
> 
> > BEGIN/END:            3.5sec          ~600insert/sec
> > Without:            151sec            ~13insert/sec
> 
> > (IDE disk!)
> 
> > I guess that you don't use -F flag..?
> 
> *Do* use -F, you meant of course.  You're right, I do.  With an

Actually no -:)

> fsync after every transaction, I'm sure my test would've been much
> slower.

Just use BEGIN/END to run all queries in single transaction -:)

> However, with a UPS and an OS that only crashes about once every
> other year, I feel pretty secure using -F ...

However, it's easy to crash Postgres itself and lose
committed transactions -:(

Vadim


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> However, with a UPS and an OS that only crashes about once every
>> other year, I feel pretty secure using -F ...

> However, it's easy to crash Postgres itself and lose
> committed transactions -:(

Surely not?  The docs say (and a quick look at the code confirms)
that -F suppresses calls to fsync(2).  It does not suppress writes.
Thus, a commit will still write the data out to kernel disk buffers.
All that fsync does is force the kernel to execute immediate disk
writes for those buffers.  If I don't fsync, and the backend crashes,
the modified file data is still in kernel disk buffers and the kernel
is still responsible for seeing that those dirty buffers get written
out eventually.

(Of course, errors inside Postgres might cause it to write bogus
data, but -F isn't going to help or hurt for that.)

Not using -F means that you don't trust your OS, your hardware,
and/or your power supply.  It has nothing to do with whether you
trust Postgres.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> > Tom Lane wrote:
> >> However, with a UPS and an OS that only crashes about once every
> >> other year, I feel pretty secure using -F ...
> 
> > However, it's easy to crash Postgres itself and lose
> > committed transactions -:(
> 
> Surely not?  The docs say (and a quick look at the code confirms)
> that -F suppresses calls to fsync(2).  It does not suppress writes.
> Thus, a commit will still write the data out to kernel disk buffers.
> All that fsync does is force the kernel to execute immediate disk
> writes for those buffers.  If I don't fsync, and the backend crashes,
> the modified file data is still in kernel disk buffers and the kernel
> is still responsible for seeing that those dirty buffers get written
> out eventually.

You're right.

Vadim


Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
> Not using -F means that you don't trust your OS, your hardware,
> and/or your power supply.  It has nothing to do with whether you
> trust Postgres.

I couldn't have said it better.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Michael Meskes
Date:
On Tue, Jan 19, 1999 at 09:15:01AM +0700, Vadim Mikheev wrote:
> 1. PG can handle prepared queries.
>    Parser & Planner can parse/plan (i.e. prepare) query with
>    parameters once, so such prepared query can be executed 
>    many times by executor without parser/planner invocation.
> 2. We have to implement some interface to use this ability.

I see. Thanks.

Michael
-- 
Michael Meskes                         | Go SF 49ers!
Th.-Heuss-Str. 61, D-41812 Erkelenz    | Go Rhein Fire!
Tel.: (+49) 2431/72651                 | Use Debian GNU/Linux!
Email: Michael.Meskes@gmx.net          | Use PostgreSQL!


Re: [HACKERS] Postgres Speed or lack thereof

From
John Holland
Date:
can you explain the -F flag? when is it passed to what? Can this be done
in plpgsql? how?


john
On
Mon, 18 Jan 1999, Tom Lane wrote:

> Vadim Mikheev <vadim@krs.ru> writes:
> > For the table with single int4 column & 2000 INSERTs I have:
> 
> > BEGIN/END:        3.5sec        ~600insert/sec
> > Without:          151sec        ~13insert/sec
> 
> > (IDE disk!)
> 
> > I guess that you don't use -F flag..?
> 
> *Do* use -F, you meant of course.  You're right, I do.  With an
> fsync after every transaction, I'm sure my test would've been much
> slower.
> 
> However, with a UPS and an OS that only crashes about once every
> other year, I feel pretty secure using -F ...
> 
>             regards, tom lane
> 



Re: [HACKERS] Postgres Speed or lack thereof

From
Roberto Joao Lopes Garcia
Date:
At 11:29 19/01/99 -0500, you wrote:
>can you explain the -F flag? when is it passed to what? Can this be done
>in plpgsql? how?

Or passed by libpq in a C program that needs to insert (as fast as
possible) a lot of new rows? 
>
>
>john
>On
>Mon, 18 Jan 1999, Tom Lane wrote:
>
>> Vadim Mikheev <vadim@krs.ru> writes:
>> > For the table with single int4 column & 2000 INSERTs I have:
>> 
>> > BEGIN/END:        3.5sec        ~600insert/sec
>> > Without:          151sec        ~13insert/sec
>> 
>> > (IDE disk!)
>> 
>> > I guess that you don't use -F flag..?
>> 
>> *Do* use -F, you meant of course.  You're right, I do.  With an
>> fsync after every transaction, I'm sure my test would've been much
>> slower.
>> 
>> However, with a UPS and an OS that only crashes about once every
>> other year, I feel pretty secure using -F ...
>> 
>>             regards, tom lane
>> 
>
>
>
>
------------------------------------------------------------------
Eng. Roberto João Lopes Garcia         E-mail: roberto@mha.com.br
F. 55 11 848 9906   FAX  55 11 848 9955

MHA Engenharia Ltda
E-mail: mha@mha.com.br    WWW: http://www.mha.com.br

Av Maria Coelho Aguiar, 215 Bloco D     2 Andar
Centro Empresarial de Sao Paulo
Sao Paulo - BRASIL - 05805 000
-------------------------------------------------------------------



Re: [HACKERS] Prepared statements (was: Postgres Speed or lack thereof)

From
Peter T Mount
Date:
On Mon, 18 Jan 1999, Taral wrote:

> On Mon, 18 Jan 1999, Vadim Mikheev wrote:
> >1. PG can handle prepared queries.
> >   Parser & Planner can parse/plan (i.e. prepare) query with
> >   parameters once, so such prepared query can be executed 
> >   many times by executor without parser/planner invocation.
> >2. We have to implement some interface to use this ability.
> 
> Yes, I need prepared statements for CORBA, too.

This would improve JDBC's PreparedStatement as well.

--       Peter T Mount peter@retep.org.uk     Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf



Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
John Holland <jholland@isr.umd.edu> writes:
> can you explain the -F flag? when is it passed to what?

-F is a command-line flag passed to the backend at backend startup.
Since backends are normally started by the postmaster, what you
really do in practice is to start the postmaster with "-o -F".
For example my postmaster start script looks like

nohup postmaster -i -o "-F" >server.log 2>&1 </dev/null &

What the -F switch actually does is to disable calls to fsync(2),
thereby allowing modified file blocks to hang around in kernel
memory for a little while (up to 30 seconds in most Unixes)
rather than being force-written to disk as soon as each transaction
commits.  If the same block of the database file gets modified
again within that time window (very likely under a repeated-update
load), you just saved a disk write.  On the other hand, if your OS
crashes or your power goes out in those 30 sec, you just lost a
database update that you thought you had committed.

I'm not sure I believe the argument that omitting -F buys very much
safety, even if you do not trust your power company.  Murphy's law
says that a power flicker will happen in the middle of committing
a transaction, not during the 30-second-max window between when you
could've had the data flushed to disk if only you'd used fsync()
and when the swapper process will fsync it on its own.  And in that
case you have a corrupted database anyway.  So my theory is you use
a reliable OS, and get yourself a UPS if your power company isn't
reliable (lord knows mine ain't), and back up your database as often
as you can.  -F buys enough speed that it's worth the small extra risk.

There are competent experts with conflicting opinions, however ;-)

See doc/README.fsync for more about -F and the implications of
using it.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
> John Holland <jholland@isr.umd.edu> writes:
> > can you explain the -F flag? when is it passed to what?
> 
> -F is a command-line flag passed to the backend at backend startup.
> Since backends are normally started by the postmaster, what you
> really do in practice is to start the postmaster with "-o -F".
> For example my postmaster start script looks like
> 
> nohup postmaster -i -o "-F" >server.log 2>&1 </dev/null &
> 
> What the -F switch actually does is to disable calls to fsync(2),
> thereby allowing modified file blocks to hang around in kernel
> memory for a little while (up to 30 seconds in most Unixes)
> rather than being force-written to disk as soon as each transaction
> commits.  If the same block of the database file gets modified
> again within that time window (very likely under a repeated-update
> load), you just saved a disk write.  On the other hand, if your OS
> crashes or your power goes out in those 30 sec, you just lost a
> database update that you thought you had committed.
> 
> I'm not sure I believe the argument that omitting -F buys very much
> safety, even if you do not trust your power company.  Murphy's law
> says that a power flicker will happen in the middle of committing
> a transaction, not during the 30-second-max window between when you
> could've had the data flushed to disk if only you'd used fsync()
> and when the swapper process will fsync it on its own.  And in that
> case you have a corrupted database anyway.  So my theory is you use
> a reliable OS, and get yourself a UPS if your power company isn't
> reliable (lord knows mine ain't), and back up your database as often
> as you can.  -F buys enough speed that it's worth the small extra risk.
> 
> There are competent experts with conflicting opinions, however ;-)
> 
> See doc/README.fsync for more about -F and the implications of
> using it.

Well said, and I think we have to address this shortcoming.  Vadim has
been reluctant to turn off fsync by default.  I have been trying to come
up with some soft-fsync on my own but haven't hit on anything he agrees
with yet.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Performance test with -F (Was Postgres Speed or lack thereof)

From
Roberto Joao Lopes Garcia
Date:
>For example my postmaster start script looks like>
>
>nohup postmaster -i -o "-F" >server.log 2>&1 </dev/null &
>

I trayed the -F option and the results is bellow. I use a C program (libPQ)
that read from 3 dbf files and past it to 3 tables in postgreSQL 4.0.0.
Eche row in dbf file generate a SQL insert command. I´m not using begin
transaction and commit to make all insert in just one transaction.

Test  without -F option:

table 1: 697 rows = 4' 39"
table 2: 22760 rows = 1h 9' 37"
table 3: 22758 rows = 43' 44"
index/vacuun/grant all tables 12' 58" 
Total time: 2h 10' 58"

Test  with -F option (dbf file was updated and grow up a litle):

table 1: 698 rows = 1' 52"
table 2: 22798 rows = 15' 38"
table 3: 22796 rows = 7' 42"
index/vacuun/grant all tables 11' 39" 
Total time: 36' 51"  ~= 72% reducion in time 

it will allow me to update postgreSQL tables from the dbf 2 times a day
insted of one at night!

Thank you who point me the -F option. I will tray with begin transaction
and commit to make all inserts in just one transaction

Roberto




------------------------------------------------------------------
Eng. Roberto João Lopes Garcia         E-mail: roberto@mha.com.br
F. 55 11 848 9906   FAX  55 11 848 9955

MHA Engenharia Ltda
E-mail: mha@mha.com.br    WWW: http://www.mha.com.br

Av Maria Coelho Aguiar, 215 Bloco D     2 Andar
Centro Empresarial de Sao Paulo
Sao Paulo - BRASIL - 05805 000
-------------------------------------------------------------------



Re: [HACKERS] Postgres Speed or lack thereof

From
Todd Graham Lewis
Date:
On Wed, 20 Jan 1999, Tom Lane wrote:

> What the -F switch actually does is to disable calls to fsync(2),
> thereby allowing modified file blocks to hang around in kernel
> memory for a little while (up to 30 seconds in most Unixes)
> rather than being force-written to disk as soon as each transaction
> commits.  If the same block of the database file gets modified
> again within that time window (very likely under a repeated-update
> load), you just saved a disk write. 

Your effeciencies from running asynchronously run far beyond that.
By delaying writes, you allow the kernel to order them and pass them in
large chunks to the drive, getting very good effeciency.  Compare ten
aggregated writes (the asynchronous case) to ten individual writes
(the synchronous case): in the former case, the disk arm needs to move
over the platter once, and you pay the cost of initiating a write once,
whereas in the later case the disk arm needs to do ten seeks, and you
pay the cost of initiating ten different disk operations, servicing ten
interrupts instead of one, etc.  You save a lot more than just optimizing
out cancelling writes.

> I'm not sure I believe the argument that omitting -F buys very much
> safety, even if you do not trust your power company.  Murphy's law
> says that a power flicker will happen in the middle of committing
> a transaction, not during the 30-second-max window between when you
> could've had the data flushed to disk if only you'd used fsync()
> and when the swapper process will fsync it on its own.  And in that
> case you have a corrupted database anyway.

That depends on how you process your transactions, I guess.  It is
possible to perform transactions safely, such that no interruption can
violate your db's integrity.  I have not read and understood enough
of psql's transaction engine to know how safe it is in the face of
interruption, but there's no inherent reason that it need be unsafe.

The real way to solve this problem is to build a file system which is
more suitable for use in this way than a traditional unix fs.  You're
working around the coarse-grained-ness of fsync() no matter what you do,
and there is no good solution until you move away from fsync() and towards
something like fsync(block) or fsync(transaction).

--
Todd Graham Lewis            32�49'N,83�36'W          (800) 719-4664, x2804
******Linux******         MindSpring Enterprises      tlewis@mindspring.net

"Those who write the code make the rules."                -- Jamie Zawinski



Re: [HACKERS] Postgres Speed or lack thereof

From
John Ryan
Date:
Todd Graham Lewis wrote:
> 
> On Wed, 20 Jan 1999, Tom Lane wrote:
> 
> > What the -F switch actually does is to disable calls to fsync(2),
> > thereby allowing modified file blocks to hang around in kernel

Could you use an mmapped file and flush it according to some algorythm?  
Hard on memory but fast.

-- 
John Ryan
CEO Ryan Net Works         Visit us at www.cybertrace.com
john@cybertrace.com              (703) 998-1751 (w)
3515-A S. Stafford St.           (703) 820-6304 (f)
Arlington, VA 22206-1191         (703) 626-4130 (c)

-----BEGIN PGP PUBLIC KEY BLOCK----- Version: 2.6.2
mQBtAzO3iTYAAAEDAOvDARDRxluZdieT11m6ukP/niHI3qnSL9DKmRrLil5M+Cdp
3w4fV2/HvoBhsUvJE0uLwtbFCCBUP5YkGWNM0oU+XxQHP2+yUY8CDEHLncZ6KHV5
KLlfQKDHSQgZ+58PcQAFE7QjSm9obiBSeWFuIDxqb2huQHd3dy5jeWJlcnRyYWNlLmNvbT4==Wl7V
-----END PGP PUBLIC KEY BLOCK-----


Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Having an idle hour this evening, I thought it'd be interesting to build
> a backend with profiling enabled, so as to confirm or deny the above
> guess.  It seems that indeed a lot of time is being wasted, but where
> it's being wasted might surprise you!
...
> In other words, we're spending a third of our time mallocing and freeing
> memory.  A tad high, what?
> 
> Actually, it's worse than that, because AllocSetAlloc,
> PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
> routines represent our own bookkeeping layer atop malloc/free.
> That's another 18.66 seconds spent in these top routines, which means
> that we are real close to expending half the backend's runtime on
> memory bookkeeping.  This needs work.
>

Yes, it's suprizing!

I added some debug code to palloc/pfree and it shows that for
INSERT:

1. 80% of allocations are made for <= 32 bytes.
2. pfree is used for 25% of them only (others are freed  after statement/transaction is done).

Note that our mmgr adds 16 bytes to each allocation
(+ some bytes in malloc) - a great overhead, yes?

I added code to allocate a few big (16K-64K) blocks
of memory for these small allocations to speed up 
palloc by skiping AllocSetAlloc/malloc. New code
don't free allocated memory (to make bookkeeping fast)
but keeping in mind 2. above and memory overhead
it seems as appropriate thing to do. These code also
speed up freeing when statement/transaction is done,
because of only a few blocks have to be freed now.

I did 5000 INSERTS (into tables with 3 ints and 33 ints)
with BEGIN/END, -F and -B 512 (I run postgres directly,
without postmaster). User times:
        old        new
-----------------------------------------
table with 3 ints     9.7 sec     7.6 sec
table with 33 ints    59.5 sec    39.9 sec

So, new code 20%-30% faster. Process sizes are the same.

Tom, could you run new code under profiling?

There are still some things to do:

1. SELECT/UPDATE/DELETE often palloc/pfree tuples  (sizes are > 32 bytes), but pfree now requires   additional lookup
tosee is memory allocated by   AllocSetAlloc or new code. We can avoid this.
 

2. Index scans palloc/pfree IndexResult for each  tuple returned by index. This one was annoying me  for long time.
IndexResultshould be part of  IndexScan structure...
 

3. psort uses leftist structure (16 bytes) when  disk is used for sorting. Obviously, big block  allocation should be
usedby lselect code itself.
 

4. Actually, new mode shouldn't be used by Executor  in some cases.

I'll address this in a few days...

BTW, look at memutils.h: new code is like "tunable"
mode described there.

> 
> In other words, essentially *all* of the CPU time spent in
> CommitTransaction is spent freeing memory.  That's probably why
> ganging the transactions doesn't help --- it's the same number of
> memory blocks getting allocated and freed.

It shows that we should get rid of system malloc/free and do
all things in mmgr itself - this would allow us much faster
free memory contexts at statement/transaction end.

Vadim


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> In other words, we're spending a third of our time mallocing and freeing
>> memory.  A tad high, what?

> I added some debug code to palloc/pfree and it shows that for
> INSERT:
> 1. 80% of allocations are made for <= 32 bytes.
> 2. pfree is used for 25% of them only (others are freed
>    after statement/transaction is done).

I had suspected the former (because most of the allocs are
parser nodes) and knew the latter from profile call counts.

> Note that our mmgr adds 16 bytes to each allocation
> (+ some bytes in malloc) - a great overhead, yes?

Youch ... for a list-node-sized request, that's a lot of overhead.

I did some more profiling this weekend, after Magnus Hagander and I
repaired libpq's little problem with calling recv() once per input byte.
Unsurprisingly, memory allocation is still 50% of backend CPU time in
my test consisting of a long series of INSERT statements.  It's less
than that, but still an interesting number, in two other test scenarios
I set up:

(1) SELECT the whole contents of a 6500-row, 38-column table; repeat 5 times.   Memory allocation takes 8.74 of 54.45
backendCPU sec, or 16%.
 

(2) Load a database from a pg_dump script.  Script size is 1.3MB.   This is of course mostly CREATE TABLE, COPY from
stdin,CREATE INDEX.   Memory allocation takes 11.08 of 38.25 sec = 29%.
 

I believe that the reason memory alloc is such a large percentage of the
INSERT scenario is that this is the only one where lots of SQL parsing
is going on.  In the SELECT scenario, most of the palloc/pfree traffic
is temporary field value representations inside printtup (more about
that below).  In the database load scenario, it seems that btree index
building is the biggest hit; the heaviest callers of palloc are:
               0.00    0.01   10642/438019      _bt_searchr [86]               0.00    0.01   13531/438019
heap_formtuple[149]               0.00    0.01   13656/438019      CopyFrom [24]               0.00    0.01
14196/438019     textin [247]               0.01    0.01   18275/438019      datetime_in [42]               0.01
0.03  32595/438019      bpcharin [182]               0.01    0.03   38882/438019      index_formtuple [79]
0.01    0.03   39028/438019      _bt_formitem [212]               0.07    0.16  204564/438019      _bt_setsortkey [78]
 
[88]     1.3    0.14    0.34  438019         palloc [88]

> I added code to allocate a few big (16K-64K) blocks of memory for
> these small allocations to speed up palloc by skiping
> AllocSetAlloc/malloc. New code don't free allocated memory (to make
> bookkeeping fast) but keeping in mind 2. above and memory overhead it
> seems as appropriate thing to do. These code also speed up freeing
> when statement/transaction is done, because of only a few blocks have
> to be freed now.

This is pretty much what I had in mind, but I think that it is not OK
to leave everything to be freed at statement end.  In particular I've
just been looking at printtup(), the guts of SELECT output, and I see
that each field value is palloc'd (by a type-specific output routine)
and then pfree'd by printtup after emitting it.  If you make the pfrees
all no-ops then this means the temporary space needed for SELECT is
at least equal to the total volume of output data ... not good for
huge tables.  (COPY in and out also seem to palloc/pfree each field
value that goes past.)

However, I think we could get around this by making more use of memory
contexts.  The space allocated inside printtup could be palloc'd from a
"temporary" context that gets flushed every so often (say, after every
few hundred tuples of output).  All the pallocs done inside the parser
could go to a "statement" context that is flushed at the end of each
statement.  It'd probably also be a good idea to have two kinds of
memory contexts, the current kind where each block is individually
free-able and the new kind where we only keep track of how to free the
whole context's contents at once.  That way, any particular routine that
was going to allocate a *large* chunk of memory would have a way to free
that chunk explicitly when done with it (but would have to remember
which context it got it from in order to do the free).  For the typical
list-node-sized request, we wouldn't bother with the extra bookkeeping.

I think that palloc() could be defined as always giving you the bulk-
allocated kind of chunk.  pfree() would become a no-op and eventually
could be removed entirely.  Anyone who wants the other kind of chunk
would call a different pair of routines, where the controlling memory
context has to be named at both alloc and free time.  (This approach
gets rid of your concern about the cost of finding which kind of context
a block has been allocated in inside of pfree; we expect the caller
to know that.)

Getting this right is probably going to take some thought and work,
but it looks worthwhile from a performance standpoint.  Maybe for 6.6?

> It shows that we should get rid of system malloc/free and do
> all things in mmgr itself - this would allow us much faster
> free memory contexts at statement/transaction end.

I don't think we can or should stop using malloc(), but we can
ask it for large blocks and do our own allocations inside those
blocks --- was that what you meant?
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Goran Thyni
Date:
Tom Lane wrote:
> I don't think we can or should stop using malloc(), but we can
> ask it for large blocks and do our own allocations inside those
> blocks --- was that what you meant?

I will just jump in with a small idea.
The Gnome crowd is investigating using alloca instead of malloc/free
where applicable. It is a huge save in CPU cycles where it can be used.
regards,
-- 
-----------------
Göran Thyni
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: [HACKERS] Postgres Speed or lack thereof

From
The Hermit Hacker
Date:
BUGS    The alloca() function is machine dependent; its use is discouraged.


On Sun, 24 Jan 1999, Goran Thyni wrote:

> Tom Lane wrote:
> > I don't think we can or should stop using malloc(), but we can
> > ask it for large blocks and do our own allocations inside those
> > blocks --- was that what you meant?
> 
> I will just jump in with a small idea.
> The Gnome crowd is investigating using alloca instead of malloc/free
> where applicable. It is a huge save in CPU cycles where it can be used.
> 
>     regards,
> -- 
> -----------------
> G�ran Thyni
> This is Penguin Country. On a quiet night you can hear Windows NT
> reboot!
> 

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



alloca (was: Postgres Speed or lack thereof)

From
Goran Thyni
Date:
The Hermit Hacker wrote:
> BUGS
>      The alloca() function is machine dependent; its use is discouraged.

Gain a big potential speed boost, loose some portability.
I think most modern unices has a good alloca, DOS and Mac
don't, but who's porting the server there?

Any unices out there missing alloca?
mvh,
-- 
-----------------
Göran Thyni
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> > Note that our mmgr adds 16 bytes to each allocation
> > (+ some bytes in malloc) - a great overhead, yes?
> 
> Youch ... for a list-node-sized request, that's a lot of overhead.

And lists are very often used by planner and - never pfreed.

> Getting this right is probably going to take some thought and work,
> but it looks worthwhile from a performance standpoint.  Maybe for 6.6?

Yes - I have to return to MVCC stuff...
So, I consider my exercizes with mmgr as vacation from MVCC -:)

> > It shows that we should get rid of system malloc/free and do
> > all things in mmgr itself - this would allow us much faster
> > free memory contexts at statement/transaction end.
> 
> I don't think we can or should stop using malloc(), but we can
> ask it for large blocks and do our own allocations inside those
> blocks --- was that what you meant?

No. We could ask brk() for large blocks.
The problem is where to handle dynamic allocations.
As long as they are handled by malloc we can't put
them in proper blocks of current memory context.
But having our own handling malloc would become useless.

Vadim


Re: [HACKERS] Postgres Speed or lack thereof

From
Vince Vielhaber
Date:
On 25-Jan-99 Vadim Mikheev wrote:
> 
> Yes - I have to return to MVCC stuff...
> So, I consider my exercizes with mmgr as vacation from MVCC -:)

Dumb question.  What is MVCC?  It almost sounds like a M$ compiler.

Vince.
-- 
==========================================================================
Vince Vielhaber -- KA8CSH   email: vev@michvhf.com   flame-mail: /dev/null      # include <std/disclaimers.h>
       TEAM-OS2  Online Searchable Campground Listings    http://www.camping-usa.com      "There is no outfit less
entitledto lecture me about bloat              than the federal government"  -- Tony Snow
 
==========================================================================




Re: alloca (was: Postgres Speed or lack thereof)

From
The Hermit Hacker
Date:
On Sun, 24 Jan 1999, Goran Thyni wrote:

> The Hermit Hacker wrote:
> > BUGS
> >      The alloca() function is machine dependent; its use is discouraged.
> 
> Gain a big potential speed boost, loose some portability.
> I think most modern unices has a good alloca, DOS and Mac
> don't, but who's porting the server there?
> 
> Any unices out there missing alloca?

Make you a deal...

You build the code such that "#ifdef HAVE_ALLOCA" is true, so that those
platforms that either don't support it, or have broken suport for it,
aren't affect, and you are most welcome to work on it...

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Re: alloca (was: Postgres Speed or lack thereof)

From
Bruce Momjian
Date:
> On Sun, 24 Jan 1999, Goran Thyni wrote:
> 
> > The Hermit Hacker wrote:
> > > BUGS
> > >      The alloca() function is machine dependent; its use is discouraged.
> > 
> > Gain a big potential speed boost, loose some portability.
> > I think most modern unices has a good alloca, DOS and Mac
> > don't, but who's porting the server there?
> > 
> > Any unices out there missing alloca?
> 
> Make you a deal...
> 
> You build the code such that "#ifdef HAVE_ALLOCA" is true, so that those
> platforms that either don't support it, or have broken suport for it,
> aren't affect, and you are most welcome to work on it...

As far as I know, alloca is only useful for memory that is used by the
current function or its children.  I don't think we have many cases
where we could use that.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: alloca (was: Postgres Speed or lack thereof)

From
Tom Lane
Date:
>>>> The Hermit Hacker wrote:
>> You build the code such that "#ifdef HAVE_ALLOCA" is true, so that those
>> platforms that either don't support it, or have broken suport for it,
>> aren't affect, and you are most welcome to work on it...

alloca is treated as a builtin function by gcc, so it should work on
any platform if you use gcc.  If your compiler is not gcc then alloca
is *very* iffy --- as far as I know, it is not specified by any
recognized standard.  So I concur with Marc's opinion: a speedup on
alloca-supporting platforms is nice, but it is not acceptable to break
the code on machines without it.

Bruce Momjian <maillist@candle.pha.pa.us> writes:
> As far as I know, alloca is only useful for memory that is used by the
> current function or its children.  I don't think we have many cases
> where we could use that.

Right, storage allocated by alloca is part of the calling function's
stack frame --- it goes away when that function exits, no sooner and
no later.

I have noticed some places in Postgres where routines palloc storage
that they pfree before exiting.  alloca would be a fine substitute
for that usage.  But as far as I've seen in profiling, the primary
callers of palloc are allocating storage that they will *not* free,
but indeed will return to their caller --- the type-specific conversion
routines all work that way, for example.  So I'm doubtful that adding
alloca to our tool arsenal would be worth the portability headaches
it would cost.

I have been toying with the notion of nested memory contexts, which
would be created at the start of certain major routines and deleted
when they exit.  palloc would allocate storage from the most closely
nested context, and thus would mean "allocate storage that will live
through the current major operation, whatever it is".  This is still
a very half-baked notion and I can't say that it will work, but I
thought I'd throw it into the thread...
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> I don't think we can or should stop using malloc(), but we can
>> ask it for large blocks and do our own allocations inside those
>> blocks --- was that what you meant?

> No. We could ask brk() for large blocks.

I think that would be a bad idea.  brk() is a Unix-ism; I doubt it's
supported on Win NT, for example.  malloc() is a lot more portable.

Another potential portability issue is whether malloc() will coexist
with calling brk() ourselves.  (It *ought* to, but I can believe that
the feature might be broken on some platforms, since it's so seldom
exercised...)  We can't stop all uses of malloc(), because parts of the
C library use it --- stdio, qsort, putenv all do on my machine.

If we're going to grab large chunks and keep them, then any small
inefficiency in doing the grabbing isn't really worth worrying about;
so I don't see the need to bypass malloc() for that.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> > Tom Lane wrote:
> >> I don't think we can or should stop using malloc(), but we can
> >> ask it for large blocks and do our own allocations inside those
> >> blocks --- was that what you meant?
> 
> > No. We could ask brk() for large blocks.
> 
> I think that would be a bad idea.  brk() is a Unix-ism; I doubt it's
> supported on Win NT, for example.  malloc() is a lot more portable.
> 
> Another potential portability issue is whether malloc() will coexist
> with calling brk() ourselves.  (It *ought* to, but I can believe that
> the feature might be broken on some platforms, since it's so seldom
> exercised...)  We can't stop all uses of malloc(), because parts of the
> C library use it --- stdio, qsort, putenv all do on my machine.
> 
> If we're going to grab large chunks and keep them, then any small
> inefficiency in doing the grabbing isn't really worth worrying about;
> so I don't see the need to bypass malloc() for that.

Ok, I agreed.

Vadim


Re: [HACKERS] Re: alloca (was: Postgres Speed or lack thereof)

From
"Thomas G. Lockhart"
Date:
> Right, storage allocated by alloca is part of the calling function's
> stack frame --- it goes away when that function exits, no sooner and
> no later.

And as I'm sure many already know, there are limits on stack frame size
on many architectures, and limits on total stack size on every platform.
The available size is not in the same league as malloc()-style heap
allocation.
                   - Tom


Re: [HACKERS] Re: alloca (was: Postgres Speed or lack thereof)

From
Goran Thyni
Date:
Bruce Momjian wrote:
> As far as I know, alloca is only useful for memory that is used by the
> current function or its children.  I don't think we have many cases
> where we could use that.

Perhaps not in so many obvious places, but
with some restructuring perhaps.
I plan to look into this too, time permitting, don't hold your breath.
regards,
-- 
-----------------
Göran Thyni
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!




Re: alloca (was: Postgres Speed or lack thereof)

From
Goran Thyni
Date:
The Hermit Hacker wrote:
> Make you a deal...
>
> You build the code such that "#ifdef HAVE_ALLOCA" is true, so that those
> platforms that either don't support it, or have broken suport for it,
> aren't affect, and you are most welcome to work on it...

Yeah right, I should keep my mouth shut. :-)

It is a big job, starting getting with getting profiling working on
my box, analyzing lots of code and then come up with some optimations.

We can start with the patch below to detect alloca() in autoconf.

    best regards,
--
-----------------
Göran Thyni
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!diff -rc pgsql-orig/src/configure.in pgsql/src/configure.in
*** pgsql-orig/src/configure.in    Mon Jan 18 07:01:04 1999
--- pgsql/src/configure.in    Tue Jan 26 15:03:12 1999
***************
*** 692,697 ****
--- 692,698 ----
  AC_FUNC_MEMCMP
  AC_TYPE_SIGNAL
  AC_FUNC_VPRINTF
+ AC_FUNC_ALLOCA
  AC_CHECK_FUNCS(tzset memmove sigsetjmp kill sysconf fpclass)
  AC_CHECK_FUNCS(fp_class fp_class_d class)
  AC_CHECK_FUNCS(sigprocmask waitpid setsid fcvt)
diff -rc pgsql-orig/src/include/config.h.in pgsql/src/include/config.h.in
*** pgsql-orig/src/include/config.h.in    Mon Jan 18 07:02:26 1999
--- pgsql/src/include/config.h.in    Tue Jan 26 15:05:37 1999
***************
*** 115,120 ****
--- 115,123 ----
  # define gettimeofday(a,b) gettimeofday(a)
  #endif

+ /* Set to 1 if you have alloca() */
+ #undef HAVE_ALLOCA
+
  /* Set to 1 if you have snprintf() */
  #undef HAVE_SNPRINTF


Re: alloca (was: Postgres Speed or lack thereof)

From
The Hermit Hacker
Date:


Before this patch is applied, havae we determined wheteher or not its even
a useful thing to pursue?  I thought the general conscious was that, for
us, it wasn't...
On Tue, 26 Jan 1999, Goran Thyni wrote:

> The Hermit Hacker wrote:
> > Make you a deal...
> > 
> > You build the code such that "#ifdef HAVE_ALLOCA" is true, so that those
> > platforms that either don't support it, or have broken suport for it,
> > aren't affect, and you are most welcome to work on it...
> 
> Yeah right, I should keep my mouth shut. :-)
> 
> It is a big job, starting getting with getting profiling working on 
> my box, analyzing lots of code and then come up with some optimations.
> 
> We can start with the patch below to detect alloca() in autoconf.
> 
>     best regards,
> -- 
> -----------------
> G�ran Thyni
> This is Penguin Country. On a quiet night you can hear Windows NT
> reboot!

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Re: alloca (was: Postgres Speed or lack thereof)

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> Before this patch is applied, havae we determined wheteher or not its even
> a useful thing to pursue?  I thought the general conscious was that, for
> us, it wasn't...

If we can demonstrate through profiling that there's someplace where
using alloca offers a worthwhile speedup, then let's go for it.

But, until we find such a hotspot, I'd be inclined not to spend
configure cycles on testing for alloca...
        regards, tom lane


Re: alloca (was: Postgres Speed or lack thereof)

From
Goran Thyni
Date:
The Hermit Hacker wrote:
> Before this patch is applied, havae we determined wheteher or not its even
> a useful thing to pursue?  I thought the general conscious was that, for
> us, it wasn't...

I would have to investigate further before making any final judgement.
But having the test in configure/config.h can't hurt should me or
someone
else find a worthwhile speedup using alloca, can it?

If you want something more substantial you have to wait,
I just posted the patch to make it available to other
eager investigators.
happy hacking,
-- 
-----------------
Göran Thyni
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: alloca (was: Postgres Speed or lack thereof)

From
The Hermit Hacker
Date:
On Wed, 27 Jan 1999, Goran Thyni wrote:

> The Hermit Hacker wrote:
> > Before this patch is applied, havae we determined wheteher or not its even
> > a useful thing to pursue?  I thought the general conscious was that, for
> > us, it wasn't...
> 
> I would have to investigate further before making any final judgement.
> But having the test in configure/config.h can't hurt should me or
> someone
> else find a worthwhile speedup using alloca, can it?
> 
> If you want something more substantial you have to wait,
> I just posted the patch to make it available to other
> eager investigators.

Let's go with the wait scenario...one thing I've always hated with other
packages and configure is watching them run forever checking for things
that are never used, but checking "just cause it can" *roll eyes*

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Postgres Speed or lack thereof

From
Massimo Dal Zotto
Date:
> 
> Tom Lane wrote:
> > 
> > > Note that our mmgr adds 16 bytes to each allocation
> > > (+ some bytes in malloc) - a great overhead, yes?
> > 
> > Youch ... for a list-node-sized request, that's a lot of overhead.
> 
> And lists are very often used by planner and - never pfreed.
> 
> > Getting this right is probably going to take some thought and work,
> > but it looks worthwhile from a performance standpoint.  Maybe for 6.6?
> 
> Yes - I have to return to MVCC stuff...
> So, I consider my exercizes with mmgr as vacation from MVCC -:)
> 
> > > It shows that we should get rid of system malloc/free and do
> > > all things in mmgr itself - this would allow us much faster
> > > free memory contexts at statement/transaction end.
> > 
> > I don't think we can or should stop using malloc(), but we can
> > ask it for large blocks and do our own allocations inside those
> > blocks --- was that what you meant?
> 
> No. We could ask brk() for large blocks.
> The problem is where to handle dynamic allocations.
> As long as they are handled by malloc we can't put
> them in proper blocks of current memory context.
> But having our own handling malloc would become useless.
> 
> Vadim

We could use 4 methods for dynamic allocation:

1)    malloc/free - for persistent storage allocation

2)    palloc/pfree - for storage belonging to some context andwhich we can keep track of and free explicitly

3)    fast_palloc - for storage which impossible, too difficult or tooexpensive to keep track of. This storage should
beallocated withfast and simple inline code from bigger chunks allocated with palloc.This storage would never freed
explicitly,so that code could besimple and fast, but the big chunks would be freed automatically atthe end of the
transaction.

4)    fast_talloc - we could introduce a `tuple' context handled likefast_palloc for storage used only while processing
onetuple.This storage could be fast allocated from few big chunks allocatedwith palloc and freed explicitly after the
tuplehas been processed.This could avoid the overhead of many malloc/palloc while reducingthe overall memory usage for
transactionwhich process many rows.The total cost per tuple could be one palloc and one pfree.We could also simply
reusethe chunks for every tuple and pfree them only at the end of the transaction. This would cost one palloc/pfreeper
transaction.

This would require revising the code and changing palloc/pfree with the new
functions where appropriate, but this could be done gradually because the
old palloc/pfree are always safe.

-- 
Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto               email: dz@cs.unitn.it               |
|  Via Marconi, 141                phone: ++39-0461534251              |
|  38057 Pergine Valsugana (TN)      www: http://www.cs.unitn.it/~dz/  |
|  Italy                             pgp: finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Massimo Dal Zotto <dz@cs.unitn.it> writes:
> We could use 4 methods for dynamic allocation:

This sounds pretty close to what I was thinking, too.  We already have
the notion of "memory contexts" which can have different alloc/free
implementations, so there's already a structure to put these different
alloc methods into.

However, we probably don't really want to touch each individual palloc()
call in the system to change it to a MemoryContextAlloc(some-context, size)
call.  Not only would that be horribly tedious, but an awful lot of
these calls are made for the purpose of creating an object that the
requesting routine intends to pass back to its caller.  In that
situation, the requesting routine doesn't really *know* what the
lifetime of the storage needs to be --- it's the caller that determines
what's going to happen.

I think what we want is to define palloc() as allocating
not-individually-freeable storage from the current context, which is set
by the most closely nested main control routine (parser, planner,
executor, etc).  That should handle the majority of the existing code
and thus minimize the number of places that have to be touched.
pfree() goes away, eventually, and can be made a no-op meanwhile.

This assumes that the number of places that really *want* to free
storage earlier than end of context is much smaller than the number
of places that don't want to be bothered --- therefore, it makes more
sense to redefine palloc() as the not-separately-freeable kind and
change the places that do want to control their storage lifetime,
instead of changing all the other places.

> 4)    fast_talloc - we could introduce a `tuple' context handled like
>     fast_palloc for storage used only while processing one tuple.

Right, that could be the most closely nested context for operations that
iterate over tuples.  A further idea I had is to define allocations
from that context as being only good for one tuple, but to not bother to
physically clear the context after every single tuple --- do it every
ten or hundred or so tuples, instead.  The amount of storage wasted
won't be very large, and it cuts the per-tuple time cost even more.
(Of course, if resetting a context to empty only takes a few
instructions, you might as well just do it every time.  I'm not sure
offhand if that will be true or not.)
        regards, tom lane


tough locale bug

From
Goran Thyni
Date:
I have found a bug and tried to trace the source but now I need some
pointers.

ENV: Pgsql 6.4.2 (and current) compiled with locale (using Swedish
locale)

BUG:
A small table: CREATE TABLE x(txt text);

goran=> SELECT * FROM x;
txt   
------
abc   
åäö   
Grön
Gunnar
GUNNAR
Göran 
göran 
GÖRAN 
(8 rows)

but:

goran=> select * from x WHERE txt LIKE 'G%';
txt   
------
Grön
Gunnar
GUNNAR
(3 rows)

the same goes for: "select * from x WHERE txt ~ '^G'" which is the same
as above LIKE-stmt in the backend.
Otherwise regex works correct.

To sum up:
Case sensitive regex anchored with '^' and where the first char
following
the match is a non-ascii char gives misses.

I have tracked it down to the following, 3 functions are called to test
the expression:
1. textregexeq OK for 5 instances
2. text_ge('G','G') also OK
3. text_lt('G','G\0xFF') this is not correct!

Case 3 work not work with strcoll(), in varstr_cmp().
If I change it to strcoll() to strncmp() it works as expected,
but it probably breaks sorting etc big time.

Any suggestions how to proceed?
TIA,
-- 
---------------------------------------------
Göran Thyni, JMS Bildbasen, Kiruna
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


tough locale bug

From
Goran Thyni
Date:
I have found a bug and tried to trace the source but now I need some
pointers.

ENV: Pgsql 6.4.2 (and current) compiled with locale (using Swedish
locale)

BUG:
A small table: CREATE TABLE x(txt text);

goran=> SELECT * FROM x;
txt   
------
abc   
åäö   
Grön
Gunnar
GUNNAR
Göran 
göran 
GÖRAN 
(8 rows)

but:

goran=> select * from x WHERE txt LIKE 'G%';
txt   
------
Grön
Gunnar
GUNNAR
(3 rows)

the same goes for: "select * from x WHERE txt ~ '^G'" which is the same
as above LIKE-stmt in the backend.
Otherwise regex works correct.

To sum up:
Case sensitive regex anchored with '^' and where the first char
following
the match is a non-ascii char gives misses.

I have tracked it down to the following, 3 functions are called to test
the expression:
1. textregexeq OK for 5 instances
2. text_ge('G','G') also OK
3. text_lt('G','G\0xFF') this is not correct!

Case 3 work not work with strcoll(), in varstr_cmp().
If I change it to strcoll() to strncmp() it works as expected,
but it probably breaks sorting etc big time.

Any suggestions how to proceed?
TIA,
-- 
---------------------------------------------
Göran Thyni, JMS Bildbasen, Kiruna
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: [HACKERS] Postgres Speed or lack thereof

From
jwieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> However, we probably don't really want to touch each individual palloc()
> call in the system to change it to a MemoryContextAlloc(some-context, size)
> call.  Not only would that be horribly tedious, but an awful lot of
> these calls are made for the purpose of creating an object that the
> requesting routine intends to pass back to its caller.  In that
> situation, the requesting routine doesn't really *know* what the
> lifetime of the storage needs to be --- it's the caller that determines
> what's going to happen.

    There are about 900 calls of palloc() in the backend code. It
    is much less than I expected (we have over 200,000  lines  of
    code).

    So I vote for doing it ONCE HIGH QUALITY, instead of half way
    but easy to do.

    And I don't think you could avoid at least  looking  at  each
    individual  palloc().  You  defined  nesting levels, where at
    their end the allocated memory get's free'd.  But  there  are
    many  places  where  deeper  nested functions put things into
    structures that are  held  in  upper  levels  and  thus  live
    longer.  These  upper  levels  continue to use the things the
    callee's placed in there.


Jan


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
jwieck@debis.com (Jan Wieck) writes:
> Tom Lane wrote:
>> However, we probably don't really want to touch each individual palloc()
>> call in the system ...

>     There are about 900 calls of palloc() in the backend code. It
>     is much less than I expected (we have over 200,000  lines  of
>     code).

Much less than I thought, also.  And (grep|wc...) over 300 of these
calls are in the utils/adt code, ie, they are for passing back the
results of type conversion functions.  We'll only need to think through
how that should work once, and then it's just a mechanical edit to all
the ADT files.

>     So I vote for doing it ONCE HIGH QUALITY, instead of half way
>     but easy to do.

OK, I agree.

>     And I don't think you could avoid at least  looking  at  each
>     individual  palloc().  You  defined  nesting levels, where at
>     their end the allocated memory get's free'd.  But  there  are
>     many  places  where  deeper  nested functions put things into
>     structures that are  held  in  upper  levels  and  thus  live
>     longer.  These  upper  levels  continue to use the things the
>     callee's placed in there.

What about my prior point that the bottom-level function may not know
how long the caller needs the storage?  Will we have to add a "memory
context to put result in" parameter to a lot of routines?  Ugh, but
maybe it's the only way.

I've noticed that everyone else contributing to this thread has been
thinking in terms of inventing multiple allocation functions with
different names, ie a routine might have to call "palloc" or
"fast_palloc" or "tmp_palloc" or whatever depending on what behavior it
wants.  I really think we are better off to stick to the structure that
we already have (cf. include/nodes/memnodes.h and include/utils/mcxt.h),
in which there's one set of interface routines that take a "context"
parameter, and the context determines the semantics of allocation.
(The interface routines probably ought to be macros, not out-of-line
code, but that's a minor optimization.)  This will make it easier to
handle cases where one routine has to tell another one how to allocate
its result space: you pass a MemoryContext parameter.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
jwieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

>
> jwieck@debis.com (Jan Wieck) writes:
> > Tom Lane wrote:
> >> However, we probably don't really want to touch each individual palloc()
> >> call in the system ...
>
> >     There are about 900 calls of palloc() in the backend code. It
> >     is much less than I expected (we have over 200,000  lines  of
> >     code).
>
> Much less than I thought, also.  And (grep|wc...) over 300 of these
> calls are in the utils/adt code, ie, they are for passing back the
> results of type conversion functions.  We'll only need to think through
> how that should work once, and then it's just a mechanical edit to all
> the ADT files.

    Exactly  these  300  ones  (except  for 20 or 30 of them) are
    those that the caller needs :-)

> What about my prior point that the bottom-level function may not know
> how long the caller needs the storage?  Will we have to add a "memory
> context to put result in" parameter to a lot of routines?  Ugh, but
> maybe it's the only way.

    I've browsed a little throug the code (well, the candidates I
    thought are good choices for optimization). The problem there
    is, that the order of allocation hasn't anything to  do  with
    the  lifecycle  of  the  objects.  So  if  we  want to handle
    allocations different depending on the object lifecycle,  the
    allocators  (mainly their callers) would have to switch often
    between different contexts.

    It wouldn't make the code much  better  readable  if  between
    most   of   the  statements  some  MemoryContextStumbleOver()
    appears. And many memory contexts means  on  the  other  hand
    many  concurrently read/written pages. Thus I would expect it
    to affect the caching of the CPU.

>
> I've noticed that everyone else contributing to this thread has been
> thinking in terms of inventing multiple allocation functions with
> different names, ie a routine might have to call "palloc" or
> "fast_palloc" or "tmp_palloc" or whatever depending on what behavior it
> wants.

    So I.

>        I really think we are better off to stick to the structure that
> we already have (cf. include/nodes/memnodes.h and include/utils/mcxt.h),
> in which there's one set of interface routines that take a "context"
> parameter, and the context determines the semantics of allocation.
> (The interface routines probably ought to be macros, not out-of-line
> code, but that's a minor optimization.)  This will make it easier to
> handle cases where one routine has to tell another one how to allocate
> its result space: you pass a MemoryContext parameter.

    I came to the  same  conclusion.  So  I  continued  with  the
    approach  of  bigger  chunks handled in palloc(). What I have
    now is  something,  that  gains  about  10%  speedup  at  the
    regression  test,  while  memory consumption (visibly watched
    with top(1)) seems not to raise compared against old version.

    Since  the  bigger  blocks  are  built on top of the existing
    memory context model, it would not conflict with any enhanced
    usage we could make with it.

    I include a patch at the end - please comment.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #


diff -cr src.orig/backend/utils/mmgr/mcxt.c src/backend/utils/mmgr/mcxt.c
*** src.orig/backend/utils/mmgr/mcxt.c    Thu Jan 28 16:47:39 1999
--- src/backend/utils/mmgr/mcxt.c    Fri Jan 29 20:15:41 1999
***************
*** 106,111 ****
--- 106,112 ----
  static struct GlobalMemory TopGlobalMemoryData = {
      T_GlobalMemory,                /* NodeTag                tag          */
      &GlobalContextMethodsData,    /* ContextMethods        method      */
+     NULL,                        /* char* smallchunk_block    */
      {{0}},                        /* uninitialized OrderedSetData allocSetD */
      "TopGlobal",                /* char* name       */
      {0}                            /* uninitialized OrderedElemData elemD */
diff -cr src.orig/backend/utils/mmgr/palloc.c src/backend/utils/mmgr/palloc.c
*** src.orig/backend/utils/mmgr/palloc.c    Thu Jan 28 16:47:39 1999
--- src/backend/utils/mmgr/palloc.c    Fri Jan 29 21:57:06 1999
***************
*** 24,29 ****
--- 24,57 ----

  #include "utils/palloc.h"

+
+ typedef struct PallocBlock {
+     MemoryContext    mcxt;
+     int                refcount;
+     char           *unused_chunk;
+     char           *freeptr;
+     char           *endptr;
+ } PallocBlock;
+
+ typedef struct PallocChunk {
+     PallocBlock        *blk;
+     Size            size;
+ } PallocChunk;
+
+
+ #define PALLOC_BLOCKSIZE        \
+             (8192 - sizeof(OrderedElemData) - sizeof(Size))
+ #define PALLOC_CHUNK_LIMIT        512
+ #define PALLOC_SIZE_ALIGN(s)    ((s + 15) & ~15)
+
+ #define PALLOC_CHUNK_BLKPTR(c)    (((PallocChunk *)(c))[-1].blk)
+ #define PALLOC_CHUNK_SIZE(c)    (((PallocChunk *)(c))[-1].size)
+
+ #define PALLOC_CHUNK_HDRSZ        MAXALIGN(sizeof(PallocChunk))
+ #define PALLOC_BLOCK_HDRSZ        MAXALIGN(sizeof(PallocBlock))
+
+ #define PALLOC_FREESPACE(b)        ((b)->endptr - (b)->freeptr)
+
  /* ----------------------------------------------------------------
   *        User library functions
   * ----------------------------------------------------------------
***************
*** 66,72 ****
  #ifdef PALLOC_IS_MALLOC
      return malloc(size);
  #else
!     return MemoryContextAlloc(CurrentMemoryContext, size);
  #endif     /* PALLOC_IS_MALLOC */
  }

--- 94,168 ----
  #ifdef PALLOC_IS_MALLOC
      return malloc(size);
  #else
!     PallocBlock        *block;
!     char            *chunk;
!     Size            asize = PALLOC_SIZE_ALIGN(size);
!
!     if (asize >= PALLOC_CHUNK_LIMIT)
!     {
!         chunk = (char *)MemoryContextAlloc(CurrentMemoryContext, size +
!                                                         PALLOC_CHUNK_HDRSZ);
!         chunk += PALLOC_CHUNK_HDRSZ;
!         PALLOC_CHUNK_BLKPTR(chunk) = NULL;
!         return (void *)chunk;
!     }
!
!     block = (PallocBlock *)(CurrentMemoryContext->smallchunk_block);
!
!     if (block != NULL && PALLOC_FREESPACE(block) < asize)
!     {
!         char    *prev = NULL;
!         Size    chunk_size;
!
!         chunk = block->unused_chunk;
!         while (chunk)
!         {
!             chunk_size = PALLOC_CHUNK_SIZE(chunk);
!             if (asize == chunk_size)
!             {
!                 if (prev == NULL)
!                     block->unused_chunk = (char *)PALLOC_CHUNK_BLKPTR(chunk);
!                 else
!                     PALLOC_CHUNK_BLKPTR(prev) = PALLOC_CHUNK_BLKPTR(chunk);
!
!                 block->refcount++;
!                 PALLOC_CHUNK_BLKPTR(chunk) = block;
!                 /*
!                 PALLOC_CHUNK_SIZE(chunk) = size;
!                 */
!                 return (void *)chunk;
!             }
!             prev = chunk;
!             chunk = (char *)PALLOC_CHUNK_BLKPTR(chunk);
!         }
!
!         block = NULL;
!     }
!
!     if (block == NULL)
!     {
!         block = (PallocBlock *)MemoryContextAlloc(CurrentMemoryContext,
!                                     PALLOC_BLOCKSIZE);
!         block->mcxt = CurrentMemoryContext;
!         block->unused_chunk = NULL;
!         block->refcount = 0;
!         block->freeptr = ((char *)block) + PALLOC_BLOCK_HDRSZ +
!                                             PALLOC_CHUNK_HDRSZ;
!         block->endptr = ((char *)block) + PALLOC_BLOCKSIZE;
!
!         CurrentMemoryContext->smallchunk_block = (void *)block;
!     }
!
!     chunk = block->freeptr;
!     block->freeptr += PALLOC_CHUNK_HDRSZ + asize;
!     block->refcount++;
!     PALLOC_CHUNK_BLKPTR(chunk) = block;
!     PALLOC_CHUNK_SIZE(chunk) = asize;
!
!     if (block->freeptr >= block->endptr)
!         block->mcxt->smallchunk_block = NULL;
!
!     return (void *)chunk;
  #endif     /* PALLOC_IS_MALLOC */
  }

***************
*** 76,82 ****
  #ifdef PALLOC_IS_MALLOC
      free(pointer);
  #else
!     MemoryContextFree(CurrentMemoryContext, pointer);
  #endif     /* PALLOC_IS_MALLOC */
  }

--- 172,202 ----
  #ifdef PALLOC_IS_MALLOC
      free(pointer);
  #else
!     PallocBlock        *block = PALLOC_CHUNK_BLKPTR(pointer);
!
!     if (block == NULL)
!     {
!         MemoryContextFree(CurrentMemoryContext, (char *)pointer - PALLOC_CHUNK_HDRSZ);
!         return;
!     }
!
!     PALLOC_CHUNK_BLKPTR(pointer) = (PallocBlock *)(block->unused_chunk);
!     block->unused_chunk = (char *)pointer;
!
!     block->refcount--;
!     if (block->refcount == 0)
!     {
!         if (block == (PallocBlock *)(block->mcxt->smallchunk_block))
!         {
!             block->freeptr = ((char *)block) + PALLOC_BLOCK_HDRSZ +
!                                                 PALLOC_CHUNK_HDRSZ;
!             block->unused_chunk = NULL;
!         }
!         else
!         {
!             MemoryContextFree(block->mcxt, (void *)block);
!         }
!     }
  #endif     /* PALLOC_IS_MALLOC */
  }

***************
*** 100,106 ****
  #ifdef PALLOC_IS_MALLOC
      return realloc(pointer, size);
  #else
!     return MemoryContextRealloc(CurrentMemoryContext, pointer, size);
  #endif
  }

--- 220,248 ----
  #ifdef PALLOC_IS_MALLOC
      return realloc(pointer, size);
  #else
!     PallocBlock        *block = PALLOC_CHUNK_BLKPTR(pointer);
!     char             *new;
!     Size            tocopy;
!
!     if (block == NULL)
!     {
!         new = (char *)MemoryContextRealloc(CurrentMemoryContext,
!                                 (char *)pointer - PALLOC_CHUNK_HDRSZ,
!                                 size + PALLOC_CHUNK_HDRSZ);
!         new += PALLOC_CHUNK_HDRSZ;
!         PALLOC_CHUNK_BLKPTR(new) = NULL;
!         return (void *)new;
!     }
!     else
!     {
!         new = palloc(size);
!
!         tocopy = PALLOC_CHUNK_SIZE(pointer) > size ?
!                             size : PALLOC_CHUNK_SIZE(pointer);
!         memcpy(new, pointer, tocopy);
!         pfree(pointer);
!         return (void *)new;
!     }
  #endif
  }

***************
*** 117,119 ****
--- 259,263 ----

      return nstr;
  }
+
+
diff -cr src.orig/backend/utils/mmgr/portalmem.c src/backend/utils/mmgr/portalmem.c
*** src.orig/backend/utils/mmgr/portalmem.c    Thu Jan 28 16:47:39 1999
--- src/backend/utils/mmgr/portalmem.c    Fri Jan 29 20:15:41 1999
***************
*** 390,395 ****
--- 390,396 ----
      NodeSetTag((Node *) &portal->variable, T_PortalVariableMemory);
      AllocSetInit(&portal->variable.setData, DynamicAllocMode, (Size) 0);
      portal->variable.method = &PortalVariableContextMethodsData;
+     portal->variable.smallchunk_block = NULL;

      /*
       * initialize portal heap context
***************
*** 399,404 ****
--- 400,406 ----
      FixedStackInit(&portal->heap.stackData,
                     offsetof(HeapMemoryBlockData, itemData));
      portal->heap.method = &PortalHeapContextMethodsData;
+     portal->heap.smallchunk_block = NULL;

      /*
       * set bogus portal name
***************
*** 756,761 ****
--- 758,764 ----
      NodeSetTag((Node *) &portal->variable, T_PortalVariableMemory);
      AllocSetInit(&portal->variable.setData, DynamicAllocMode, (Size) 0);
      portal->variable.method = &PortalVariableContextMethodsData;
+     portal->variable.smallchunk_block = NULL;

      /* initialize portal heap context */
      NodeSetTag((Node *) &portal->heap, T_PortalHeapMemory);
***************
*** 763,768 ****
--- 766,772 ----
      FixedStackInit(&portal->heap.stackData,
                     offsetof(HeapMemoryBlockData, itemData));
      portal->heap.method = &PortalHeapContextMethodsData;
+     portal->heap.smallchunk_block = NULL;

      /* initialize portal name */
      length = 1 + strlen(name);
***************
*** 918,923 ****
--- 922,928 ----

      /* free current mode */
      AllocSetReset(&HEAPMEMBLOCK(context)->setData);
+     context->smallchunk_block = NULL;
      MemoryContextFree((MemoryContext) PortalHeapMemoryGetVariableMemory(context),
                        context->block);

diff -cr src.orig/include/nodes/memnodes.h src/include/nodes/memnodes.h
*** src.orig/include/nodes/memnodes.h    Thu Jan 28 16:47:39 1999
--- src/include/nodes/memnodes.h    Fri Jan 29 20:15:41 1999
***************
*** 60,65 ****
--- 60,66 ----
  {
      NodeTag        type;
      MemoryContextMethods method;
+     void        *smallchunk_block;
  }           *MemoryContext;

  /* think about doing this right some time but we'll have explicit fields
***************
*** 68,73 ****
--- 69,75 ----
  {
      NodeTag        type;
      MemoryContextMethods method;
+     void        *smallchunk_block;
      AllocSetData setData;
      char       *name;
      OrderedElemData elemData;
***************
*** 79,84 ****
--- 81,87 ----
  {
      NodeTag        type;
      MemoryContextMethods method;
+     void        *smallchunk_block;
      AllocSetData setData;
  }           *PortalVariableMemory;

***************
*** 86,91 ****
--- 89,95 ----
  {
      NodeTag        type;
      MemoryContextMethods method;
+     void        *smallchunk_block;
      Pointer        block;
      FixedStackData stackData;
  }           *PortalHeapMemory;

Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
> I've noticed that everyone else contributing to this thread has been
> thinking in terms of inventing multiple allocation functions with
> different names, ie a routine might have to call "palloc" or
> "fast_palloc" or "tmp_palloc" or whatever depending on what behavior it
> wants.  I really think we are better off to stick to the structure that
> we already have (cf. include/nodes/memnodes.h and include/utils/mcxt.h),
> in which there's one set of interface routines that take a "context"
> parameter, and the context determines the semantics of allocation.
> (The interface routines probably ought to be macros, not out-of-line
> code, but that's a minor optimization.)  This will make it easier to
> handle cases where one routine has to tell another one how to allocate
> its result space: you pass a MemoryContext parameter.

I agree, inline is good.  I can imagine something like makeNode is going
to be tough.

I am going to give the parser as an example.  It palloc's all sorts of
things for rangetables, strings, structures, lists, but eventually just
exports a huge Query structure.

If we could somehow just grab copy Query structure and free all the
other parser palloc allocations, that would be great.

Same thing with the optimizer.  It makes all sorts of lists, but
eventuall just exports Plan's.  All the other stuff can be pfree'ed.

It is almost as though we could copy the Query structure to another
memory context, and just fee all palloc'ed memory.

What we could try is to keep the various memory contexts like cache
context unchanged, but allow pallocs in the standard memory context to
be allocated from a pool and free'ed automatically.  Our standard memory
context is overused and doesn't need the pfree overhead.  We could make
palloc a macro that tested for the current context, and did standard
pallocs from a free memory pool.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Patch (was: tough locale bug)

From
Goran Thyni
Date:
Goran Thyni wrote:
> 3. text_lt('G','G\0xFF') this is not correct!
>
> Case 3 work not work with strcoll(), in varstr_cmp().
> If I change it to strcoll() to strncmp() it works as expected,
> but it probably breaks sorting etc big time.

Here is a hack which solves the problem without breaking sorting AFAIK.
I have read a lot of code, but has not found any cleaner way to fix
this.
A cleaner solution would be to find the highest char in locale charset
but I found no portable way to do that, any pointers appriciated.

It is not a beauty but it works.
Please apply to -current since it fixes the bug,
unless someone has a better suggestion.

    best regards,
--
-----------------
Göran Thyni
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!diff -cr cvs/pgsql/src/backend/utils/adt/varlena.c cvswork/pgsql/src/backend/utils/adt/varlena.c
*** cvs/pgsql/src/backend/utils/adt/varlena.c    Mon Dec 14 07:01:37 1998
--- cvswork/pgsql/src/backend/utils/adt/varlena.c    Sun Jan 31 16:32:52 1999
***************
*** 496,528 ****
  varstr_cmp(char *arg1, int len1, char *arg2, int len2)
  {
      int            result;
!     char       *a1p,
!                *a2p;
!
  #ifdef USE_LOCALE
!     a1p = (unsigned char *) palloc(len1 + 1);
!     a2p = (unsigned char *) palloc(len2 + 1);
!
!     memcpy(a1p, arg1, len1);
!     *(a1p + len1) = '\0';
!     memcpy(a2p, arg2, len2);
!     *(a2p + len2) = '\0';
!
!     result = strcoll(a1p, a2p);
!
!     pfree(a1p);
!     pfree(a2p);
!
! #else
!
!     a1p = arg1;
!     a2p = arg2;
!
!     result = strncmp(a1p, a2p, Min(len1, len2));
!     if ((result == 0) && (len1 != len2))
!         result = (len1 < len2) ? -1 : 1;
  #endif
!
      return result;
  }    /* varstr_cmp() */

--- 496,524 ----
  varstr_cmp(char *arg1, int len1, char *arg2, int len2)
  {
      int            result;
!     char       *a1p, *a2p;
  #ifdef USE_LOCALE
!     if ((int)arg2[len2 - 1] != -1)
!       {
!         a1p = (unsigned char *) palloc(len1 + 1);
!         a2p = (unsigned char *) palloc(len2 + 1);
!         memcpy(a1p, arg1, len1);
!         *(a1p + len1) = '\0';
!         memcpy(a2p, arg2, len2);
!         *(a2p + len2) = '\0';
!         result = strcoll(a1p, a2p);
!         pfree(a1p);
!         pfree(a2p);
!       }
!     else
  #endif
!       {
!         a1p = arg1;
!         a2p = arg2;
!         result = strncmp(a1p, a2p, Min(len1, len2));
!         if ((result == 0) && (len1 != len2))
!         result = (len1 < len2) ? -1 : 1;
!       }
      return result;
  }    /* varstr_cmp() */


Re: [HACKERS] Patch (was: tough locale bug)

From
"Thomas G. Lockhart"
Date:
>   #ifdef USE_LOCALE
> !       if ((int)arg2[len2 - 1] != -1)

Sorry, I'm not as familiar with the LOCALE code as I should be. Why are
we testing explicitly for -1? Is there something special about a
character with all bits set in typical locale support code?

Regards.
                       - Tom


Re: [HACKERS] Patch (was: tough locale bug)

From
Oleg Broytmann
Date:
Hi!

On Mon, 1 Feb 1999, Thomas G. Lockhart wrote:
> >   #ifdef USE_LOCALE
> > !       if ((int)arg2[len2 - 1] != -1)
> 
> Sorry, I'm not as familiar with the LOCALE code as I should be. Why are
> we testing explicitly for -1? Is there something special about a
> character with all bits set in typical locale support code?
  Other way around - in windows-1251 it is pretty standard character.

Oleg.
----    Oleg Broytmann  National Research Surgery Centre  http://sun.med.ru/~phd/          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Patch (was: tough locale bug)

From
Goran Thyni
Date:
"Thomas G. Lockhart" wrote:
> 
> >   #ifdef USE_LOCALE
> > !       if ((int)arg2[len2 - 1] != -1)
> 
> Sorry, I'm not as familiar with the LOCALE code as I should be. Why are
> we testing explicitly for -1? Is there something special about a
> character with all bits set in typical locale support code?
> 
> Regards.
> 
>                         - Tom

It looks a bit cryptic yes, but it is not specific to locale.
The planner appends a "(char) -1" (or alt syntax '\xFF') to
the teststring in some cases of "<=" on text (f.ex in "txt ~ '^G'").
This works OK with strncmp which compare char by char as unsigned ints,
but locale uses lookup-tables where '\FF' not is guanteed to be
greater or equal to any other unsigned 8-bit character.

Therefore I did a fallback to strncmp (instead of locale aware strcoll)
for this special case.
It is not the perfect solution and may break in a few cases, but as it
is now
it breaks most usage of LIKE and ~ when locale is enabled.

If we could in a portable way to find the last usable character in the
used charset it would be nice, but I have not found any way to do that.
regards,
-- 
---------------------------------------------
Göran Thyni, JMS Bildbasen, Kiruna
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: [HACKERS] Patch (was: tough locale bug)

From
Goran Thyni
Date:
Oleg Broytmann wrote:
> 
> Hi!
> 
> On Mon, 1 Feb 1999, Thomas G. Lockhart wrote:
> > >   #ifdef USE_LOCALE
> > > !       if ((int)arg2[len2 - 1] != -1)
> >
> > Sorry, I'm not as familiar with the LOCALE code as I should be. Why are
> > we testing explicitly for -1? Is there something special about a
> > character with all bits set in typical locale support code?
> 
>    Other way around - in windows-1251 it is pretty standard character.

OK,
and it is probably not the last character in the alphabet either. 
In that case we need another solution I am afraid.
Anyone got any ideas how to resolve this?
regards,
-- 
---------------------------------------------
Göran Thyni, JMS Bildbasen, Kiruna
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: [HACKERS] Patch (was: tough locale bug)

From
Oleg Broytmann
Date:
Hi!

On Mon, 1 Feb 1999, Goran Thyni wrote:
> >    Other way around - in windows-1251 it is pretty standard character.
> and it is probably not the last character in the alphabet either. 
  In win-1251 it is last charachter in alphabet. In koi8-r it is not.

Oleg.
----    Oleg Broytmann     http://members.xoom.com/phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Patch (was: tough locale bug)

From
Goran Thyni
Date:
Oleg Broytmann wrote:
> 
> Hi!
> 
> On Mon, 1 Feb 1999, Goran Thyni wrote:
> > >    Other way around - in windows-1251 it is pretty standard character.
> > and it is probably not the last character in the alphabet either.
> 
>    In win-1251 it is last charachter in alphabet. In koi8-r it is not.

I posted a new patch that corrects the problem in a way that
doesn't break koi8-r (and possibly others). I hope it will get
applied before beta starts.
mvh,
-- 
---------------------------------------------
Göran Thyni, JMS Bildbasen, Kiruna
This is Penguin Country. On a quiet night you can hear Windows NT
reboot!


Re: [HACKERS] Patch (was: tough locale bug)

From
Oleg Broytmann
Date:
Hi!

On Tue, 2 Feb 1999, Goran Thyni wrote:
> I posted a new patch that corrects the problem in a way that
> doesn't break koi8-r (and possibly others). I hope it will get
> applied before beta starts.
  I saw the patch. I'll try to apply it and test.  BTW, if you are looking into locale - could you please supply
locale
regression test for you locale? Look into ...src/test/locale, copy koi8-r
directory and supply your test data.

Oleg.
----    Oleg Broytmann  National Research Surgery Centre  http://sun.med.ru/~phd/          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Postgres Speed or lack thereof

From
jwieck@debis.com (Jan Wieck)
Date:
I wrote:

>
>     I came to the  same  conclusion.  So  I  continued  with  the
>     approach  of  bigger  chunks handled in palloc(). What I have
>     now is  something,  that  gains  about  10%  speedup  at  the
>     regression  test,  while  memory consumption (visibly watched
>     with top(1)) seems not to raise compared against old version.
>
>     Since  the  bigger  blocks  are  built on top of the existing
>     memory context model, it would not conflict with any enhanced
>     usage we could make with it.
>
>     I include a patch at the end - please comment.
>
>
> Jan

    Did anyone play around with it? I've had it installed now for
    some days and it work's well so far.

    How close are we to v6.5 BETA? Should I apply it to  CURRENT?

> diff -cr src.orig/backend/utils/mmgr/mcxt.c src/backend/utils/mmgr/mcxt.c
> *** src.orig/backend/utils/mmgr/mcxt.c     Thu Jan 28 16:47:39 1999
> --- src/backend/utils/mmgr/mcxt.c     Fri Jan 29 20:15:41 1999
> ***************
> *** 106,111 ****
> --- 106,112 ----
>   static struct GlobalMemory TopGlobalMemoryData = {
>    T_GlobalMemory,                    /* NodeTag                    tag         */
>    &GlobalContextMethodsData,    /* ContextMethods        method      */
> +  NULL,                              /* char* smallchunk_block     */
>    {{0}},                             /* uninitialized OrderedSetData allocSetD */
>    "TopGlobal",                  /* char* name     */
>    {0}                                /* uninitialized OrderedElemData elemD */
> diff -cr src.orig/backend/utils/mmgr/palloc.c src/backend/utils/mmgr/palloc.c
> *** src.orig/backend/utils/mmgr/palloc.c   Thu Jan 28 16:47:39 1999
> --- src/backend/utils/mmgr/palloc.c   Fri Jan 29 21:57:06 1999
> ***************
> *** 24,29 ****
> --- 24,57 ----
>
>   #include "utils/palloc.h"
>
> +
> + typedef struct PallocBlock {
> +  MemoryContext  mcxt;
> +  int                 refcount;
> +  char         *unused_chunk;
> +  char         *freeptr;
> +  char         *endptr;
> + } PallocBlock;
> +
> + typedef struct PallocChunk {
> +  PallocBlock         *blk;
> +  Size           size;
> + } PallocChunk;
> +
> +
> + #define PALLOC_BLOCKSIZE       \
> +            (8192 - sizeof(OrderedElemData) - sizeof(Size))
> + #define PALLOC_CHUNK_LIMIT          512
> + #define PALLOC_SIZE_ALIGN(s)   ((s + 15) & ~15)
> +
> + #define PALLOC_CHUNK_BLKPTR(c) (((PallocChunk *)(c))[-1].blk)
> + #define PALLOC_CHUNK_SIZE(c)   (((PallocChunk *)(c))[-1].size)
> +
> + #define PALLOC_CHUNK_HDRSZ          MAXALIGN(sizeof(PallocChunk))
> + #define PALLOC_BLOCK_HDRSZ          MAXALIGN(sizeof(PallocBlock))
> +
> + #define PALLOC_FREESPACE(b)         ((b)->endptr - (b)->freeptr)
> +
>   /* ----------------------------------------------------------------
>    *         User library functions
>    * ----------------------------------------------------------------
> ***************
> *** 66,72 ****
>   #ifdef PALLOC_IS_MALLOC
>    return malloc(size);
>   #else
> !  return MemoryContextAlloc(CurrentMemoryContext, size);
>   #endif      /* PALLOC_IS_MALLOC */
>   }
>
> --- 94,168 ----
>   #ifdef PALLOC_IS_MALLOC
>    return malloc(size);
>   #else
> !  PallocBlock         *block;
> !  char           *chunk;
> !  Size           asize = PALLOC_SIZE_ALIGN(size);
> !
> !  if (asize >= PALLOC_CHUNK_LIMIT)
> !  {
> !       chunk = (char *)MemoryContextAlloc(CurrentMemoryContext, size +
> !                                                                   PALLOC_CHUNK_HDRSZ);
> !       chunk += PALLOC_CHUNK_HDRSZ;
> !       PALLOC_CHUNK_BLKPTR(chunk) = NULL;
> !       return (void *)chunk;
> !  }
> !
> !  block = (PallocBlock *)(CurrentMemoryContext->smallchunk_block);
> !
> !  if (block != NULL && PALLOC_FREESPACE(block) < asize)
> !  {
> !       char *prev = NULL;
> !       Size chunk_size;
> !
> !       chunk = block->unused_chunk;
> !       while (chunk)
> !       {
> !            chunk_size = PALLOC_CHUNK_SIZE(chunk);
> !            if (asize == chunk_size)
> !            {
> !                 if (prev == NULL)
> !                      block->unused_chunk = (char *)PALLOC_CHUNK_BLKPTR(chunk);
> !                 else
> !                      PALLOC_CHUNK_BLKPTR(prev) = PALLOC_CHUNK_BLKPTR(chunk);
> !
> !                 block->refcount++;
> !                 PALLOC_CHUNK_BLKPTR(chunk) = block;
> !                 /*
> !                 PALLOC_CHUNK_SIZE(chunk) = size;
> !                 */
> !                 return (void *)chunk;
> !            }
> !            prev = chunk;
> !            chunk = (char *)PALLOC_CHUNK_BLKPTR(chunk);
> !       }
> !
> !       block = NULL;
> !  }
> !
> !  if (block == NULL)
> !  {
> !       block = (PallocBlock *)MemoryContextAlloc(CurrentMemoryContext,
> !                                          PALLOC_BLOCKSIZE);
> !       block->mcxt = CurrentMemoryContext;
> !       block->unused_chunk = NULL;
> !       block->refcount = 0;
> !       block->freeptr = ((char *)block) + PALLOC_BLOCK_HDRSZ +
> !                                                    PALLOC_CHUNK_HDRSZ;
> !       block->endptr = ((char *)block) + PALLOC_BLOCKSIZE;
> !
> !       CurrentMemoryContext->smallchunk_block = (void *)block;
> !  }
> !
> !  chunk = block->freeptr;
> !  block->freeptr += PALLOC_CHUNK_HDRSZ + asize;
> !  block->refcount++;
> !  PALLOC_CHUNK_BLKPTR(chunk) = block;
> !  PALLOC_CHUNK_SIZE(chunk) = asize;
> !
> !  if (block->freeptr >= block->endptr)
> !       block->mcxt->smallchunk_block = NULL;
> !
> !  return (void *)chunk;
>   #endif      /* PALLOC_IS_MALLOC */
>   }
>
> ***************
> *** 76,82 ****
>   #ifdef PALLOC_IS_MALLOC
>    free(pointer);
>   #else
> !  MemoryContextFree(CurrentMemoryContext, pointer);
>   #endif      /* PALLOC_IS_MALLOC */
>   }
>
> --- 172,202 ----
>   #ifdef PALLOC_IS_MALLOC
>    free(pointer);
>   #else
> !  PallocBlock         *block = PALLOC_CHUNK_BLKPTR(pointer);
> !
> !  if (block == NULL)
> !  {
> !       MemoryContextFree(CurrentMemoryContext, (char *)pointer - PALLOC_CHUNK_HDRSZ);
> !       return;
> !  }
> !
> !  PALLOC_CHUNK_BLKPTR(pointer) = (PallocBlock *)(block->unused_chunk);
> !  block->unused_chunk = (char *)pointer;
> !
> !  block->refcount--;
> !  if (block->refcount == 0)
> !  {
> !       if (block == (PallocBlock *)(block->mcxt->smallchunk_block))
> !       {
> !            block->freeptr = ((char *)block) + PALLOC_BLOCK_HDRSZ +
> !                                                         PALLOC_CHUNK_HDRSZ;
> !            block->unused_chunk = NULL;
> !       }
> !       else
> !       {
> !            MemoryContextFree(block->mcxt, (void *)block);
> !       }
> !  }
>   #endif      /* PALLOC_IS_MALLOC */
>   }
>
> ***************
> *** 100,106 ****
>   #ifdef PALLOC_IS_MALLOC
>    return realloc(pointer, size);
>   #else
> !  return MemoryContextRealloc(CurrentMemoryContext, pointer, size);
>   #endif
>   }
>
> --- 220,248 ----
>   #ifdef PALLOC_IS_MALLOC
>    return realloc(pointer, size);
>   #else
> !  PallocBlock         *block = PALLOC_CHUNK_BLKPTR(pointer);
> !  char                *new;
> !  Size           tocopy;
> !
> !  if (block == NULL)
> !  {
> !       new = (char *)MemoryContextRealloc(CurrentMemoryContext,
> !                                     (char *)pointer - PALLOC_CHUNK_HDRSZ,
> !                                     size + PALLOC_CHUNK_HDRSZ);
> !       new += PALLOC_CHUNK_HDRSZ;
> !       PALLOC_CHUNK_BLKPTR(new) = NULL;
> !       return (void *)new;
> !  }
> !  else
> !  {
> !       new = palloc(size);
> !
> !       tocopy = PALLOC_CHUNK_SIZE(pointer) > size ?
> !                                size : PALLOC_CHUNK_SIZE(pointer);
> !       memcpy(new, pointer, tocopy);
> !       pfree(pointer);
> !       return (void *)new;
> !  }
>   #endif
>   }
>
> ***************
> *** 117,119 ****
> --- 259,263 ----
>
>    return nstr;
>   }
> +
> +
> diff -cr src.orig/backend/utils/mmgr/portalmem.c src/backend/utils/mmgr/portalmem.c
> *** src.orig/backend/utils/mmgr/portalmem.c     Thu Jan 28 16:47:39 1999
> --- src/backend/utils/mmgr/portalmem.c     Fri Jan 29 20:15:41 1999
> ***************
> *** 390,395 ****
> --- 390,396 ----
>    NodeSetTag((Node *) &portal->variable, T_PortalVariableMemory);
>    AllocSetInit(&portal->variable.setData, DynamicAllocMode, (Size) 0);
>    portal->variable.method = &PortalVariableContextMethodsData;
> +  portal->variable.smallchunk_block = NULL;
>
>    /*
>     * initialize portal heap context
> ***************
> *** 399,404 ****
> --- 400,406 ----
>    FixedStackInit(&portal->heap.stackData,
>                      offsetof(HeapMemoryBlockData, itemData));
>    portal->heap.method = &PortalHeapContextMethodsData;
> +  portal->heap.smallchunk_block = NULL;
>
>    /*
>     * set bogus portal name
> ***************
> *** 756,761 ****
> --- 758,764 ----
>    NodeSetTag((Node *) &portal->variable, T_PortalVariableMemory);
>    AllocSetInit(&portal->variable.setData, DynamicAllocMode, (Size) 0);
>    portal->variable.method = &PortalVariableContextMethodsData;
> +  portal->variable.smallchunk_block = NULL;
>
>    /* initialize portal heap context */
>    NodeSetTag((Node *) &portal->heap, T_PortalHeapMemory);
> ***************
> *** 763,768 ****
> --- 766,772 ----
>    FixedStackInit(&portal->heap.stackData,
>                      offsetof(HeapMemoryBlockData, itemData));
>    portal->heap.method = &PortalHeapContextMethodsData;
> +  portal->heap.smallchunk_block = NULL;
>
>    /* initialize portal name */
>    length = 1 + strlen(name);
> ***************
> *** 918,923 ****
> --- 922,928 ----
>
>    /* free current mode */
>    AllocSetReset(&HEAPMEMBLOCK(context)->setData);
> +  context->smallchunk_block = NULL;
>    MemoryContextFree((MemoryContext) PortalHeapMemoryGetVariableMemory(context),
>                          context->block);
>
> diff -cr src.orig/include/nodes/memnodes.h src/include/nodes/memnodes.h
> *** src.orig/include/nodes/memnodes.h Thu Jan 28 16:47:39 1999
> --- src/include/nodes/memnodes.h Fri Jan 29 20:15:41 1999
> ***************
> *** 60,65 ****
> --- 60,66 ----
>   {
>    NodeTag        type;
>    MemoryContextMethods method;
> +  void      *smallchunk_block;
>   }             *MemoryContext;
>
>   /* think about doing this right some time but we'll have explicit fields
> ***************
> *** 68,73 ****
> --- 69,75 ----
>   {
>    NodeTag        type;
>    MemoryContextMethods method;
> +  void      *smallchunk_block;
>    AllocSetData setData;
>    char    *name;
>    OrderedElemData elemData;
> ***************
> *** 79,84 ****
> --- 81,87 ----
>   {
>    NodeTag        type;
>    MemoryContextMethods method;
> +  void      *smallchunk_block;
>    AllocSetData setData;
>   }             *PortalVariableMemory;
>
> ***************
> *** 86,91 ****
> --- 89,95 ----
>   {
>    NodeTag        type;
>    MemoryContextMethods method;
> +  void      *smallchunk_block;
>    Pointer        block;
>    FixedStackData stackData;
>   }             *PortalHeapMemory;
>
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
jwieck@debis.com (Jan Wieck) writes:
>     [ patch to handle small-block allocation specially ]
>     Did anyone play around with it? I've had it installed now for
>     some days and it work's well so far.
>     How close are we to v6.5 BETA? Should I apply it to  CURRENT?

Well, I think it's a band-aid until we can redesign the memory manager
properly, but it could be a useful band-aid ... might as well put it in.

The only thing that bothers me about the code itself is lack of
comments.  In particular it'd be nice to mention somewhere that freeptr
is actually one chunkheader past the start of the free space --- took
a while to puzzle that out.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
jwieck@debis.com (Jan Wieck)
Date:
>
> jwieck@debis.com (Jan Wieck) writes:
> >     [ patch to handle small-block allocation specially ]
> >     Did anyone play around with it? I've had it installed now for
> >     some days and it work's well so far.
> >     How close are we to v6.5 BETA? Should I apply it to  CURRENT?
>
> Well, I think it's a band-aid until we can redesign the memory manager
> properly, but it could be a useful band-aid ... might as well put it in.
>
> The only thing that bothers me about the code itself is lack of
> comments.  In particular it'd be nice to mention somewhere that freeptr
> is actually one chunkheader past the start of the free space --- took
> a while to puzzle that out.

    Tricky  eh?  The AllocSet...() functions do it the same so it
    just continues with that style.

    But you're right, and when moving  it  in  I'll  shurely  add
    comments.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Postgres Speed or lack thereof

From
The Hermit Hacker
Date:
On Tue, 2 Feb 1999, Jan Wieck wrote:

> I wrote:
> 
> >
> >     I came to the  same  conclusion.  So  I  continued  with  the
> >     approach  of  bigger  chunks handled in palloc(). What I have
> >     now is  something,  that  gains  about  10%  speedup  at  the
> >     regression  test,  while  memory consumption (visibly watched
> >     with top(1)) seems not to raise compared against old version.
> >
> >     Since  the  bigger  blocks  are  built on top of the existing
> >     memory context model, it would not conflict with any enhanced
> >     usage we could make with it.
> >
> >     I include a patch at the end - please comment.
> >
> >
> > Jan
> 
>     Did anyone play around with it? I've had it installed now for
>     some days and it work's well so far.
> 
>     How close are we to v6.5 BETA? Should I apply it to  CURRENT?

BETA starts upon Vadim's word...I'd say go for it...


Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
> >     Since  the  bigger  blocks  are  built on top of the existing
> >     memory context model, it would not conflict with any enhanced
> >     usage we could make with it.
> >
> >     I include a patch at the end - please comment.
> >
> >
> > Jan
> 
>     Did anyone play around with it? I've had it installed now for
>     some days and it work's well so far.
> 
>     How close are we to v6.5 BETA? Should I apply it to  CURRENT?


Good question.  I am done with temp tables, so Vadim makes the call on
when to start beta.
--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
jwieck@debis.com (Jan Wieck)
Date:
>
> > >     Since  the  bigger  blocks  are  built on top of the existing
> > >     memory context model, it would not conflict with any enhanced
> > >     usage we could make with it.
> > >
> > >     I include a patch at the end - please comment.
> > >
> > >
> > > Jan
> >
> >     Did anyone play around with it? I've had it installed now for
> >     some days and it work's well so far.
> >
> >     How close are we to v6.5 BETA? Should I apply it to  CURRENT?
>
>
> Good question.  I am done with temp tables, so Vadim makes the call on
> when to start beta.

    Now I'm placing another LOCK request (waiting for Vadim).

    The  problem  with  constraints  in  ExecRelCheck() I've just
    found  not  only  affects   COPY.   It   also   occurs   with
    INSERT...SELECT   or  big  UPDATE's.  Constraints  with  many
    affected tuples eat up backend memory quickly  until  out  of
    swap space.

    This  fix  must  go into CURRENT and REL6_4 before we release
    v6.5 or v6.4.3 either. There are actually reports from  users
    not able to reload dumps made with v6.4.2.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
> > Good question.  I am done with temp tables, so Vadim makes the call on
> > when to start beta.
> 
>     Now I'm placing another LOCK request (waiting for Vadim).
> 
>     The  problem  with  constraints  in  ExecRelCheck() I've just
>     found  not  only  affects   COPY.   It   also   occurs   with
>     INSERT...SELECT   or  big  UPDATE's.  Constraints  with  many
>     affected tuples eat up backend memory quickly  until  out  of
>     swap space.
> 
>     This  fix  must  go into CURRENT and REL6_4 before we release
>     v6.5 or v6.4.3 either. There are actually reports from  users
>     not able to reload dumps made with v6.4.2.

You can keep applying fixes like this until near final release date.  I
have patches in my mailbox to apply today too.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Patch (was: tough locale bug)

From
Bruce Momjian
Date:
Do you want this applied?


[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Goran Thyni wrote:
> > 3. text_lt('G','G\0xFF') this is not correct!
> > 
> > Case 3 work not work with strcoll(), in varstr_cmp().
> > If I change it to strcoll() to strncmp() it works as expected,
> > but it probably breaks sorting etc big time.
> 
> Here is a hack which solves the problem without breaking sorting AFAIK.
> I have read a lot of code, but has not found any cleaner way to fix
> this.
> A cleaner solution would be to find the highest char in locale charset
> but I found no portable way to do that, any pointers appriciated.
> 
> It is not a beauty but it works. 
> Please apply to -current since it fixes the bug, 
> unless someone has a better suggestion.
> 
>     best regards,
> -- 
> -----------------
> G_ran Thyni
> This is Penguin Country. On a quiet night you can hear Windows NT
> reboot!

> diff -cr cvs/pgsql/src/backend/utils/adt/varlena.c cvswork/pgsql/src/backend/utils/adt/varlena.c
> *** cvs/pgsql/src/backend/utils/adt/varlena.c    Mon Dec 14 07:01:37 1998
> --- cvswork/pgsql/src/backend/utils/adt/varlena.c    Sun Jan 31 16:32:52 1999
> ***************
> *** 496,528 ****
>   varstr_cmp(char *arg1, int len1, char *arg2, int len2)
>   {
>       int            result;
> !     char       *a1p,
> !                *a2p;
> ! 
>   #ifdef USE_LOCALE
> !     a1p = (unsigned char *) palloc(len1 + 1);
> !     a2p = (unsigned char *) palloc(len2 + 1);
> ! 
> !     memcpy(a1p, arg1, len1);
> !     *(a1p + len1) = '\0';
> !     memcpy(a2p, arg2, len2);
> !     *(a2p + len2) = '\0';
> ! 
> !     result = strcoll(a1p, a2p);
> ! 
> !     pfree(a1p);
> !     pfree(a2p);
> ! 
> ! #else
> ! 
> !     a1p = arg1;
> !     a2p = arg2;
> ! 
> !     result = strncmp(a1p, a2p, Min(len1, len2));
> !     if ((result == 0) && (len1 != len2))
> !         result = (len1 < len2) ? -1 : 1;
>   #endif
> ! 
>       return result;
>   }    /* varstr_cmp() */
>   
> --- 496,524 ----
>   varstr_cmp(char *arg1, int len1, char *arg2, int len2)
>   {
>       int            result;
> !     char       *a1p, *a2p;
>   #ifdef USE_LOCALE
> !     if ((int)arg2[len2 - 1] != -1)
> !       {
> !         a1p = (unsigned char *) palloc(len1 + 1);
> !         a2p = (unsigned char *) palloc(len2 + 1);
> !         memcpy(a1p, arg1, len1);
> !         *(a1p + len1) = '\0';
> !         memcpy(a2p, arg2, len2);
> !         *(a2p + len2) = '\0';
> !         result = strcoll(a1p, a2p);
> !         pfree(a1p);
> !         pfree(a2p);
> !       }
> !     else
>   #endif
> !       {
> !         a1p = arg1;
> !         a2p = arg2;
> !         result = strncmp(a1p, a2p, Min(len1, len2));
> !         if ((result == 0) && (len1 != len2))
> !         result = (len1 < len2) ? -1 : 1;
> !       }
>       return result;
>   }    /* varstr_cmp() */
>   


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
Tom, I assume this has all been corrected, right?




> I wrote:
> > As for where the time is going: "top" showed that the time for the
> > INSERT ops was almost all going into backend CPU time.  My guess is
> > that most of the cost is in parsing and planning the INSERT
> > statements.
> 
> Having an idle hour this evening, I thought it'd be interesting to build
> a backend with profiling enabled, so as to confirm or deny the above
> guess.  It seems that indeed a lot of time is being wasted, but where
> it's being wasted might surprise you!
> 
> I profiled a backend built from this morning's current CVS sources,
> being driven by a single psql run reading a series of exactly 5000
> INSERT commands.  The first few are:
> 
> INSERT INTO "fehistory_1" values (64,14,252,'D','-','-','s',1,4027,NULL,'S',1,4088,NULL,32,'Thu Jan 07 15:15:00 1999
EST','FriJan 08 08:30:38 1999 EST',4027,32,NULL,NULL,46,NULL);
 
> INSERT INTO "fehistory_1" values (65,14,252,'P','-','-','S',1,4086,NULL,NULL,NULL,NULL,NULL,32,'Fri Jan 08 16:00:00
1999EST',NULL,NULL,NULL,NULL,NULL,45,NULL);
 
> INSERT INTO "fehistory_1" values (66,14,135,'P','-','-','S',1,13619,NULL,'s',1,12967,NULL,100,'Fri Jan 08 02:00:00
1999EST',NULL,NULL,NULL,NULL,NULL,44,NULL);
 
> 
> and it goes on for a long while in that vein :-).  Total runtime was
> 278.09 CPU seconds in the backend, 307 seconds elapsed realtime.  The
> table being inserted into has no indexes and was initially empty.
> 
> The profiler is GNU gprof, which provides both flat and structured
> profile data.  The top few entries in the flat profile are:
> 
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  34.96     97.21    97.21                             _mcount
>   7.18    117.19    19.98  8047816     0.00     0.00  tree_cut
>   6.56    135.43    18.24   962187     0.02     0.02  recv
>   3.55    145.31     9.88  8234278     0.00     0.00  tree_insert
>   3.49    155.02     9.71  4292947     0.00     0.01  malloc
>   3.31    164.22     9.20  1965056     0.00     0.01  SearchSysCache
>   3.24    173.23     9.01  4291775     0.00     0.01  free
>   2.44    180.01     6.78  6171667     0.00     0.00  tree_concatenate
>   2.17    186.04     6.03  8234069     0.00     0.00  tree_delete
>   1.07    189.02     2.98    10250     0.29     0.29  write
>   1.05    191.94     2.92                             _psort_cmp
>   0.95    194.57     2.63  2983301     0.00     0.00  newNode
>   0.92    197.14     2.57  4251762     0.00     0.00  OrderedElemPop
>   0.88    199.60     2.46  2746172     0.00     0.00  comphash
>   0.82    201.88     2.28  4252495     0.00     0.01  AllocSetAlloc
>   0.77    204.02     2.14  1965056     0.00     0.00  SearchSysCacheTuple
>   0.73    206.05     2.03   265000     0.01     0.01  yylex
>   0.69    207.96     1.91                             $$remU
>   0.68    209.86     1.90  4252500     0.00     0.00  OrderedElemPushHead
>   0.62    211.58     1.72  4247473     0.00     0.00  palloc
>   0.60    213.26     1.68    25004     0.07     0.07  send
>   0.56    214.81     1.55  1965056     0.00     0.00  CatalogCacheComputeHashIndex
>   0.55    216.33     1.52     5000     0.30     1.07  yyparse
>   0.54    217.82     1.49  4251762     0.00     0.01  AllocSetFree
>   0.53    219.29     1.47  4252494     0.00     0.00  MemoryContextAlloc
>   0.51    220.71     1.42  4252500     0.00     0.00  OrderedElemPushInto
>   0.49    222.07     1.36  3430844     0.00     0.00  strncmp
>   0.48    223.41     1.34  4205327     0.00     0.00  OrderedSetGetHead
>   0.47    224.73     1.32                             elog
>   0.45    225.99     1.26  1964976     0.00     0.00  DLRemove
>   0.43    227.19     1.20  2510857     0.00     0.00  strcmp
>   0.43    228.38     1.19  4200327     0.00     0.00  OrderedElemGetBase
>   0.42    229.56     1.18  4245740     0.00     0.01  PortalHeapMemoryAlloc
>   0.38    230.62     1.06  4252500     0.00     0.00  OrderedElemPush
>   0.37    231.66     1.04  4205327     0.00     0.00  AllocSetGetFirst
>   0.36    232.66     1.00  1965145     0.00     0.00  DLAddHead
> 
> _mcount is part of the profiling apparatus and can be ignored (although
> the fact that it's so high up implies that we're executing an awful lot
> of short routine calls, because mcount runs once per entry and exit of
> profilable routines).  tree_cut, tree_insert, tree_concatenate, and
> tree_delete are subroutines of malloc/free.  Altogether, malloc/free
> and friends accounted for 61.39 seconds out of the 180.88 non-overhead
> CPU seconds in this run.
> 
> In other words, we're spending a third of our time mallocing and freeing
> memory.  A tad high, what?
> 
> Actually, it's worse than that, because AllocSetAlloc,
> PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
> routines represent our own bookkeeping layer atop malloc/free.
> That's another 18.66 seconds spent in these top routines, which means
> that we are real close to expending half the backend's runtime on
> memory bookkeeping.  This needs work.
> 
> The other thing that jumps out here is the unreasonably high position of
> recv(), which is called 962187 times.  The script being read by psql was
> only 957186 characters.  Evidently we're invoking a kernel recv() call
> once per character read from the frontend.  I suspect this is an
> inefficiency introduced by Magnus Hagander's recent rewrite of backend
> libpq (see, I told you there was a reason for using stdio ;-)).  We're
> gonna have to do something about that, though it's not as critical as
> the memory-allocation issue.  It also appears that send() is now being
> invoked multiple times per backend reply, which is going to create
> inefficiencies outside the backend (ie, multiple packets per reply).
> On a test case with a lot of SELECTs that would show up more than it
> does here.
> 
> 
> Moving on to the dynamic profile, the most interesting items are:
> 
> index % time    self  children    called     name
> -----------------------------------------------
>                 0.05  118.06       1/1           DoBackend [7]
> [8]     65.3    0.05  118.06       1         PostgresMain [8]
>                 0.04   57.64    5000/5000        pg_exec_query [9]
>                 0.01   39.83    5000/5000        CommitTransactionCommand [11]
>                 0.02   18.86    5001/5001        ReadCommand [27]
>                 0.01    1.18    5000/5000        StartTransactionCommand [118]
>                 0.01    0.36    5001/5001        ReadyForQuery [190]
>                 0.04    0.00    5001/5001        EnableNotifyInterrupt [385]
>                 0.04    0.00    5000/5000        strspn [388]
>                 0.00    0.01       1/1           InitPostgres [495]
>                 0.00    0.00    5000/548573      strlen [172]
>                 0.00    0.00    5001/93349       memset [366]
>                 0.00    0.00       1/1           read_pg_options [601]
>                 0.00    0.00       2/2           pq_putint [637]
>                 0.00    0.00       1/15002       pq_putnchar [123]
>                 0.00    0.00       2/10013       getenv [241]
>                 0.00    0.00       1/1           SetPgUserName [683]
>                 0.00    0.00       1/1           pq_init [687]
>                 0.00    0.00       1/1           proc_exit [704]
>                 0.00    0.00       1/1           pq_close [721]
>                 0.00    0.00       1/5002        getpgusername [521]
>                 0.00    0.00       1/2           FindExec [722]
>                 0.00    0.00       5/8           getopt [728]
>                 0.00    0.00    5001/5001        DisableNotifyInterrupt [761]
>                 0.00    0.00       9/34          pqsignal [806]
>                 0.00    0.00       2/5           atoi [837]
>                 0.00    0.00       1/2           SetProcessingMode [868]
>                 0.00    0.00       1/10002       TransactionFlushEnabled [749]
>                 0.00    0.00       1/1           sigsetjmp [934]
> 
> ("self" is the amount of time (CPU sec) spent directly in the named function,
> while "children" is the amount of time spent in its callees, pro-rated
> by the number of calls.  For example, strlen was actually measured to
> use 0.47 sec in the whole run, but since PostgresMain called it less
> than 1% of all the times it was called, PostgresMain gets blamed for
> less than 0.01 sec of that total.)
> 
> The interesting thing here is the relative times indicated for
> pg_exec_query, CommitTransactionCommand, and ReadCommand.  We already
> found out why ReadCommand is so slow (recv() per character) ... but why
> is CommitTransactionCommand so high, when I already discovered that
> doing only one commit doesn't help much?  Patience, we'll get there.
> 
> -----------------------------------------------
>                 0.03   57.61    5000/5000        pg_exec_query [9]
> [10]    31.9    0.03   57.61    5000         pg_exec_query_dest [10]
>                 0.09   39.12    5000/5000        pg_parse_and_plan [13]
>                 0.00   17.37    5000/5000        ProcessQuery [36]
>                 0.02    0.60    5000/5000        SetQuerySnapshot [159]
>                 0.01    0.34    5000/5000        CommandCounterIncrement [196]
>                 0.02    0.05   10000/4291775     free [22]
> -----------------------------------------------
>                 0.09   39.12    5000/5000        pg_exec_query_dest [10]
> [13]    21.7    0.09   39.12    5000         pg_parse_and_plan [13]
>                 0.02   19.28    5000/5000        parser [26]
>                 0.03   17.63    5000/5000        planner [34]
>                 0.00    1.98    5000/5000        QueryRewrite [90]
>                 0.02    0.05   10000/4292947     malloc [21]
>                 0.02    0.05   10000/4291775     free [22]
>                 0.00    0.02    5000/497779      lappend [91]
>                 0.01    0.00    5000/5000        IsAbortedTransactionBlockState [529]
>                 0.00    0.00    5000/72779       length [342]
> 
> In other words, parsing, planning, and executing an INSERT each take
> about the same amount of time.
> 
> -----------------------------------------------
>                 0.01   39.83    5000/5000        PostgresMain [8]
> [11]    22.0    0.01   39.83    5000         CommitTransactionCommand [11]
>                 0.02   39.81    5000/5000        CommitTransaction [12]
> -----------------------------------------------
>                 0.02   39.81    5000/5000        CommitTransactionCommand [11]
> [12]    22.0    0.02   39.81    5000         CommitTransaction [12]
>                 0.00   38.12    5000/5000        AtCommit_Memory [16]
>                 0.01    1.10    5000/5000        RecordTransactionCommit [122]
>                 0.00    0.30    5000/5000        AtCommit_Locks [211]
>                 0.02    0.10    5000/5000        AtEOXact_portals [286]
>                 0.03    0.07    5000/5000        DestroyTempRels [305]
>                 0.03    0.00    5000/5000        _lo_commit [425]
>                 0.01    0.00    5000/5000        AtCommit_Notify [522]
>                 0.01    0.00    5000/5000        CloseSequences [523]
>                 0.01    0.00    5000/5000        RelationPurgeLocalRelation [531]
>                 0.00    0.00    5000/10000       AtCommit_Cache [750]
> -----------------------------------------------
>                 0.00   38.12    5000/5000        CommitTransaction [12]
> [16]    21.1    0.00   38.12    5000         AtCommit_Memory [16]
>                 0.01   38.11    5000/5000        EndPortalAllocMode [17]
>                 0.00    0.00    5000/10552       MemoryContextSwitchTo [747]
> -----------------------------------------------
>                 0.01   38.11    5000/5000        AtCommit_Memory [16]
> [17]    21.1    0.01   38.11    5000         EndPortalAllocMode [17]
>                 0.54   37.55    5000/5000        AllocSetReset [18]
>                 0.01    0.00    5000/10000       PortalHeapMemoryGetVariableMemory [449]
>                 0.01    0.00    5000/5000        FixedStackPop [528]
>                 0.00    0.00    5000/51434       MemoryContextFree [367]
> -----------------------------------------------
>                 0.54   37.55    5000/5000        EndPortalAllocMode [17]
> [18]    21.1    0.54   37.55    5000         AllocSetReset [18]
>                 1.47   32.51 4200327/4251762     AllocSetFree [20]
>                 1.04    2.53 4205327/4205327     AllocSetGetFirst [64]
> 
> In other words, essentially *all* of the CPU time spent in
> CommitTransaction is spent freeing memory.  That's probably why
> ganging the transactions doesn't help --- it's the same number of
> memory blocks getting allocated and freed.
> 
>                 0.02   19.28    5000/5000        pg_parse_and_plan [13]
> [26]    10.7    0.02   19.28    5000         parser [26]
>                 0.02   13.88    5000/5000        parse_analyze [41]
>                 1.52    3.81    5000/5000        yyparse [54]
>                 0.01    0.01    5000/5000        init_io [474]
>                 0.00    0.01    5000/420496      pstrdup [124]
>                 0.01    0.00    5000/5000        parser_init [533]
> 
> Thomas might know why parse_analyze is taking so much time compared to
> the rest of the parsing machinery...
> 
> 
> I won't bother the list with the rest of the profile, although I'll be
> happy to send it to anyone who wants to see all of it.  Our immediate
> performance problems seem pretty clear, however: time to rethink memory
> management.
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Patch (was: tough locale bug)

From
Bruce Momjian
Date:
As I remember, this was fixed already, without your patch, right?


[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Goran Thyni wrote:
> > 3. text_lt('G','G\0xFF') this is not correct!
> > 
> > Case 3 work not work with strcoll(), in varstr_cmp().
> > If I change it to strcoll() to strncmp() it works as expected,
> > but it probably breaks sorting etc big time.
> 
> Here is a hack which solves the problem without breaking sorting AFAIK.
> I have read a lot of code, but has not found any cleaner way to fix
> this.
> A cleaner solution would be to find the highest char in locale charset
> but I found no portable way to do that, any pointers appriciated.
> 
> It is not a beauty but it works. 
> Please apply to -current since it fixes the bug, 
> unless someone has a better suggestion.
> 
>     best regards,
> -- 
> -----------------
> G_ran Thyni
> This is Penguin Country. On a quiet night you can hear Windows NT
> reboot!

> diff -cr cvs/pgsql/src/backend/utils/adt/varlena.c cvswork/pgsql/src/backend/utils/adt/varlena.c
> *** cvs/pgsql/src/backend/utils/adt/varlena.c    Mon Dec 14 07:01:37 1998
> --- cvswork/pgsql/src/backend/utils/adt/varlena.c    Sun Jan 31 16:32:52 1999
> ***************
> *** 496,528 ****
>   varstr_cmp(char *arg1, int len1, char *arg2, int len2)
>   {
>       int            result;
> !     char       *a1p,
> !                *a2p;
> ! 
>   #ifdef USE_LOCALE
> !     a1p = (unsigned char *) palloc(len1 + 1);
> !     a2p = (unsigned char *) palloc(len2 + 1);
> ! 
> !     memcpy(a1p, arg1, len1);
> !     *(a1p + len1) = '\0';
> !     memcpy(a2p, arg2, len2);
> !     *(a2p + len2) = '\0';
> ! 
> !     result = strcoll(a1p, a2p);
> ! 
> !     pfree(a1p);
> !     pfree(a2p);
> ! 
> ! #else
> ! 
> !     a1p = arg1;
> !     a2p = arg2;
> ! 
> !     result = strncmp(a1p, a2p, Min(len1, len2));
> !     if ((result == 0) && (len1 != len2))
> !         result = (len1 < len2) ? -1 : 1;
>   #endif
> ! 
>       return result;
>   }    /* varstr_cmp() */
>   
> --- 496,524 ----
>   varstr_cmp(char *arg1, int len1, char *arg2, int len2)
>   {
>       int            result;
> !     char       *a1p, *a2p;
>   #ifdef USE_LOCALE
> !     if ((int)arg2[len2 - 1] != -1)
> !       {
> !         a1p = (unsigned char *) palloc(len1 + 1);
> !         a2p = (unsigned char *) palloc(len2 + 1);
> !         memcpy(a1p, arg1, len1);
> !         *(a1p + len1) = '\0';
> !         memcpy(a2p, arg2, len2);
> !         *(a2p + len2) = '\0';
> !         result = strcoll(a1p, a2p);
> !         pfree(a1p);
> !         pfree(a2p);
> !       }
> !     else
>   #endif
> !       {
> !         a1p = arg1;
> !         a2p = arg2;
> !         result = strncmp(a1p, a2p, Min(len1, len2));
> !         if ((result == 0) && (len1 != len2))
> !         result = (len1 < len2) ? -1 : 1;
> !       }
>       return result;
>   }    /* varstr_cmp() */
>   


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
Can someone remind me where we left this memory context stuff?



> > I've noticed that everyone else contributing to this thread has been
> > thinking in terms of inventing multiple allocation functions with
> > different names, ie a routine might have to call "palloc" or
> > "fast_palloc" or "tmp_palloc" or whatever depending on what behavior it
> > wants.  I really think we are better off to stick to the structure that
> > we already have (cf. include/nodes/memnodes.h and include/utils/mcxt.h),
> > in which there's one set of interface routines that take a "context"
> > parameter, and the context determines the semantics of allocation.
> > (The interface routines probably ought to be macros, not out-of-line
> > code, but that's a minor optimization.)  This will make it easier to
> > handle cases where one routine has to tell another one how to allocate
> > its result space: you pass a MemoryContext parameter.
> 
> I agree, inline is good.  I can imagine something like makeNode is going
> to be tough.
> 
> I am going to give the parser as an example.  It palloc's all sorts of
> things for rangetables, strings, structures, lists, but eventually just
> exports a huge Query structure.
> 
> If we could somehow just grab copy Query structure and free all the
> other parser palloc allocations, that would be great.
> 
> Same thing with the optimizer.  It makes all sorts of lists, but
> eventuall just exports Plan's.  All the other stuff can be pfree'ed.
> 
> It is almost as though we could copy the Query structure to another
> memory context, and just fee all palloc'ed memory.
> 
> What we could try is to keep the various memory contexts like cache
> context unchanged, but allow pallocs in the standard memory context to
> be allocated from a pool and free'ed automatically.  Our standard memory
> context is overused and doesn't need the pfree overhead.  We could make
> palloc a macro that tested for the current context, and did standard
> pallocs from a free memory pool.
> 
> -- 
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, I assume this has all been corrected, right?

The recv() problem is fixed, and Jan took a swipe at reducing the cost
of palloc/pfree.  I have not done a profile since Jan's changes to see
how successful he was.

I would like to revisit memory management at some point, but it won't
happen for 6.5.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Can someone remind me where we left this memory context stuff?

Jan put in some changes to allow palloc()'d space to be freed more
cheaply.  The more sweeping rewrite that I was suggesting is not done,
but perhaps we can reconsider it for 6.6 or sometime later.  I don't
think we need to worry about it for 6.5.
        regards, tom lane


Re: [HACKERS] Postgres Speed or lack thereof

From
jwieck@debis.com (Jan Wieck)
Date:
>
> Can someone remind me where we left this memory context stuff?
>

    I've  changed  AllocSetAlloc()  and  friends  to manage small
    allocations in bigger blocks together  where  the  sizes  are
    round up to powers of two and free's chunks in the blocks are
    reused in place. The entire blocks get only free'd when  they
    get completely free.

    Then  I've  changed  palloc() etc. into macros (which started
    the discussion on mcxt.h - sorry).

    Overall  speedup  from  the  changes  (for   regression)   is
    something about 10-15%.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
> >
> > Can someone remind me where we left this memory context stuff?
> >
> 
>     I've  changed  AllocSetAlloc()  and  friends  to manage small
>     allocations in bigger blocks together  where  the  sizes  are
>     round up to powers of two and free's chunks in the blocks are
>     reused in place. The entire blocks get only free'd when  they
>     get completely free.
> 
>     Then  I've  changed  palloc() etc. into macros (which started
>     the discussion on mcxt.h - sorry).
> 
>     Overall  speedup  from  the  changes  (for   regression)   is
>     something about 10-15%.
> 

Great.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:

This from January 24, 1999.  Is it still relivant, and something that
should be on the TODO list?


> Tom Lane wrote:
> > 
> > Having an idle hour this evening, I thought it'd be interesting to build
> > a backend with profiling enabled, so as to confirm or deny the above
> > guess.  It seems that indeed a lot of time is being wasted, but where
> > it's being wasted might surprise you!
> ...
> > In other words, we're spending a third of our time mallocing and freeing
> > memory.  A tad high, what?
> > 
> > Actually, it's worse than that, because AllocSetAlloc,
> > PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
> > routines represent our own bookkeeping layer atop malloc/free.
> > That's another 18.66 seconds spent in these top routines, which means
> > that we are real close to expending half the backend's runtime on
> > memory bookkeeping.  This needs work.
> >
> 
> Yes, it's suprizing!
> 
> I added some debug code to palloc/pfree and it shows that for
> INSERT:
> 
> 1. 80% of allocations are made for <= 32 bytes.
> 2. pfree is used for 25% of them only (others are freed
>    after statement/transaction is done).
> 
> Note that our mmgr adds 16 bytes to each allocation
> (+ some bytes in malloc) - a great overhead, yes?
> 
> I added code to allocate a few big (16K-64K) blocks
> of memory for these small allocations to speed up 
> palloc by skiping AllocSetAlloc/malloc. New code
> don't free allocated memory (to make bookkeeping fast)
> but keeping in mind 2. above and memory overhead
> it seems as appropriate thing to do. These code also
> speed up freeing when statement/transaction is done,
> because of only a few blocks have to be freed now.
> 
> I did 5000 INSERTS (into tables with 3 ints and 33 ints)
> with BEGIN/END, -F and -B 512 (I run postgres directly,
> without postmaster). User times:
> 
>             old        new
> -----------------------------------------
> table with 3 ints     9.7 sec     7.6 sec
> table with 33 ints    59.5 sec    39.9 sec
> 
> So, new code 20%-30% faster. Process sizes are the same.
> 
> Tom, could you run new code under profiling?
> 
> There are still some things to do:
> 
> 1. SELECT/UPDATE/DELETE often palloc/pfree tuples
>    (sizes are > 32 bytes), but pfree now requires 
>    additional lookup to see is memory allocated by 
>    AllocSetAlloc or new code. We can avoid this.
> 
> 2. Index scans palloc/pfree IndexResult for each
>    tuple returned by index. This one was annoying me
>    for long time. IndexResult should be part of
>    IndexScan structure...
> 
> 3. psort uses leftist structure (16 bytes) when
>    disk is used for sorting. Obviously, big block
>    allocation should be used by lselect code itself.
> 
> 4. Actually, new mode shouldn't be used by Executor
>    in some cases.
> 
> I'll address this in a few days...
> 
> BTW, look at memutils.h: new code is like "tunable"
> mode described there.
> 
> > 
> > In other words, essentially *all* of the CPU time spent in
> > CommitTransaction is spent freeing memory.  That's probably why
> > ganging the transactions doesn't help --- it's the same number of
> > memory blocks getting allocated and freed.
> 
> It shows that we should get rid of system malloc/free and do
> all things in mmgr itself - this would allow us much faster
> free memory contexts at statement/transaction end.
> 
> Vadim
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Bruce Momjian
Date:
I have added this to the TODO list:

* improve dynamic memory allocation by introducing tuple-context memory  allocation
* add pooled memory allocation where allocations are freed only as a group 



> We could use 4 methods for dynamic allocation:
> 
> 1)    malloc/free - for persistent storage allocation
> 
> 2)    palloc/pfree - for storage belonging to some context and
>     which we can keep track of and free explicitly
> 
> 3)    fast_palloc - for storage which impossible, too difficult or too
>     expensive to keep track of. This storage should be allocated with
>     fast and simple inline code from bigger chunks allocated with palloc.
>     This storage would never freed explicitly, so that code could be
>     simple and fast, but the big chunks would be freed automatically at
>     the end of the transaction.
> 
> 4)    fast_talloc - we could introduce a `tuple' context handled like
>     fast_palloc for storage used only while processing one tuple.
>     This storage could be fast allocated from few big chunks allocated
>     with palloc and freed explicitly after the tuple has been processed.
>     This could avoid the overhead of many malloc/palloc while reducing
>     the overall memory usage for transaction which process many rows.
>     The total cost per tuple could be one palloc and one pfree.
>     We could also simply reuse the chunks for every tuple and pfree them 
>     only at the end of the transaction. This would cost one palloc/pfree
>     per transaction.
> 
> This would require revising the code and changing palloc/pfree with the new
> functions where appropriate, but this could be done gradually because the
> old palloc/pfree are always safe.
> 
> -- 
> Massimo Dal Zotto
> 
> +----------------------------------------------------------------------+
> |  Massimo Dal Zotto               email: dz@cs.unitn.it               |
> |  Via Marconi, 141                phone: ++39-0461534251              |
> |  38057 Pergine Valsugana (TN)      www: http://www.cs.unitn.it/~dz/  |
> |  Italy                             pgp: finger dz@tango.cs.unitn.it  |
> +----------------------------------------------------------------------+
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres Speed or lack thereof

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> This from January 24, 1999.  Is it still relivant, and something that
> should be on the TODO list?

Jan's rewrite of alloc sets has probably taken care of the speed issue.
Of course we still need to look at how memory allocation is being
*used*, in order to fix the within-statement memory leak problems.
But that's not what I was complaining about here.
        regards, tom lane

>> Tom Lane wrote:
>>>> 
>>>> Having an idle hour this evening, I thought it'd be interesting to build
>>>> a backend with profiling enabled, so as to confirm or deny the above
>>>> guess.  It seems that indeed a lot of time is being wasted, but where
>>>> it's being wasted might surprise you!
>> ...
>>>> In other words, we're spending a third of our time mallocing and freeing
>>>> memory.  A tad high, what?
>>>> 
>>>> Actually, it's worse than that, because AllocSetAlloc,
>>>> PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
>>>> routines represent our own bookkeeping layer atop malloc/free.
>>>> That's another 18.66 seconds spent in these top routines, which means
>>>> that we are real close to expending half the backend's runtime on
>>>> memory bookkeeping.  This needs work.
>>>> 
>> 
>> Yes, it's suprizing!
>> 
>> I added some debug code to palloc/pfree and it shows that for
>> INSERT:
>> 
>> 1. 80% of allocations are made for <= 32 bytes.
>> 2. pfree is used for 25% of them only (others are freed
>> after statement/transaction is done).
>> 
>> Note that our mmgr adds 16 bytes to each allocation
>> (+ some bytes in malloc) - a great overhead, yes?
>> 
>> I added code to allocate a few big (16K-64K) blocks
>> of memory for these small allocations to speed up 
>> palloc by skiping AllocSetAlloc/malloc. New code
>> don't free allocated memory (to make bookkeeping fast)
>> but keeping in mind 2. above and memory overhead
>> it seems as appropriate thing to do. These code also
>> speed up freeing when statement/transaction is done,
>> because of only a few blocks have to be freed now.
>> 
>> I did 5000 INSERTS (into tables with 3 ints and 33 ints)
>> with BEGIN/END, -F and -B 512 (I run postgres directly,
>> without postmaster). User times:
>> 
>> old        new
>> -----------------------------------------
>> table with 3 ints     9.7 sec     7.6 sec
>> table with 33 ints    59.5 sec    39.9 sec
>> 
>> So, new code 20%-30% faster. Process sizes are the same.
>> 
>> Tom, could you run new code under profiling?
>> 
>> There are still some things to do:
>> 
>> 1. SELECT/UPDATE/DELETE often palloc/pfree tuples
>> (sizes are > 32 bytes), but pfree now requires 
>> additional lookup to see is memory allocated by 
>> AllocSetAlloc or new code. We can avoid this.
>> 
>> 2. Index scans palloc/pfree IndexResult for each
>> tuple returned by index. This one was annoying me
>> for long time. IndexResult should be part of
>> IndexScan structure...
>> 
>> 3. psort uses leftist structure (16 bytes) when
>> disk is used for sorting. Obviously, big block
>> allocation should be used by lselect code itself.
>> 
>> 4. Actually, new mode shouldn't be used by Executor
>> in some cases.
>> 
>> I'll address this in a few days...
>> 
>> BTW, look at memutils.h: new code is like "tunable"
>> mode described there.
>> 
>>>> 
>>>> In other words, essentially *all* of the CPU time spent in
>>>> CommitTransaction is spent freeing memory.  That's probably why
>>>> ganging the transactions doesn't help --- it's the same number of
>>>> memory blocks getting allocated and freed.
>> 
>> It shows that we should get rid of system malloc/free and do
>> all things in mmgr itself - this would allow us much faster
>> free memory contexts at statement/transaction end.
>> 
>> Vadim
>> 
>> 


> -- 
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026