Thread: memory dilemma

memory dilemma

From
Karel Zak - Zakkr
Date:
Hi,


I have dilemma with PG's memory. I'am finishing with to_char()
implementation and I try use internal cache buffer in this routines. 
This cache is used if a to_char() format-picture (which going to the 
internal to_char parser) is equal as previous and to_char's parser is 
skiped. It is very good, because speed rise (20%).

A problem is how implement this cache:
via palloc -  It is standard in PG, but it is problem, because memory contents is not persisten across transactions.
AndI don't know how check when memory is free (lose) and a routine must reallocs memory again (if transaction finish PG
memorymanagementnot zeroizing (reset) memory and any "if( buffer )" still affects as good memory).     
 
via malloc - (Now used). It is good, because buffer is persistent.This variant is (example) use in regexp utils in PG
now.But is it nice?     
 
via a static buffer - but how long? Terrible. Or set any default size for this buffer, and if format-picture will
bigger- usepallocated memory and not use cache buffer. (It is my favourite variant.)  
 
not use cache - hmm.. but I like fast routines (my currentto_char() implementation is faster (20-50%) than current
date_part()).  
 

Any idea? Please.    
                    Karel

PS. IMHO - add to PostgreSQL any *across transactions persistent*    memory managemet?

----------------------------------------------------------------------
Karel Zak <zakkr@zf.jcu.cz>              http://home.zf.jcu.cz/~zakkr/

Docs:        http://docs.linux.cz                    (big docs archive)    
Kim Project: http://home.zf.jcu.cz/~zakkr/kim/        (process manager)
FTP:         ftp://ftp2.zf.jcu.cz/users/zakkr/        (C/ncurses/PgSQL)
-----------------------------------------------------------------------



Re: [HACKERS] memory dilemma

From
Don Baccus
Date:
At 01:40 PM 12/27/99 +0100, Karel Zak - Zakkr wrote:

>    not use cache - hmm.. but I like fast routines (my current
>    to_char() implementation is faster (20-50%) than current 
>    date_part()).

While fast routines are nice indeed, isn't it true in practice
that to_char() times will be swamped by the amount of time to
parse, plan, and execute a query in most cases?

Trivial cases like "select to_char('now'::datetime,...)" can't in
general be cached anyway, since 'now' is always changing...

Your caching code needs to guarantee that it can't leak memory
in any circumstance.  In environments where database servers
run 24/7 that's far more important than minor increases in
speed.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] memory dilemma

From
Karel Zak - Zakkr
Date:
On Thu, 9 Dec 1999, Don Baccus wrote:

> At 01:40 PM 12/27/99 +0100, Karel Zak - Zakkr wrote:
> 
> >    not use cache - hmm.. but I like fast routines (my current
> >    to_char() implementation is faster (20-50%) than current 
> >    date_part()).
> 
> While fast routines are nice indeed, isn't it true in practice
> that to_char() times will be swamped by the amount of time to
> parse, plan, and execute a query in most cases?

Sorry, but it is not good argument. If any routine (in the query path)  
spend time is not interesting write (other) fast routine? No, we must 
try rewrite this slowly part to faster version.

*Very* simpl test over 10000 rows:

$ time psql test -c  "select date_part('second', d)
from dtest;"  -o /dev/null

real    0m0.504s
user    0m0.100s
sys     0m0.000s

$ time psql test -c  "select to_char(d, 'SI') from
dtest;"  -o /dev/null

real    0m0.288s
user    0m0.100s
sys     0m0.000s


> Trivial cases like "select to_char('now'::datetime,...)" can't in
> general be cached anyway, since 'now' is always changing...

No, you not understend me. I want cached 'format-picture':

run 10000 x        select to_char(datetime, 'HH24:MI:SI FMMonth YYYY'); 

yes, 'datetime' can always changing, but 'HH24:MI:SI FMMonth YYYY' not, 
and this format-picture must be always parsed. It is terrible always call
to_char() parser, if I can use cache for it.

> Your caching code needs to guarantee that it can't leak memory
> in any circumstance.  In environments where database servers
> run 24/7 that's far more important than minor increases in
> speed.


Yes, I agree - robus SQL is more importent, but always say "speed is not
interesting, we can robus only" is way to robus-snail-SQL. 

I want nice-robus-fast-SQL :-)                    Karel



Re: [HACKERS] memory dilemma

From
Tom Lane
Date:
Karel Zak - Zakkr <zakkr@zf.jcu.cz> writes:
>     not use cache - hmm.. but I like fast routines (my current
>     to_char() implementation is faster (20-50%) than current 
>     date_part()).

I like that one.  Anything else is a potential memory leak, and I really
find it hard to believe that the speed of to_char() itself is going to
be a critical factor in a real-world application.  You have client-to-
backend communication, parsing, planning, I/O, etc that are all going
to swamp out the cost of a single function.
        regards, tom lane


Re: [HACKERS] memory dilemma

From
Tom Lane
Date:
Karel Zak - Zakkr <zakkr@zf.jcu.cz> writes:
> *Very* simpl test over 10000 rows:

> $ time psql test -c  "select date_part('second', d)
> from dtest;"  -o /dev/null

> real    0m0.504s
> user    0m0.100s
> sys     0m0.000s

> $ time psql test -c  "select to_char(d, 'SI') from
> dtest;"  -o /dev/null

> real    0m0.288s
> user    0m0.100s
> sys     0m0.000s

That isn't necessarily an impressive demonstration --- what is the data
type of your "d" column?  Four of the six variants of date_part() are
implemented as SQL functions, which naturally adds a lot of overhead...
        regards, tom lane


Re: [HACKERS] memory dilemma

From
Don Baccus
Date:
At 02:26 PM 12/27/99 +0100, Karel Zak - Zakkr wrote:

>Sorry, but it is not good argument. If any routine (in the query path)  
>spend time is not interesting write (other) fast routine? No, we must 
>try rewrite this slowly part to faster version.
>
>*Very* simpl test over 10000 rows:
>
>$ time psql test -c  "select date_part('second', d)
>from dtest;"  -o /dev/null
>
>real    0m0.504s
>user    0m0.100s
>sys     0m0.000s
>
>$ time psql test -c  "select to_char(d, 'SI') from
>dtest;"  -o /dev/null
>
>real    0m0.288s
>user    0m0.100s
>sys     0m0.000s

This would seem to be a great argument to investigate why
date_part's so much slower.  However, it says nothing about
the times saving of caching vs. not caching.

A more interesting comparison, more germane to the point under
discussion, would be:

time psql test -c "select d from dtest;"

In other words, how much overhead does "to_char" add?  That's what
you need to look at if you want to measure whether or not caching's
worth it.  Caching the parse of the format string will save a 
percentage of the to_char overhead, but a test like the above
will at least help you get a handle on how much overhead the
format string parse adds.

>> Your caching code needs to guarantee that it can't leak memory
>> in any circumstance.  In environments where database servers
>> run 24/7 that's far more important than minor increases in
>> speed.

>Yes, I agree - robus SQL is more importent, but always say "speed is not
>interesting, we can robus only" is way to robus-snail-SQL. 

Which, of course, isn't what I said...after all, I've spent most of
my adult life writing highly optimizing compilers.  I merely asked
if a typical query wouldn't swamp any savings that caching the
parse of a format string might yield.

>I want nice-robus-fast-SQL :-)

Sure, but given the great disparity between "date_part" and your
initial "to_char" implementation, more people might see a more
significant speed-up if you spent time speeding up "date_part"...
of course, if caching greatly speeds up queries using "to_char"
then it's probably worth doing, but at least try to measure first
before adding the complication.

At least, that's how I tend to work...




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] memory dilemma

From
Karel Zak - Zakkr
Date:

On Mon, 27 Dec 1999, Tom Lane wrote:

> That isn't necessarily an impressive demonstration --- what is the data
> type of your "d" column?  Four of the six variants of date_part() are
> implemented as SQL functions, which naturally adds a lot of overhead...


Sorry. I better describe problem now. 
The test-table 'tab':
CRAETE TABLE tab (d datetime);
The 'tab' contain _random_ datetime values (generate via my program
rand_datetime - it is in PG's contrib/dateformat/test). In this table 
is 10000 rows.

Test:

time psql test -c "select d from tab;" -o /dev/null

real    0m0.530s
user    0m0.060s
sys     0m0.020s

time psql test -c "select date_part('second', d) from tab;" -o /dev/null

real    0m0.494s
user    0m0.060s
sys     0m0.030s

time psql test -c "select to_char(d, 'SS') from tab;" -o /dev/null

real    0m0.368s
user    0m0.080s
sys     0m0.000s

(to_char() is a little slowly now (than in previous test), because I rewrite 
any parts)     
This comparison is *not* show cache effect. This test show (probably) better
searching and datetime part extraction in to_char().


Cache has effect for long and complicated 'format-picture' in to_char().

With cache (Cache has implement via malloc/free.) :
~~~~~~~~~~
time psql test -c "select to_char(d, 'HH12:MI:SS YYYY FMMonth Day') from
tab;" -o /dev/null

real    0m0.545s
user    0m0.060s
sys     0m0.010s

Without cache:
~~~~~~~~~~~~~
time psql test -c "select to_char(d, 'HH12:MI:SS YYYY FMMonth Day') from
tab;" -o /dev/null

real    0m0.638s
user    0m0.060s
sys     0m0.010s 

Hmm.. my internal to_char() parser is very fast (0.100s for 10000 
calls only) :-))


Thank for all suggestion. I finaly use in to_char() cache via static buffer,
and if format-picture will bigger than this buffer, to_char will work as
without cache. This solution eliminate memory leak - this solution is used
in current datetime routines. It is good compromise.

I plan in future make small changes in datetime routines. The to_char is
probably fastly, because it use better search algorithm (has a simple index 
for scanned array). The date_part() will fast too :-)

-
A last (PG's novice) question - how problem appear if PG is compilate with 
(gcc) -O3 optimalization? Or why is not used in PG 'inline' function
declaration?  
                        Karel
 



Re: [HACKERS] memory dilemma

From
Don Baccus
Date:
At 11:28 AM 12/28/99 +0100, Karel Zak - Zakkr wrote:
>Thank for all suggestion. I finaly use in to_char() cache via static buffer,
>and if format-picture will bigger than this buffer, to_char will work as
>without cache. This solution eliminate memory leak - this solution is used
>in current datetime routines. It is good compromise.

Seems simple and safe, yes.  My objection was really due to my concern
over memory leaks.  The "to_char()" function will be a great help to
those porting over Oracle applications to Postgres.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.