Thread: memory dilemma
Hi, I have dilemma with PG's memory. I'am finishing with to_char() implementation and I try use internal cache buffer in this routines. This cache is used if a to_char() format-picture (which going to the internal to_char parser) is equal as previous and to_char's parser is skiped. It is very good, because speed rise (20%). A problem is how implement this cache: via palloc - It is standard in PG, but it is problem, because memory contents is not persisten across transactions. AndI don't know how check when memory is free (lose) and a routine must reallocs memory again (if transaction finish PG memorymanagementnot zeroizing (reset) memory and any "if( buffer )" still affects as good memory). via malloc - (Now used). It is good, because buffer is persistent.This variant is (example) use in regexp utils in PG now.But is it nice? via a static buffer - but how long? Terrible. Or set any default size for this buffer, and if format-picture will bigger- usepallocated memory and not use cache buffer. (It is my favourite variant.) not use cache - hmm.. but I like fast routines (my currentto_char() implementation is faster (20-50%) than current date_part()). Any idea? Please. Karel PS. IMHO - add to PostgreSQL any *across transactions persistent* memory managemet? ---------------------------------------------------------------------- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ Docs: http://docs.linux.cz (big docs archive) Kim Project: http://home.zf.jcu.cz/~zakkr/kim/ (process manager) FTP: ftp://ftp2.zf.jcu.cz/users/zakkr/ (C/ncurses/PgSQL) -----------------------------------------------------------------------
At 01:40 PM 12/27/99 +0100, Karel Zak - Zakkr wrote: > not use cache - hmm.. but I like fast routines (my current > to_char() implementation is faster (20-50%) than current > date_part()). While fast routines are nice indeed, isn't it true in practice that to_char() times will be swamped by the amount of time to parse, plan, and execute a query in most cases? Trivial cases like "select to_char('now'::datetime,...)" can't in general be cached anyway, since 'now' is always changing... Your caching code needs to guarantee that it can't leak memory in any circumstance. In environments where database servers run 24/7 that's far more important than minor increases in speed. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
On Thu, 9 Dec 1999, Don Baccus wrote: > At 01:40 PM 12/27/99 +0100, Karel Zak - Zakkr wrote: > > > not use cache - hmm.. but I like fast routines (my current > > to_char() implementation is faster (20-50%) than current > > date_part()). > > While fast routines are nice indeed, isn't it true in practice > that to_char() times will be swamped by the amount of time to > parse, plan, and execute a query in most cases? Sorry, but it is not good argument. If any routine (in the query path) spend time is not interesting write (other) fast routine? No, we must try rewrite this slowly part to faster version. *Very* simpl test over 10000 rows: $ time psql test -c "select date_part('second', d) from dtest;" -o /dev/null real 0m0.504s user 0m0.100s sys 0m0.000s $ time psql test -c "select to_char(d, 'SI') from dtest;" -o /dev/null real 0m0.288s user 0m0.100s sys 0m0.000s > Trivial cases like "select to_char('now'::datetime,...)" can't in > general be cached anyway, since 'now' is always changing... No, you not understend me. I want cached 'format-picture': run 10000 x select to_char(datetime, 'HH24:MI:SI FMMonth YYYY'); yes, 'datetime' can always changing, but 'HH24:MI:SI FMMonth YYYY' not, and this format-picture must be always parsed. It is terrible always call to_char() parser, if I can use cache for it. > Your caching code needs to guarantee that it can't leak memory > in any circumstance. In environments where database servers > run 24/7 that's far more important than minor increases in > speed. Yes, I agree - robus SQL is more importent, but always say "speed is not interesting, we can robus only" is way to robus-snail-SQL. I want nice-robus-fast-SQL :-) Karel
Karel Zak - Zakkr <zakkr@zf.jcu.cz> writes: > not use cache - hmm.. but I like fast routines (my current > to_char() implementation is faster (20-50%) than current > date_part()). I like that one. Anything else is a potential memory leak, and I really find it hard to believe that the speed of to_char() itself is going to be a critical factor in a real-world application. You have client-to- backend communication, parsing, planning, I/O, etc that are all going to swamp out the cost of a single function. regards, tom lane
Karel Zak - Zakkr <zakkr@zf.jcu.cz> writes: > *Very* simpl test over 10000 rows: > $ time psql test -c "select date_part('second', d) > from dtest;" -o /dev/null > real 0m0.504s > user 0m0.100s > sys 0m0.000s > $ time psql test -c "select to_char(d, 'SI') from > dtest;" -o /dev/null > real 0m0.288s > user 0m0.100s > sys 0m0.000s That isn't necessarily an impressive demonstration --- what is the data type of your "d" column? Four of the six variants of date_part() are implemented as SQL functions, which naturally adds a lot of overhead... regards, tom lane
At 02:26 PM 12/27/99 +0100, Karel Zak - Zakkr wrote: >Sorry, but it is not good argument. If any routine (in the query path) >spend time is not interesting write (other) fast routine? No, we must >try rewrite this slowly part to faster version. > >*Very* simpl test over 10000 rows: > >$ time psql test -c "select date_part('second', d) >from dtest;" -o /dev/null > >real 0m0.504s >user 0m0.100s >sys 0m0.000s > >$ time psql test -c "select to_char(d, 'SI') from >dtest;" -o /dev/null > >real 0m0.288s >user 0m0.100s >sys 0m0.000s This would seem to be a great argument to investigate why date_part's so much slower. However, it says nothing about the times saving of caching vs. not caching. A more interesting comparison, more germane to the point under discussion, would be: time psql test -c "select d from dtest;" In other words, how much overhead does "to_char" add? That's what you need to look at if you want to measure whether or not caching's worth it. Caching the parse of the format string will save a percentage of the to_char overhead, but a test like the above will at least help you get a handle on how much overhead the format string parse adds. >> Your caching code needs to guarantee that it can't leak memory >> in any circumstance. In environments where database servers >> run 24/7 that's far more important than minor increases in >> speed. >Yes, I agree - robus SQL is more importent, but always say "speed is not >interesting, we can robus only" is way to robus-snail-SQL. Which, of course, isn't what I said...after all, I've spent most of my adult life writing highly optimizing compilers. I merely asked if a typical query wouldn't swamp any savings that caching the parse of a format string might yield. >I want nice-robus-fast-SQL :-) Sure, but given the great disparity between "date_part" and your initial "to_char" implementation, more people might see a more significant speed-up if you spent time speeding up "date_part"... of course, if caching greatly speeds up queries using "to_char" then it's probably worth doing, but at least try to measure first before adding the complication. At least, that's how I tend to work... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
On Mon, 27 Dec 1999, Tom Lane wrote: > That isn't necessarily an impressive demonstration --- what is the data > type of your "d" column? Four of the six variants of date_part() are > implemented as SQL functions, which naturally adds a lot of overhead... Sorry. I better describe problem now. The test-table 'tab': CRAETE TABLE tab (d datetime); The 'tab' contain _random_ datetime values (generate via my program rand_datetime - it is in PG's contrib/dateformat/test). In this table is 10000 rows. Test: time psql test -c "select d from tab;" -o /dev/null real 0m0.530s user 0m0.060s sys 0m0.020s time psql test -c "select date_part('second', d) from tab;" -o /dev/null real 0m0.494s user 0m0.060s sys 0m0.030s time psql test -c "select to_char(d, 'SS') from tab;" -o /dev/null real 0m0.368s user 0m0.080s sys 0m0.000s (to_char() is a little slowly now (than in previous test), because I rewrite any parts) This comparison is *not* show cache effect. This test show (probably) better searching and datetime part extraction in to_char(). Cache has effect for long and complicated 'format-picture' in to_char(). With cache (Cache has implement via malloc/free.) : ~~~~~~~~~~ time psql test -c "select to_char(d, 'HH12:MI:SS YYYY FMMonth Day') from tab;" -o /dev/null real 0m0.545s user 0m0.060s sys 0m0.010s Without cache: ~~~~~~~~~~~~~ time psql test -c "select to_char(d, 'HH12:MI:SS YYYY FMMonth Day') from tab;" -o /dev/null real 0m0.638s user 0m0.060s sys 0m0.010s Hmm.. my internal to_char() parser is very fast (0.100s for 10000 calls only) :-)) Thank for all suggestion. I finaly use in to_char() cache via static buffer, and if format-picture will bigger than this buffer, to_char will work as without cache. This solution eliminate memory leak - this solution is used in current datetime routines. It is good compromise. I plan in future make small changes in datetime routines. The to_char is probably fastly, because it use better search algorithm (has a simple index for scanned array). The date_part() will fast too :-) - A last (PG's novice) question - how problem appear if PG is compilate with (gcc) -O3 optimalization? Or why is not used in PG 'inline' function declaration? Karel
At 11:28 AM 12/28/99 +0100, Karel Zak - Zakkr wrote: >Thank for all suggestion. I finaly use in to_char() cache via static buffer, >and if format-picture will bigger than this buffer, to_char will work as >without cache. This solution eliminate memory leak - this solution is used >in current datetime routines. It is good compromise. Seems simple and safe, yes. My objection was really due to my concern over memory leaks. The "to_char()" function will be a great help to those porting over Oracle applications to Postgres. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.