Thread: Re: [PATCHES] serverlog rotation/functions

Re: [PATCHES] serverlog rotation/functions

From
Bruce Momjian
Date:
OK, I talked to Tom about this patch and I understand the issues now.

I think the best solution will be to have the postmaster start a child
process that can read the guc variables and create a log file based on
it contents.  The child would be responsible to create a new log file
every X seconds as specified in the postgresql.conf file.

Backends wishing to read the log file would call a function to list the
contents of a directory.  We can do this by creating a generic backend
super-user-only function that can list the contents of any directory. 
Then you would use an API to read a specific file, similar to the log
reading functions you already have.  In fact, you could set up the API
so reading a directory would return a list of directory names so you
don't need a separate directory listing function.  (Does your API return
file contents as one big text string or as lines.  If you return it as
one big text string, the directory idea will not work and you will need
a separate function.)

So, we have:
o   use pipe and dup to copy stdout and stderr to your    postmaster childo   new guc variables to specify log
destinationand rotation timeso   a server-side function to force a log rotateo   API to list a directory and show file
contents

With this functionality, you can have clients listing the log directory
and choosing the most recent log file or previous ones.  You could even
configure it to remove old log files.

Both Tom and I believe this is a more generic and reliable solution.

---------------------------------------------------------------------------

Andreas Pflug wrote:
> Bruce Momjian wrote:
> > 
> > Also there are no documenttion changes.
> 
> Here are the missing docs, freshly created against cvs.
> 
> Regards,
> Andreas
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: [PATCHES] serverlog rotation/functions

From
Andreas Pflug
Date:
Bruce Momjian wrote:
> OK, I talked to Tom about this patch and I understand the issues now.
> 
> I think the best solution will be to have the postmaster start a child
> process that can read the guc variables and create a log file based on
> it contents.  The child would be responsible to create a new log file
> every X seconds as specified in the postgresql.conf file.
> 

OK, next round.

> Backends wishing to read the log file would call a function to list the
> contents of a directory.  We can do this by creating a generic backend
> super-user-only function that can list the contents of any directory. 

If logfiles are located using directory listings, this would mean we 
should have well-known filename patterns instead of the current template 
stuff. So only the serverlog directory name would be selectable, if that  makes still sense (I'd opt to drop that).

> Then you would use an API to read a specific file, similar to the log
> reading functions you already have.  In fact, you could set up the API
> so reading a directory would return a list of directory names so you
> don't need a separate directory listing function.  (Does your API return
> file contents as one big text string or as lines.  If you return it as
> one big text string, the directory idea will not work and you will need
> a separate function.)

We'd better have distinct functions. The current is for reading large 
chunks of data without interpretation of contents, and dir listing might  need tweaking for win32.
Now I wonder if it's better to have generic superuser-only functions for 
list/read/write/delete (this would enable client-side editing of 
postgresql.conf), or tie the functions to logfiles only (in which case 
the logfile functions shouldn't use filenames but logfile timestamps).

I still believe the log process should write the current logfile 
filename to shared-mem (write-only), so backends can prevent deleting it.


> 
> So, we have:
> 
>     o   use pipe and dup to copy stdout and stderr to your
>         postmaster child
>     o   new guc variables to specify log destination and rotation times
>     o   a server-side function to force a log rotate
>     o   API to list a directory and show file contents

I'll start on the log process.

Regards,
Andreas




Re: [PATCHES] serverlog rotation/functions

From
Bruce Momjian
Date:
Andreas Pflug wrote:
> Bruce Momjian wrote:
> > OK, I talked to Tom about this patch and I understand the issues now.
> > 
> > I think the best solution will be to have the postmaster start a child
> > process that can read the guc variables and create a log file based on
> > it contents.  The child would be responsible to create a new log file
> > every X seconds as specified in the postgresql.conf file.
> > 
> 
> OK, next round.
> 
> > Backends wishing to read the log file would call a function to list the
> > contents of a directory.  We can do this by creating a generic backend
> > super-user-only function that can list the contents of any directory. 
> 
> If logfiles are located using directory listings, this would mean we 
> should have well-known filename patterns instead of the current template 
> stuff. So only the serverlog directory name would be selectable, if that 
>   makes still sense (I'd opt to drop that).

I don't see any reason to have a pattern though I suppose if you mix
pgsql log files in with other log files it might be a problem.  One idea
would be for the client-side program to do some processing like this:
SELECT *FROM dir_listing('/var/log') AS dirWHERE dir LIKE 'pgsql_%'

or something like that where the client pulls apart the directory
specifiction like this:
log_dest = '/var/log/postgresql.log.%Y-%m-%d_%H%M%S'

You do something that splits the value into directory name and file name
and removes every letter after %.
/var/logpostgresql.log.%-%-%_%%%

Another idea is to allow filename wildcards in the listing so it
becomes:
SELECT *FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir

While that is nice, it doesn't match the functionality of opendir so we
are perhaps better with one that doesn't handle wildcards and we just do
the wildcard processing in the WHERE clause.

> > Then you would use an API to read a specific file, similar to the log
> > reading functions you already have.  In fact, you could set up the API
> > so reading a directory would return a list of directory names so you
> > don't need a separate directory listing function.  (Does your API return
> > file contents as one big text string or as lines.  If you return it as
> > one big text string, the directory idea will not work and you will need
> > a separate function.)
> 
> We'd better have distinct functions. The current is for reading large 
> chunks of data without interpretation of contents, and dir listing might 
>   need tweaking for win32.

OK.

> Now I wonder if it's better to have generic superuser-only functions for 
> list/read/write/delete (this would enable client-side editing of 
> postgresql.conf), or tie the functions to logfiles only (in which case 
> the logfile functions shouldn't use filenames but logfile timestamps).

Agreed, that was one of my goals.  Right now the superuser can read any
file with COPY (read it as text lines into a text table).  Allowing
directory listings is a natural extension.

> I still believe the log process should write the current logfile 
> filename to shared-mem (write-only), so backends can prevent deleting it.

Once we do this there will not be any backend writing to those files.
(We will need the log subprocess pid in shared memory so backends can
send signals to it.)  I am not sure how we will do file deletes but I
think each backend will have to do the delete itself rather than try to
pass the file name to the log process.  I think we will have to assume
the log file names increase in ordering so we know which one is the
current one.  I can't think if a cleaner way to communicate this to the
backends except perhaps as you suggest as shared memory areas that
contains the name, but we will need a lock so the backends don't read it
while it is changing.  That would be a nice feature.

> > So, we have:
> > 
> >     o   use pipe and dup to copy stdout and stderr to your
> >         postmaster child
> >     o   new guc variables to specify log destination and rotation times
> >     o   a server-side function to force a log rotate
> >     o   API to list a directory and show file contents
> 
> I'll start on the log process.

Sorry to be dumping more work on you but I think this is a better
directory to go for this feature.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: [PATCHES] serverlog rotation/functions

From
Andreas Pflug
Date:
Bruce Momjian wrote:

> 
> I don't see any reason to have a pattern though I suppose if you mix
> pgsql log files in with other log files it might be a problem.  One idea
> would be for the client-side program to do some processing like this:
> 
>     SELECT *
>     FROM dir_listing('/var/log') AS dir
>     WHERE dir LIKE 'pgsql_%'
> 
> or something like that where the client pulls apart the directory
> specifiction like this:
> 
>     log_dest = '/var/log/postgresql.log.%Y-%m-%d_%H%M%S'
> 
> You do something that splits the value into directory name and file name
> and removes every letter after %.
> 
>     /var/log
>     postgresql.log.%-%-%_%%%
> 
> Another idea is to allow filename wildcards in the listing so it
> becomes:
> 
>     SELECT *
>     FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir
> 
> While that is nice, it doesn't match the functionality of opendir so we
> are perhaps better with one that doesn't handle wildcards and we just do
> the wildcard processing in the WHERE clause.

Uh, this looks ugly.

How about
pg_logfile_list() RETURNS setof timestamp -- to list available logfiles
pg_logfile_filename(timestamp) to return filename for that logfile

and generic
pg_dir(wildcard_text)
pg_file_length(filename_text)
pg_file_read(filename_text, offs, len)
pg_file_write(filename_text, contents_text)
pg_file_delete(filename)

I'd like to have the logfile api straigt forward. I finally would like 
all server logging to go into a non-configurable DataDir subdirectory 
pg_log with filenames mangled by internal rules. Maybe it's a good idea 
to have the pid in the filename too, to detect postmaster restarts.

> 
> Once we do this there will not be any backend writing to those files.

Of course not, only the logging subprocess may write.

> (We will need the log subprocess pid in shared memory so backends can
> send signals to it.) 

Yes, because the inherited SysLoggerPID might become invalid in case the 
logger process crashes and is recreated.
 I am not sure how we will do file deletes but I
> think each backend will have to do the delete itself rather than try to
> pass the file name to the log process. 

Agreed.
 I think we will have to assume
> the log file names increase in ordering so we know which one is the
> current one.   I can't think if a cleaner way to communicate this to the
> backends except perhaps as you suggest as shared memory areas that
> contains the name, but we will need a lock so the backends don't read it
> while it is changing.  That would be a nice feature.

We can omit this, if we supply only a native function. In that case, 
it's up to the admin not to shoot himself into the foot.

> 
> Sorry to be dumping more work on you but I think this is a better
> directory to go for this feature.

Well that's the pain with you cvs committers guys :-)

Regards,
Andreas


Re: [PATCHES] serverlog rotation/functions

From
Bruce Momjian
Date:
Andreas Pflug wrote:
> > You do something that splits the value into directory name and file name
> > and removes every letter after %.
> > 
> >     /var/log
> >     postgresql.log.%-%-%_%%%
> > 
> > Another idea is to allow filename wildcards in the listing so it
> > becomes:
> > 
> >     SELECT *
> >     FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir
> > 
> > While that is nice, it doesn't match the functionality of opendir so we
> > are perhaps better with one that doesn't handle wildcards and we just do
> > the wildcard processing in the WHERE clause.
> 
> Uh, this looks ugly.
> 
> How about
> pg_logfile_list() RETURNS setof timestamp -- to list available logfiles
> pg_logfile_filename(timestamp) to return filename for that logfile

I don't see the need to return timestamps. If you select any empty
directory, you can just return the file names.  The only reason you
might need a pattern is to distinguish pg log files from other log
files.  If you want, create a server-side function that returns the file
name with the strftime() patterns converted to '*'.

> and generic
> pg_dir(wildcard_text)

Maybe pg_dir_ls().

OK, it would be nice if we could do a sed operation like:
sed 's/%./*/g'

but I don't know a way to do that without defining a function or pulling
in a procedural language, but if we could do it we could do:
pg_dir(echo log_destination | sed 's/%./*/g')

I think we will need a server-side function predefined to do this
conversion and we will call it:
pg_dir(glob_log_filename(log_destination))

> pg_file_length(filename_text)
> pg_file_read(filename_text, offs, len)
> pg_file_write(filename_text, contents_text)
> pg_file_delete(filename)

_delete should be _unlink.

> I'd like to have the logfile api straigt forward. I finally would like 
> all server logging to go into a non-configurable DataDir subdirectory 
> pg_log with filenames mangled by internal rules. Maybe it's a good idea 
> to have the pid in the filename too, to detect postmaster restarts.

This will not work.  What if there isn't sufficient room or
administrators want the log files somewher else?  I don't see the value
in hardcoding the location.  log_destination tells us enough.

> > Once we do this there will not be any backend writing to those files.
> 
> Of course not, only the logging subprocess may write.
> 
> > (We will need the log subprocess pid in shared memory so backends can
> > send signals to it.) 
> 
> Yes, because the inherited SysLoggerPID might become invalid in case the 
> logger process crashes and is recreated.

OK.

>   I am not sure how we will do file deletes but I
> > think each backend will have to do the delete itself rather than try to
> > pass the file name to the log process. 
> 
> Agreed.
> 
>   I think we will have to assume
> > the log file names increase in ordering so we know which one is the
> > current one.   I can't think if a cleaner way to communicate this to the
> > backends except perhaps as you suggest as shared memory areas that
> > contains the name, but we will need a lock so the backends don't read it
> > while it is changing.  That would be a nice feature.
> 
> We can omit this, if we supply only a native function. In that case, 
> it's up to the admin not to shoot himself into the foot.

Yes, we can improve this in 7.6 if we want.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: serverlog rotation/functions

From
Andreas Pflug
Date:
Bruce Momjian wrote:
> Andreas Pflug wrote:
> 
>>>You do something that splits the value into directory name and file name
>>>and removes every letter after %.
>>>
>>>    /var/log
>>>    postgresql.log.%-%-%_%%%
>>>
>>>Another idea is to allow filename wildcards in the listing so it
>>>becomes:
>>>
>>>    SELECT *
>>>    FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir
>>>
>>>While that is nice, it doesn't match the functionality of opendir so we
>>>are perhaps better with one that doesn't handle wildcards and we just do
>>>the wildcard processing in the WHERE clause.
>>
>>Uh, this looks ugly.
>>
>>How about
>>pg_logfile_list() RETURNS setof timestamp -- to list available logfiles
>>pg_logfile_filename(timestamp) to return filename for that logfile
> 
> 
> I don't see the need to return timestamps. If you select any empty
> directory, you can just return the file names.  The only reason you
> might need a pattern is to distinguish pg log files from other log
> files.  If you want, create a server-side function that returns the file
> name with the strftime() patterns converted to '*'.
> 
> 
>>and generic
>>pg_dir(wildcard_text)
> 
> 
> Maybe pg_dir_ls().
> 
> OK, it would be nice if we could do a sed operation like:
> 
>     sed 's/%./*/g'
> 
> but I don't know a way to do that without defining a function or pulling
> in a procedural language, but if we could do it we could do:
> 
>     pg_dir(echo log_destination | sed 's/%./*/g')
> 

Argggg.... ever used sed on win32?!? And how should the timestamp be 
represented in client tools? Date/time interpretation is always a source 
of problems, so *please* let the server do that.

Rethinking all this, I'd like the pg_logfile_list to return a complex type:

CREATE TYPE pg_logfile_list AS (filedate timestamp,filename text,backendpid int,inuse bool)

and

pg_logfile_list() RETURNS SETOF pg_logfile_list

which would enable

SELECT  filename,        pg_file_unlink(filename)  FROM  pg_logfile_list() WHERE  filedate < current_timestamp - '3
months'::interval       AND NOT inuse
 

In-use check is easy for the backend, if the syslog process publishes 
the current logfile's timestamp in sharedmem.

We can use a GUC variable for the log_directory (not log_destination); 
anyway, I'd like the filenames to be selected by the server.


Regards,
Andreas


Re: serverlog rotation/functions

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> In-use check is easy for the backend, if the syslog process publishes 
> the current logfile's timestamp in sharedmem.

You really haven't absorbed any of the objections I've raised, have you?
I don't want the log process connected to shared mem at *all*, and see
no particularly good reason why it should be.

> We can use a GUC variable for the log_directory (not log_destination); 
> anyway, I'd like the filenames to be selected by the server.

The directory should definitely be a GUC variable.  The individual
filenames should probably be of the form <prefix><timestamp>, where
the server dictates the format of the timestamp (and we choose it so
that the names sort correctly).  We could let the prefix be
user-selectable or make it hard-wired; I don't have a strong feeling
about that either way.
        regards, tom lane


Re: serverlog rotation/functions

From
Bruce Momjian
Date:
Andreas Pflug wrote:
> > OK, it would be nice if we could do a sed operation like:
> > 
> >     sed 's/%./*/g'
> > 
> > but I don't know a way to do that without defining a function or pulling
> > in a procedural language, but if we could do it we could do:
> > 
> >     pg_dir(echo log_destination | sed 's/%./*/g')
> > 
> 
> Argggg.... ever used sed on win32?!? And how should the timestamp be 
> represented in client tools? Date/time interpretation is always a source 
> of problems, so *please* let the server do that.

I am thinking of these all being server-side functions.

> Rethinking all this, I'd like the pg_logfile_list to return a complex type:
> 
> CREATE TYPE pg_logfile_list AS (
>     filedate timestamp,
>     filename text,
>     backendpid int,
>     inuse bool)
> 
> and
> 
> pg_logfile_list() RETURNS SETOF pg_logfile_list
> 
> which would enable
> 
> SELECT  filename,
>          pg_file_unlink(filename)
>    FROM  pg_logfile_list()
>   WHERE  filedate < current_timestamp - '3 months'::interval
>          AND NOT inuse
> 
> In-use check is easy for the backend, if the syslog process publishes 
> the current logfile's timestamp in sharedmem.
> 
> We can use a GUC variable for the log_directory (not log_destination); 
> anyway, I'd like the filenames to be selected by the server.

This seems quite involved.  Can we get the basic functionality I
described first?  Also I am not sure how all this information is going
to be passed from the logging process to the backend requesting the
information, and it seems overly complicated.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: serverlog rotation/functions

From
Andreas Pflug
Date:
Bruce Momjian wrote:

> 
> This seems quite involved.  Can we get the basic functionality I
> described first?

On the way.

> Also I am not sure how all this information is going
> to be passed from the logging process to the backend requesting the
> information, and it seems overly complicated.

There's *no* information passing from the logging process, with the 
single exception of the latest logfile timestamp (if allowed). I'd 
rather like to have that information from the logger, to be safe in case 
the system time was manipulated and the last logfile is not the current one.
The rest is just a reworked version of pg_dir_ls, with internal 
knowledge of how the timestamp is formatted.

Regards,
Andreas






Re: serverlog rotation/functions

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>In-use check is easy for the backend, if the syslog process publishes 
>>the current logfile's timestamp in sharedmem.
> 
> 
> You really haven't absorbed any of the objections I've raised, have you?
> I don't want the log process connected to shared mem at *all*, and see
> no particularly good reason why it should be.

Why shouldn't the process announce the logfile timestamp and its pid 
*writeonly*, so other backends know about it?

At least the pid must be distributed like this, just as bgwriter does.
I understand perfectly that postmaster and logger are very critical 
processes, so they should be dependent on as few resources as possible. 
The logger works without shmem in general, but how to reach it if its 
pid is unknown?


> The directory should definitely be a GUC variable.  The individual
> filenames should probably be of the form <prefix><timestamp>, where
> the server dictates the format of the timestamp (and we choose it so
> that the names sort correctly).  We could let the prefix be
> user-selectable or make it hard-wired; I don't have a strong feeling
> about that either way.

Agreed.

Regard,
Andreas


Re: serverlog rotation/functions

From
Bruce Momjian
Date:
Andreas Pflug wrote:
> Bruce Momjian wrote:
> 
> > 
> > This seems quite involved.  Can we get the basic functionality I
> > described first?
> 
> On the way.
> 
> > Also I am not sure how all this information is going
> > to be passed from the logging process to the backend requesting the
> > information, and it seems overly complicated.
> 
> There's *no* information passing from the logging process, with the 
> single exception of the latest logfile timestamp (if allowed). I'd 
> rather like to have that information from the logger, to be safe in case 
> the system time was manipulated and the last logfile is not the current one.
> The rest is just a reworked version of pg_dir_ls, with internal 
> knowledge of how the timestamp is formatted.

Oh, so you are hardcoding the logfile name so you can interpret the
timestamp from that?  It seems cleaner to allow the admin to specify
whatever log pattern the want.

However, you idea of expiring the log files based on timestamp values is
pretty powerful.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073