Thread: Re: [PATCHES] serverlog rotation/functions
OK, I talked to Tom about this patch and I understand the issues now. I think the best solution will be to have the postmaster start a child process that can read the guc variables and create a log file based on it contents. The child would be responsible to create a new log file every X seconds as specified in the postgresql.conf file. Backends wishing to read the log file would call a function to list the contents of a directory. We can do this by creating a generic backend super-user-only function that can list the contents of any directory. Then you would use an API to read a specific file, similar to the log reading functions you already have. In fact, you could set up the API so reading a directory would return a list of directory names so you don't need a separate directory listing function. (Does your API return file contents as one big text string or as lines. If you return it as one big text string, the directory idea will not work and you will need a separate function.) So, we have: o use pipe and dup to copy stdout and stderr to your postmaster childo new guc variables to specify log destinationand rotation timeso a server-side function to force a log rotateo API to list a directory and show file contents With this functionality, you can have clients listing the log directory and choosing the most recent log file or previous ones. You could even configure it to remove old log files. Both Tom and I believe this is a more generic and reliable solution. --------------------------------------------------------------------------- Andreas Pflug wrote: > Bruce Momjian wrote: > > > > Also there are no documenttion changes. > > Here are the missing docs, freshly created against cvs. > > Regards, > Andreas > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > OK, I talked to Tom about this patch and I understand the issues now. > > I think the best solution will be to have the postmaster start a child > process that can read the guc variables and create a log file based on > it contents. The child would be responsible to create a new log file > every X seconds as specified in the postgresql.conf file. > OK, next round. > Backends wishing to read the log file would call a function to list the > contents of a directory. We can do this by creating a generic backend > super-user-only function that can list the contents of any directory. If logfiles are located using directory listings, this would mean we should have well-known filename patterns instead of the current template stuff. So only the serverlog directory name would be selectable, if that makes still sense (I'd opt to drop that). > Then you would use an API to read a specific file, similar to the log > reading functions you already have. In fact, you could set up the API > so reading a directory would return a list of directory names so you > don't need a separate directory listing function. (Does your API return > file contents as one big text string or as lines. If you return it as > one big text string, the directory idea will not work and you will need > a separate function.) We'd better have distinct functions. The current is for reading large chunks of data without interpretation of contents, and dir listing might need tweaking for win32. Now I wonder if it's better to have generic superuser-only functions for list/read/write/delete (this would enable client-side editing of postgresql.conf), or tie the functions to logfiles only (in which case the logfile functions shouldn't use filenames but logfile timestamps). I still believe the log process should write the current logfile filename to shared-mem (write-only), so backends can prevent deleting it. > > So, we have: > > o use pipe and dup to copy stdout and stderr to your > postmaster child > o new guc variables to specify log destination and rotation times > o a server-side function to force a log rotate > o API to list a directory and show file contents I'll start on the log process. Regards, Andreas
Andreas Pflug wrote: > Bruce Momjian wrote: > > OK, I talked to Tom about this patch and I understand the issues now. > > > > I think the best solution will be to have the postmaster start a child > > process that can read the guc variables and create a log file based on > > it contents. The child would be responsible to create a new log file > > every X seconds as specified in the postgresql.conf file. > > > > OK, next round. > > > Backends wishing to read the log file would call a function to list the > > contents of a directory. We can do this by creating a generic backend > > super-user-only function that can list the contents of any directory. > > If logfiles are located using directory listings, this would mean we > should have well-known filename patterns instead of the current template > stuff. So only the serverlog directory name would be selectable, if that > makes still sense (I'd opt to drop that). I don't see any reason to have a pattern though I suppose if you mix pgsql log files in with other log files it might be a problem. One idea would be for the client-side program to do some processing like this: SELECT *FROM dir_listing('/var/log') AS dirWHERE dir LIKE 'pgsql_%' or something like that where the client pulls apart the directory specifiction like this: log_dest = '/var/log/postgresql.log.%Y-%m-%d_%H%M%S' You do something that splits the value into directory name and file name and removes every letter after %. /var/logpostgresql.log.%-%-%_%%% Another idea is to allow filename wildcards in the listing so it becomes: SELECT *FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir While that is nice, it doesn't match the functionality of opendir so we are perhaps better with one that doesn't handle wildcards and we just do the wildcard processing in the WHERE clause. > > Then you would use an API to read a specific file, similar to the log > > reading functions you already have. In fact, you could set up the API > > so reading a directory would return a list of directory names so you > > don't need a separate directory listing function. (Does your API return > > file contents as one big text string or as lines. If you return it as > > one big text string, the directory idea will not work and you will need > > a separate function.) > > We'd better have distinct functions. The current is for reading large > chunks of data without interpretation of contents, and dir listing might > need tweaking for win32. OK. > Now I wonder if it's better to have generic superuser-only functions for > list/read/write/delete (this would enable client-side editing of > postgresql.conf), or tie the functions to logfiles only (in which case > the logfile functions shouldn't use filenames but logfile timestamps). Agreed, that was one of my goals. Right now the superuser can read any file with COPY (read it as text lines into a text table). Allowing directory listings is a natural extension. > I still believe the log process should write the current logfile > filename to shared-mem (write-only), so backends can prevent deleting it. Once we do this there will not be any backend writing to those files. (We will need the log subprocess pid in shared memory so backends can send signals to it.) I am not sure how we will do file deletes but I think each backend will have to do the delete itself rather than try to pass the file name to the log process. I think we will have to assume the log file names increase in ordering so we know which one is the current one. I can't think if a cleaner way to communicate this to the backends except perhaps as you suggest as shared memory areas that contains the name, but we will need a lock so the backends don't read it while it is changing. That would be a nice feature. > > So, we have: > > > > o use pipe and dup to copy stdout and stderr to your > > postmaster child > > o new guc variables to specify log destination and rotation times > > o a server-side function to force a log rotate > > o API to list a directory and show file contents > > I'll start on the log process. Sorry to be dumping more work on you but I think this is a better directory to go for this feature. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > > I don't see any reason to have a pattern though I suppose if you mix > pgsql log files in with other log files it might be a problem. One idea > would be for the client-side program to do some processing like this: > > SELECT * > FROM dir_listing('/var/log') AS dir > WHERE dir LIKE 'pgsql_%' > > or something like that where the client pulls apart the directory > specifiction like this: > > log_dest = '/var/log/postgresql.log.%Y-%m-%d_%H%M%S' > > You do something that splits the value into directory name and file name > and removes every letter after %. > > /var/log > postgresql.log.%-%-%_%%% > > Another idea is to allow filename wildcards in the listing so it > becomes: > > SELECT * > FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir > > While that is nice, it doesn't match the functionality of opendir so we > are perhaps better with one that doesn't handle wildcards and we just do > the wildcard processing in the WHERE clause. Uh, this looks ugly. How about pg_logfile_list() RETURNS setof timestamp -- to list available logfiles pg_logfile_filename(timestamp) to return filename for that logfile and generic pg_dir(wildcard_text) pg_file_length(filename_text) pg_file_read(filename_text, offs, len) pg_file_write(filename_text, contents_text) pg_file_delete(filename) I'd like to have the logfile api straigt forward. I finally would like all server logging to go into a non-configurable DataDir subdirectory pg_log with filenames mangled by internal rules. Maybe it's a good idea to have the pid in the filename too, to detect postmaster restarts. > > Once we do this there will not be any backend writing to those files. Of course not, only the logging subprocess may write. > (We will need the log subprocess pid in shared memory so backends can > send signals to it.) Yes, because the inherited SysLoggerPID might become invalid in case the logger process crashes and is recreated. I am not sure how we will do file deletes but I > think each backend will have to do the delete itself rather than try to > pass the file name to the log process. Agreed. I think we will have to assume > the log file names increase in ordering so we know which one is the > current one. I can't think if a cleaner way to communicate this to the > backends except perhaps as you suggest as shared memory areas that > contains the name, but we will need a lock so the backends don't read it > while it is changing. That would be a nice feature. We can omit this, if we supply only a native function. In that case, it's up to the admin not to shoot himself into the foot. > > Sorry to be dumping more work on you but I think this is a better > directory to go for this feature. Well that's the pain with you cvs committers guys :-) Regards, Andreas
Andreas Pflug wrote: > > You do something that splits the value into directory name and file name > > and removes every letter after %. > > > > /var/log > > postgresql.log.%-%-%_%%% > > > > Another idea is to allow filename wildcards in the listing so it > > becomes: > > > > SELECT * > > FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir > > > > While that is nice, it doesn't match the functionality of opendir so we > > are perhaps better with one that doesn't handle wildcards and we just do > > the wildcard processing in the WHERE clause. > > Uh, this looks ugly. > > How about > pg_logfile_list() RETURNS setof timestamp -- to list available logfiles > pg_logfile_filename(timestamp) to return filename for that logfile I don't see the need to return timestamps. If you select any empty directory, you can just return the file names. The only reason you might need a pattern is to distinguish pg log files from other log files. If you want, create a server-side function that returns the file name with the strftime() patterns converted to '*'. > and generic > pg_dir(wildcard_text) Maybe pg_dir_ls(). OK, it would be nice if we could do a sed operation like: sed 's/%./*/g' but I don't know a way to do that without defining a function or pulling in a procedural language, but if we could do it we could do: pg_dir(echo log_destination | sed 's/%./*/g') I think we will need a server-side function predefined to do this conversion and we will call it: pg_dir(glob_log_filename(log_destination)) > pg_file_length(filename_text) > pg_file_read(filename_text, offs, len) > pg_file_write(filename_text, contents_text) > pg_file_delete(filename) _delete should be _unlink. > I'd like to have the logfile api straigt forward. I finally would like > all server logging to go into a non-configurable DataDir subdirectory > pg_log with filenames mangled by internal rules. Maybe it's a good idea > to have the pid in the filename too, to detect postmaster restarts. This will not work. What if there isn't sufficient room or administrators want the log files somewher else? I don't see the value in hardcoding the location. log_destination tells us enough. > > Once we do this there will not be any backend writing to those files. > > Of course not, only the logging subprocess may write. > > > (We will need the log subprocess pid in shared memory so backends can > > send signals to it.) > > Yes, because the inherited SysLoggerPID might become invalid in case the > logger process crashes and is recreated. OK. > I am not sure how we will do file deletes but I > > think each backend will have to do the delete itself rather than try to > > pass the file name to the log process. > > Agreed. > > I think we will have to assume > > the log file names increase in ordering so we know which one is the > > current one. I can't think if a cleaner way to communicate this to the > > backends except perhaps as you suggest as shared memory areas that > > contains the name, but we will need a lock so the backends don't read it > > while it is changing. That would be a nice feature. > > We can omit this, if we supply only a native function. In that case, > it's up to the admin not to shoot himself into the foot. Yes, we can improve this in 7.6 if we want. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > Andreas Pflug wrote: > >>>You do something that splits the value into directory name and file name >>>and removes every letter after %. >>> >>> /var/log >>> postgresql.log.%-%-%_%%% >>> >>>Another idea is to allow filename wildcards in the listing so it >>>becomes: >>> >>> SELECT * >>> FROM dir_listing('/var/log/postgresql.log.*-*-*_***') AS dir >>> >>>While that is nice, it doesn't match the functionality of opendir so we >>>are perhaps better with one that doesn't handle wildcards and we just do >>>the wildcard processing in the WHERE clause. >> >>Uh, this looks ugly. >> >>How about >>pg_logfile_list() RETURNS setof timestamp -- to list available logfiles >>pg_logfile_filename(timestamp) to return filename for that logfile > > > I don't see the need to return timestamps. If you select any empty > directory, you can just return the file names. The only reason you > might need a pattern is to distinguish pg log files from other log > files. If you want, create a server-side function that returns the file > name with the strftime() patterns converted to '*'. > > >>and generic >>pg_dir(wildcard_text) > > > Maybe pg_dir_ls(). > > OK, it would be nice if we could do a sed operation like: > > sed 's/%./*/g' > > but I don't know a way to do that without defining a function or pulling > in a procedural language, but if we could do it we could do: > > pg_dir(echo log_destination | sed 's/%./*/g') > Argggg.... ever used sed on win32?!? And how should the timestamp be represented in client tools? Date/time interpretation is always a source of problems, so *please* let the server do that. Rethinking all this, I'd like the pg_logfile_list to return a complex type: CREATE TYPE pg_logfile_list AS (filedate timestamp,filename text,backendpid int,inuse bool) and pg_logfile_list() RETURNS SETOF pg_logfile_list which would enable SELECT filename, pg_file_unlink(filename) FROM pg_logfile_list() WHERE filedate < current_timestamp - '3 months'::interval AND NOT inuse In-use check is easy for the backend, if the syslog process publishes the current logfile's timestamp in sharedmem. We can use a GUC variable for the log_directory (not log_destination); anyway, I'd like the filenames to be selected by the server. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > In-use check is easy for the backend, if the syslog process publishes > the current logfile's timestamp in sharedmem. You really haven't absorbed any of the objections I've raised, have you? I don't want the log process connected to shared mem at *all*, and see no particularly good reason why it should be. > We can use a GUC variable for the log_directory (not log_destination); > anyway, I'd like the filenames to be selected by the server. The directory should definitely be a GUC variable. The individual filenames should probably be of the form <prefix><timestamp>, where the server dictates the format of the timestamp (and we choose it so that the names sort correctly). We could let the prefix be user-selectable or make it hard-wired; I don't have a strong feeling about that either way. regards, tom lane
Andreas Pflug wrote: > > OK, it would be nice if we could do a sed operation like: > > > > sed 's/%./*/g' > > > > but I don't know a way to do that without defining a function or pulling > > in a procedural language, but if we could do it we could do: > > > > pg_dir(echo log_destination | sed 's/%./*/g') > > > > Argggg.... ever used sed on win32?!? And how should the timestamp be > represented in client tools? Date/time interpretation is always a source > of problems, so *please* let the server do that. I am thinking of these all being server-side functions. > Rethinking all this, I'd like the pg_logfile_list to return a complex type: > > CREATE TYPE pg_logfile_list AS ( > filedate timestamp, > filename text, > backendpid int, > inuse bool) > > and > > pg_logfile_list() RETURNS SETOF pg_logfile_list > > which would enable > > SELECT filename, > pg_file_unlink(filename) > FROM pg_logfile_list() > WHERE filedate < current_timestamp - '3 months'::interval > AND NOT inuse > > In-use check is easy for the backend, if the syslog process publishes > the current logfile's timestamp in sharedmem. > > We can use a GUC variable for the log_directory (not log_destination); > anyway, I'd like the filenames to be selected by the server. This seems quite involved. Can we get the basic functionality I described first? Also I am not sure how all this information is going to be passed from the logging process to the backend requesting the information, and it seems overly complicated. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > > This seems quite involved. Can we get the basic functionality I > described first? On the way. > Also I am not sure how all this information is going > to be passed from the logging process to the backend requesting the > information, and it seems overly complicated. There's *no* information passing from the logging process, with the single exception of the latest logfile timestamp (if allowed). I'd rather like to have that information from the logger, to be safe in case the system time was manipulated and the last logfile is not the current one. The rest is just a reworked version of pg_dir_ls, with internal knowledge of how the timestamp is formatted. Regards, Andreas
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >>In-use check is easy for the backend, if the syslog process publishes >>the current logfile's timestamp in sharedmem. > > > You really haven't absorbed any of the objections I've raised, have you? > I don't want the log process connected to shared mem at *all*, and see > no particularly good reason why it should be. Why shouldn't the process announce the logfile timestamp and its pid *writeonly*, so other backends know about it? At least the pid must be distributed like this, just as bgwriter does. I understand perfectly that postmaster and logger are very critical processes, so they should be dependent on as few resources as possible. The logger works without shmem in general, but how to reach it if its pid is unknown? > The directory should definitely be a GUC variable. The individual > filenames should probably be of the form <prefix><timestamp>, where > the server dictates the format of the timestamp (and we choose it so > that the names sort correctly). We could let the prefix be > user-selectable or make it hard-wired; I don't have a strong feeling > about that either way. Agreed. Regard, Andreas
Andreas Pflug wrote: > Bruce Momjian wrote: > > > > > This seems quite involved. Can we get the basic functionality I > > described first? > > On the way. > > > Also I am not sure how all this information is going > > to be passed from the logging process to the backend requesting the > > information, and it seems overly complicated. > > There's *no* information passing from the logging process, with the > single exception of the latest logfile timestamp (if allowed). I'd > rather like to have that information from the logger, to be safe in case > the system time was manipulated and the last logfile is not the current one. > The rest is just a reworked version of pg_dir_ls, with internal > knowledge of how the timestamp is formatted. Oh, so you are hardcoding the logfile name so you can interpret the timestamp from that? It seems cleaner to allow the admin to specify whatever log pattern the want. However, you idea of expiring the log files based on timestamp values is pretty powerful. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073