Thread: logger table

logger table

From

Philipp Kraus

Date:

24 December 2012, 05:55:59

Hello,

I need some ideas for creating a PG based logger. I have got a job, which can run more than one time. So the PK is at
themoment jobid & cycle number. 
The inserts in this table are in parallel with the same username from different host (clustering). The user calls in
theexecutable "myprint" and the message 
will insert into this table, but at the moment I don't know a good structure of the table. Each print call can be
differentlength, so I think a text field is a good 
choice, but I don't know how can I create a good PK value. IMHO a sequence can be create problems that I'm logged in
withthe same user on multiple 
hosts, a hash key value like SHA1 based on the content are not a good choice, because content is not unique, so I can
getkey collisions.  
I would like to create on each "print" call a own record in the table, but how can I create a good key value and get no
problemsin parallel access. 
I think there can be more than 1000 inserts each second.

Does anybody can post a good idea?

Thanks

Phil

Re: logger table

From

Alejandro Carrillo

Date:

24 December 2012, 06:13:25

Did you use pg_audit?

https://github.com/jcasanov/pg_audit

De: Philipp Kraus <philipp.kraus@flashpixx.de>
Para: pgsql-general@postgresql.org
Enviado: Domingo 23 de diciembre de 2012 22:01
Asunto: [GENERAL] logger table

Hello,

I need some ideas for creating a PG based logger. I have got a job, which can run more than one time. So the PK is at the moment jobid & cycle number.
The inserts in this table are in parallel with the same username from different host (clustering). The user calls in the executable "myprint" and the message
will insert into this table, but at the moment I don't know a good structure of the table. Each print call can be different length, so I think a text field is a good
choice, but I don't know how can I create a good PK value. IMHO a sequence can be create problems that I'm logged in with the same user on multiple
hosts, a hash key value like SHA1 based on the content are not a good choice, because content is not unique, so I can get key collisions.
I would like to create on each "print" call a own record in the table, but how can I create a good key value and get no problems in parallel access.
I think there can be more than 1000 inserts each second.

Does anybody can post a good idea?

Thanks

Phil

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: logger table

From

John R Pierce

Date:

24 December 2012, 06:43:57

On 12/23/2012 7:01 PM, Philipp Kraus wrote:
> I don't know how can I create a good PK value. IMHO a sequence can be create problems that I'm logged in with the
sameuser on multiple hosts, 

why is that a reason ?   sequences work no matter how many clients there
are.

> PK is at the moment jobid & cycle number.

how is this jobid assigned?  is this something external?    how are you
keeping track of the cycle number for a given job?

Re: logger table

From

Jason Dusek

Date:

25 December 2012, 19:19:38

2012/12/24 Philipp Kraus <philipp.kraus@flashpixx.de>:
> I need some ideas for creating a PG based logger. I have got a
> job, which can run more than one time. So the PK is at the
> moment jobid & cycle number.  The inserts in this table are in
> parallel with the same username from different host
> (clustering). The user calls in the executable "myprint" and
> the message will insert into this table, but at the moment I
> don't know a good structure of the table. Each print call can
> be different length, so I think a text field is a good choice,
> but I don't know how can I create a good PK value. IMHO a
> sequence can be create problems that I'm logged in with the
> same user on multiple hosts, a hash key value like SHA1 based
> on the content are not a good choice, because content is not
> unique, so I can get key collisions.  I would like to create
> on each "print" call a own record in the table, but how can I
> create a good key value and get no problems in parallel
> access. I think there can be more than 1000 inserts each
> second.
>
> Does anybody can post a good idea?

Why is it neccesry to have a primary key? What is the "cycle
number"?

For what it is worth, I put all my syslog in PG and have so far
been fine without primary keys. (I keep only an hour there at a
time, though, and it's only a few hundred megs.)

In the past, I have had trouble maintaining a high TPS while
having lots (hundreds) of connected clients; maybe you'll want
to use a connection pool.

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B

Re: logger table

From

Philipp Kraus

Date:

25 December 2012, 21:17:38

Am 25.12.2012 17:19, schrieb Jason Dusek:
> 2012/12/24 Philipp Kraus <philipp.kraus@flashpixx.de>:
>> I need some ideas for creating a PG based logger. I have got a
>> job, which can run more than one time. So the PK is at the
>> moment jobid & cycle number.  The inserts in this table are in
>> parallel with the same username from different host
>> (clustering). The user calls in the executable "myprint" and
>> the message will insert into this table, but at the moment I
>> don't know a good structure of the table. Each print call can
>> be different length, so I think a text field is a good choice,
>> but I don't know how can I create a good PK value. IMHO a
>> sequence can be create problems that I'm logged in with the
>> same user on multiple hosts, a hash key value like SHA1 based
>> on the content are not a good choice, because content is not
>> unique, so I can get key collisions.  I would like to create
>> on each "print" call a own record in the table, but how can I
>> create a good key value and get no problems in parallel
>> access. I think there can be more than 1000 inserts each
>> second.
>>
>> Does anybody can post a good idea?
>
> Why is it neccesry to have a primary key? What is the "cycle
> number"?

the cycle number is an increment number starting by 0 til cycle-1


> For what it is worth, I put all my syslog in PG and have so far
> been fine without primary keys. (I keep only an hour there at a
> time, though, and it's only a few hundred megs.)
>
> In the past, I have had trouble maintaining a high TPS while
> having lots (hundreds) of connected clients; maybe you'll want
> to use a connection pool.

I use a connection pool at the time. I have a MPI process:

for (std::size_t i=0; i < cycle; ++i)
     for (std::size_t n=0; n < iterations; ++n)
     {
         .....
         log_to_pg_table(i, "log message")
         .....
         mpi::barrier()
     }

so the clients are synchronized on each inner loop, the primary key
is also a order number, so message with a previous number get a lower
index like a message that is pushed later to the table. So with a
primary
key I can say, that only the messages within an iteration are
unordered.