Re: performance for high-volume log insertion - Mailing list pgsql-performance

From david@lang.hm
Subject Re: performance for high-volume log insertion
Date
Msg-id alpine.DEB.1.10.0904210948070.12662@asgard.lang.hm
Whole thread Raw
In response to Re: performance for high-volume log insertion  (Stephen Frost <sfrost@snowman.net>)
Responses Re: performance for high-volume log insertion  (Stephen Frost <sfrost@snowman.net>)
List pgsql-performance
On Tue, 21 Apr 2009, Stephen Frost wrote:

> * Ben Chobot (bench@silentmedia.com) wrote:
>> On Mon, 20 Apr 2009, david@lang.hm wrote:
>>> one huge advantage of putting the sql into the configuration is the
>>> ability to work around other users of the database.
>>
>> +1 on this. We've always found tools much easier to work with when they
>> could be adapted to our schema, as opposed to changing our process for
>> the tool.
>
> I think we're all in agreement that we should allow the user to define
> their schema and support loading the data into it.  The question has
> been if the user really needs the flexibility to define arbitrary SQL to
> be used to do the inserts.
>
> Something I'm a bit confused about, still, is if this is really even a
> problem.  It sounds like rsyslog already allows arbitrary SQL in the
> config file with some kind of escape syntax for the variables.  Why not
> just keep that, but split it into a prepared query (where you change the
> variables to $NUM vars for the prepared statement) and an array of
> values (to pass to PQexecPrepared)?
>
> If you already know how to figure out what the variables in the
> arbitrary SQL statement are, this shouldn't be any more limiting than
> today, except where a prepared query can't have a variable argument but
> a non-prepared query can (eg, table name).  You could deal with that
> with some kind of configuration variable that lets the user choose to
> use prepared queries or not though, or some additional syntax that
> indicates certain variables shouldn't be translated to $NUM vars (eg:
> $*blah instead of $blah).

I think the key thing is that rsyslog today doesn't know anything about
SQL variables, it just creates a string that the user and the database say
looks like a SQL statement.

an added headache is that the rsyslog config does not have the concept of
arrays (the closest that it has is one special-case hack to let you
specify one variable multiple times)

if the performance win of the prepared statement is significant, then it's
probably worth the complication of changing these things.

David Lang

pgsql-performance by date:

Previous
From: Stephen Frost
Date:
Subject: Re: performance for high-volume log insertion
Next
From: Greg Smith
Date:
Subject: Re: performance for high-volume log insertion