Thread: Extracting SQL from logs in a usable format

Extracting SQL from logs in a usable format

From
Chris Ernst
Date:
Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system.  Does such a thing already exist or should I start writing my
own log parser?

Thank you for your help.

Chris Ernst
eSoft, Inc.

Re: Extracting SQL from logs in a usable format

From
Rory Campbell-Lange
Date:
On 18/12/09, Chris Ernst (cernst@esoft.com) wrote:
> I have a project where I need to be able to capture every query from a
> production system into a file such that I can "replay" them on a staging
> system.  Does such a thing already exist or should I start writing my
> own log parser?

I'm sure there are a number of log replay systems already in existence.
Perhaps you could use Londiste, and introduce a lag to the replication
process if it is required?
http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc17

A very simple way of doing this is to log all of the SQL statements by
setting the postgresql.conf parameter log_min_duration_statement to 0.
You can then easily parse out the SQL statements from the log file.
I have done this before quite successfully when wishing to replay a long
set of SQL statements to test un upgrade of a Postgresql server.

Rory

--
Rory Campbell-Lange
Director
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Re: Extracting SQL from logs in a usable format

From
Chris Ernst
Date:
Rory Campbell-Lange wrote:
> On 18/12/09, Chris Ernst (cernst@esoft.com) wrote:
>> I have a project where I need to be able to capture every query from a
>> production system into a file such that I can "replay" them on a staging
>> system.  Does such a thing already exist or should I start writing my
>> own log parser?
>
> I'm sure there are a number of log replay systems already in existence.
> Perhaps you could use Londiste, and introduce a lag to the replication
> process if it is required?
> http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc17
>
> A very simple way of doing this is to log all of the SQL statements by
> setting the postgresql.conf parameter log_min_duration_statement to 0.
> You can then easily parse out the SQL statements from the log file.
> I have done this before quite successfully when wishing to replay a long
> set of SQL statements to test un upgrade of a Postgresql server.

Hi Rory,

Thank you for the quick reply.

Londiste isn't really an option as it (apparently) would only get
INSERT, UPDATE and DELETE queries.  I would want to capture every query
that is run, including SELECTs.  Plus, the production master is already
running slony1, and I don't think they will play nice together.

My goal is to be able to replay a set of actual production traffic on
the staging server, starting from a snapshot at the point where the
statement logging began.  Then make some changes (tweak settings,
upgrade versions, make DDL changes, etc.) and rerun the same set of
statements to analyze the results with pgFouine.

I started writing my own log parser to pull the statements from the
postgres logs, but as I get in to the details, it's not quite as
straight forward as I had thought.  Keeping track of which client
connections have prepared queries, merging the correct arguments and
maintaining concurrency are all critical and far from trivial.

Basically I'm curious if anyone has already created something that does
this or am I treading into uncharted waters?  I've been googling around
a bit and haven't come up with anything yet.


Thanks again,

    - Chris

Re: Extracting SQL from logs in a usable format

From
Greg Smith
Date:
Chris Ernst wrote:
> I started writing my own log parser to pull the statements from the
> postgres logs, but as I get in to the details, it's not quite as
> straight forward as I had thought.  Keeping track of which client
> connections have prepared queries, merging the correct arguments and
> maintaining concurrency are all critical and far from trivial.
>
I hope you're using the CVS format logs, which should make the job a lot
easier than the standard text one.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Extracting SQL from logs in a usable format

From
Filip Rembiałkowski
Date:

Tsung [http://tsung.erlang-projects.org/] has a nice pg proxy module. It records your "query stream" and is able to replay it later.



2009/12/18 Chris Ernst <cernst@esoft.com>
Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system.  Does such a thing already exist or should I start writing my
own log parser?

Thank you for your help.

Chris Ernst
eSoft, Inc.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

Re: Extracting SQL from logs in a usable format

From
Chris Ernst
Date:
Greg Smith wrote:
> Chris Ernst wrote:
>> I started writing my own log parser to pull the statements from the
>> postgres logs, but as I get in to the details, it's not quite as
>> straight forward as I had thought.  Keeping track of which client
>> connections have prepared queries, merging the correct arguments and
>> maintaining concurrency are all critical and far from trivial.
>>
> I hope you're using the CVS format logs, which should make the job a lot
> easier than the standard text one.


HOLY COW!  I hadn't even thought of that.  Excellent point.  Thank you,
Greg.

    - Chris


Re: Extracting SQL from logs in a usable format

From
Chris Ernst
Date:
Hmm.. That does look very interesting.  The only thing that concerns me
is where it says it supports "Basic Queries (Extended queries not yet
supported)".  I'm not sure what is meant by "Extended queries".  Any idea?

Thank you for the pointer, Filip.  I'll check it out.

    - Chris

Filip Rembiałkowski wrote:
>
> Tsung [http://tsung.erlang-projects.org/] has a nice pg proxy module. It
> records your "query stream" and is able to replay it later.
>
>
>
> 2009/12/18 Chris Ernst <cernst@esoft.com <mailto:cernst@esoft.com>>
>
>     Hi all,
>
>     I have a project where I need to be able to capture every query from a
>     production system into a file such that I can "replay" them on a staging
>     system.  Does such a thing already exist or should I start writing my
>     own log parser?
>
>     Thank you for your help.
>
>     Chris Ernst
>     eSoft, Inc.
>
>     --
>     Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>     <mailto:pgsql-general@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>
> --
> Filip Rembiałkowski
> JID,mailto:filip.rembialkowski@gmail.com
> <mailto:filip.rembialkowski@gmail.com>
> http://filip.rembialkowski.net/


Re: Extracting SQL from logs in a usable format

From
"Albe Laurenz"
Date:
Chris Ernst wrote:
> I have a project where I need to be able to capture every query from a
> production system into a file such that I can "replay" them on a staging
> system.  Does such a thing already exist or should I start writing my
> own log parser?

I am currently developing such a beast, it is currently still quite alpha.
If you are interested I can send you a copy.
I'll try to publish it once it is sufficiently beta.

Yours,
Laurenz Albe

Re: Extracting SQL from logs in a usable format

From
Shoaib Mir
Date:
On Sun, Dec 20, 2009 at 9:37 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Chris Ernst wrote:
> I have a project where I need to be able to capture every query from a
> production system into a file such that I can "replay" them on a staging
> system.  Does such a thing already exist or should I start writing my
> own log parser?


I am not sure if its still available but there used to a nice tool for doing the same, I guess it was named as "Playr" by myyearbook.

Regards,
--
Shoaib Mir
http://shoaibmir.wordpress.com/

Re: Extracting SQL from logs in a usable format

From
"JGuillaume (ioguix) de Rorthais"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey guys,

Albe Laurenz a écrit :
> Chris Ernst wrote:
>> I have a project where I need to be able to capture every query
>> from a production system into a file such that I can "replay"
>> them on a staging system.  Does such a thing already exist or
>> should I start writing my own log parser?

I have the same problem. Our goal would be to capture from a 8.1 a
representative period and replay against a 8.4 to find out every
possible issues.

> I am currently developing such a beast, it is currently still quite
> alpha. If you are interested I can send you a copy. I'll try to
> publish it once it is sufficiently beta.

Interesting project, but but I have one big issue under 8.1 and
advanced query (prepare / bind / execute): we cannot extract values of
parameters from the logs with <8.2. So I am not able to parse /
rebuilt query from logs under 8.1.

I started something as well, based on tcpdump/tshark output (tshark
- -VT text ...). My project is in pre-alpha step, but at least I can
extract both simple queries and advanced queries w/ params. The only
known limitations with this approach are :
 - cannot extract from SSL connections
 - ISTM tshark only support PostgreSQL V3 protocol. So only work on
> 7.2  IIRC

> Yours, Laurenz Albe

- --
Jehan-Guillaume (ioguix) de Rorthais
www.dalibo.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkst9/EACgkQxWGfaAgowiJPNgCgia285amuwCXX2nl4/LaNSofR
N1wAnA8kuFKnP0vzAx/PCamheD/iKmNu
=ihow
-----END PGP SIGNATURE-----


Re: Extracting SQL from logs in a usable format

From
"JGuillaume (ioguix) de Rorthais"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shoaib Mir a écrit :
> On Sun, Dec 20, 2009 at 9:37 AM, Albe Laurenz
> <laurenz.albe@wien.gv.at <mailto:laurenz.albe@wien.gv.at>> wrote:
>
> Chris Ernst wrote:
>> I have a project where I need to be able to capture every
> query from a
>> production system into a file such that I can "replay" them on
> a staging
>> system.  Does such a thing already exist or should I start
> writing my
>> own log parser?
>
>
> I am not sure if its still available but there used to a nice tool
> for doing the same, I guess it was named as "Playr" by myyearbook.

AFAIK, it's a dead project and its home at myyearbook is unavailable.
However, you can still access their download page:
https://area51.myyearbook.com/downloads/

I did find the svn repo with some googling some time ago...

I did some quick investigation on it, it seems it doesn't work with
extended queries neither.

> Regards, -- Shoaib Mir http://shoaibmir.wordpress.com/

- --
Jehan-Guillaume (ioguix) de Rorthais
www.dalibo.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkst+d8ACgkQxWGfaAgowiKA8gCcDhiCPC6pZCghVIuVePd0s3lo
GtMAoKFV5YldPH8QjdYGMRZ+Mq0Io/Dk
=+ANY
-----END PGP SIGNATURE-----


Re: Extracting SQL from logs in a usable format

From
hubert depesz lubaczewski
Date:
On Fri, Dec 18, 2009 at 02:08:07PM -0700, Chris Ernst wrote:
> Hi all,
>
> I have a project where I need to be able to capture every query from a
> production system into a file such that I can "replay" them on a staging
> system.  Does such a thing already exist or should I start writing my
> own log parser?

log to CSV format.

But:
1. not always all parts of the query will be logged in query itself
(prepared statements)
2. replying queries on 2nd machine doesn't quarantee that you will get
the same data afterwards.

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: Extracting SQL from logs in a usable format

From
Craig Ringer
Date:
On 20/12/2009 7:59 PM, hubert depesz lubaczewski wrote:
> On Fri, Dec 18, 2009 at 02:08:07PM -0700, Chris Ernst wrote:
>> Hi all,
>>
>> I have a project where I need to be able to capture every query from a
>> production system into a file such that I can "replay" them on a staging
>> system.  Does such a thing already exist or should I start writing my
>> own log parser?
>
> log to CSV format.
>
> But:
> 1. not always all parts of the query will be logged in query itself
> (prepared statements)
> 2. replying queries on 2nd machine doesn't quarantee that you will get
> the same data afterwards.

... because of global settings (DATESTYLE etc) that may affect
interpretation of the data, and because the log order of statements
can't accurately represent concurrent execution.

With the same server settings, the same starting values for sequences
etc, no time-based function use, no non-deterministic function use (eg:
random()) and no non-deterministic interactions between concurrent
transactions, you should be able to get data that's the same when
examined at the SQL level. It might not be in the same order, though,
and it certainly won't be the same on-disk.

So ... why do you need this replay? What sorts of limitations can you
live with?

It sounds like concurrency is a concern, and that's one that will give
you pain, because the Pg logs don't record statement start _and_ end
time, nor do they record at what points along the execution timeline the
backend got a chance to do work. So it's hard to know about lock
acquisition order, among other things.

--
Craig Ringer

Re: Extracting SQL from logs in a usable format

From
"Albe Laurenz"
Date:
JGuillaume (ioguix) de Rorthais wrote:
>> I am currently developing such a beast, it is currently still quite
>> alpha. If you are interested I can send you a copy. I'll try to
>> publish it once it is sufficiently beta.
>
> Interesting project, but but I have one big issue under 8.1 and
> advanced query (prepare / bind / execute): we cannot extract values of
> parameters from the logs with <8.2. So I am not able to parse /
> rebuilt query from logs under 8.1.

Hmm, that doesn't bother me. 8.1 is pretty old now, and who knows when
my program will be stable :^)

> I started something as well, based on tcpdump/tshark output (tshark
> - -VT text ...). My project is in pre-alpha step, but at least I can
> extract both simple queries and advanced queries w/ params. The only
> known limitations with this approach are :
>  - cannot extract from SSL connections
>  - ISTM tshark only support PostgreSQL V3 protocol. So only work on
> > 7.2  IIRC

I guess each approach has some limitations.
The limitations I encountered for log parsing:
- COPY data are not logged.
- Fast Path API calls are not logged (that includes large object functions).
- Unless you have log_min_messages at DEBUG2 or better, you cannot
  determine when exactly a prepared statement was parsed.

Yours,
Laurenz Albe

Re: Extracting SQL from logs in a usable format

From
Dimitri Fontaine
Date:
Le 19 déc. 2009 à 16:20, Chris Ernst a écrit :

> Hmm.. That does look very interesting.  The only thing that concerns me
> is where it says it supports "Basic Queries (Extended queries not yet
> supported)".  I'm not sure what is meant by "Extended queries".  Any idea?

I think it refers to the Extended Query support in the frontend / backend protocol, as in the documentation:
  http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

  The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results of
preparatorysteps can be re-used multiple times for improved efficiency. Furthermore, additional features are available,
suchas the possibility of supplying data values as separate parameters instead of having to insert them directly into a
querystring 

So that's for parse/bind/execute communications, which are used in prepare/execute and queryParam I think.
--
dim

Re: Extracting SQL from logs in a usable format

From
Chris Ernst
Date:
Dimitri Fontaine wrote:
> Le 19 déc. 2009 à 16:20, Chris Ernst a écrit :
>
>> Hmm.. That does look very interesting.  The only thing that concerns me
>> is where it says it supports "Basic Queries (Extended queries not yet
>> supported)".  I'm not sure what is meant by "Extended queries".  Any idea?
>
> I think it refers to the Extended Query support in the frontend / backend protocol, as in the documentation:
>   http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY
>
>   The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results
ofpreparatory steps can be re-used multiple times for improved efficiency. Furthermore, additional features are
available,such as the possibility of supplying data values as separate parameters instead of having to insert them
directlyinto a query string 
>
> So that's for parse/bind/execute communications, which are used in prepare/execute and queryParam I think.

Ouch!  You're right.  And that's would be a deal killer for me.  About
90% of the traffic is prepared queries that are run over and over with
different parameters.

    - Chris

Re: Extracting SQL from logs in a usable format

From
Dimitri Fontaine
Date:
Le 21 déc. 2009 à 15:24, Chris Ernst a écrit :
> Ouch!  You're right.  And that's would be a deal killer for me.  About
> 90% of the traffic is prepared queries that are run over and over with
> different parameters.

The driver project and code are now there it seems:
  http://frihjul.net/pgsql
  http://github.com/noss/pgsql/tree

Maybe you could ask the author about supporting the extended protocol, a quick browsing tonight shows me
prepare/executesupport. I'm sure if improvements in the pgsql driver would translate to improvements in the tsung
supportof it, but it should be about it. 

Regards,
--
dim