Thread: WIP: Transportable Optimizer Mode

WIP: Transportable Optimizer Mode

From
Simon Riggs
Date:
I've written a contrib module that allows planning information to be
extracted from one server and transported to another server to allow SQL
analysis: Transportable Optimizer Mode (TOM).

TOM works, yet is incomplete because of the lack of a stats hook within
the server. So TOM is the application which would seek to use the recent
patch requesting stats hooks in Postgres.

I'm posting this as background to later patches that will
* re-introduce a stats hook
* complete TOM so that it works fully with the stats hook

That way we can all see where I'm going with this, and why, as well as
providing a test harness for the stats hook patches.

The main purpose of all of this is to literally "get data for Tom" -
enhance the ability of experts to diagnose SQL query plans without
requiring access to the data on remote servers. I see this as an
enabling tool that will provide more information about optimizer issues
and opportunities.

Comments welcome.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: WIP: Transportable Optimizer Mode

From
chris
Date:
simon@2ndquadrant.com (Simon Riggs) writes:
> Comments welcome.

A minor change: I'd suggest creating these views as TEMP views, that
way they go away automatically.

diff -r1.1 create_tom_tables.sql
===================================================================
RCS file: get_data_for_tom.sql,v
retrieving revision 1.1
diff -r1.1 get_data_for_tom.sql
8,9c8
< DROP VIEW IF EXISTS tom_get_stats;
< CREATE OR REPLACE VIEW tom_get_stats AS
---
> CREATE OR REPLACE TEMP VIEW tom_get_stats AS
54,55c53
< DROP VIEW IF EXISTS tom_get_pg_class;
< CREATE OR REPLACE VIEW tom_get_pg_class AS
---
> CREATE OR REPLACE TEMP VIEW tom_get_pg_class AS
71,73c69,71
< ORDER BY n.nspname
< ;
< \copy (select * from get_tom_pg_class) TO 'tom_pg_class.data'
---
> ORDER BY n.nspname;
>
> \copy (select * from tom_get_pg_class) TO 'tom_pg_class.data'
===================================================================

Note also there's a table name fix in there.

Here are some patches to the README file:

===================================================================
RCS file: README,v
retrieving revision 1.1
diff -u -r1.1 README
--- README    2008/07/30 18:15:20    1.1
+++ README    2008/07/30 18:21:29
@@ -52,27 +52,31 @@
 Actions on Target database:
 ===========================

-5. Create user tables (and load data if required)
+5  Determine the name of the database and schema you intend to use for TOM

-    e.g.
-    psql -f user_tables.sql
+       TOMSCHEMA=tom
+       TESTDB=testtomdb
+       export TOMSCHEMA TESTDB

-6. Create TOM tables
+6. Create user tables (and load data if required)

-    psql -c "create schema tom;"
+    e.g.
+    createdb ${TESTDB}
+    psql -f user_tables.sql -d ${TESTDB}

-    psql -f create_tom_tables.sql --single-transaction -v TOM_SCHEMA='tom'
+6. Create TOM tables

-    Use the current schema
+    psql -c "create schema ${TOMSCHEMA};" -d ${TESTDB}
+    psql -f create_tom_tables.sql --single-transaction -v TOM_SCHEMA='${TOMSCHEMA}' -d ${TESTDB}

 7. Load TOM data into target database

     e.g.
-    psql -f load_data_for_tom.sql --single-transaction  -v TOM_SCHEMA='tom'
+    psql -f load_data_for_tom.sql --single-transaction  -v TOM_SCHEMA='${TOMSCHEMA}' -d ${TESTDB}

 8. Create mapping between target catalog and source stats

-    psql -f get_mapping_for_tom.sql -v TOM_SCHEMA=tom
+    psql -f get_mapping_for_tom.sql -v TOM_SCHEMA=${TOMSCHEMA} -d ${TESTDB}

     We need to be able to match up objects between source and target. This
     script matches up objects that have matching schema and tablenames. If
@@ -80,7 +84,7 @@

 9. Setup TOM plugin

-    LOAD '$libdir/plugins/tom';
+    psql -c "LOAD '\$libdir/tom_hooks';" -d ${TESTDB}

 10. EXPLAIN your queries
-----------------------------

Note, particularly, the change to the plugin name.

As a further inclusion, here's a test script that I wrote up to automate the actions:
--------------------
#!/bin/sh
# $Id$
source ~/dbs/pgsql83.sh env    # A cbbrowne-ism to set up access to my favorite DB cluster...

# These all provide overridable defaults
SOURCEDB=${SOURCEDB:-"postgres"}         # which database does the data come from
SOURCESCHEMA=${SOURCESCHEMA:-"public"}   # which schema is in use (e.g. - what schema to pg_dump)
TESTDB=${TESTDB:-"tomtestdb"}            # target database for TOM test
TOMSCHEMA=${TOMSCHEMA:-"tom"}            # schema to use for TOM tables

psql -d ${SOURCEDB} -c "analyze;"
psql -d ${SOURCEDB} -f get_data_for_tom.sql

dropdb ${TESTDB}
createdb ${TESTDB}
psql -d ${TESTDB} -c "create schema ${TOMSCHEMA};"

pg_dump -n ${SOURCESCHEMA} -s ${SOURCEDB} | psql -d ${TESTDB}

psql -d ${TESTDB} -f create_tom_tables.sql --single-transaction -v TOM_SCHEMA=${TOMSCHEMA}
psql -d ${TESTDB} -f load_data_for_tom.sql --single-transaction -v TOM_SCHEMA=${TOMSCHEMA}
psql -d ${TESTDB} -f get_mapping_for_tom.sql -v TOM_SCHEMA=${TOMSCHEMA}
psql -d ${TESTDB} -c "load '\$libdir/tom_hooks';"
--------------------

It's not clear to me that the plugin is actually working.

When I run EXPLAIN against tables in "tomtestdb", I get results
consistent with an unanalyzed table.  So possibly the "hook" isn't
being used.  Perhaps I'm using it wrongly; perhaps what I have
documented above may suggest to you what's broken.

Mind you, the logs *are* suggesting that they are using the plugin:

LOG:  plugin_get_relation_info relationObjectId = 30026
STATEMENT:  explain analyze select * from test_table ;

That is the OID for test_table.  But here's what I see:

tomtestdb=# explain select * from public.test_table ;
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..63.00 rows=1 width=104)
(1 row)

tomtestdb=# \c postgres

That seems consistent with an empty table.  I switch to (on the same
backend) the "source" DB:

You are now connected to database "postgres".
postgres=# explain select * from public.test_table ;
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..124.62 rows=6162 width=60)
(1 row)

--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules  of the  Evil Overlord  #145. "My  dungeon cell  decor  will not
feature exposed pipes.  While they add to the  gloomy atmosphere, they
are good  conductors of vibrations and  a lot of  prisoners know Morse
code." <http://www.eviloverlord.com/>

Re: WIP: Transportable Optimizer Mode

From
Simon Riggs
Date:
Chris,

Thanks for all of those changes... added as suggested (in next version)

On Wed, 2008-07-30 at 14:58 -0400, chris wrote:

> It's not clear to me that the plugin is actually working.
>
> When I run EXPLAIN against tables in "tomtestdb", I get results
> consistent with an unanalyzed table.  So possibly the "hook" isn't
> being used.  Perhaps I'm using it wrongly; perhaps what I have
> documented above may suggest to you what's broken.

The plugin works, but it currently extracts 0 for number of tuples. So
you need to override that to give a different answer.

I was going to add an SQL function to estimate the number of tuples in
the same way as the optimizer does. That way we get the same answer from
the EXPLAIN as we would have got on the main server and we don't need to
run select count(*) against each table (unless we want to).

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: WIP: Transportable Optimizer Mode

From
Simon Riggs
Date:
On Sat, 2008-08-02 at 20:12 +0100, Simon Riggs wrote:
> Chris,
>
> Thanks for all of those changes... added as suggested (in next version)
>
> On Wed, 2008-07-30 at 14:58 -0400, chris wrote:
>
> > It's not clear to me that the plugin is actually working.
> >
> > When I run EXPLAIN against tables in "tomtestdb", I get results
> > consistent with an unanalyzed table.  So possibly the "hook" isn't
> > being used.  Perhaps I'm using it wrongly; perhaps what I have
> > documented above may suggest to you what's broken.
>
> The plugin works, but it currently extracts 0 for number of tuples. So
> you need to override that to give a different answer.
>
> I was going to add an SQL function to estimate the number of tuples in
> the same way as the optimizer does. That way we get the same answer from
> the EXPLAIN as we would have got on the main server and we don't need to
> run select count(*) against each table (unless we want to).

New version submitted together with stats hook patch on other thread.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support