Thread: Valid query times out when run from bash script

Valid query times out when run from bash script

From
"W. David Jarvis"
Date:
Hello all -

I've been trying to get a bash script set-up that creates a DB, does a
join, writes the join to file, and then cleans itself up afterwards.
Everything within the script runs fine when entered at an actual
prompt, but when I run the script the SELECT query runs indefinitely.
I'm not sure what the cause is—the script is being run on a box
connecting to its own local postgres installation, so it can't be a
connection issue. Any thoughts?

The bash script is included below; postgres version is 9.0.4. Any help
very much appreciated :)

#!/bin/bash

: ${1?"ERROR: Incorrect number of arguments (files have not been
properly specified). Proper format is 'mta-join old_mta_file
new_mta_file desired_output_filename.'"}
: ${2?"ERROR: Incorrect number of arguments (files have not been
properly specified). Proper format is 'mta-join old_mta_file
new_mta_file desired_output_filename.'"}
: ${3?"ERROR: Incorrect number of arguments (files have not been
properly specified). Proper format is 'mta-join old_mta_file
new_mta_file desired_output_filename.'"}

OLD_MTA_FILE=$1
NEW_MTA_FILE=$2
OUTPUT_FILE=$3

# Figure out how many days we need columns for
export NEW_MTA_COLWIDTH=`head -1 $NEW_MTA_FILE | grep ',' -o | wc -l`
NEW_MTA_COLWIDTH=($NEW_MTA_COLWIDTH-7)/4

# Assemble the extra SQL for the above
ADDITIONAL_CREATION_FIELDS=""
ADDITIONAL_SELECTION_FIELDS=""
for (( c=0; c<=$NEW_MTA_COLWIDTH; c++ ))
do
        ADDITIONAL_CREATION_FIELDS=$ADDITIONAL_CREATION_FIELDS"DAY_"$c"
varchar(255), event_count_"$c" numeric(20,10), conversions_"$c"
numeric(20,10), revenue_"$c" numeric(20,10), "
        ADDITIONAL_SELECTION_FIELDS=$ADDITIONAL_SELECTION_FIELDS"new_mta.DAY_"$c",
new_mta.event_count_"$c", new_mta.conversions_"$c",
new_mta.revenue_"$c", "
done

# Let's get rid of that extra comma at the end.

ADDITIONAL_CREATION_FIELDS=${ADDITIONAL_CREATION_FIELDS:0:${#ADDITIONAL_CREATION_FIELDS}-2}
ADDITIONAL_SELECTION_FIELDS=${ADDITIONAL_SELECTION_FIELDS:0:${#ADDITIONAL_SELECTION_FIELDS}-2}

echo -n "Creating database mta-join..."
createdb mta-join
echo -e "Done!\n"

echo "Creating table new_mta..."
# PSQL commands start here
psql mta-join<<EOF
CREATE TABLE new_mta (
report_date             date,
campaign_id             integer,
tracking_campaign_id    integer,
placement_id            integer,
creative_id             integer,
package_id              integer,
audience_id             integer,
$ADDITIONAL_CREATION_FIELDS);
EOF
echo -e "Done!\n"

echo -n "Inserting new mta file into new_mta table..."
psql mta-join<<EOF
\copy new_mta from '$NEW_MTA_FILE' with delimiter ','
\q
EOF
echo -e "Done!\n"

echo "Creating table old_mta..."
# PSQL commands start here
psql mta-join<<EOF
CREATE TABLE old_mta (
report_date_day         date,
report_date_week        date,
report_date_month       date,
campaign_name           varchar(255),
package_name            varchar(255),
audience_name           varchar(255),
inventory_provider_name varchar(255),
placement_name          varchar(255),
creative_size           varchar(255),
creative_name           varchar(255),
impressions             bigint,
data_cost               numeric(20,10),
media_cost              numeric(20,10),
gross_cost              numeric(20,10),
clicks                  integer,
lta_click_actions       integer,
lta_view_actions        integer,
lta_click_revenue       integer,
lta_view_revenue        integer,
mta_actions             numeric(20,10),
mta_revenue             integer,
mta_action_count        integer,
mta_seconds             integer,
campaign_id             integer,
placement_id            bigint,
creative_id             bigint,
package_id              bigint,
audience_id             integer);
\q
EOF
echo -e "Done!\n"

# Upload old MTA file into table old_mta
echo -n "Inserting old mta file into old_mta table..."
psql mta-join<<EOF
\COPY old_mta from '$OLD_MTA_FILE' with delimiter ',' CSV header;
EOF
echo -e "Done!\n"

# Create a bunch of indexes
echo -n "Creating table indexes for faster querying..."
psql mta-join<<EOF
CREATE INDEX campaign_id_index_old ON old_mta (campaign_id);
CREATE INDEX campaign_id_index_new ON new_mta (campaign_id);
CREATE INDEX placement_id_index_old ON old_mta (placement_id);
CREATE INDEX placement_id_index_new ON new_mta (placement_id);
CREATE INDEX creative_id_index_old ON old_mta (creative_id);
CREATE INDEX creative_id_index_new ON new_mta (creative_id);
CREATE INDEX package_id_index_old ON old_mta (package_id);
CREATE INDEX package_id_index_new ON new_mta (package_id);
CREATE INDEX audience_id_index_old ON old_mta (audience_id);
CREATE INDEX audience_id_index_new ON old_mta (audience_id);
\q
EOF
echo -e "Done!\n"

echo "Writing join to file..."
psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta
WHERE report_date = report_date_day AND new_mta.campaign_id =
old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id
AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id =
old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to
stdout;' > $OUTPUT_FILE
echo -e "Done!\n"

# Clearing the newly created tables
echo "Cleaning up postgres installation..."
psql mta-join<<EOF
DROP TABLE new_mta;
DROP TABLE old_mta;
EOF

# Drop the mta-join database
dropdb mta-join
echo -e "Done!\r"

Re: Valid query times out when run from bash script

From
"hb@101-factory.eu"
Date:
run the script with bash -v or
-vvv for extra detailed vebose logging.

see whats wrong, most of the times a
matter using the right closure of the statements with ' or "

Henk Bronk



On 27 mrt. 2012, at 20:37, "W. David Jarvis" <william.d.jarvis@gmail.com> wrote:

> Hello all -
>
> I've been trying to get a bash script set-up that creates a DB, does a
> join, writes the join to file, and then cleans itself up afterwards.
> Everything within the script runs fine when entered at an actual
> prompt, but when I run the script the SELECT query runs indefinitely.
> I'm not sure what the cause is—the script is being run on a box
> connecting to its own local postgres installation, so it can't be a
> connection issue. Any thoughts?
>
> The bash script is included below; postgres version is 9.0.4. Any help
> very much appreciated :)
>
> #!/bin/bash
>
> : ${1?"ERROR: Incorrect number of arguments (files have not been
> properly specified). Proper format is 'mta-join old_mta_file
> new_mta_file desired_output_filename.'"}
> : ${2?"ERROR: Incorrect number of arguments (files have not been
> properly specified). Proper format is 'mta-join old_mta_file
> new_mta_file desired_output_filename.'"}
> : ${3?"ERROR: Incorrect number of arguments (files have not been
> properly specified). Proper format is 'mta-join old_mta_file
> new_mta_file desired_output_filename.'"}
>
> OLD_MTA_FILE=$1
> NEW_MTA_FILE=$2
> OUTPUT_FILE=$3
>
> # Figure out how many days we need columns for
> export NEW_MTA_COLWIDTH=`head -1 $NEW_MTA_FILE | grep ',' -o | wc -l`
> NEW_MTA_COLWIDTH=($NEW_MTA_COLWIDTH-7)/4
>
> # Assemble the extra SQL for the above
> ADDITIONAL_CREATION_FIELDS=""
> ADDITIONAL_SELECTION_FIELDS=""
> for (( c=0; c<=$NEW_MTA_COLWIDTH; c++ ))
> do
>         ADDITIONAL_CREATION_FIELDS=$ADDITIONAL_CREATION_FIELDS"DAY_"$c"
> varchar(255), event_count_"$c" numeric(20,10), conversions_"$c"
> numeric(20,10), revenue_"$c" numeric(20,10), "
>         ADDITIONAL_SELECTION_FIELDS=$ADDITIONAL_SELECTION_FIELDS"new_mta.DAY_"$c",
> new_mta.event_count_"$c", new_mta.conversions_"$c",
> new_mta.revenue_"$c", "
> done
>
> # Let's get rid of that extra comma at the end.
>
> ADDITIONAL_CREATION_FIELDS=${ADDITIONAL_CREATION_FIELDS:0:${#ADDITIONAL_CREATION_FIELDS}-2}
> ADDITIONAL_SELECTION_FIELDS=${ADDITIONAL_SELECTION_FIELDS:0:${#ADDITIONAL_SELECTION_FIELDS}-2}
>
> echo -n "Creating database mta-join..."
> createdb mta-join
> echo -e "Done!\n"
>
> echo "Creating table new_mta..."
> # PSQL commands start here
> psql mta-join<<EOF
> CREATE TABLE new_mta (
> report_date             date,
> campaign_id             integer,
> tracking_campaign_id    integer,
> placement_id            integer,
> creative_id             integer,
> package_id              integer,
> audience_id             integer,
> $ADDITIONAL_CREATION_FIELDS);
> EOF
> echo -e "Done!\n"
>
> echo -n "Inserting new mta file into new_mta table..."
> psql mta-join<<EOF
> \copy new_mta from '$NEW_MTA_FILE' with delimiter ','
> \q
> EOF
> echo -e "Done!\n"
>
> echo "Creating table old_mta..."
> # PSQL commands start here
> psql mta-join<<EOF
> CREATE TABLE old_mta (
> report_date_day         date,
> report_date_week        date,
> report_date_month       date,
> campaign_name           varchar(255),
> package_name            varchar(255),
> audience_name           varchar(255),
> inventory_provider_name varchar(255),
> placement_name          varchar(255),
> creative_size           varchar(255),
> creative_name           varchar(255),
> impressions             bigint,
> data_cost               numeric(20,10),
> media_cost              numeric(20,10),
> gross_cost              numeric(20,10),
> clicks                  integer,
> lta_click_actions       integer,
> lta_view_actions        integer,
> lta_click_revenue       integer,
> lta_view_revenue        integer,
> mta_actions             numeric(20,10),
> mta_revenue             integer,
> mta_action_count        integer,
> mta_seconds             integer,
> campaign_id             integer,
> placement_id            bigint,
> creative_id             bigint,
> package_id              bigint,
> audience_id             integer);
> \q
> EOF
> echo -e "Done!\n"
>
> # Upload old MTA file into table old_mta
> echo -n "Inserting old mta file into old_mta table..."
> psql mta-join<<EOF
> \COPY old_mta from '$OLD_MTA_FILE' with delimiter ',' CSV header;
> EOF
> echo -e "Done!\n"
>
> # Create a bunch of indexes
> echo -n "Creating table indexes for faster querying..."
> psql mta-join<<EOF
> CREATE INDEX campaign_id_index_old ON old_mta (campaign_id);
> CREATE INDEX campaign_id_index_new ON new_mta (campaign_id);
> CREATE INDEX placement_id_index_old ON old_mta (placement_id);
> CREATE INDEX placement_id_index_new ON new_mta (placement_id);
> CREATE INDEX creative_id_index_old ON old_mta (creative_id);
> CREATE INDEX creative_id_index_new ON new_mta (creative_id);
> CREATE INDEX package_id_index_old ON old_mta (package_id);
> CREATE INDEX package_id_index_new ON new_mta (package_id);
> CREATE INDEX audience_id_index_old ON old_mta (audience_id);
> CREATE INDEX audience_id_index_new ON old_mta (audience_id);
> \q
> EOF
> echo -e "Done!\n"
>
> echo "Writing join to file..."
> psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta
> WHERE report_date = report_date_day AND new_mta.campaign_id =
> old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id
> AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id =
> old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to
> stdout;' > $OUTPUT_FILE
> echo -e "Done!\n"
>
> # Clearing the newly created tables
> echo "Cleaning up postgres installation..."
> psql mta-join<<EOF
> DROP TABLE new_mta;
> DROP TABLE old_mta;
> EOF
>
> # Drop the mta-join database
> dropdb mta-join
> echo -e "Done!\r"
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Valid query times out when run from bash script

From
Steve Crawford
Date:
On 03/27/2012 11:37 AM, W. David Jarvis wrote:
> Hello all -
>
> I've been trying to get a bash script...but when I run the script the SELECT query runs indefinitely.
>
> ...
> # Create a bunch of indexes
> echo -n "Creating table indexes for faster querying..."
> psql mta-join<<EOF
> CREATE INDEX campaign_id_index_old ON old_mta (campaign_id);
> ...
> CREATE INDEX audience_id_index_new ON old_mta (audience_id);
> ...
> psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta
> WHERE report_date = report_date_day AND new_mta.campaign_id =
> old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id
> AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id =
> old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to
> stdout;'>  $OUTPUT_FILE
>
Some things to try:

1. Change "psql" to "echo" and then run the statement that gets dumped
out by hand. See what happens.

2. Change your psql command to be an "EXPLAIN...", run the script and
examine the output of the explain.

Some thoughts:

1. You immediately run a select after creating, importing and making a
bunch of indexes. Is it possible that autovacuum hasn't yet analyzed the
table and you are getting a bad query plan? If so (or to make sure it
doesn't happen), update your script to include an ANALYZE of the
appropriate tables before you run the SELECT.

2. Run analyze anyway. It is likely that most of the indexes you create
are unused. Unless your data changes so much that the planner would
choose different indexes for different imports, you can probably
eliminate the steps of creating the unused indexes.

3. You can probably wrap all the steps into a single connection for a
small speed improvement. Running everything within a single connection
would allow you to use temporary tables which are unlogged. Alternately,
since you are on 9.1, you could leave the script alone and create an
unlogged table "CREATE UNLOGGED TABLE...". Both temporary and unlogged
tables don't write to the WAL so you get a speed improvement in return
for the data being at risk in a crash. In this case, the database is
just doing some temporary processing steps that are deleted anyway so
you don't need crash safety.

Cheers,
Steve


Re: Valid query times out when run from bash script

From
Raymond O'Donnell
Date:
On 27/03/2012 19:37, W. David Jarvis wrote:
> Hello all -
>
> I've been trying to get a bash script set-up that creates a DB, does a
> join, writes the join to file, and then cleans itself up afterwards.
> Everything within the script runs fine when entered at an actual
> prompt, but when I run the script the SELECT query runs indefinitely.
> I'm not sure what the cause is—the script is being run on a box
> connecting to its own local postgres installation, so it can't be a
> connection issue. Any thoughts?

Only a guess, but would the createdb line, or indeed any of the psql
invocations, be prompting for a password? - That would cause the script
to hang indefinitely.

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

Re: Valid query times out when run from bash script

From
"W. David Jarvis"
Date:
All -

Thanks for the prompt responses.  I've actually narrowed it down a bit
and it seems that something earlier (specifically, the for loop that
assembles $ADDITIONAL_CREATION_FIELDS and
$ADDITIONAL_SELECTION_FIELDS) in the script is causing Bash to write
extremely slowly (if I subset the query out itself into an independent
Bash script it executes appropriately). Since this is primarily a Bash
issue rather than a psql issue I may take my search elsewhere.

 - D

On Tue, Mar 27, 2012 at 1:20 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:
> On 03/27/2012 11:37 AM, W. David Jarvis wrote:
>>
>> Hello all -
>>
>> I've been trying to get a bash script...but when I run the script the
>> SELECT query runs indefinitely.
>>
>> ...
>>
>> # Create a bunch of indexes
>> echo -n "Creating table indexes for faster querying..."
>> psql mta-join<<EOF
>> CREATE INDEX campaign_id_index_old ON old_mta (campaign_id);
>> ...
>>
>> CREATE INDEX audience_id_index_new ON old_mta (audience_id);
>> ...
>>
>> psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta
>> WHERE report_date = report_date_day AND new_mta.campaign_id =
>> old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id
>> AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id =
>> old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to
>> stdout;'>  $OUTPUT_FILE
>>
> Some things to try:
>
> 1. Change "psql" to "echo" and then run the statement that gets dumped out
> by hand. See what happens.
>
> 2. Change your psql command to be an "EXPLAIN...", run the script and
> examine the output of the explain.
>
> Some thoughts:
>
> 1. You immediately run a select after creating, importing and making a bunch
> of indexes. Is it possible that autovacuum hasn't yet analyzed the table and
> you are getting a bad query plan? If so (or to make sure it doesn't happen),
> update your script to include an ANALYZE of the appropriate tables before
> you run the SELECT.
>
> 2. Run analyze anyway. It is likely that most of the indexes you create are
> unused. Unless your data changes so much that the planner would choose
> different indexes for different imports, you can probably eliminate the
> steps of creating the unused indexes.
>
> 3. You can probably wrap all the steps into a single connection for a small
> speed improvement. Running everything within a single connection would allow
> you to use temporary tables which are unlogged. Alternately, since you are
> on 9.1, you could leave the script alone and create an unlogged table
> "CREATE UNLOGGED TABLE...". Both temporary and unlogged tables don't write
> to the WAL so you get a speed improvement in return for the data being at
> risk in a crash. In this case, the database is just doing some temporary
> processing steps that are deleted anyway so you don't need crash safety.
>
> Cheers,
> Steve
>



--
W. David Jarvis
M: 203.918.2328

Re: Valid query times out when run from bash script

From
"W. David Jarvis"
Date:
No prompt, seems to be using .pgpass without a problem.

On Tue, Mar 27, 2012 at 1:59 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 27/03/2012 19:37, W. David Jarvis wrote:
>> Hello all -
>>
>> I've been trying to get a bash script set-up that creates a DB, does a
>> join, writes the join to file, and then cleans itself up afterwards.
>> Everything within the script runs fine when entered at an actual
>> prompt, but when I run the script the SELECT query runs indefinitely.
>> I'm not sure what the cause is—the script is being run on a box
>> connecting to its own local postgres installation, so it can't be a
>> connection issue. Any thoughts?
>
> Only a guess, but would the createdb line, or indeed any of the psql
> invocations, be prompting for a password? - That would cause the script
> to hang indefinitely.
>
> Ray.
>
>
> --
> Raymond O'Donnell :: Galway :: Ireland
> rod@iol.ie



--
W. David Jarvis
M: 203.918.2328