Thread: Valid query times out when run from bash script
Hello all - I've been trying to get a bash script set-up that creates a DB, does a join, writes the join to file, and then cleans itself up afterwards. Everything within the script runs fine when entered at an actual prompt, but when I run the script the SELECT query runs indefinitely. I'm not sure what the cause is—the script is being run on a box connecting to its own local postgres installation, so it can't be a connection issue. Any thoughts? The bash script is included below; postgres version is 9.0.4. Any help very much appreciated :) #!/bin/bash : ${1?"ERROR: Incorrect number of arguments (files have not been properly specified). Proper format is 'mta-join old_mta_file new_mta_file desired_output_filename.'"} : ${2?"ERROR: Incorrect number of arguments (files have not been properly specified). Proper format is 'mta-join old_mta_file new_mta_file desired_output_filename.'"} : ${3?"ERROR: Incorrect number of arguments (files have not been properly specified). Proper format is 'mta-join old_mta_file new_mta_file desired_output_filename.'"} OLD_MTA_FILE=$1 NEW_MTA_FILE=$2 OUTPUT_FILE=$3 # Figure out how many days we need columns for export NEW_MTA_COLWIDTH=`head -1 $NEW_MTA_FILE | grep ',' -o | wc -l` NEW_MTA_COLWIDTH=($NEW_MTA_COLWIDTH-7)/4 # Assemble the extra SQL for the above ADDITIONAL_CREATION_FIELDS="" ADDITIONAL_SELECTION_FIELDS="" for (( c=0; c<=$NEW_MTA_COLWIDTH; c++ )) do ADDITIONAL_CREATION_FIELDS=$ADDITIONAL_CREATION_FIELDS"DAY_"$c" varchar(255), event_count_"$c" numeric(20,10), conversions_"$c" numeric(20,10), revenue_"$c" numeric(20,10), " ADDITIONAL_SELECTION_FIELDS=$ADDITIONAL_SELECTION_FIELDS"new_mta.DAY_"$c", new_mta.event_count_"$c", new_mta.conversions_"$c", new_mta.revenue_"$c", " done # Let's get rid of that extra comma at the end. ADDITIONAL_CREATION_FIELDS=${ADDITIONAL_CREATION_FIELDS:0:${#ADDITIONAL_CREATION_FIELDS}-2} ADDITIONAL_SELECTION_FIELDS=${ADDITIONAL_SELECTION_FIELDS:0:${#ADDITIONAL_SELECTION_FIELDS}-2} echo -n "Creating database mta-join..." createdb mta-join echo -e "Done!\n" echo "Creating table new_mta..." # PSQL commands start here psql mta-join<<EOF CREATE TABLE new_mta ( report_date date, campaign_id integer, tracking_campaign_id integer, placement_id integer, creative_id integer, package_id integer, audience_id integer, $ADDITIONAL_CREATION_FIELDS); EOF echo -e "Done!\n" echo -n "Inserting new mta file into new_mta table..." psql mta-join<<EOF \copy new_mta from '$NEW_MTA_FILE' with delimiter ',' \q EOF echo -e "Done!\n" echo "Creating table old_mta..." # PSQL commands start here psql mta-join<<EOF CREATE TABLE old_mta ( report_date_day date, report_date_week date, report_date_month date, campaign_name varchar(255), package_name varchar(255), audience_name varchar(255), inventory_provider_name varchar(255), placement_name varchar(255), creative_size varchar(255), creative_name varchar(255), impressions bigint, data_cost numeric(20,10), media_cost numeric(20,10), gross_cost numeric(20,10), clicks integer, lta_click_actions integer, lta_view_actions integer, lta_click_revenue integer, lta_view_revenue integer, mta_actions numeric(20,10), mta_revenue integer, mta_action_count integer, mta_seconds integer, campaign_id integer, placement_id bigint, creative_id bigint, package_id bigint, audience_id integer); \q EOF echo -e "Done!\n" # Upload old MTA file into table old_mta echo -n "Inserting old mta file into old_mta table..." psql mta-join<<EOF \COPY old_mta from '$OLD_MTA_FILE' with delimiter ',' CSV header; EOF echo -e "Done!\n" # Create a bunch of indexes echo -n "Creating table indexes for faster querying..." psql mta-join<<EOF CREATE INDEX campaign_id_index_old ON old_mta (campaign_id); CREATE INDEX campaign_id_index_new ON new_mta (campaign_id); CREATE INDEX placement_id_index_old ON old_mta (placement_id); CREATE INDEX placement_id_index_new ON new_mta (placement_id); CREATE INDEX creative_id_index_old ON old_mta (creative_id); CREATE INDEX creative_id_index_new ON new_mta (creative_id); CREATE INDEX package_id_index_old ON old_mta (package_id); CREATE INDEX package_id_index_new ON new_mta (package_id); CREATE INDEX audience_id_index_old ON old_mta (audience_id); CREATE INDEX audience_id_index_new ON old_mta (audience_id); \q EOF echo -e "Done!\n" echo "Writing join to file..." psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta WHERE report_date = report_date_day AND new_mta.campaign_id = old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id = old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to stdout;' > $OUTPUT_FILE echo -e "Done!\n" # Clearing the newly created tables echo "Cleaning up postgres installation..." psql mta-join<<EOF DROP TABLE new_mta; DROP TABLE old_mta; EOF # Drop the mta-join database dropdb mta-join echo -e "Done!\r"
run the script with bash -v or -vvv for extra detailed vebose logging. see whats wrong, most of the times a matter using the right closure of the statements with ' or " Henk Bronk On 27 mrt. 2012, at 20:37, "W. David Jarvis" <william.d.jarvis@gmail.com> wrote: > Hello all - > > I've been trying to get a bash script set-up that creates a DB, does a > join, writes the join to file, and then cleans itself up afterwards. > Everything within the script runs fine when entered at an actual > prompt, but when I run the script the SELECT query runs indefinitely. > I'm not sure what the cause is—the script is being run on a box > connecting to its own local postgres installation, so it can't be a > connection issue. Any thoughts? > > The bash script is included below; postgres version is 9.0.4. Any help > very much appreciated :) > > #!/bin/bash > > : ${1?"ERROR: Incorrect number of arguments (files have not been > properly specified). Proper format is 'mta-join old_mta_file > new_mta_file desired_output_filename.'"} > : ${2?"ERROR: Incorrect number of arguments (files have not been > properly specified). Proper format is 'mta-join old_mta_file > new_mta_file desired_output_filename.'"} > : ${3?"ERROR: Incorrect number of arguments (files have not been > properly specified). Proper format is 'mta-join old_mta_file > new_mta_file desired_output_filename.'"} > > OLD_MTA_FILE=$1 > NEW_MTA_FILE=$2 > OUTPUT_FILE=$3 > > # Figure out how many days we need columns for > export NEW_MTA_COLWIDTH=`head -1 $NEW_MTA_FILE | grep ',' -o | wc -l` > NEW_MTA_COLWIDTH=($NEW_MTA_COLWIDTH-7)/4 > > # Assemble the extra SQL for the above > ADDITIONAL_CREATION_FIELDS="" > ADDITIONAL_SELECTION_FIELDS="" > for (( c=0; c<=$NEW_MTA_COLWIDTH; c++ )) > do > ADDITIONAL_CREATION_FIELDS=$ADDITIONAL_CREATION_FIELDS"DAY_"$c" > varchar(255), event_count_"$c" numeric(20,10), conversions_"$c" > numeric(20,10), revenue_"$c" numeric(20,10), " > ADDITIONAL_SELECTION_FIELDS=$ADDITIONAL_SELECTION_FIELDS"new_mta.DAY_"$c", > new_mta.event_count_"$c", new_mta.conversions_"$c", > new_mta.revenue_"$c", " > done > > # Let's get rid of that extra comma at the end. > > ADDITIONAL_CREATION_FIELDS=${ADDITIONAL_CREATION_FIELDS:0:${#ADDITIONAL_CREATION_FIELDS}-2} > ADDITIONAL_SELECTION_FIELDS=${ADDITIONAL_SELECTION_FIELDS:0:${#ADDITIONAL_SELECTION_FIELDS}-2} > > echo -n "Creating database mta-join..." > createdb mta-join > echo -e "Done!\n" > > echo "Creating table new_mta..." > # PSQL commands start here > psql mta-join<<EOF > CREATE TABLE new_mta ( > report_date date, > campaign_id integer, > tracking_campaign_id integer, > placement_id integer, > creative_id integer, > package_id integer, > audience_id integer, > $ADDITIONAL_CREATION_FIELDS); > EOF > echo -e "Done!\n" > > echo -n "Inserting new mta file into new_mta table..." > psql mta-join<<EOF > \copy new_mta from '$NEW_MTA_FILE' with delimiter ',' > \q > EOF > echo -e "Done!\n" > > echo "Creating table old_mta..." > # PSQL commands start here > psql mta-join<<EOF > CREATE TABLE old_mta ( > report_date_day date, > report_date_week date, > report_date_month date, > campaign_name varchar(255), > package_name varchar(255), > audience_name varchar(255), > inventory_provider_name varchar(255), > placement_name varchar(255), > creative_size varchar(255), > creative_name varchar(255), > impressions bigint, > data_cost numeric(20,10), > media_cost numeric(20,10), > gross_cost numeric(20,10), > clicks integer, > lta_click_actions integer, > lta_view_actions integer, > lta_click_revenue integer, > lta_view_revenue integer, > mta_actions numeric(20,10), > mta_revenue integer, > mta_action_count integer, > mta_seconds integer, > campaign_id integer, > placement_id bigint, > creative_id bigint, > package_id bigint, > audience_id integer); > \q > EOF > echo -e "Done!\n" > > # Upload old MTA file into table old_mta > echo -n "Inserting old mta file into old_mta table..." > psql mta-join<<EOF > \COPY old_mta from '$OLD_MTA_FILE' with delimiter ',' CSV header; > EOF > echo -e "Done!\n" > > # Create a bunch of indexes > echo -n "Creating table indexes for faster querying..." > psql mta-join<<EOF > CREATE INDEX campaign_id_index_old ON old_mta (campaign_id); > CREATE INDEX campaign_id_index_new ON new_mta (campaign_id); > CREATE INDEX placement_id_index_old ON old_mta (placement_id); > CREATE INDEX placement_id_index_new ON new_mta (placement_id); > CREATE INDEX creative_id_index_old ON old_mta (creative_id); > CREATE INDEX creative_id_index_new ON new_mta (creative_id); > CREATE INDEX package_id_index_old ON old_mta (package_id); > CREATE INDEX package_id_index_new ON new_mta (package_id); > CREATE INDEX audience_id_index_old ON old_mta (audience_id); > CREATE INDEX audience_id_index_new ON old_mta (audience_id); > \q > EOF > echo -e "Done!\n" > > echo "Writing join to file..." > psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta > WHERE report_date = report_date_day AND new_mta.campaign_id = > old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id > AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id = > old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to > stdout;' > $OUTPUT_FILE > echo -e "Done!\n" > > # Clearing the newly created tables > echo "Cleaning up postgres installation..." > psql mta-join<<EOF > DROP TABLE new_mta; > DROP TABLE old_mta; > EOF > > # Drop the mta-join database > dropdb mta-join > echo -e "Done!\r" > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On 03/27/2012 11:37 AM, W. David Jarvis wrote: > Hello all - > > I've been trying to get a bash script...but when I run the script the SELECT query runs indefinitely. > > ... > # Create a bunch of indexes > echo -n "Creating table indexes for faster querying..." > psql mta-join<<EOF > CREATE INDEX campaign_id_index_old ON old_mta (campaign_id); > ... > CREATE INDEX audience_id_index_new ON old_mta (audience_id); > ... > psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta > WHERE report_date = report_date_day AND new_mta.campaign_id = > old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id > AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id = > old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to > stdout;'> $OUTPUT_FILE > Some things to try: 1. Change "psql" to "echo" and then run the statement that gets dumped out by hand. See what happens. 2. Change your psql command to be an "EXPLAIN...", run the script and examine the output of the explain. Some thoughts: 1. You immediately run a select after creating, importing and making a bunch of indexes. Is it possible that autovacuum hasn't yet analyzed the table and you are getting a bad query plan? If so (or to make sure it doesn't happen), update your script to include an ANALYZE of the appropriate tables before you run the SELECT. 2. Run analyze anyway. It is likely that most of the indexes you create are unused. Unless your data changes so much that the planner would choose different indexes for different imports, you can probably eliminate the steps of creating the unused indexes. 3. You can probably wrap all the steps into a single connection for a small speed improvement. Running everything within a single connection would allow you to use temporary tables which are unlogged. Alternately, since you are on 9.1, you could leave the script alone and create an unlogged table "CREATE UNLOGGED TABLE...". Both temporary and unlogged tables don't write to the WAL so you get a speed improvement in return for the data being at risk in a crash. In this case, the database is just doing some temporary processing steps that are deleted anyway so you don't need crash safety. Cheers, Steve
On 27/03/2012 19:37, W. David Jarvis wrote: > Hello all - > > I've been trying to get a bash script set-up that creates a DB, does a > join, writes the join to file, and then cleans itself up afterwards. > Everything within the script runs fine when entered at an actual > prompt, but when I run the script the SELECT query runs indefinitely. > I'm not sure what the cause is—the script is being run on a box > connecting to its own local postgres installation, so it can't be a > connection issue. Any thoughts? Only a guess, but would the createdb line, or indeed any of the psql invocations, be prompting for a password? - That would cause the script to hang indefinitely. Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
All - Thanks for the prompt responses. I've actually narrowed it down a bit and it seems that something earlier (specifically, the for loop that assembles $ADDITIONAL_CREATION_FIELDS and $ADDITIONAL_SELECTION_FIELDS) in the script is causing Bash to write extremely slowly (if I subset the query out itself into an independent Bash script it executes appropriately). Since this is primarily a Bash issue rather than a psql issue I may take my search elsewhere. - D On Tue, Mar 27, 2012 at 1:20 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 03/27/2012 11:37 AM, W. David Jarvis wrote: >> >> Hello all - >> >> I've been trying to get a bash script...but when I run the script the >> SELECT query runs indefinitely. >> >> ... >> >> # Create a bunch of indexes >> echo -n "Creating table indexes for faster querying..." >> psql mta-join<<EOF >> CREATE INDEX campaign_id_index_old ON old_mta (campaign_id); >> ... >> >> CREATE INDEX audience_id_index_new ON old_mta (audience_id); >> ... >> >> psql -d mta-join -t -A -F ',' -c 'copy (SELECT * FROM new_mta, old_mta >> WHERE report_date = report_date_day AND new_mta.campaign_id = >> old_mta.campaign_id AND new_mta.placement_id = old_mta.placement_id >> AND new_mta.creative_id = old_mta.creative_id AND new_mta.package_id = >> old_mta.package_id AND new_mta.audience_id = old_mta.audience_id) to >> stdout;'> $OUTPUT_FILE >> > Some things to try: > > 1. Change "psql" to "echo" and then run the statement that gets dumped out > by hand. See what happens. > > 2. Change your psql command to be an "EXPLAIN...", run the script and > examine the output of the explain. > > Some thoughts: > > 1. You immediately run a select after creating, importing and making a bunch > of indexes. Is it possible that autovacuum hasn't yet analyzed the table and > you are getting a bad query plan? If so (or to make sure it doesn't happen), > update your script to include an ANALYZE of the appropriate tables before > you run the SELECT. > > 2. Run analyze anyway. It is likely that most of the indexes you create are > unused. Unless your data changes so much that the planner would choose > different indexes for different imports, you can probably eliminate the > steps of creating the unused indexes. > > 3. You can probably wrap all the steps into a single connection for a small > speed improvement. Running everything within a single connection would allow > you to use temporary tables which are unlogged. Alternately, since you are > on 9.1, you could leave the script alone and create an unlogged table > "CREATE UNLOGGED TABLE...". Both temporary and unlogged tables don't write > to the WAL so you get a speed improvement in return for the data being at > risk in a crash. In this case, the database is just doing some temporary > processing steps that are deleted anyway so you don't need crash safety. > > Cheers, > Steve > -- W. David Jarvis M: 203.918.2328
No prompt, seems to be using .pgpass without a problem. On Tue, Mar 27, 2012 at 1:59 PM, Raymond O'Donnell <rod@iol.ie> wrote: > On 27/03/2012 19:37, W. David Jarvis wrote: >> Hello all - >> >> I've been trying to get a bash script set-up that creates a DB, does a >> join, writes the join to file, and then cleans itself up afterwards. >> Everything within the script runs fine when entered at an actual >> prompt, but when I run the script the SELECT query runs indefinitely. >> I'm not sure what the cause is—the script is being run on a box >> connecting to its own local postgres installation, so it can't be a >> connection issue. Any thoughts? > > Only a guess, but would the createdb line, or indeed any of the psql > invocations, be prompting for a password? - That would cause the script > to hang indefinitely. > > Ray. > > > -- > Raymond O'Donnell :: Galway :: Ireland > rod@iol.ie -- W. David Jarvis M: 203.918.2328