Thread: Work table

Work table

From
Robert James
Date:
I'm using Postgres for data analysis (interactive and batch).  I need
to focus the analysis on a subset of one table, and, for both
performance and simplicity, have a function which loads that subset
into another table (DELETE FROM another_table; INSERT INTO
another_table SELECT ...).

Oddly enough, although the SELECT itself is very quick (< 1 s), the
DELETE and INSERT can take over a minute! I can't figure out why.
another_table is simple: it has only 7 fields.  Two of those fields
are indexed, using a simple one field standard index.  There are no
triggers on it.

What is the cause of this behavior? What should I do to make this
faster? Is there a recommended work around?

(I'm hesitant to drop another_table and recreate it each time, since
many views depend on it.)


Re: Work table

From
Thomas Kellerer
Date:
Robert James wrote on 27.10.2013 20:47:
> I'm using Postgres for data analysis (interactive and batch).  I need
> to focus the analysis on a subset of one table, and, for both
> performance and simplicity, have a function which loads that subset
> into another table (DELETE FROM another_table; INSERT INTO
> another_table SELECT ...).
>
> Oddly enough, although the SELECT itself is very quick (< 1 s), the
> DELETE and INSERT can take over a minute! I can't figure out why.
> another_table is simple: it has only 7 fields.  Two of those fields
> are indexed, using a simple one field standard index.  There are no
> triggers on it.
>
> What is the cause of this behavior? What should I do to make this
> faster? Is there a recommended work around?
>
> (I'm hesitant to drop another_table and recreate it each time, since
> many views depend on it.)

DELETE can be a quite lengthy thing to do - especially with a large number of rows.

If you use TRUNCATE instead, this will be *much* quicker with the additional benefit,
that if you INSERT the rows in the same transaction, the INSERT will require much less
I/O because it's not logged.




Re: Work table

From
Robert James
Date:
On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Robert James wrote on 27.10.2013 20:47:
>> I'm using Postgres for data analysis (interactive and batch).  I need
>> to focus the analysis on a subset of one table, and, for both
>> performance and simplicity, have a function which loads that subset
>> into another table (DELETE FROM another_table; INSERT INTO
>> another_table SELECT ...).
>>
>> Oddly enough, although the SELECT itself is very quick (< 1 s), the
>> DELETE and INSERT can take over a minute! I can't figure out why.
>> another_table is simple: it has only 7 fields.  Two of those fields
>> are indexed, using a simple one field standard index.  There are no
>> triggers on it.
>>
>> What is the cause of this behavior? What should I do to make this
>> faster? Is there a recommended work around?
>>
>> (I'm hesitant to drop another_table and recreate it each time, since
>> many views depend on it.)
>
> DELETE can be a quite lengthy thing to do - especially with a large number
> of rows.
>
> If you use TRUNCATE instead, this will be *much* quicker with the additional
> benefit,
> that if you INSERT the rows in the same transaction, the INSERT will require
> much less
> I/O because it's not logged.
>

Changing DELETE to TRUNCATE and putting it all in a transaction
brought the time down to 40 seconds.  But this is still awfully slow,
when the SELECT is under a second.

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Or should I be using a different type of table for work tables? (RAM only table)


Re: Work table

From
Adrian Klaver
Date:
On 10/27/2013 02:23 PM, Robert James wrote:
> On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:
>> Robert James wrote on 27.10.2013 20:47:
>>> I'm using Postgres for data analysis (interactive and batch).  I need
>>> to focus the analysis on a subset of one table, and, for both
>>> performance and simplicity, have a function which loads that subset
>>> into another table (DELETE FROM another_table; INSERT INTO
>>> another_table SELECT ...).
>>>
>>> Oddly enough, although the SELECT itself is very quick (< 1 s), the
>>> DELETE and INSERT can take over a minute! I can't figure out why.
>>> another_table is simple: it has only 7 fields.  Two of those fields
>>> are indexed, using a simple one field standard index.  There are no
>>> triggers on it.
>>>
>>> What is the cause of this behavior? What should I do to make this
>>> faster? Is there a recommended work around?
>>>
>>> (I'm hesitant to drop another_table and recreate it each time, since
>>> many views depend on it.)
>>
>> DELETE can be a quite lengthy thing to do - especially with a large number
>> of rows.
>>
>> If you use TRUNCATE instead, this will be *much* quicker with the additional
>> benefit,
>> that if you INSERT the rows in the same transaction, the INSERT will require
>> much less
>> I/O because it's not logged.
>>
>
> Changing DELETE to TRUNCATE and putting it all in a transaction
> brought the time down to 40 seconds.  But this is still awfully slow,
> when the SELECT is under a second.
>
> Is there another problem here? Perhaps something to do with
> triggerring autovacuum?

Is there a FK relationship involved?

Could we see the schema for another_table?

>
> Or should I be using a different type of table for work tables? (RAM only table)
>
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Work table

From
Robert James
Date:
On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On 10/27/2013 02:23 PM, Robert James wrote:
>> On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:
>>> Robert James wrote on 27.10.2013 20:47:
>>>> I'm using Postgres for data analysis (interactive and batch).  I need
>>>> to focus the analysis on a subset of one table, and, for both
>>>> performance and simplicity, have a function which loads that subset
>>>> into another table (DELETE FROM another_table; INSERT INTO
>>>> another_table SELECT ...).
>>>>
>>>> Oddly enough, although the SELECT itself is very quick (< 1 s), the
>>>> DELETE and INSERT can take over a minute! I can't figure out why.
>>>> another_table is simple: it has only 7 fields.  Two of those fields
>>>> are indexed, using a simple one field standard index.  There are no
>>>> triggers on it.
>>>>
>>>> What is the cause of this behavior? What should I do to make this
>>>> faster? Is there a recommended work around?
>>>>
>>>> (I'm hesitant to drop another_table and recreate it each time, since
>>>> many views depend on it.)
>>>
>>> DELETE can be a quite lengthy thing to do - especially with a large
>>> number
>>> of rows.
>>>
>>> If you use TRUNCATE instead, this will be *much* quicker with the
>>> additional
>>> benefit,
>>> that if you INSERT the rows in the same transaction, the INSERT will
>>> require
>>> much less
>>> I/O because it's not logged.
>>>
>>
>> Changing DELETE to TRUNCATE and putting it all in a transaction
>> brought the time down to 40 seconds.  But this is still awfully slow,
>> when the SELECT is under a second.
>>
>> Is there another problem here? Perhaps something to do with
>> triggerring autovacuum?
>
> Is there a FK relationship involved?
>
> Could we see the schema for another_table?


1. No FK
2. I removed the indexes from the table
TRUNCATE takes only 40 ms, but the INSERT still takes 10s!
3. ALTER TABLE another_table SET (autovacuum_enabled = true,
toast.autovacuum_enabled = true); didn't seem to make a difference

4. Here's the schema:


CREATE TABLE another_table
(
  id serial NOT NULL,
  eventtime timestamp without time zone NOT NULL,
  reporter character varying NOT NULL,
  loc character varying NOT NULL,
  city character varying NOT NULL,
  stanza character varying,
  purdue character varying,
  CONSTRAINT segment_pkey PRIMARY KEY (id)
)


Re: Work table

From
Adrian Klaver
Date:
On 10/27/2013 02:48 PM, Robert James wrote:
> On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:


>>> Is there another problem here? Perhaps something to do with
>>> triggerring autovacuum?
>>
>> Is there a FK relationship involved?
>>
>> Could we see the schema for another_table?
>
>
> 1. No FK
> 2. I removed the indexes from the table
> TRUNCATE takes only 40 ms, but the INSERT still takes 10s!

So how many records are we talking about?

Also complete this sentence :)

INSERT INTO
another_table SELECT ...

In other words what is the SELECT statement for the INSERT?

Also, you mentioned the above was in a function. What is the function
body and how is it being called?

> 3. ALTER TABLE another_table SET (autovacuum_enabled = true,
> toast.autovacuum_enabled = true); didn't seem to make a difference
>
> 4. Here's the schema:
>
>
> CREATE TABLE another_table
> (
>    id serial NOT NULL,
>    eventtime timestamp without time zone NOT NULL,
>    reporter character varying NOT NULL,
>    loc character varying NOT NULL,
>    city character varying NOT NULL,
>    stanza character varying,
>    purdue character varying,
>    CONSTRAINT segment_pkey PRIMARY KEY (id)
> )
>
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Work table

From
Eelke Klein
Date:

2013/10/27 Robert James <srobertjames@gmail.com>
On 10/27/13, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Robert James wrote on 27.10.2013 20:47:
>> I'm using Postgres for data analysis (interactive and batch).  I need
>> to focus the analysis on a subset of one table, and, for both
>> performance and simplicity, have a function which loads that subset
>> into another table (DELETE FROM another_table; INSERT INTO
>> another_table SELECT ...).
>>
>> Oddly enough, although the SELECT itself is very quick (< 1 s), the
>> DELETE and INSERT can take over a minute! I can't figure out why.
>> another_table is simple: it has only 7 fields.  Two of those fields
>> are indexed, using a simple one field standard index.  There are no
>> triggers on it.
>>
>> What is the cause of this behavior? What should I do to make this
>> faster? Is there a recommended work around?
>>
>> (I'm hesitant to drop another_table and recreate it each time, since
>> many views depend on it.)
>
> DELETE can be a quite lengthy thing to do - especially with a large number
> of rows.
>
> If you use TRUNCATE instead, this will be *much* quicker with the additional
> benefit,
> that if you INSERT the rows in the same transaction, the INSERT will require
> much less
> I/O because it's not logged.
>

Changing DELETE to TRUNCATE and putting it all in a transaction
brought the time down to 40 seconds.  But this is still awfully slow,
when the SELECT is under a second.

How many rows are being inserted?
 

Is there another problem here? Perhaps something to do with
triggerring autovacuum?

Or should I be using a different type of table for work tables? (RAM only table)



You could use a TEMP or UNLOGGED table depending on how long you need it to stay around (for these types of tables data won't be forced to disk before returning from the COMMIT).


Re: Work table

From
Robert James
Date:
It could be > 1 Million rows.

SELECT is:

SELECT *
    FROM another_table
    WHERE
        eventtime > (SELECT e FROM tags WHERE id = $1) AND
        eventtime < (SELECT e FROM tags WHERE id = $2)
;

$1 and $2 are integers.

SELECT ran just now, returning >1Million rows, in 1.6 seconds.

Inserting into work table causes weird behavior - it takes over a
minute, PG CPU climbs to 100%, but then other subsequent queries
sometimes seem to slow down too.  After a lot of these, sometimes PG
acts irresponsive until I restart it.

The function is just a wrapper to set $1 and $2.  I get the same
behavior when I try just its SQL, no function.


On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On 10/27/2013 02:48 PM, Robert James wrote:
>> On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:
>
>
>>>> Is there another problem here? Perhaps something to do with
>>>> triggerring autovacuum?
>>>
>>> Is there a FK relationship involved?
>>>
>>> Could we see the schema for another_table?
>>
>>
>> 1. No FK
>> 2. I removed the indexes from the table
>> TRUNCATE takes only 40 ms, but the INSERT still takes 10s!
>
> So how many records are we talking about?
>
> Also complete this sentence :)
>
> INSERT INTO
> another_table SELECT ...
>
> In other words what is the SELECT statement for the INSERT?
>
> Also, you mentioned the above was in a function. What is the function
> body and how is it being called?
>
>> 3. ALTER TABLE another_table SET (autovacuum_enabled = true,
>> toast.autovacuum_enabled = true); didn't seem to make a difference
>>
>> 4. Here's the schema:
>>
>>
>> CREATE TABLE another_table
>> (
>>    id serial NOT NULL,
>>    eventtime timestamp without time zone NOT NULL,
>>    reporter character varying NOT NULL,
>>    loc character varying NOT NULL,
>>    city character varying NOT NULL,
>>    stanza character varying,
>>    purdue character varying,
>>    CONSTRAINT segment_pkey PRIMARY KEY (id)
>> )
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@gmail.com
>


Re: Work table

From
Adrian Klaver
Date:
On 10/28/2013 12:50 PM, Robert James wrote:
> It could be > 1 Million rows.

Well that would be a reason.

>
> SELECT is:
>
> SELECT *
>     FROM another_table
>     WHERE
>         eventtime > (SELECT e FROM tags WHERE id = $1) AND
>         eventtime < (SELECT e FROM tags WHERE id = $2)
> ;
>
> $1 and $2 are integers.
>
> SELECT ran just now, returning >1Million rows, in 1.6 seconds.

Not surprising there is less overhead for a SELECT than an INSERT.

>
> Inserting into work table causes weird behavior - it takes over a
> minute, PG CPU climbs to 100%, but then other subsequent queries
> sometimes seem to slow down too.  After a lot of these, sometimes PG
> acts irresponsive until I restart it.

Below you say it takes 10s.

>
> The function is just a wrapper to set $1 and $2.  I get the same
> behavior when I try just its SQL, no function.
>
>
> On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:
>> On 10/27/2013 02:48 PM, Robert James wrote:
>>> On 10/27/13, Adrian Klaver <adrian.klaver@gmail.com> wrote:
>>
>>
>>>>> Is there another problem here? Perhaps something to do with
>>>>> triggerring autovacuum?
>>>>
>>>> Is there a FK relationship involved?
>>>>
>>>> Could we see the schema for another_table?
>>>
>>>
>>> 1. No FK
>>> 2. I removed the indexes from the table
>>> TRUNCATE takes only 40 ms, but the INSERT still takes 10s!
>>
>> So how many records are we talking about?
>>
>> Also complete this sentence :)
>>
>> INSERT INTO
>> another_table SELECT ...
>>
>> In other words what is the SELECT statement for the INSERT?
>>
>> Also, you mentioned the above was in a function. What is the function
>> body and how is it being called?
>>
>>> 3. ALTER TABLE another_table SET (autovacuum_enabled = true,
>>> toast.autovacuum_enabled = true); didn't seem to make a difference
>>>
>>> 4. Here's the schema:
>>>
>>>
>>> CREATE TABLE another_table
>>> (
>>>     id serial NOT NULL,
>>>     eventtime timestamp without time zone NOT NULL,
>>>     reporter character varying NOT NULL,
>>>     loc character varying NOT NULL,
>>>     city character varying NOT NULL,
>>>     stanza character varying,
>>>     purdue character varying,
>>>     CONSTRAINT segment_pkey PRIMARY KEY (id)
>>> )
>>>
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@gmail.com
>>
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: Work table

From
Bill Moran
Date:
On Mon, 28 Oct 2013 15:50:42 -0400 Robert James <srobertjames@gmail.com> wrote:

> It could be > 1 Million rows.
>
> SELECT is:
>
> SELECT *
>     FROM another_table
>     WHERE
>         eventtime > (SELECT e FROM tags WHERE id = $1) AND
>         eventtime < (SELECT e FROM tags WHERE id = $2)
> ;
>
> $1 and $2 are integers.
>
> SELECT ran just now, returning >1Million rows, in 1.6 seconds.
>
> Inserting into work table causes weird behavior - it takes over a
> minute, PG CPU climbs to 100%, but then other subsequent queries
> sometimes seem to slow down too.  After a lot of these, sometimes PG
> acts irresponsive until I restart it.

Depending on the nature of your hardware, this might not be weird ...

For example, if you have enough RAM that all of another_table fits
in RAM, and (comparitively) slow disks, the SELECT would run
considerably faster than an insert of the same size.

--
Bill Moran <wmoran@potentialtech.com>