Thread: Index slow down insertions...

Index slow down insertions...

From
Ioannis Anagnostopoulos
Date:
Hello,

Our postgres 9.0 DB has one table (the important one) where the bulk of
insertions is happening. We are looking more or less at around 15K to
20K insertions per minute and my measurements give me a rate of 0.60 to
1 msec per insertion. A summary of the table where the insertions are
happening is as follows:

-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  msg_id
bigint NOT NULL DEFAULT
nextval('feed_all_y2012m07.messages_msg_id_seq'::regclass),
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  msg_type
smallint NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  obj_id
integer NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
msg_date_rec timestamp without time zone NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  msg_text
text NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
msg_expanded boolean NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  msg_time
time without time zone,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
pos_accuracy boolean NOT NULL DEFAULT false,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  pos_raim
boolean NOT NULL DEFAULT false,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  pos_lon
integer NOT NULL DEFAULT (181 * 600000),
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:  pos_lat
integer NOT NULL DEFAULT (91 * 60000),
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
pos_georef1 character varying(2) NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
pos_georef2 character varying(2) NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
pos_georef3 character varying(2) NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
pos_georef4 character varying(2) NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
pos_point geometry,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_speed smallint NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_course smallint NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_heading smallint NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_second smallint NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_radio integer NOT NULL,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_status ais_server.nav_status NOT NULL DEFAULT
'NOT_DEFINED'::ais_server.nav_status,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_turn smallint NOT NULL DEFAULT 128,
-- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
ship_maneuver smallint NOT NULL,
   CONSTRAINT ship_a_pos_messages_wk0_pkey PRIMARY KEY (msg_id )

The table is created in table space "Data" while its indexes in table
space "Index" (a different HD). Now once the database is empty the
configuration is flying but of course this is not the case always. 5
days later and around 55,000,000 rows later the insertions are literally
so slow that the application server has to drop inserts in order to keep
up. To be precise we are looking now at 1 insertion every 5 to 10,
sometimes 25 msec!!

After lots of tuning both on the postgres server and the stored procs,
after installing 18G Ram and appropriately changing the shared_buffers,
working_mem etc, we realized that our index hard disk had 100%
utilization and essentially it was talking longer to update the indexes
than to update the table. Well I took a radical approach and dropped all
the indexes and... miracle, the db got back in to life, insertion went
back to a healthy 0.70msec but of course now I have no indexes. It is my
belief that I am doing something fundamentally wrong with the index
creation as 4 indexes cannot really bring a database to a halt. Here are
the indexes I was using:

CREATE INDEX idx_ship_a_pos_messages_wk0_date_pos
   ON feed_all_y2012m07.ship_a_pos_messages_wk0
   USING btree
   (msg_date_rec , pos_georef1 , pos_georef2 , pos_georef3 , pos_georef4 )
TABLESPACE index;

CREATE INDEX idx_ship_a_pos_messages_wk0_date_rec
   ON feed_all_y2012m07.ship_a_pos_messages_wk0
   USING btree
   (msg_date_rec )
TABLESPACE index;

CREATE INDEX idx_ship_a_pos_messages_wk0_object
   ON feed_all_y2012m07.ship_a_pos_messages_wk0
   USING btree
   (obj_id , msg_type , msg_text , msg_date_rec )
TABLESPACE index;

CREATE INDEX idx_ship_a_pos_messages_wk0_pos
   ON feed_all_y2012m07.ship_a_pos_messages_wk0
   USING btree
   (pos_georef1 , pos_georef2 , pos_georef3 , pos_georef4 )
TABLESPACE index;

As I have run out of ideas any help will be really appreciated. For the
time being i can live without indexes but sooner or later people will
need to access the live data. I don't even dare to think what will
happen to the database if I only introduce a spatial GIS index that I
need. Question: Is there any possibility that I must include the primary
key into my index to "help" during indexing? If I remember well MS-SQL
has such a "feature".

Kind Regards
Yiannis


Re: Index slow down insertions...

From
Ioannis Anagnostopoulos
Date:
On 15/07/2012 02:14, Ioannis Anagnostopoulos wrote:
> Hello,
>
> Our postgres 9.0 DB has one table (the important one) where the bulk
> of insertions is happening. We are looking more or less at around 15K
> to 20K insertions per minute and my measurements give me a rate of
> 0.60 to 1 msec per insertion. A summary of the table where the
> insertions are happening is as follows:
>
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages: msg_id
> bigint NOT NULL DEFAULT
> nextval('feed_all_y2012m07.messages_msg_id_seq'::regclass),
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> msg_type smallint NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages: obj_id
> integer NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> msg_date_rec timestamp without time zone NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> msg_text text NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> msg_expanded boolean NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> msg_time time without time zone,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_accuracy boolean NOT NULL DEFAULT false,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_raim boolean NOT NULL DEFAULT false,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages: pos_lon
> integer NOT NULL DEFAULT (181 * 600000),
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages: pos_lat
> integer NOT NULL DEFAULT (91 * 60000),
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_georef1 character varying(2) NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_georef2 character varying(2) NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_georef3 character varying(2) NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_georef4 character varying(2) NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> pos_point geometry,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_speed smallint NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_course smallint NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_heading smallint NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_second smallint NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_radio integer NOT NULL,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_status ais_server.nav_status NOT NULL DEFAULT
> 'NOT_DEFINED'::ais_server.nav_status,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_turn smallint NOT NULL DEFAULT 128,
> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
> ship_maneuver smallint NOT NULL,
>   CONSTRAINT ship_a_pos_messages_wk0_pkey PRIMARY KEY (msg_id )
>
> The table is created in table space "Data" while its indexes in table
> space "Index" (a different HD). Now once the database is empty the
> configuration is flying but of course this is not the case always. 5
> days later and around 55,000,000 rows later the insertions are
> literally so slow that the application server has to drop inserts in
> order to keep up. To be precise we are looking now at 1 insertion
> every 5 to 10, sometimes 25 msec!!
>
> After lots of tuning both on the postgres server and the stored procs,
> after installing 18G Ram and appropriately changing the
> shared_buffers, working_mem etc, we realized that our index hard disk
> had 100% utilization and essentially it was talking longer to update
> the indexes than to update the table. Well I took a radical approach
> and dropped all the indexes and... miracle, the db got back in to
> life, insertion went back to a healthy 0.70msec but of course now I
> have no indexes. It is my belief that I am doing something
> fundamentally wrong with the index creation as 4 indexes cannot really
> bring a database to a halt. Here are the indexes I was using:
>
> CREATE INDEX idx_ship_a_pos_messages_wk0_date_pos
>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>   USING btree
>   (msg_date_rec , pos_georef1 , pos_georef2 , pos_georef3 , pos_georef4 )
> TABLESPACE index;
>
> CREATE INDEX idx_ship_a_pos_messages_wk0_date_rec
>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>   USING btree
>   (msg_date_rec )
> TABLESPACE index;
>
> CREATE INDEX idx_ship_a_pos_messages_wk0_object
>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>   USING btree
>   (obj_id , msg_type , msg_text , msg_date_rec )
> TABLESPACE index;
>
> CREATE INDEX idx_ship_a_pos_messages_wk0_pos
>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>   USING btree
>   (pos_georef1 , pos_georef2 , pos_georef3 , pos_georef4 )
> TABLESPACE index;
>
> As I have run out of ideas any help will be really appreciated. For
> the time being i can live without indexes but sooner or later people
> will need to access the live data. I don't even dare to think what
> will happen to the database if I only introduce a spatial GIS index
> that I need. Question: Is there any possibility that I must include
> the primary key into my index to "help" during indexing? If I remember
> well MS-SQL has such a "feature".
>
> Kind Regards
> Yiannis
>
>
Some more information regarding this "problem". I start to believe that
the problem is mainly due to the autovacum that happens to prevent
wraparound. As our database is heavily used with inserts, wraparounds
are happing very often. The vacuums that are triggered to deal with the
situation have an adverse effect on the index HD. In essence as the
database covers 12 months of data an autovacuum to prevent wrap around
is more or less constantly present starving the actual data insertion
process from index HD resources (especially when those indexes are quite
a lot as I said in my previous post). Now, given the fact that only the
"current" month is updated with inserts while the previous months are
essentially ready-only(static) I think that moving the indexes of the
past months to an archive HD or dropping those that are not necessary
any more would probably solve the problem. Does my theory hold any water?

Kind Regards
Yiannis


Re: Index slow down insertions...

From
Ioannis Anagnostopoulos
Date:
On 16/07/2012 11:24, Ioannis Anagnostopoulos wrote:
> On 15/07/2012 02:14, Ioannis Anagnostopoulos wrote:
>> Hello,
>>
>> Our postgres 9.0 DB has one table (the important one) where the bulk
>> of insertions is happening. We are looking more or less at around 15K
>> to 20K insertions per minute and my measurements give me a rate of
>> 0.60 to 1 msec per insertion. A summary of the table where the
>> insertions are happening is as follows:
>>
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages: msg_id
>> bigint NOT NULL DEFAULT
>> nextval('feed_all_y2012m07.messages_msg_id_seq'::regclass),
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> msg_type smallint NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages: obj_id
>> integer NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> msg_date_rec timestamp without time zone NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> msg_text text NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> msg_expanded boolean NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> msg_time time without time zone,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_accuracy boolean NOT NULL DEFAULT false,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_raim boolean NOT NULL DEFAULT false,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_lon integer NOT NULL DEFAULT (181 * 600000),
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_lat integer NOT NULL DEFAULT (91 * 60000),
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_georef1 character varying(2) NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_georef2 character varying(2) NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_georef3 character varying(2) NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_georef4 character varying(2) NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> pos_point geometry,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_speed smallint NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_course smallint NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_heading smallint NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_second smallint NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_radio integer NOT NULL,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_status ais_server.nav_status NOT NULL DEFAULT
>> 'NOT_DEFINED'::ais_server.nav_status,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_turn smallint NOT NULL DEFAULT 128,
>> -- Inherited from table feed_all_y2012m07.ship_a_pos_messages:
>> ship_maneuver smallint NOT NULL,
>>   CONSTRAINT ship_a_pos_messages_wk0_pkey PRIMARY KEY (msg_id )
>>
>> The table is created in table space "Data" while its indexes in table
>> space "Index" (a different HD). Now once the database is empty the
>> configuration is flying but of course this is not the case always. 5
>> days later and around 55,000,000 rows later the insertions are
>> literally so slow that the application server has to drop inserts in
>> order to keep up. To be precise we are looking now at 1 insertion
>> every 5 to 10, sometimes 25 msec!!
>>
>> After lots of tuning both on the postgres server and the stored
>> procs, after installing 18G Ram and appropriately changing the
>> shared_buffers, working_mem etc, we realized that our index hard disk
>> had 100% utilization and essentially it was talking longer to update
>> the indexes than to update the table. Well I took a radical approach
>> and dropped all the indexes and... miracle, the db got back in to
>> life, insertion went back to a healthy 0.70msec but of course now I
>> have no indexes. It is my belief that I am doing something
>> fundamentally wrong with the index creation as 4 indexes cannot
>> really bring a database to a halt. Here are the indexes I was using:
>>
>> CREATE INDEX idx_ship_a_pos_messages_wk0_date_pos
>>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>>   USING btree
>>   (msg_date_rec , pos_georef1 , pos_georef2 , pos_georef3 ,
>> pos_georef4 )
>> TABLESPACE index;
>>
>> CREATE INDEX idx_ship_a_pos_messages_wk0_date_rec
>>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>>   USING btree
>>   (msg_date_rec )
>> TABLESPACE index;
>>
>> CREATE INDEX idx_ship_a_pos_messages_wk0_object
>>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>>   USING btree
>>   (obj_id , msg_type , msg_text , msg_date_rec )
>> TABLESPACE index;
>>
>> CREATE INDEX idx_ship_a_pos_messages_wk0_pos
>>   ON feed_all_y2012m07.ship_a_pos_messages_wk0
>>   USING btree
>>   (pos_georef1 , pos_georef2 , pos_georef3 , pos_georef4 )
>> TABLESPACE index;
>>
>> As I have run out of ideas any help will be really appreciated. For
>> the time being i can live without indexes but sooner or later people
>> will need to access the live data. I don't even dare to think what
>> will happen to the database if I only introduce a spatial GIS index
>> that I need. Question: Is there any possibility that I must include
>> the primary key into my index to "help" during indexing? If I
>> remember well MS-SQL has such a "feature".
>>
>> Kind Regards
>> Yiannis
>>
>>
> Some more information regarding this "problem". I start to believe
> that the problem is mainly due to the autovacum that happens to
> prevent wraparound. As our database is heavily used with inserts,
> wraparounds are happing very often. The vacuums that are triggered to
> deal with the situation have an adverse effect on the index HD. In
> essence as the database covers 12 months of data an autovacuum to
> prevent wrap around is more or less constantly present starving the
> actual data insertion process from index HD resources (especially when
> those indexes are quite a lot as I said in my previous post). Now,
> given the fact that only the "current" month is updated with inserts
> while the previous months are essentially ready-only(static) I think
> that moving the indexes of the past months to an archive HD or
> dropping those that are not necessary any more would probably solve
> the problem. Does my theory hold any water?
>
> Kind Regards
> Yiannis
>
>
Hello again, sorry for topping up the thread but I think that the more
information I provide you the more likely it is to get an answer. So as
I go along, I have stripped completely the database from additional
indexes, those that possible delay the insertion process, of course
maintaining the pkey and 2 or three absolutely mandatory indexes for my
select queries. As a result I have a sleek and steady performance of
around 0.70 msec per insertion. However I have now closed a full circle
as I have a fast database but when I try to "select", making optimum
usage of the left over indexes, the insertion process slows down. Yes my
selections are huge (they are not slow, just huge as it is about
geographical points etc) but I am asking if there is anyway that I can
"prioritise" the insertions over the "selections".  These "selections"
are happening anyway as batch process during night so I don't really
mind if they will take 2 or 5 hours, as long as they are ready at 9.00am
next day. Again any advice will be highly appreciated.

Kind Regards
Yiannis

Re: Index slow down insertions...

From
"Kevin Grittner"
Date:
Ioannis Anagnostopoulos <ioannis@anatec.com> wrote:

> I have stripped completely the database from additional indexes,
> those that possible delay the insertion process, of course
> maintaining the pkey and 2 or three absolutely mandatory indexes
> for my select queries. As a result I have a sleek and steady
> performance of around 0.70 msec per insertion.

Not bad!

> However I have now closed a full circle as I have a fast database
> but when I try to "select", making optimum usage of the left over
> indexes, the insertion process slows down. Yes my selections are
> huge (they are not slow, just huge as it is about geographical
> points etc) but I am asking if there is anyway that I can
> "prioritise" the insertions over the "selections".  These
> "selections" are happening anyway as batch process during night so
> I don't really mind if they will take 2 or 5 hours, as long as
> they are ready at 9.00am next day.

You could try adding back indexes on the most critical columns, one
at a time.  You might want to try single-column indexes, rather than
the wide ones you had before.  The narrower keys may cut the cost of
maintaining the indexes enough to tolerate a few, and PostgreSQL can
often combine multiple indexes using "bitmap index scans".

You could also play with "nice" and "ionice" to reduce priority of
the "select" processes, but watch any such attempt very carefully
until you see what the impact really is.

Since you seem to be relatively satisfied with where you are now,
you should make small changes and be prepared to revert them if
insert performance drops off too much.

-Kevin