Improvement for query planner? (no, not about count(*) again ;-)) - Mailing list pgsql-bugs

From Tobias Völk
Subject Improvement for query planner? (no, not about count(*) again ;-))
Date
Msg-id 1jxEsu-0bGx160@fwd35.t-online.de
Whole thread Raw
Responses Re: Improvement for query planner? (no, not about count(*) again ;-))
Re: Improvement for query planner? (no, not about count(*) again ;-))
List pgsql-bugs

Hello Postgres-Community,

 

 

I’ve got a table games(name1 text, name2 text) with 1.3x10^9 rows consisting of two two text columns for the names of players who’ve played a game, duplicate rows are possible, there’s no primary key since this table was just intended as a temporary storage for my data until further processing.

 

The length of a name is usually not more than 20 characters, shorter most of the time.

I’ve asked postgres to make an unlogged newtable(name text primary key) consisting of the unqiue names and executed:

 

Insert into newtable(name) select name1 from games on conflict do nothing;

(and later on intended to do the same for the second column)

 

However after hours it still wasn’t done, used only 1 cpu core to the max and read with 5 MB/s from my fast SSD.

So I stopped it.

I’ve also tried inserting (select name1 from games union select name2 from games) but it always wanted to do it using sorting.

But either the sorting or the preperations for the sorting were again only done using 1 core to the max and reading with 5 MB/s.

 

Couldn’t find a fast query for my problem.

 

So I wrote a java-program which read the whole table at a fetchsize of about 4 million and inserted the names into a HashSet.

And surprisingly after only a few minutes the program was already 25% done o.O

 

My question is, why isn’t postgres nearly this fast? Why doesn’t it just create a HashSet in RAM and read full speed from the disk?

I even created a hash index but it kept using it’s primary key b-tree and then I read that hash indices somehow don’t support checking for uniqueness.

 

Best regards, Tobi

 


Virus-free. www.avast.com

pgsql-bugs by date:

Previous
From: Andy Fan
Date:
Subject: Re: Reported type mismatch improperly
Next
From: Thomas Munro
Date:
Subject: Re: psql has some accessibility issues on Windows