On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote:
> it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms
> on your system. There's a bunch of opportunities for more improvements,
> as the hash table implementation is pretty naive/silly, the on-disk
> format is wasteful and so on.
>
> But before spending more time on that, it'd be interesting to know what
> would be a competitive timing. I mean, what would be "good enough"? What
> timings are achievable with graph databases?
Your hashset is now almost exactly as fast as the corresponding roaringbitmap query, +/- 1 ms on my machine.
I tested Neo4j and the results are surprising; it appears to be significantly *slower*.
However, I've probably misunderstood something, maybe I need to add some index or something.
Even so, it's interesting it's apparently not fast "by default".
The query I tested:
MATCH (user:User {id: '5867'})-[:FRIENDS_WITH*3..3]->(fof)
RETURN COUNT(DISTINCT fof)
Here is how I loaded the data into it:
% pwd
/Users/joel/Library/Application Support/Neo4j
Desktop/Application/relate-data/dbmss/dbms-3837aa22-c830-4dcf-8668-ef8e302263c7
% head import/*
==> import/friendships.csv <==
1,13,FRIENDS_WITH
1,11,FRIENDS_WITH
1,6,FRIENDS_WITH
1,3,FRIENDS_WITH
1,4,FRIENDS_WITH
1,5,FRIENDS_WITH
1,15,FRIENDS_WITH
1,14,FRIENDS_WITH
1,7,FRIENDS_WITH
1,8,FRIENDS_WITH
==> import/friendships_header.csv <==
:START_ID(User),:END_ID(User),:TYPE
==> import/users.csv <==
1,User
2,User
3,User
4,User
5,User
6,User
7,User
8,User
9,User
10,User
==> import/users_header.csv <==
id:ID(User),:LABEL
% ./bin/neo4j-admin database import full --overwrite-destination --nodes=User=import/users_header.csv,import/users.csv
--relationships=FRIENDS_WIDTH=import/friendships_header.csv,import/friendships.csvneo4j
/Joel