I have constructed a Neo4j graph based on MovieLens100k with the following content:
Nodes:
(:User)
,(:Movie)
,(:Genre)
Relations:
(:User)-[:RATED]->(:Movie)
,(:Movie)-[:HAS_GENRE]->(:Genre)
I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User)
and that the meta-path to follow is the following:
path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)
Suppose to have computed such quantities:
N_ij = Number of path of type path1 starting from (:User)
with id=i and ending in (:User)
with id=j
Then PathSim for user with id 1 and id 2 is:
PathSim = 2N_12/(N_11+N_22)
To compute such quantity in Neo4j my Cypher query (1) is:
"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie)
starting from (:User)
with id=1, i.e. query (2):
"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
By taking into account "trivial" paths consisting of going from (:User)
to (:Movie)
and coming back from the same exact half-path.
However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.
Is there a way to account for "trivial" paths by sort of modifying query (1)?
I have constructed a Neo4j graph based on MovieLens100k with the following content:
Nodes:
(:User)
,(:Movie)
,(:Genre)
Relations:
(:User)-[:RATED]->(:Movie)
,(:Movie)-[:HAS_GENRE]->(:Genre)
I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User)
and that the meta-path to follow is the following:
path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)
Suppose to have computed such quantities:
N_ij = Number of path of type path1 starting from (:User)
with id=i and ending in (:User)
with id=j
Then PathSim for user with id 1 and id 2 is:
PathSim = 2N_12/(N_11+N_22)
To compute such quantity in Neo4j my Cypher query (1) is:
"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie)
starting from (:User)
with id=1, i.e. query (2):
"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
By taking into account "trivial" paths consisting of going from (:User)
to (:Movie)
and coming back from the same exact half-path.
However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.
Is there a way to account for "trivial" paths by sort of modifying query (1)?
Share Improve this question edited Mar 19 at 11:45 Emanuele Maduli asked Mar 19 at 11:28 Emanuele MaduliEmanuele Maduli 11 bronze badge1 Answer
Reset to default 1Neo4j will filter out any MATCH result row that repeats the same relationship. That is why (1) did not do what you expected.
Assuming that:
N_11
is the number of(:User {id: "1"})-[:RATED]->(:Movie)
paths, andUser
nodeid
properties have string values (to allow my query to use them as map keys),
this query may do what you want:
MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(*) as ct
WITH
COLLECT(DISTINCT u1) as u1s,
COLLECT(DISTINCT u2) AS u2s,
COLLECT({headUser: u1.id, tailUser: u2.id, ct: ct}) AS pairs
UNWIND apoc.coll.intersection(u1s, u2s) AS user
WITH user, COUNT { (user)-[:RATED]->(:Movie) } AS userCt, pairs
WITH COLLECT(user.id) AS userIds, COLLECT(userCt) AS userCts, pairs
WITH apoc.map.fromLists(userIds, userCts) AS userCtMap, pairs
UNWIND pairs as pair
RETURN
pair.headUser AS headUser,
pair.tailUser AS tailUser,
(2.0 * pair.ct)/(userCtMap[pair.headUser] + userCtMap[pair.tailUser]) AS pathSim
This query calculates all distinct users found by the MATCH
clause and maps each user's id
to its number of outgoing -[:RATED]->(:Movie)
relationships. It then uses that mapping to calculate the denominator of your pathSim formula.