Random Walk
Glossary
 Directed

Directed trait. The algorithm is welldefined on a directed graph.
 Directed

Directed trait. The algorithm ignores the direction of the graph.
 Directed

Directed trait. The algorithm does not run on a directed graph.
 Undirected

Undirected trait. The algorithm is welldefined on an undirected graph.
 Undirected

Undirected trait. The algorithm ignores the undirectedness of the graph.
 Heterogeneous nodes

Heterogeneous nodes fully supported. The algorithm has the ability to distinguish between nodes of different types.
 Heterogeneous nodes

Heterogeneous nodes allowed. The algorithm treats all selected nodes similarly regardless of their label.
 Heterogeneous relationships

Heterogeneous relationships fully supported. The algorithm has the ability to distinguish between relationships of different types.
 Heterogeneous relationships

Heterogeneous relationships allowed. The algorithm treats all selected relationships similarly regardless of their type.
 Weighted relationships

Weighted trait. The algorithm supports a relationship property to be used as weight, specified via the relationshipWeightProperty configuration parameter.
 Weighted relationships

Weighted trait. The algorithm treats each relationship as equally important, discarding the value of any relationship weight.
Random Walk is an algorithm that provides random paths in a graph.
A random walk simulates a traversal of the graph in which the traversed relationships are chosen at random.
In a classic random walk, each relationship has the same, possibly weighted, probability of being picked.
This probability is not influenced by the previously visited nodes.
The random walk implementation of the Neo4j Graph Data Science library supports the concept of second order random walks.
This method tries to model the transition probability based on the currently visited node v
, the node t
visited before the current one, and the node x
which is the target of a candidate relationship.
Random walks are thus influenced by two parameters: the returnFactor
and the inOutFactor
:

The
returnFactor
is used ift
equalsx
, i.e., the random walk returns to the previously visited node. 
The
inOutFactor
is used if the distance fromt
tox
is equal to 2, i.e., the walk traverses further away from the nodet
The probabilities for traversing a relationship during a random walk can be further influenced by specifying a relationshipWeightProperty
.
A relationship property value greater than 1 will increase the likelihood of a relationship being traversed, a property value between 0 and 1 will decrease that probability.
To obtain a random walk where the transition probability is independent of the previously visited nodes both the returnFactor and the inOutFactor can be set to 1.0.

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Memory Estimation. 
Syntax
CALL gds.randomWalk.stream(
graphName: String,
configuration: Map
)
YIELD
nodeIds: List of Integer,
path: Path
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. Nodes with any of the given labels will be included. 

List of String 

yes 
Filter the named graph using the given relationship types. Relationships with any of the given types will be included. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

String 

yes 
An ID that can be provided to more easily track the algorithm’s progress. 

Boolean 

yes 
If disabled the progress percentage will not be logged. 

sourceNodes 
List of Integer 

yes 
The list of nodes from which to do a random walk. 
walkLength 
Integer 

yes 
The number of steps in a single random walk. 
walksPerNode 
Integer 

yes 
The number of random walks generated for each node. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
String 

yes 
Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted. 

randomSeed 
Integer 

yes 
Seed value for the random number generator used to generate the random walks. 
walkBufferSize 
Integer 

yes 
The number of random walks to complete before starting training. 
Name  Type  Description 


List of Integer 
The nodes of the random walk. 

Path 
A 
CALL gds.randomWalk.stats(
graphName: String,
configuration: Map
)
YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. Nodes with any of the given labels will be included. 

List of String 

yes 
Filter the named graph using the given relationship types. Relationships with any of the given types will be included. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

String 

yes 
An ID that can be provided to more easily track the algorithm’s progress. 

Boolean 

yes 
If disabled the progress percentage will not be logged. 

sourceNodes 
List of Integer 

yes 
The list of nodes from which to do a random walk. 
walkLength 
Integer 

yes 
The number of steps in a single random walk. 
walksPerNode 
Integer 

yes 
The number of random walks generated for each node. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
String 

yes 
Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted. 

randomSeed 
Integer 

yes 
Seed value for the random number generator used to generate the random walks. 
walkBufferSize 
Integer 

yes 
The number of random walks to complete before starting training. 
Name  Type  Description 


Integer 
Milliseconds for preprocessing the data. 

Integer 
Milliseconds for running the algorithm. 

Map 
The configuration used for running the algorithm. 
CALL gds.randomWalk.mutate(
graphName: String,
configuration: Map
)
YIELD
nodePropertiesWritten: Integer,
preProcessingMillis: Integer,
computeMillis: Integer,
mutateMillis: Integer,
postProcessingMillis: Integer,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

mutateProperty 
String 

no 
The node property in the GDS graph to which the count of random walk visits is written. 
sourceNodes 
List of Integer 

yes 
The list of nodes from which to do a random walk. 
walkLength 
Integer 

yes 
The number of steps in a single random walk. 
walksPerNode 
Integer 

yes 
The number of random walks generated for each node. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
String 

yes 
Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted. 

randomSeed 
Integer 

yes 
Seed value for the random number generator used to generate the random walks. 
walkBufferSize 
Integer 

yes 
The number of random walks to complete before starting training. 
Name  Type  Description 


Integer 
Number of properties added to the projected graph. 

Integer 
Milliseconds for preprocessing the data. 

Integer 
Milliseconds for running the algorithm. 

Integer 
Milliseconds for adding properties to the projected graph. 

Integer 
Unused. 

Map 
The configuration used for running the algorithm. 
Examples
All the examples below should be run in an empty database. The examples use Cypher projections as the norm. Native projections will be deprecated in a future release. 
Consider the graph created by the following Cypher statement:
CREATE (home:Page {name: 'Home'}),
(about:Page {name: 'About'}),
(product:Page {name: 'Product'}),
(links:Page {name: 'Links'}),
(a:Page {name: 'Site A'}),
(b:Page {name: 'Site B'}),
(c:Page {name: 'Site C'}),
(d:Page {name: 'Site D'}),
(home)[:LINKS]>(about),
(about)[:LINKS]>(home),
(product)[:LINKS]>(home),
(home)[:LINKS]>(product),
(links)[:LINKS]>(home),
(home)[:LINKS]>(links),
(links)[:LINKS]>(a),
(a)[:LINKS]>(home),
(links)[:LINKS]>(b),
(b)[:LINKS]>(home),
(links)[:LINKS]>(c),
(c)[:LINKS]>(home),
(links)[:LINKS]>(d),
(d)[:LINKS]>(home)
MATCH (source:Page)[r:LINKS]>(target:Page)
RETURN gds.graph.project(
'myGraph',
source,
target,
{},
{ undirectedRelationshipTypes: ['*'] }
)
Without specified source nodes
myGraph
CALL gds.randomWalk.stream(
'myGraph',
{
walkLength: 3,
walksPerNode: 1,
randomSeed: 42,
concurrency: 1
}
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path)  node.name ] AS pages
nodeIds  pages 

[0, 6, 0] 
["Home", "Site C", "Home"] 
[3, 5, 3] 
["Links", "Site B", "Links"] 
[2, 0, 1] 
["Product", "Home", "About"] 
[1, 0, 4] 
["About", "Home", "Site A"] 
[7, 3, 0] 
["Site D", "Links", "Home"] 
[6, 0, 2] 
["Site C", "Home", "Product"] 
[5, 0, 7] 
["Site B", "Home", "Site D"] 
[4, 0, 2] 
["Site A", "Home", "Product"] 
With specified source nodes
myGraph
with specified sourceNodesMATCH (page:Page)
WHERE page.name IN ['Home', 'About']
WITH COLLECT(page) as sourceNodes
CALL gds.randomWalk.stream(
'myGraph',
{
sourceNodes: sourceNodes,
walkLength: 3,
walksPerNode: 1,
randomSeed: 42,
concurrency: 1
}
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path)  node.name ] AS pages
nodeIds  pages 

[0, 6, 0] 
["Home", "Site C", "Home"] 
[1, 0, 4] 
["About", "Home", "Site A"] 
Stats
myGraph
CALL gds.randomWalk.stats(
'myGraph',
{
walkLength: 3,
walksPerNode: 1,
randomSeed: 42,
concurrency: 1
}
)
preProcessingMillis  computeMillis  configuration 

0 
1 
{randomSeed=42, walkLength=3, jobId=b77f31476683424986334db7da03f24d, sourceNodes=[], walksPerNode=1, inOutFactor=1.0, nodeLabels=[], sudo=false, relationshipTypes=[], walkBufferSize=1000, returnFactor=1.0, concurrency=1} 
Mutate
myGraph
CALL gds.randomWalk.mutate(
'myGraph',
{
walkLength: 3,
walksPerNode: 1,
randomSeed: 19,
concurrency: 1,
mutateProperty: 'randomWalksCount'
}
)
YIELD nodePropertiesWritten, configuration
RETURN nodePropertiesWritten, configuration.mutateProperty AS mutateProperty
nodePropertiesWritten  mutateProperty 

8 
"randomWalksCount" 
Then we can then stream the mutated node property to see on how many RandomWalks it appeared:
CALL gds.graph.nodeProperty.stream('myGraph', 'randomWalksCount')
YIELD nodeId, propertyValue
RETURN gds.util.asNode(nodeId).name AS page, propertyValue AS randomWalksCount
ORDER BY page ASC
page  randomWalksCount 

"About" 
1 
"Home" 
6 
"Links" 
6 
"Product" 
1 
"Site A" 
3 
"Site B" 
1 
"Site C" 
4 
"Site D" 
2 