SNFtool: Similarity Network Fusion


Download SNFtool: Similarity Network Fusion


Preview text

Package ‘SNFtool’
June 11, 2021
Type Package Title Similarity Network Fusion Version 2.3.1 Date 2021-06-10 Author Bo Wang, Aziz Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Ben-
jamin Haibe-Kains, Anna Goldenberg Maintainer Benjamin Brew Imports ExPosition, alluvial Description Similarity Network Fusion takes multiple views of a network and fuses them to-
gether to construct an overall status matrix. The input to our algorithm can be feature vectors, pairwise distances, or pairwise similarities. The learned status matrix can then be used for retrieval, clustering, and classification. License GPL NeedsCompilation no Repository CRAN Date/Publication 2021-06-11 08:40:15 UTC
R topics documented:
affinityMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 calNMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 chiDist2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 concordanceNetworkNMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Data1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Data2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 dataL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 displayClusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 displayClustersWithHeatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 dist2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 estimateNumberOfClustersGivenGraph . . . . . . . . . . . . . . . . . . . . . . . . . . 12 getColorsForGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 groupPredict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1

2

affinityMatrix

heatmapPlus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 plotAlluvial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 rankFeaturesByNMI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 SNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 spectralClustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 standardNormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Index

28

affinityMatrix

Affinity matrix calculation

Description Computes affinity matrix from a generic distance matrix

Usage affinityMatrix(diff, K = 20, sigma = 0.5)

Arguments diff K sigma

Distance matrix Number of nearest neighbors Variance for local model

Value Returns an affinity matrix that represents the neighborhood graph of the data points.

Author(s) Dr. Anna Goldenberg, Bo Wang, Aziz Mezlini, Feyyaz Demir

References
B Wang, A Mezlini, F Demir, M Fiume, T Zu, M Brudno, B Haibe-Kains, A Goldenberg (2014) Similarity Network Fusion: a fast and effective method to aggregate multiple data types on a genome wide scale. Nature Methods. Online. Jan 26, 2014

calNMI

3

Examples

## First, set all the parameters: K = 20; ##number of neighbors, must be greater than 1. usually (10~30) alpha = 0.5; ##hyperparameter, usually (0.3~0.8) T = 20; ###Number of Iterations, usually (10~50)
## Data1 is of size n x d_1, ## where n is the number of patients, d_1 is the number of genes, ## Data2 is of size n x d_2, ## where n is the number of patients, d_2 is the number of methylation data(Data1) data(Data2)
## Calculate distance matrices(here we calculate Euclidean Distance, ## you can use other distance, e.g. correlation) Dist1 = (dist2(as.matrix(Data1),as.matrix(Data1)))^(1/2) Dist2 = (dist2(as.matrix(Data2),as.matrix(Data2)))^(1/2)
## Next, construct similarity graphs W1 = affinityMatrix(Dist1, K, alpha) W2 = affinityMatrix(Dist2, K, alpha)

calNMI

Mutual Information calculation

Description Calculate the mutual information between vectors x and y.

Usage calNMI(x, y)

Arguments x y

a vector a vector

Value Returns the mutual information between vectors x and y.

Author(s) Dr. Anna Goldenberg, Bo Wang, Aziz Mezlini, Feyyaz Demir

4

chiDist2

References
B Wang, A Mezlini, F Demir, M Fiume, T Zu, M Brudno, B Haibe-Kains, A Goldenberg (2014) Similarity Network Fusion: a fast and effective method to aggregate multiple data types on a genome wide scale. Nature Methods. Online. Jan 26, 2014

Examples

# How to use SNF with multiple views

# Load views into list "dataL" data(dataL) data(label)

# Set the other parameters K = 20 # number of neighbours alpha = 0.5 # hyperparameter in affinityMatrix T = 20 # number of iterations of SNF

# Normalize the features in each of the views if necessary # dataL = lapply(dataL, standardNormalization)

# Calculate the distances for each view distL = lapply(dataL, function(x) (dist2(x, x))^(1/2))

# Construct the similarity graphs affinityL = lapply(distL, function(x) affinityMatrix(x, K, alpha))

# Example of how to use SNF to perform subtyping # Construct the fused network W = SNF(affinityL, K, T) # Perform clustering on the fused network. clustering = spectralClustering(W,3); # Use NMI to measure the goodness of the obtained NMI = calNMI(clustering,label);

labels.

chiDist2

Pairwise Chi-squared distances

Description Wrapper function chi2Dist imported from ’ExPosition’ package. Computes the Chi-squared distances between all pairs of data point given
Usage chiDist2(A)

concordanceNetworkNMI

5

Arguments A

A data matrix where each row is a different data point

Value
Returns an N x N matrix where N is the number of rows in X. element (i,j) is the squared Chisquared distance between ith data point in X and jth data point in X.

Author(s) Dr. Anna Goldenberg, Bo Wang, Aziz Mezlini, Feyyaz Demir

Examples

## Data1 is of size n x d_1, ## where n is the number of patients, d_1 is the number of genes, ## Data2 is of size n x d_2, ## where n is the number of patients, d_2 is the number of methylation data(Data1) data(Data2)
## Calculate distance matrices(here we calculate Euclidean Distance, ## you can use other distance, e.g. correlation) Dist1 = chiDist2(as.matrix(Data1)) Dist2 = chiDist2(as.matrix(Data2))

concordanceNetworkNMI Concordance Network NMI calculation

Description
Given a list of affinity matrices, Wall, the number of clusters, return a matrix containing the NMIs between cluster assignments made with spectral clustering on all matrices provided.

Usage concordanceNetworkNMI(Wall, C)

Arguments Wall
C

List of matrices. Each element of the list is a square, symmetric matrix that shows affinities of the data points from a certain view.
Number of clusters

Value Returns an affinity matrix that represents the neighborhood graph of the data points.

6

Data1

Author(s) Dr. Anna Goldenberg, Bo Wang, Aziz Mezlini, Feyyaz Demir
Examples

# How to use SNF with multiple views
# Load views into list "dataL" data(dataL) data(label)
# Set the other parameters K = 20 # number of neighbours alpha = 0.5 # hyperparameter in affinityMatrix T = 20 # number of iterations of SNF # Normalize the features in each of the views. #dataL = lapply(dataL, standardNormalization)
# Calculate the distances for each view distL = lapply(dataL, function(x) (dist2(x, x)^(1/2)))
# Construct the similarity graphs affinityL = lapply(distL, function(x) affinityMatrix(x, K, alpha))
# an example of how to use concordanceNetworkNMI Concordance_matrix = concordanceNetworkNMI(affinityL, 3);
## The output, Concordance_matrix, ## shows the concordance between the fused network and each individual network.

Data1

Data1

Description Data1 dataset used to demonstrate the use of SNFtool.
Usage data(Data1)
Format A data frame with 200 observations on the following 2 variables. V1 a numeric vector V2 a numeric vector

Data2

7

Examples data(Data1)

Data2

Data2

Description Data2 dataset used to demonstrate the use of SNFtool.
Usage data(Data2)
Format A data frame with 200 observations on the following 2 variables. V3 a numeric vector V4 a numeric vector
Examples data(Data2)

dataL

dataL

Description Dataset used to provide an example of predicting the new labels with label propagation.
Usage data(dataL)
Format The format is: List of 2 $ : num [1:600, 1:76] 0.0659 0.0491 0.0342 0.0623 0.062 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:600] "V1" "V2" "V3" "V4" ... .. ..$ : NULL $ : int [1:600, 1:240] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:600] "V1" "V2" "V3" "V4" ... .. ..$ : NULL
Examples data(dataL)

8

displayClusters

displayClusters

Plot given similarity matrix by clusters

Description Visualize the clusters in given similarity matrix

Usage displayClusters(W, group)

Arguments W group

Similarity matrix A vector containing the labels for each sample in W.

Value Plots given similarity matrix with patients ordered to form clusters.

Author(s) Dr. Anna Goldenberg, Bo Wang, Aziz Mezlini, Feyyaz Demir

Examples

## First, set all the parameters: K = 20; # number of neighbors, usually (10~30) alpha = 0.5; # hyperparameter, usually (0.3~0.8) T = 10; # Number of Iterations, usually (10~20)
## Data1 is of size n x d_1, ## where n is the number of patients, d_1 is the number of genes, ## Data2 is of size n x d_2, ## where n is the number of patients, d_2 is the number of methylation data(Data1) data(Data2)
## Here, the simulation data (SNFdata) has two data types. They are complementary to each other. ## And two data types have the same number of points. ## The first half data belongs to the first cluster; the rest belongs to the second cluster. truelabel = c(matrix(1,100,1),matrix(2,100,1)); ## the ground truth of the simulated data
## Calculate distance matrices ## (here we calculate Euclidean Distance, you can use other distance, e.g,correlation)
## If the data are all continuous values, we recommend the users to perform ## standard normalization before using SNF,

displayClustersWithHeatmap

9

## though it is optional depending on the data the users want to use. # Data1 = standardNormalization(Data1); # Data2 = standardNormalization(Data2);

## Calculate the pair-wise distance; ## If the data is continuous, we recommend to use the function "dist2" as follows Dist1 = (dist2(as.matrix(Data1),as.matrix(Data1)))^(1/2) Dist2 = (dist2(as.matrix(Data2),as.matrix(Data2)))^(1/2)
## next, construct similarity graphs W1 = affinityMatrix(Dist1, K, alpha) W2 = affinityMatrix(Dist2, K, alpha)
## These similarity graphs have complementary information about clusters. displayClusters(W1, truelabel); displayClusters(W2, truelabel);

displayClustersWithHeatmap Display the similarity matrix by clusters with some sample information

Description Visualize the clusters present in the given similarity matrix as well as some sample information.

Usage displayClustersWithHeatmap(W, group, ColSideColors=NULL, ...)

Arguments W group ColSideColors
...

Similarity matrix
A numeric vector containing the groups information for each sample in W such as the result of the spectralClustering function. The order should correspond to the sample order in W.
(optional) character vector of length ncol(x) containing the color names for a horizontal side bar that may be used to annotate the columns of x, used by the heatmap function, OR a character matrix with number of rows matching number of rows in x. Each column is plotted as a row similar to heatmap()’s ColSideColors by the heatmap.plus function.
other paramater that can be pass on to the heatmap (if ColSideColor is a NULL or a vector) or heatmap.plus function (if ColSideColors is matrix)

10

displayClustersWithHeatmap

Details
Using the heatmap or heatmap.plus function to display the similarity matrix For representation purpose, the similarity matrix diagonal is set to the median value of W, the matrix is normalised and W = W + t(W) is applied In this presentation no clustering method is ran the samples are ordered in function of their group label present in the group arguments.

Value
Plots the similarity matrix using the heatmap function. Samples are ordered by the clusters provided by the argument groups with sample information displayed with a color bar if the ColSideColors argument is informed.

Author(s) Florence Cavalli

Examples
## First, set all the parameters: K = 20; # number of neighbors, usually (10~30) alpha = 0.5; # hyperparameter, usually (0.3~0.8) T = 20; # Number of Iterations, usually (10~20)
## Data1 is of size n x d_1, ## where n is the number of patients, d_1 is the number of genes, ## Data2 is of size n x d_2, ## where n is the number of patients, d_2 is the number of methylation data(Data1) data(Data2)
## Here, the simulation data (SNFdata) has two data types. They are complementary to each other. ## And two data types have the same number of points. ## The first half data belongs to the first cluster; the rest belongs to the second cluster. truelabel = c(matrix(1,100,1),matrix(2,100,1)); ## the ground truth of the simulated data
## Calculate distance matrices ## (here we calculate Euclidean Distance, you can use other distance, e.g,correlation)
## If the data are all continuous values, we recommend the users to perform ## standard normalization before using SNF, ## though it is optional depending on the data the users want to use. # Data1 = standardNormalization(Data1); # Data2 = standardNormalization(Data2);
## Calculate the pair-wise distance; ## If the data is continuous, we recommend to use the function "dist2" as follows Dist1 = (dist2(as.matrix(Data1),as.matrix(Data1)))^(1/2) Dist2 = (dist2(as.matrix(Data2),as.matrix(Data2)))^(1/2)
## next, construct similarity graphs W1 = affinityMatrix(Dist1, K, alpha)

Preparing to load PDF file. please wait...

0 of 0
100%
SNFtool: Similarity Network Fusion