Overview of SiDCoSiDCo (SIgned Distance COrrelation) calculates pairwise distance correlation coefficients between all columns of a .xlsx datasheet. The primary use of SiDCo is in metabolomics and lipidomics although this site provides seamless application of signed distance correlation and partial distance correlation for any dataset. The main advantage of distance correlation is the ability to quantify linear and non-linear correlations simultaneously, while allowing for comparisons of matrices of different dimensions through the calculation of distance covariances. Due to this unique ability, distance correlation can be used to calculate one-to-all (linear and non-linear correlations between each feature and all the other features) or one-to-one correlations (pairwise correlations between individual features). Both options are available in SiDCo. SiDCo capitalizes on the Gaussian Graphical Model (GGM) method to determine pair-wise associations while removing the confounding effects of other variables. The GGM method calculates the inverse of the distance covariance to remove orthogonal contributions without any matrix shrinkage. If distance covariance matrix is singular, this inverse is calculated using the (Moore–Penrose inverse - Wikipedia) . SiDCo is implemented in Python with a RShiny front-end. Two analytical tabs allow users to choose between signed distance correlation or partial distance correlation: Tab dCor:
Tab pdCor:
In both cases data is preprocessed by z-score normalization of features across all samples. Any missing values must be imputed by the user prior to analysis or SiDCo will not function. Distances calculation running time is typically a function of N², where N is the sample size. For typical datasets in metabolomics and lipidomics (~500 x 500) both dCor and pdCor calculations take less than 2 minutes. For extremely large datasets of more then 1M elements, calculations can be time-consuming. |
SiDCo workflow. |
Preparing your data for SiDCoSidCo calculates distance correlation between features listed in columns using data across rows. The SidCo input must be a single .xlsx file with features (for example metabolites or lipids) in columns and samples in rows. The file should contain column names in the top row and row names in the first column (column A). Additional information can be included and user can specify start row and column as well as stop row. All numeric data should be below and to the right of the specified start column and start row. If there is any non-numeric data to the right of the specified start column, the analysis will abort. Because distance correlation calculations cannot work with data that have missing values, users should impute missing data with a method that is the most appropriate for their dataset prior to using SiDCo. Sample DataThe sample datasets are provided in both allowed input formats (.csv and .xlsx) with features (metabolites or lipids) in columns and samples in rows. Note, column A includes group names. Row 1 includes feature names. To calculate distance correlations in separate groups set, the user should input for Group 1: Start Column: B; First Row: 2 (or -1 indicating first data row); Last Row: 31. For analysis of Group 2, the user should input: Start Column: B; First Row: 32; Last Row: 46 or -1 (stating last row).
Sample Data |
When troubleshooting, please review this list of common reasons for SiDCo failing to run. If you are still experiencing difficulties, please contact ldomic@uottawa.ca for further assistance. Please include your input dataset and a description of the problem that you experienced. We will reproduce the problem and provide you with a solution.
-
My file loads but does not output any analysis
SiDCo only accepts comma-delimited .xlsx files as input. Tab-delimited files (.txt) can be read but not analyzed and will not produce any results. Please convert your input data into .csv format before running SiDCo. Additionally, ensure that your column and row information to the right and below your start cell are numeric values. Data can start from any row and column; however, all data must be numeric to the right and down from the user-defined start point. Make sure that there are no missing data in your input as they will prevent SiDCo calculations.
-
All obtained values are zero
Revise your tolerance, i.e., threshold information and p-value. SiDCo sets to zero values that are below the correlation value threshold or above the specified p value. If you prefer to see all values please enter 0 for distance correlation threshold and 1 for p-value threshold.
-
Why are there are no negative values for one-to-all distance correlations
As Pearson correlation cannot be calculated for vectors of different lengths, it is not possible to determine a linear sign in the one-to -all distance calculation.
-
Why are there no Person and Spearman values for one-to-all correlation
Pearson and Spearman correlations can not be calculated for the one-to-all set and thus can not be included in this output.
-
I get result from dCor but not from pdCor tab
pdCor analysis is based on the inversion of the distance covariance matrix. If this matrix is singular, inversion is not possible. This occurs when the input has fewer samples than features or if there are some features that can only be obtained as a linear combination of other features in the dataset. To address these issues, add more sample measurements or reduce the number of features in your pdCor analysis.
Cite your use of SiDCo in a publication
F. Monti, D. Stewart, A. Surendra, I. Alecu, T. Nguyen-Tran, S. A. L Bennett, M. Čuperlović-Culf, Signed Distance Correlation (SiDCo): an online implementation of distance correlation and partial distance correlation for data-driven network analysis, Bioinformatics, Volume 39, Issue 5, May 2023, btad210,
https://doi.org/10.1093/bioinformatics/btad210
(
Download
)
Public Server
SiDCo: https://complimet.ca/SiDCo/
Software License
SiDCo is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License v3 (or later versions) as published by the Free Software Foundation. As per the GNU General Public License, SiDCo is distributed as a bioinformatic tool to assist users WITHOUT ANY WARRANTY and without any implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. All limitations of warranty are indicated in the GNU General Public License.
Calculating. This might take a minute...
Remember that the sign of the coefficient is coming from Pearson's correlation.
eg. a high coefficient with a negative sign does NOT mean a significant negative trend.
It only indicates a strong correlation, with some negative overall, linear trend also detected
Calculating. This might take a minute...