Overview of SIDCO


SiDCo (SIgned Distance COrrelation) application calculates pairwise distance correlation coefficients between all columns of a datasheet. The primary use for SiDCo is in metabolomics and lipidomics. However, this site provides seamless application of signed distance correlation and partial distance correlation for any data set.

The main advantage of distance correlation is it's ability to detect non-linear correlations while at the same time allowing comparison of matrices of different dimensions through the calculation of distance covariances. Due to this unique ability distance correlation can be used to calculate one-to-all or one-to-one correlations, whichare both provided as options on this site.


Gaussian Graphical Model (GGM) method allows determination of pair-wise associations with confounding effects of other variables removed. GGM method relies on the calculation of the inverse of distance covariance for removal of orthogonal contributions. If distance covariance matrix is singular inverse is calculated using Moore-Penrose pseudoinverse method (Moore–Penrose inverse - Wikipedia) .


SiDCo is implemented in Python with RShiny front-end and includes:


Tab dCor:

  • distance correlation and p-value calculation either between each feature and all the other features combined (n i to (∀j≠i) n j ) in a sense of one-to-all (correlation with the network) or pairwise correlation between individual features (n i to n (∀j≠i) ) in a sense of one-to-one
  • sign of Pearson correlation between features as an indication of the overall trend in distance correlation. Sign only indicates overall linear trend, does not suggest significant linear correlation

    Output include:
    - For one-to-all comparison output is an excel file that includes: distance correlation value for each feature to all the other features and the corresponding p-value for this calculation.
    - For one-to-one calculation output is an excel file including signed distance correlation values, corresponding p-values as well as Pearson and Spearman correlation values and corresponding p-values each as separate spreadsheet.
    Distance correlation values are set to zero if their absolute value is below the threshold value or their corresponding p-value is above user defined significance level p-value.


Tab pCor:

  • Partial distance correlation calculated using Gaussian Graphical Model (GGM) with p-value determined from cumulative normal distribution function of Fisher z-transformed correlations.

    Output include:
    - Excel spreadsheet with partial distance correlation values and corresponding p-values.


In both cases data is preprocessed with z-score normalization of features across all samples. It is recommended that dataset is imputed prior to analysis. Any remaining missing values will be filled with 1/5th of the lowest measured value for the feature.




SIDCO workflow.

Preparing your data for SIDCO


SIDCO input must be a single file in .xlsx or .csv format with features (metabolites or lipids) in columns and samples in rows. The file should contain column names in the top row, and all numeric data should be below and to the right of the specified start column.Having any non-numeric data to the right of the start column specified will cause the calculation to be aborted. Distance correlation calculations cannot work with data that have missing values. Therefore, it is recommended that the user imputes any missing data with a method that is the most appropriate for their dataset prior to using SIDCO. Any remaining missing values will be imputed with the value equal to the one fifth of the lowest measured value for the feature (assuming that the value is missing because it is below the level of detection for the feature).


Sample Data


Provided example datasets include allowed input formats. In the provided dataset, column A includes sample group names and row 1 includes feature names. For the calculation of distance correlation for Group 1 in this data set, the user will input: Start Column: B; First Row: 2 (or -1 indicating first data row); Last Row: 31. For analysis of Group 2, the user will input: Start Column: B; First Row: 32; Last Row: 46 or -1 (stating last row).


Sample Data


  1. exampleinput.xlsx
  2. exampleinput.csv

When troubleshooting, please review this list of common reasons for SIDCO failing to run. If you are still experiencing difficulties running our tool, please contact ldomic@uottawa.ca for further assistance. Please include your input dataset and a description of the problem that you experienced. We will reproduce the problem and provide you with a solution.

  1. My file loads but when trying to download output system reports a problem with output

    SIDCO only accepts comma-delimited or xlsx files as input. Tab-delimited files will be read but will not produce any results. Please convert your input data into .csv format before running SIDCO. Additionally, make sure that your column and row information leads to numeric values and data that you are trying to analyze. Data can start from any row and column however has to be numeric right and down from user defined start point.

  2. All obtained values are zero

    Check your tolerance, i.e. threshold information in the input. SIDCO sets to zero values that are below correlation value threshold as well as above p value limit.

  3. How come there are no negative values for correlation in one-to-all output

    As Pearson correlation cannot be calculated for vectors of different length it is not possible to determine sign in this calculation without averaging or sampling that could bias the result.

Contact Us

ldomic@uottawa.ca


Cite your use of SIDCO in a publication

F. Monti, D. Stewart, A. Surendra, I. Alecu, T. Nguyen-Tran, S.A.L. Bennett, M. Čuperlović-Culf, Signed Distance Correlation (SiDCo): an online implementation of distance correlation and partial distance correlation for data driven network identification. bioRxiv, 2022


Public Server

SIDCO: https://complimet.ca/sidco/


Software License

SIDCO is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License v3 (or later versions) as published by the Free Software Foundation. As per the GNU General Public License, SIDCO is distributed as a bioinformatic tool to assist users WITHOUT ANY WARRANTY and without any implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. All limitations of warranty are indicated in the GNU General Public License.

Calculating. This might take a minute...

Remember that the sign of the coefficient is coming from Pearson's correlation.

eg. a high coefficient with a negative sign does NOT mean a significant negative trend.

It only indicates a strong correlation, with some negative overall, linear trend also detected

Calculating. This might take a minute...