Software

Semantic related software

RDF2Graph

Try it in an Galaxy environment: RDF2Graph
Code on GitHub: Get the code.

Reference: van Dam, Jesse CJ, et al. “RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource.” Journal of biomedical semantics 6.1 (2015): 1.


Empusa a code generator for the development of ontologies

Code on GitLab: Get the code.
Reference: To be submitted


SAPP, Semantic Annotation Platform with Provenance

Code on GitLab: Get the code.
Executables: Get it here
Binaries: Get it here
Documentation: Get it here

Reference: Koehorst, J. J. et al. Comparison of 432 Pseudomonas strains through integration of genomic, functional, metabolic and expression data. Sci. Rep. 6, 38699; doi: 10.1038/srep38699 (2016).


Matrix analysis and statistical software

Matlab code for Horn’s parallel analysis. Get the code.
Matlab code for Tracy-Widom testing. Get the code.

Reference: Saccenti, E.; Timmerman, M. E., Reconsidering Horn’s parallel analysis from a Random Matrix Theory point of view Psychometrika 2016.  Get the paper.


Matlab code to generate the normalization moment for the Tracy-Widom distribution for autoscaled real matrices (correlation case). Get the code.

Reference: Saccenti, E.; Smilde, A. K.; Westerhuis, J. A.; Hendriks, M. M. W. B., Tracy–Widom statistic for the largest eigenvalue of autoscaled real matrices Journal of Chemometrics 2011, 25, 644-652. Get the paper.


Matlab codes for sample size determination in PCA. Get the code.

Reference: Saccenti, E.; Timmerman, M. E., Approaches to Sample Size Determination for Multivariate Data: Applications to PCA and PLS-DA of Omics Data Journal of Proteome Research 2016, 15, 2379-2393. Get the paper.

 


Matlab code for Group-wise principal component analysis (GPCA). Get the code.  (avalaible in the MEDA toolbox)

Reference: Camacho, J.; Rodriguez-Gomez, R.; Saccenti, E. Group-wise principal component analysis for exploratory data analysis. Journal of Computational and Graphical Statistics 2016, in press. Get the paper.


Network analysis

R code for the PCLRC algorithm. Get the code.

Reference: Saccenti, E.; Suarez-Diez, M.; Luchinat, C.; Santucci, C.; Tenori, L., Probabilistic Networks of Blood Metabolites in Healthy Subjects As Indicators of Latent Cardiovascular Risk Journal of Proteome Research 2014, 14, 1101-1111. Get the paper.

R code for the DECA algorithm. Get the code.

Reference:  Venkatasubramanian, P. B.; Toydemir, G.;  de Wit, N.; Saccenti, E.; Martins dos Santos, V. A. P.;  van Baarlen, P.; Wells, Jelly.;  Mes, J.;  Use of Microarray Datasets to generate Caco-2-dedicated Networks and to identify Reporter Genes of Specific Pathway Activity” submitted

 

 

Data sets

Koehorst, J. J. et al. Comparison of 432 Pseudomonas strains through integration of genomic, functional, metabolic and expression data. Sci. Rep. 6, 38699; doi: 10.1038/srep38699 (2016).
Dataset can be found here


Benis, Nirupama; Interactions and functionalities of the gut revealed by computational approaches; Thesis (available soon)
Supplementary data; Data for Chapter 3
Supplementary data; Data for Chapter 6


Saccenti, E.; Camacho, J., Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods Chemometrics and Intelligent Laboratory Systems. 2015, 149, Part A, 99-116. Get the paper.
Data generated under a spiked model with normal distribution; Data for scheme A
Data with correlated variables – Peres-Neto simulation; Data for scheme B
Spectroscopic like data with varying levels of homoscedastic noise; Data for scheme C
Same as A but with skewed data distribution; Data for scheme D
Same as C but with heteroscedastic noise; Data for scheme E