SEGtool

An R Package For Specifically Expressed Gene Detection

Zhang Qiong

Email: zhangqiong@hust.edu.cn

Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China

SEGtool is an R package with self-adaptive function and high accuracy for specifically expressed gene (SEG, also known as tissue specific gene) detection. SEGs are essentially outliers in a given condition (or different treatments, tissues). In order to detect such outliers, SEGtool combines modified fuzzy c-means (FCM), Jaccard index and greedy annealing methods to detect SEGs. SEGtool can intellectually detect both high and low SEGs in numeric expression dataset. It provide a HTML results page which contains an overall insight of the SEGs information for the entire dataset.

The SEGtool package is easy-to-use and suitable for different types expression data. It requires an input of matrix with gene expression in numeric-scale, e.g. processed microarray dataset (RMA or MAS processed value) or RNA-seq expression (rpm, rpkm, fpkm, rsem etc.).

Citation:

SEGtool: a specifically expressed gene detection tool and applications in human tissue and single-cell sequencing data. Qiong Zhang, Wei Liu, Chunjie Liu, Sheng-Yan Lin, An-Yuan Guo. Briefings in Bioinformatics, 2017; doi: 10.1093/bibx074.

SEGtool could be used for:

Package and Manual Download

SEGtool R Package:
SEGtool_1.3.tar.gz
SEGtool Manual:
SEGtool_Manual.pdf
EBI test expression dataset:
EBI test expression dataset
GTEx test expression dataset:
GTEx test expression dataset
Single Cell Sequence (SCS) test expression dataset:
SCS test expression dataset

System requirements

This package can be used in UNIX/LINUX and was developed under R 3.1.1 on the ubuntu 12.04 operating system. The package on WINDOWS OS could not call multi-core cpu for speeding , and only perform on single core.

The memory occupation depends on the sample size of the input datasets. 336M RAM and 4min were taken in an E7-4820 computer using 4 Cores while handling 39 tissues that all of them have 60533 genes (EBI test dataset, each cell is float with 4 significant figures). SEGtool used 336M RAM and 8min to complete the SEG analysis using default parameters on GTEx dataset (56238 genes in 53 tissues). Because of the time consumings, default option will not draw plot figure for each SEG's expression (or this it will spend a long time).

In order to implement our package, the R software and following R packages including :

ggplot2 (single gene expression plot figure needed)

hwriterPlus ( UNIX/LINUX html report needed )

parallel ( R contributed packages,UNIX/LINUX platform run multi-cpu required )

pheatmap ( heatmap needed)

svglite ( html report needed )

All the packages required can be downloaded from CRAN or BIoconductor or click the hyperlinks above.

Procedures

  • Specific Expression Patterns Detection, using a Tukey-biweight modified fuzzy C-mean(FCM) clustering algorithm method
  • Principle Component Analysis (PCA) for the samples with SEGs
  • Cluster analysis for genes and samples
  • Represent SEGs in different samples
  • Plotting all the analysis results
  • Generate html report

Workflow of SEGtool

Procedure

Here is an example for demostrating how to use SEGtool in R environment

An simple example for SEGtool usage, click here

Demo result Download (The SEGtool analysis result of EBI dataset)

Demo_result.zip

Demo HTML result (The SEGtool analysis result of EBI dataset)

Demo HTML result for SEGtool output, click here