教育资源为主的文档平台

当前位置: 查字典文档网> 所有文档分类> 高等教育> 理学> STEM分析

STEM分析

上传者:杜艳红
|
上传时间:2015-04-15
|
次下载

STEM分析

BMC Bioinformatics

Software

BioMed Central

Open Access

STEM: a tool for the analysis of short time series gene expression

data

JasonErnst* and ZivBar-Joseph

Address: Center for Automated and Learning and Discovery, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA

Email: JasonErnst*-jernst@cs.cmu.edu; ZivBar-Joseph-zivbj@cs.cmu.edu* Corresponding author

Published: 05 April 2006BMC Bioinformatics2006, 7:191

doi:10.1186/1471-2105-7-191

This article is available from: http://wendang.chazidian.com/1471-2105/7/191

Received: 12 December 2005Accepted: 05 April 2006

© 2006Ernst and Bar-Joseph; licensee BioMed Central Ltd.

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background: Time series microarray experiments are widely used to study dynamical biologicalprocesses. Due to the cost of microarray experiments, and also in some cases the limitedavailability of biological material, about 80% of microarray time series experiments are short (3–8time points). Previously short time series gene expression data has been mainly analyzed usingmore general gene expression analysis tools not designed for the unique challenges andopportunities inherent in short time series gene expression data.

Results: We introduce the Short Time-series Expression Miner (STEM) the first software programspecifically designed for the analysis of short time series microarray gene expression data. STEMimplements unique methods to cluster, compare, and visualize such data. STEM also supportsefficient and statistically rigorous biological interpretations of short time series data through itsintegration with the Gene Ontology.

Conclusion: The unique algorithms STEM implements to cluster and compare short time seriesgene expression data combined with its visualization capabilities and integration with the GeneOntology should make STEM useful in the analysis of data from a significant portion of allmicroarray studies. STEM is available for download for free to academic and non-profit users atBackground

Microarray time series gene expression experiments arewidely used to study a range of biological processes suchas the cell cycle [1], development [2], and immuneresponse [3]. Based on an analysis of the Gene ExpressionOmnibus [4], approximately a third of all microarraystudies involve time series experiments with three or moretime points, and of these time series experiments over80% contain no more than eight time points (Figure 1).In many cases experimental costs prevent data from moretime points from being collected. In some studies, partic-

ularly clinical studies, the availability of biological mate-rial can limit the number of time points collected. Thus,even if the price of microarray experiments were to godown short time series expression experiments wouldremain prevalent.

In this paper we introduce the Short Time-series Expres-sion Miner (STEM), the first software applicationdesigned specifically for the analysis of short time seriesgene expression datasets (3–8 time points). Data fromshort time series gene expression experiments poses

所谓的短时间序列指的是涉及的时间节点比较少

BMC Bioinformatics 2006, 7:191Figure 1

Distribution of microarray experiments by type. Sum-mary of the 786 microarray datasets for human, mouse, rat, and yeast in the Gene Expression Omnibus as of August 2005. As can be seen, 27.5% of the sets are time series experiments with 3–8 time points. All of these sets were labeled as either time, development, or age in the database. An additional 1% percent contains other types of sequential experiments including dose or temperature response, with 3–8 different levels.

unique challenges. In these experiments thousands ofgenes are being profiled simultaneously while the numberof time points is few. In such cases many genes will havethe same expression pattern just by random chance. Fur-thermore as with any time series experiment, there areusually few, if any, full time series repeats from which to

内容需要下载文档才能查看

gain statistical power. STEM uses a method of analysis thattakes advantage of the number of genes being large andthe number of time points being few to identify statisti-cally significant temporal expression profiles and thegenes associated with these profiles [5]. STEM also sup-ports Gene Ontology (GO) [6] enrichment analyses forsets of genes having the same temporal expression patternproviding the means for an efficient and statistically rigor-ous biological interpretation of significant temporalexpression patterns. The integration of STEM with GO isbidirectional. STEM can easily determine and visualize thebehavior of genes belonging to a given GO category, iden-tifying which temporal expression profiles were enrichedfor genes in that category. Finally, STEM also supports theability to compare temporal responses of genes acrossexperimental conditions.

The novel clustering algorithm which STEM implementsfor short time series expression data is briefly reviewed inthe Implementation section. For a detailed discussion ofthe clustering algorithm including experimental results onsimulated data and a comparison with the k-means clus-

http://wendang.chazidian.com/1471-2105/7/191

tering algorithm on real biological data using GO we referthe reader to [5]. The main focus of this paper is onSTEM's integration with GO, its support for comparingdata sets across experimental conditions, its visualizationcapabilities, and a comparison with related software.To date, researchers analyzing short time series expressiondata relied mainly on two types of software. The first isgeneral gene expression analysis software implementingmethods which do not take advantage of the sequentialinformation in time series data. The second is gene expres-sion time series analysis software implementing methodsprimarily designed for longer time series. General methodsfor gene expression analysis that are frequently applied totime series expression data include popular clusteringmethods such as hierarchical clustering [7], k-means clus-tering [8], and self-organizing maps [9]. These standardclustering methods ignore the temporal dependencyamong successive time points. Specifically, if we were torandomly permute the order of time points, the results ofthese methods would not change. Two software packagesavailable for clustering time series gene expression thatimplement methods that take advantage of the temporaldependency of time points are the Graphical Query Lan-guage (GQL) [10] and the Cluster Analysis of GeneExpression Dynamics (CAGED) [11]. GQL implements aclustering algorithm based on a mixture of hiddenmarkov models. CAGED implements a clustering algo-rithm based on autoregressive equations. Unlike STEMthese methods generally require the estimation of manyparameters and are thus less appropriate for short timeseries data. Also unlike STEM, both standard clusteringmethods and previously suggested temporal analysismethods do not differentiate between real and randompatterns. This is a particular problem for short time seriesexpression data since, as mentioned above, many genesmay have the same expression pattern by random chance.A detailed comparison of STEM with the software imple-menting methods of analysis primarily designed forlonger time series appears in the Discussion section of thispaper.

STEM is freely available for download at [12] for non-commercial research purposes. A comprehensive anddetailed manual is also available at [12] and as Additionalfile 1 to this paper.

Implementation

STEM is implemented entirely in Java and will work withany operating system supporting Java 1.4 or later. Por-tions of the interface of STEM are implemented using athird party library, the Java Piccolo toolkit from the Uni-versity of Maryland [13]. STEM also makes use of externalGene Ontology and gene annotation files. STEM candownload these files directly from the websites of the

Figure 2

STEM input interface. The image shows the STEM input interface, which is divided into four sections. In the top section a user specifies the gene expression data and normalization options. In the second section a user specifies the gene annotation source, in this case the annotations are selected to be Human annotations from the European Bioinformatics Institute. In the third section a user specifies to either use the STEM clustering method or k-means, and can also change various parameter set-

内容需要下载文档才能查看

tings. The fourth section of the interface contains the execute button.Gene Ontology [14] or European Bioinformatics Insti-tutes [15].

A user of STEM first specifies a tab delimited gene expres-sion data file as input to STEM. Next, the user specifies agene annotation source, and may adjust default parame-ters through the input interface shown in Figure 2. Follow-ing the input phase, the STEM clustering algorithmexecutes and a new window will appear displaying theclustering results (Figure 3). From this new window, a userwill have the option to specify a comparison data set.The novel clustering algorithm that STEM implementstakes advantage of there being only a few time points in a

dataset. The clustering algorithm first selects a set of dis-tinct and representative temporal expression profiles(which we will refer to as model profiles from now on).These model profiles are selected independent of the data.The procedure for selecting the model profiles, and theo-retical guarantees that the models profiles selected are rep-resentative and distinct appear in [5]. See Figure 3 for anexample of a set of model profiles. The clustering algo-rithm then assigns each gene passing the filtering criteria(see Additional file 1 for details on gene filtering) to themodel profile that most closely matches the gene's expres-sion profile as determined by the correlation coefficient.Since the model profiles were selected independent of thedata, the algorithm can then determine which profiles

Figure 3

Example model profiles overview interface. The example data is drawn from an experiment measuring the response of gastric epithelial cells infected with the vacA-mutant strain of the pathogen Helicobacter pylori [3]. The data was sampled at five time points 0 h, .5 h, 3 h, 6 h, and 12 h. The data set was filtered to contain only the 2989 genes with no missing data (though STEM can handle missing data without filtering, see Additional file 1) that exhibited a .8 log base two fold increase or decrease for at least one time point. The number in the top left-hand corner of a profile box is the profile ID number. The colored pro-files had a statistically significant number of genes assigned. Non-white profiles of the same color represent profiles grouped into a single cluster. By clicking on one of the buttons along the bottom of the window, a dialog window appears by which the profiles can be reordered by various criteria. Another button displays a table of all genes passing filter and the profile to which

内容需要下载文档才能查看

they were assigned. Clicking on a profile box brings up detailed information about the profile (Figure 5).

have a statistically significant higher number of genesassigned using a permutation test. This test determines anassignments of genes to model profiles using a largenumber of permutations of the time points (or columns).It then uses standard hypothesis testing to determinewhich model profiles have significantly more genesassigned under the true ordering of time points comparedto the average number assigned to the model profile in thepermutation runs. Significant model profiles can either beanalyzed independently, or grouped together based onsimilarity to form clusters of significant profiles.Based on a reviewer's suggestion, STEM now also providesan implementation of the k-means clustering algorithm. Auser thus has the option to compare directly within STEM,results of STEM's novel clustering method with those pro-duced using k-means. A user that still prefers the k-meansclustering methodology for clustering short time seriesdata, or is interested in using k-means to cluster othertypes of data for which the STEM clustering method doesnot apply, may still be interested in using STEM's imple-mentation of k-means in order to leverage STEM's visuali-zation capabilities and integration with GO. The results

Figure 4

Model profiles reordered interface. The profiles from Figure 3 are reordered based on actual size based p-value enrich-ment for genes being annotated as belonging to the GO category DNA metabolism. For each profile the number of DNA

内容需要下载文档才能查看

metabolism genes assigned to it and the enrichment p-value appears in the lower left corner of the profile box.and discussion of STEM in this paper are presented usingSTEM's novel clustering method. For details on using thek-means clustering algorithm with STEM see Additionalfile 1.

cluster of significant profiles. By default profiles on themain window are ordered such that significant profilesappear before non-significant profiles, and among signif-icant profiles those profiles of the same color appear nextto each other. The profiles can be reordered based on thenumber of genes assigned, the number of genes expected,or their significance p-value. Additionally as we discussbelow, the profiles can also be reordered based on theirrelevance to a given GO category (Figure 4), a user definedgene set, or profile(s) from a comparison experiment.When the profiles are reordered relevant informationappears in the profile boxes.

The model overview screen is designed such that bydefault a user can visualize all profiles simultaneously, butas a result each profile box needs to be relatively small. Attimes however, a user will be interested in focusing on a

Results

Model profiles overview interface

A screenshot of the main interface window of STEMappears in Figure 3. In this window each box correspondsto one of the model temporal expression profiles. Clickingon a profile box displays a new window, described in thenext subsection, with detailed information about the pro-file. The colored profiles have a statistically significantnumber of genes assigned. Colored profiles which havethe same color are all similar to each other (based on cor-relation coefficients, see Additional file 1 for moredetails). These profiles are grouped together to form a

版权声明:此文档由查字典文档网用户提供,如用于商业用途请与作者联系,查字典文档网保持最终解释权!

下载文档

热门试卷

2016年四川省内江市中考化学试卷
广西钦州市高新区2017届高三11月月考政治试卷
浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
广西钦州市钦州港区2017届高三11月月考政治试卷
广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
广西钦州市高新区2016-2017学年高二11月月考政治试卷
广西钦州市高新区2016-2017学年高一11月月考政治试卷
山东省滨州市三校2017届第一学期阶段测试初三英语试题
四川省成都七中2017届高三一诊模拟考试文科综合试卷
2017届普通高等学校招生全国统一考试模拟试题(附答案)
重庆市永川中学高2017级上期12月月考语文试题
江西宜春三中2017届高三第一学期第二次月考文科综合试题
内蒙古赤峰二中2017届高三上学期第三次月考英语试题
2017年六年级(上)数学期末考试卷
2017人教版小学英语三年级上期末笔试题
江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
四川省简阳市阳安中学2016年12月高二月考英语试卷
四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
安徽省滁州中学2016—2017学年度第一学期12月月考​高三英语试卷
山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷

网友关注视频

8.对剪花样_第一课时(二等奖)(冀美版二年级上册)_T515402
冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
沪教版牛津小学英语(深圳用) 四年级下册 Unit 7
苏教版二年级下册数学《认识东、南、西、北》
【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,辽宁省
冀教版小学数学二年级下册第二单元《有余数除法的简单应用》
外研版八年级英语下学期 Module3
【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣
河南省名校课堂七年级下册英语第一课(2020年2月10日)
外研版英语七年级下册module1unit3名词性物主代词讲解
小学英语单词
冀教版小学数学二年级下册第二单元《余数和除数的关系》
飞翔英语—冀教版(三起)英语三年级下册Lesson 2 Cats and Dogs
沪教版八年级下册数学练习册21.3(3)分式方程P17
19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T3763925
冀教版小学英语四年级下册Lesson2授课视频
人教版历史八年级下册第一课《中华人民共和国成立》
【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,江苏省
第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T1406126
北师大版数学四年级下册第三单元第四节街心广场
冀教版小学英语五年级下册lesson2教学视频(2)
外研版英语三起6年级下册(14版)Module3 Unit2
【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,天津市
第8课 对称剪纸_第一课时(二等奖)(沪书画版二年级上册)_T3784187
七年级下册外研版英语M8U2reading
【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,安徽省
外研版英语七年级下册module3 unit2第一课时
化学九年级下册全册同步 人教版 第22集 酸和碱的中和反应(一)
【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,辽宁省