STEM分析
BMC Bioinformatics
Software
BioMed Central
Open Access
STEM: a tool for the analysis of short time series gene expression
data
JasonErnst* and ZivBar-Joseph
Address: Center for Automated and Learning and Discovery, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
Email: JasonErnst*-jernst@cs.cmu.edu; ZivBar-Joseph-zivbj@cs.cmu.edu* Corresponding author
Published: 05 April 2006BMC Bioinformatics2006, 7:191
doi:10.1186/1471-2105-7-191
This article is available from: http://wendang.chazidian.com/1471-2105/7/191
Received: 12 December 2005Accepted: 05 April 2006
© 2006Ernst and Bar-Joseph; licensee BioMed Central Ltd.
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Background: Time series microarray experiments are widely used to study dynamical biologicalprocesses. Due to the cost of microarray experiments, and also in some cases the limitedavailability of biological material, about 80% of microarray time series experiments are short (3–8time points). Previously short time series gene expression data has been mainly analyzed usingmore general gene expression analysis tools not designed for the unique challenges andopportunities inherent in short time series gene expression data.
Results: We introduce the Short Time-series Expression Miner (STEM) the first software programspecifically designed for the analysis of short time series microarray gene expression data. STEMimplements unique methods to cluster, compare, and visualize such data. STEM also supportsefficient and statistically rigorous biological interpretations of short time series data through itsintegration with the Gene Ontology.
Conclusion: The unique algorithms STEM implements to cluster and compare short time seriesgene expression data combined with its visualization capabilities and integration with the GeneOntology should make STEM useful in the analysis of data from a significant portion of allmicroarray studies. STEM is available for download for free to academic and non-profit users atBackground
Microarray time series gene expression experiments arewidely used to study a range of biological processes suchas the cell cycle [1], development [2], and immuneresponse [3]. Based on an analysis of the Gene ExpressionOmnibus [4], approximately a third of all microarraystudies involve time series experiments with three or moretime points, and of these time series experiments over80% contain no more than eight time points (Figure 1).In many cases experimental costs prevent data from moretime points from being collected. In some studies, partic-
ularly clinical studies, the availability of biological mate-rial can limit the number of time points collected. Thus,even if the price of microarray experiments were to godown short time series expression experiments wouldremain prevalent.
In this paper we introduce the Short Time-series Expres-sion Miner (STEM), the first software applicationdesigned specifically for the analysis of short time seriesgene expression datasets (3–8 time points). Data fromshort time series gene expression experiments poses
所谓的短时间序列指的是涉及的时间节点比较少
BMC Bioinformatics 2006, 7:191Figure 1
Distribution of microarray experiments by type. Sum-mary of the 786 microarray datasets for human, mouse, rat, and yeast in the Gene Expression Omnibus as of August 2005. As can be seen, 27.5% of the sets are time series experiments with 3–8 time points. All of these sets were labeled as either time, development, or age in the database. An additional 1% percent contains other types of sequential experiments including dose or temperature response, with 3–8 different levels.
unique challenges. In these experiments thousands ofgenes are being profiled simultaneously while the numberof time points is few. In such cases many genes will havethe same expression pattern just by random chance. Fur-thermore as with any time series experiment, there areusually few, if any, full time series repeats from which to
内容需要下载文档才能查看gain statistical power. STEM uses a method of analysis thattakes advantage of the number of genes being large andthe number of time points being few to identify statisti-cally significant temporal expression profiles and thegenes associated with these profiles [5]. STEM also sup-ports Gene Ontology (GO) [6] enrichment analyses forsets of genes having the same temporal expression patternproviding the means for an efficient and statistically rigor-ous biological interpretation of significant temporalexpression patterns. The integration of STEM with GO isbidirectional. STEM can easily determine and visualize thebehavior of genes belonging to a given GO category, iden-tifying which temporal expression profiles were enrichedfor genes in that category. Finally, STEM also supports theability to compare temporal responses of genes acrossexperimental conditions.
The novel clustering algorithm which STEM implementsfor short time series expression data is briefly reviewed inthe Implementation section. For a detailed discussion ofthe clustering algorithm including experimental results onsimulated data and a comparison with the k-means clus-
http://wendang.chazidian.com/1471-2105/7/191
tering algorithm on real biological data using GO we referthe reader to [5]. The main focus of this paper is onSTEM's integration with GO, its support for comparingdata sets across experimental conditions, its visualizationcapabilities, and a comparison with related software.To date, researchers analyzing short time series expressiondata relied mainly on two types of software. The first isgeneral gene expression analysis software implementingmethods which do not take advantage of the sequentialinformation in time series data. The second is gene expres-sion time series analysis software implementing methodsprimarily designed for longer time series. General methodsfor gene expression analysis that are frequently applied totime series expression data include popular clusteringmethods such as hierarchical clustering [7], k-means clus-tering [8], and self-organizing maps [9]. These standardclustering methods ignore the temporal dependencyamong successive time points. Specifically, if we were torandomly permute the order of time points, the results ofthese methods would not change. Two software packagesavailable for clustering time series gene expression thatimplement methods that take advantage of the temporaldependency of time points are the Graphical Query Lan-guage (GQL) [10] and the Cluster Analysis of GeneExpression Dynamics (CAGED) [11]. GQL implements aclustering algorithm based on a mixture of hiddenmarkov models. CAGED implements a clustering algo-rithm based on autoregressive equations. Unlike STEMthese methods generally require the estimation of manyparameters and are thus less appropriate for short timeseries data. Also unlike STEM, both standard clusteringmethods and previously suggested temporal analysismethods do not differentiate between real and randompatterns. This is a particular problem for short time seriesexpression data since, as mentioned above, many genesmay have the same expression pattern by random chance.A detailed comparison of STEM with the software imple-menting methods of analysis primarily designed forlonger time series appears in the Discussion section of thispaper.
STEM is freely available for download at [12] for non-commercial research purposes. A comprehensive anddetailed manual is also available at [12] and as Additionalfile 1 to this paper.
Implementation
STEM is implemented entirely in Java and will work withany operating system supporting Java 1.4 or later. Por-tions of the interface of STEM are implemented using athird party library, the Java Piccolo toolkit from the Uni-versity of Maryland [13]. STEM also makes use of externalGene Ontology and gene annotation files. STEM candownload these files directly from the websites of the
Figure 2
STEM input interface. The image shows the STEM input interface, which is divided into four sections. In the top section a user specifies the gene expression data and normalization options. In the second section a user specifies the gene annotation source, in this case the annotations are selected to be Human annotations from the European Bioinformatics Institute. In the third section a user specifies to either use the STEM clustering method or k-means, and can also change various parameter set-
内容需要下载文档才能查看tings. The fourth section of the interface contains the execute button.Gene Ontology [14] or European Bioinformatics Insti-tutes [15].
A user of STEM first specifies a tab delimited gene expres-sion data file as input to STEM. Next, the user specifies agene annotation source, and may adjust default parame-ters through the input interface shown in Figure 2. Follow-ing the input phase, the STEM clustering algorithmexecutes and a new window will appear displaying theclustering results (Figure 3). From this new window, a userwill have the option to specify a comparison data set.The novel clustering algorithm that STEM implementstakes advantage of there being only a few time points in a
dataset. The clustering algorithm first selects a set of dis-tinct and representative temporal expression profiles(which we will refer to as model profiles from now on).These model profiles are selected independent of the data.The procedure for selecting the model profiles, and theo-retical guarantees that the models profiles selected are rep-resentative and distinct appear in [5]. See Figure 3 for anexample of a set of model profiles. The clustering algo-rithm then assigns each gene passing the filtering criteria(see Additional file 1 for details on gene filtering) to themodel profile that most closely matches the gene's expres-sion profile as determined by the correlation coefficient.Since the model profiles were selected independent of thedata, the algorithm can then determine which profiles
Figure 3
Example model profiles overview interface. The example data is drawn from an experiment measuring the response of gastric epithelial cells infected with the vacA-mutant strain of the pathogen Helicobacter pylori [3]. The data was sampled at five time points 0 h, .5 h, 3 h, 6 h, and 12 h. The data set was filtered to contain only the 2989 genes with no missing data (though STEM can handle missing data without filtering, see Additional file 1) that exhibited a .8 log base two fold increase or decrease for at least one time point. The number in the top left-hand corner of a profile box is the profile ID number. The colored pro-files had a statistically significant number of genes assigned. Non-white profiles of the same color represent profiles grouped into a single cluster. By clicking on one of the buttons along the bottom of the window, a dialog window appears by which the profiles can be reordered by various criteria. Another button displays a table of all genes passing filter and the profile to which
内容需要下载文档才能查看they were assigned. Clicking on a profile box brings up detailed information about the profile (Figure 5).
have a statistically significant higher number of genesassigned using a permutation test. This test determines anassignments of genes to model profiles using a largenumber of permutations of the time points (or columns).It then uses standard hypothesis testing to determinewhich model profiles have significantly more genesassigned under the true ordering of time points comparedto the average number assigned to the model profile in thepermutation runs. Significant model profiles can either beanalyzed independently, or grouped together based onsimilarity to form clusters of significant profiles.Based on a reviewer's suggestion, STEM now also providesan implementation of the k-means clustering algorithm. Auser thus has the option to compare directly within STEM,results of STEM's novel clustering method with those pro-duced using k-means. A user that still prefers the k-meansclustering methodology for clustering short time seriesdata, or is interested in using k-means to cluster othertypes of data for which the STEM clustering method doesnot apply, may still be interested in using STEM's imple-mentation of k-means in order to leverage STEM's visuali-zation capabilities and integration with GO. The results
Figure 4
Model profiles reordered interface. The profiles from Figure 3 are reordered based on actual size based p-value enrich-ment for genes being annotated as belonging to the GO category DNA metabolism. For each profile the number of DNA
内容需要下载文档才能查看metabolism genes assigned to it and the enrichment p-value appears in the lower left corner of the profile box.and discussion of STEM in this paper are presented usingSTEM's novel clustering method. For details on using thek-means clustering algorithm with STEM see Additionalfile 1.
cluster of significant profiles. By default profiles on themain window are ordered such that significant profilesappear before non-significant profiles, and among signif-icant profiles those profiles of the same color appear nextto each other. The profiles can be reordered based on thenumber of genes assigned, the number of genes expected,or their significance p-value. Additionally as we discussbelow, the profiles can also be reordered based on theirrelevance to a given GO category (Figure 4), a user definedgene set, or profile(s) from a comparison experiment.When the profiles are reordered relevant informationappears in the profile boxes.
The model overview screen is designed such that bydefault a user can visualize all profiles simultaneously, butas a result each profile box needs to be relatively small. Attimes however, a user will be interested in focusing on a
Results
Model profiles overview interface
A screenshot of the main interface window of STEMappears in Figure 3. In this window each box correspondsto one of the model temporal expression profiles. Clickingon a profile box displays a new window, described in thenext subsection, with detailed information about the pro-file. The colored profiles have a statistically significantnumber of genes assigned. Colored profiles which havethe same color are all similar to each other (based on cor-relation coefficients, see Additional file 1 for moredetails). These profiles are grouped together to form a
下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 2011年08月30日龙湖集团建筑工程质量缺陷及防治措施
- 水利工程质量评定表
- 膨胀土路基施工QC成果
- 市政道路工程质量检验评定标准
- 模板 质量安全责任书
- 建筑工程质量通病防治措施手册
- 贵州省建筑消防设施工程质量检测申报表
- 建筑施工QC成果报告
- 某集团公司精装修工程质量通病防治措施
- 工程质量自评报告
- 天津分公司质量通病防治手册
- 质量保证措施之确保工程质量的技术措施
- 消防安装工程质量保修书
- 水暖工程质量通病防控措施PPT
- (最新)连续梁大体积混凝土外观质量控制-QC成果
- 外网工程质量评估报告
- 某交通工程建设集团质量管理与责任考核办法
- QC小组优秀报告-提高公路施工质量
- 水库工程质量与安全监督报告(范文)
- 优质工程工程质量控制资料检查评分表(土11)
- 珠海市建筑工程质量通病防治措施2012版
- 2011广联达钢筋算量学习教程
- 龙湖集团建筑工程质量缺陷及防治措施(2011年08月30日)
- 海工混凝土结构外观质量控制技术-5
- 火电施工质量检验及评定标准 水处理及制氢装置篇(2001年版)
- 防止质量通病的技术措施
- 宝山钢铁股份有限公司不锈钢分公司工程质量通病防治手册(DOC 30页)
- 工程质量保修书
- 某大道改造工程项目质量计划
- 分项、分项工程质量验收证明书(主体)
网友关注视频
- 8.对剪花样_第一课时(二等奖)(冀美版二年级上册)_T515402
- 冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 7
- 苏教版二年级下册数学《认识东、南、西、北》
- 【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,辽宁省
- 冀教版小学数学二年级下册第二单元《有余数除法的简单应用》
- 外研版八年级英语下学期 Module3
- 【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
- 冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣
- 河南省名校课堂七年级下册英语第一课(2020年2月10日)
- 外研版英语七年级下册module1unit3名词性物主代词讲解
- 小学英语单词
- 冀教版小学数学二年级下册第二单元《余数和除数的关系》
- 飞翔英语—冀教版(三起)英语三年级下册Lesson 2 Cats and Dogs
- 沪教版八年级下册数学练习册21.3(3)分式方程P17
- 19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T3763925
- 冀教版小学英语四年级下册Lesson2授课视频
- 人教版历史八年级下册第一课《中华人民共和国成立》
- 【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,江苏省
- 第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T1406126
- 北师大版数学四年级下册第三单元第四节街心广场
- 冀教版小学英语五年级下册lesson2教学视频(2)
- 外研版英语三起6年级下册(14版)Module3 Unit2
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,天津市
- 第8课 对称剪纸_第一课时(二等奖)(沪书画版二年级上册)_T3784187
- 七年级下册外研版英语M8U2reading
- 【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,安徽省
- 外研版英语七年级下册module3 unit2第一课时
- 化学九年级下册全册同步 人教版 第22集 酸和碱的中和反应(一)
- 【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,辽宁省
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理