> 8:7 Abjbj .0A $$0^``````$n^^0[Q>Q
J0///8/ Ii: BINF702 BIOLOGICAL DATA ANALYSIS - SPRING 2017
Instructor - Jeff Solka, 540-809-9799 (Cell), jlsolka@gmail.com
Office Hours - By Appointment
Schedule - Mondays 4:30 p.m. - 7:10 p.m. in Prince William Campus, Ocaquan Rm. 204B
Texts - Gareth James(Author),Daniela Witten(Author),Trevor Hastie(Author),Robert Tibshirani(Author), An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)Hardcover August 12, 2013
Peter Dalgaard, Introductory Statistics With R (required)
Wim P. Krijnen, Applied Statistics for Bioinformatics using R, freely under the GNU Free document License
Grading - Grades will be based on a take home midterm, a final and an independent final project with an associated 8-10 page paper and 15 minute class presentation. All assessments will be open book and notes. Each of these will contribute to your grade as follows.
Take home midterm (40%), Take home final (40%),Final Project and Paper (10%) and Final Project and Paper Presentation (10%)
Students will be allowed to work in teams of 2 on their projects.
Weekly homework assignments will be provided but will not be graded.Solutions to the homework assignments will be provided each week. Grading will be on the following scale. 97-100 (A+), 93-96 (A), 90-92 (A-), 87-89 (B+), 80-86 (B), less that 80 C, Student averages will be rounded to the closest integer to determine final letter grades.
Guidence on the Course Project
Project Proposal: Student teams must prepare a brief proposal, 2-4 pages, describing the independent project and must submit this proposal no later than March 14, 2016. The proposal should be divided into four sections:
1. Background and objectives: A description of the background of the biological system and the question(s) that you hope to answer. In many cases this might involve reinvestigating a dataset that was already covered in the literature by other authors, i.e. the Golub data.
2. Computational methods: The computational methods that you intend to use to answer the question(s) in your proposal.
3. Discussion: A brief description of how you plan to evaluate the biological significance of the results of your computer analysis. It's very important in science to motivate your audience to care about your work with its Impact or Significance.
4. Several references describing the background of your proposed project.
The proposal will not be graded, because its sole purpose is to determine whether the objectives of the project are reasonable and interesting.
Please note that the final project should be designed to test a biological hypothesis..
Final Report: The final report should be in the form of a scientific paper, divided into the following sections: (1) Abstract, (2) Background and objectives, (3) Computational methods, (4) Results and discussion, (5) Conclusions, (6) A brief description of how the conclusions of your analyses could be tested using biochemical or genetic techniques, (7) References.
References: Please follow the Cell Journal guidelines for references EXACTLY. I highly recommend that you use a referencing and bibliography software package like EndNote, Zotero, bibtex etc. (It will make your life much easier!) References in the text should include the authors names and dates:
- One author: (Pearson, 1996)- Two authors: (Smith and Waterman, 1981)- Three or more authors: (Altschul et al., 1990)- Multiple references: (Pearson, 1996; Smith and Waterman, 1981; Altschul et al., 1990)
The references in the bibliography should also adhere to the Cell Journal format:
- Journal article: Lipman, D.J., Pearson, W.R. (1985). Rapid and sensitive protein similarity searches. Science 227, 1435-1441.- Book chapter: Schuler G.D. (1998). Sequence alignment and database searching. In: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, AD Baxevani and BFF Ouellette, eds. Wiley Interscience, New York, NY.
Organization: Please try to organize theinformation and the interpretations as clearly as possible. It is unreasonable to expect the reader to hunt through large numbers of pages to find data supporting a specific conclusion. There are two acceptable ways of organizing the figures. First, thedata and text can be integrated into the body of the paper. Second, thedata can be compiled into a series of clearly-labeled appendices.
Figures: Every figure should have a caption adequately describing the contents of the figure without having to resort to reading the main text. There must be at least 5 figures created by the student, and at least 4 of them must be created in R.
Length: The final report should be 10-12 pages double-spaced, not including computer output or references.
Presentation: The last lecture session will be devoted to oral presentations of the projects.
Course Content
Week 1 1/23/17
Biological Data Handout
R Syntax and Development Handout
Golub Handout
Golub Paper
Krijnen Chpt1 Exercises
Week 2 1/30/17
Data Display and Descriptive Statistics
Microarrays
Krijnen Chapter 2 Exercises
Week 3 2/6/17
Probability Problems Worksheat
Week 4 2/13/17
Statistical Distributions
Krijnen Chapter3 Exercises
Week 5 2/20/17
Estimation and Inference
Krijnen Chapter4 Exercises
Week 6 2/27/17
Linear Models
Krijnen Chapter 5 Exercises
ISLR Chpt. 3 Lab
Week 7 3/6/17
Micro Array Analysis
Krijnen Chapter 6 Exercises
Take home Midterm Provided
Spring Break 3/13/17 -- 3/19/17
Week 8 3/20/17
Cluster Analysis
Krijnen Chapter 7 Exercises
ISLR Chpt. 10 Lab
Week 9 3/27/17
Classification Methods-I
Krijnen Chapter 8 Exercises
Take home Midterm Due
Week 10 4/3/17
Classification Methods II
ISLR Chpt. 4 Lab
Week 11 4/10/17
Rshiny and Sequence Analysis
Krijnen Chpt. 9 Exercises
Week 12 4/17/17
Running at Scale
RShiny Lab; Running at Scale Lab
Week 13 4/24/17 Student Presentations
Take Home Final Provided
Week 14 5/1/17 Student Presentations
5/8/17 Take Home Final Due
./9o{x
R^v+2 "(
]tʶʶʶʶʶʨʨʨʔʶʶʶʶʶʶʶʶʶʶʂth==CJOJPJQJaJ#hYM5>*CJOJPJQJ\aJ&hYMhYM5CJOJPJQJ\aJhG
CJOJPJQJaJ&hYMhYM5CJOJPJQJ\aJ hYMhYMCJOJPJQJaJ hYM5CJ$OJPJQJ\aJ$&hYMhYM5CJ$OJPJQJ\aJ$-/o
w
x
B
iR dgdYMddd[$\$gdYM$ddd[$\$a$gdYMv+"
"CQ]uv+F dgdYMddd[$\$gdYM8ddd[$\$^8gdYMtv+EFTn-.MO]oݻݻϭ|kϭ hYMh$T\CJOJPJQJaJ h$T\5CJOJPJQJ\aJh$T\CJOJPJQJaJ#hFhF5CJOJPJQJaJhYMCJOJPJQJaJ&hYMhYM5CJOJPJQJ\aJh==CJOJPJQJaJ#hYM5>*CJOJPJQJ\aJ hYMhYMCJOJPJQJaJ*FUn-.NO^oddd[$\$gd$T\ddd[$\$gd== dgdYMddd[$\$gdYM"#56BEFbc|}~νzzizziTzziT)hYMhYM5>*CJOJPJQJ\aJ hYMh==CJOJPJQJaJh==CJOJPJQJaJ hYMhYMCJOJPJQJaJ#hYM5>*CJOJPJQJ\aJ#h$T\5>*CJOJPJQJ\aJ hYMhFCJOJPJQJaJ#hYMhF5CJOJPJQJaJ#hFhF5CJOJPJQJaJhYMCJOJPJQJaJ#6Fc}$%&A dgdYMddd[$\$gdYMddd[$\$gdF#%&Aȷh$T\h$T\5>*CJh%~E hYMhYMCJOJPJQJaJ#h$T\5>*CJOJPJQJ\aJ#hYM5>*CJOJPJQJ\aJ&hFhF5CJOJPJQJ\aJ,1h/ =!"#$%j666666666vvvvvvvvv666666>6666666666666666666666666666666666666666666666666hH66666666666666660666666666666666666666666666666666666666666660666p62 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~ OJPJQJ_HmH nH sH tH J`JNormaldCJ_HaJmH sH tH DA D
Default Paragraph FontRiR
0Table Normal4
l4a(k (
0No List*W`*YM`Strong5\`^@` YM0Normal (Web)ddd[$\$CJOJPJQJ^JaJPK![Content_Types].xmlN0EH-J@%ǎǢ|ș$زULTB l,3;rØJB+$G]7O٭VN3JZ|wjk۾xJ{~?(xJ3
kz^BJ
tɯUjsȄVxJ|jȫKN1aVkzȒ>$"AbG,fxFP !Px33åJ_7f-y1$8|hWW~}O=;yk6re@T{7>6zu_zǟr+^o^<~~nNЁs>vhYCDtv<(Fr
Ȃ`3Ax&sA,ogvXbM9<'O:.BGYg2AE@Sc!Ɩ= Ĉ>%p5$Cr`T /AȷNQ۪D½S#\r"|FrHF:dz)szc̹vՒ~dƞ}Ld"ȡbLGvahfH!(r0a3אoM}t@auJK.cf`A'+`zD3E~MNADOxaYtc#vBwӍ5 ߆[%cv;nQ?ʶm>tmk]Wh}B({\m9tqzn,N 4AIpˮt2)wf_
[}K4j!Rtz+S$ m/BB$Q/e9BB콰hZX4e6X<+mr`r}Lc4dLoQXV*MuR;G
Z$TdT!:yh\4UJ
z2j>(z],.k[JAc֪>Zk4rޟDQYE0
T
""pP\<
4V+ ,&ʇFn&O&x$k#2%(|_Ғ!݃p|yrApxTN9&J3U5LvwqDg!:.)\IyNG]1Ю5C@d`*T]#尵m$#g"]Ōm`-keAJ6ZOȻ<랣!hV$Mc3Ih_[][#:?حW-MJiuO ]x
A0
tAFAL#@0(
B
S ?AF=G37ouw}O
W
#)2:Zf*.FLcjkoCC pE==%~EYM$T\@:\G
FAC@A@UnknownG*Ax Times New Roman5Symbol3.*Cx Arial7.@CalibriACambria Math"qhk:SGk:SGx*x*!r066KHP $P\2!xxjlsolka@gmail.comMeghana Pr BonthaOh+'00
px
(jlsolka@gmail.comNormalMeghana Pr Bontha2Microsoft Office Word@@Q@Qx՜.+,0hp
George Mason University*6Title
!"#$%&()*+,-.01234569Root Entry Fnd>Q;1Table/WordDocument.0SummaryInformation('DocumentSummaryInformation8/CompObjr
F Microsoft Word 97-2003 Document
MSWordDocWord.Document.89q