Practicum in Statistical Computing ( strongly recommended for MS-A3SR students)

COURSENUMBER:APSTA-GE.2352
CourseTitle:PracticuminStatisticalComputing
NumberofCredits:1-2(2creditversioninvolvesadditionalsupportforintermediateprogramming)
MeetingPattern:1hourperweek,14weeks.
Coursetime:Tuesdays,9:15-10:15am(1or2creditversion);Anadditional50minutesessionwillbe
arrangedforthosetakingthecoursefor2credits.Intermediateprogrammingconceptsarediscussed
inthissession,andonlineresourceswillbeutilizedpriortoeachclass.
Instructor:MarcScott
CourseDescription(~250wordsorless):
Thiscoursewillintroducethestudenttomodernstatisticalprogrammingandsimulationusingthe
languageR.Thecoreskillsareorientedaroundfirstunderstandingvariables,datastructures,program
flow(e.g.,conditionalexecution,looping)andfunctionalprogramming,thenapplyingtheseskillsto
answerinterestingstatisticalquestionsinvolvingthecomparisonofgroups,whichiscoretostatistical
practice.Moststatisticalanalysiswillbemotivatedviasimulations,ratherthanmathematicaltheory.
Thecoursecontent(programminganddataanalysis)requiressignificantoutsidereadingand
programming.
CourseNotes:
•
•
•
•
Classsessionswillconsistroughlyoffourdistinctparts:1)Introductionofaprogramming
concept;2)Relatingthatconcepttosolvingastatisticalquestion;3)Exerciseinclass;4)
QuestionandAnswerregardingtheexerciseandbriefdiscussionofhomeworkassignment.
Startingwiththethirdclass,studentswillbeexpectedtomakeshortpresentationsofhowthey
“solved”thehomeworkexercise.Thesepresentationswillbeassignedonarandombasis
(anyonecouldbeaskedtopresentonanyclass).
WerequirethiscourseforMS-A3SRstudentswhohavenothadformalinstructionina
computersciencecoursesuchas“IntroductiontoProgramminginJavaorC”orwhohaveno
experiencewiththeprogramlanguageR.
AnaturalsequeltothiscourseisAPSTA-GE2017,EducationalDataSciencePracticum.
CourseCo-requisites/Expectations:
•
•
•
Ifthestudenthaslittlepriorexperiencewithstatistics,theymusttakeAPSTA-GE2003
concurrently.
AnonlineRcourseintroducingthekeyconceptsisREQUIREDBEFORETHEFIRSTCLASS.Goto
https://www.datacamp.com/courses/free-introduction-to-randaftercompletingthecourse,
generateaPDFofcertificationtobeHANDEDIN.
Programming,andparticularlydebugging,requiressubstantialpersistenceandcreative
explorationandproblemsolvingskills.Forthestudentwhoisnewtothistypeofwork,we
suggestspendingsometimepriortothefirstclassexploringbasicprogramming(anylanguage)
withonlinetutorialssuchasthosedevelopedbytheKhanAcademy.
LearningObjectives:
Bytheendofthecourse,studentswillbeableto:
1. Analyzeastatisticalquestioninvolvingthecomparisonofgroupsusingmodernstatisticalsimulation
tools.
2. BuildasmalllibraryofinterrelatedfunctionsinRthatcombinetoperformtheanalysisandpresent
theresultsinagraphicalortabularmanner.
3. Usemodern,structuredprogrammingtechniques,aswellasself-documentingcode.
4. Debugsmalllibrariesoffunctions
CourseFormat:(Lecture,lab,seminar,recitationorcombination);Falloffering
1ptversion:onecombinedlectureandlabsessioneachweek
2ptversion:sameas1ptversion,withanadditionalhourscheduledforquestionsregardingweb-based
instructioninintermediateprogramming:https://www.datacamp.com/courses/intermediate-rand
https://www.datacamp.com/courses/writing-functions-in-r(thefeesforusingtheseresourcesare
currentlybeingnegotiated,butareexpectedtobeunder$25fortheterm).
CourseOutline(listoflectures/topicseachsession)
Week Topics
LabActivity
1
Rasamathematicalscratchpad: Quickarithmeticusingvectors
Scalars,vectors,matrices
andmatrices(e.g.,sweeping)
2
3
Basicdatastructures(e.g.,data
frame);transformingvariables;
missingdata
FunctionalProgramming;loops
andconditionaloperations
4
DensityEstimation(statistical
concepts)
5
6
7
Scatterplotsmoothing
(statisticalconcepts)
8
Comparingtwogroups
9
Comparisonofhomebrewed
scatterplotsmoothers
Comparingmorethantwo
groups
Randomizationtests(intro)
Bootstrap(parametric&
nonparametric)
NOCLASS–READINGWEEK
10
11
12
13
14
15
Preparation
Chapter1ANDBring
alaptoptothisand
everyclass!
Chapter2
Assignment
HWDUE:
https://www.datacamp.com/courses/freeintroduction-to-r
Chapter2
HWDUEProb2.2(ab;c;de)
Chapter3
HWDUEProb3.2abc;3.3(ab;cd)
Chapter4
HWDUE:Prob3.1(regular;compareto
homebrewedboxcarandpolygon.freq[in
agricolaelib])
PROJ1DUE:variantonprob4.2
Chapter5
HWDUE:5.2(split)
Chapter6
Chapter7
HWDUE:6.1(team);6.2(team)
PROJ2DUE:poweranalysis(sim&AUC)
HWDUE:Prob7.1usingparametrict-test
&7.2usingLeveneTest[inlibrarycar]
PROJ3DUE:problems7.1,7.2,7.3,7.4
Descriptivestatistics(means,
variances,boxplots,histograms,
scatterplots)
Extendingtheplotfunctionwith
auser-defined“wrapper”plot
function.
Writeyourown“rough”density
estimationroutineusingboxcar
weights.
ProgramRunningmeans&
medianstosmooth.
Simplelinearregression
(programviaoptim&minimize
SSRratherthanmath);
Programlinearfitthat
minimizesMADnotSSR.
CourseRequirements
Therewillbe3shortprojects,allinvolvingwritingRcode.Studentsareencouragedtoworktogetherto
learnconcepts.
Evaluationforthiscoursewillbeweightedasfollows:
•
•
•
Threeprojects(equallywtd.)
Classpresentation
Classparticipation
60%
20%
20%
ASSIGNMENTANDGRADINGDETAILS
Projects:
Thethreeprojectswillbeassessedforexcellencein:qualityofthecode(well-commented;functional);
organizationofthecode/writing;andreproducibility/flexibility/extendibilityofthecode(howmodularis
thedesign?Couldthestructurebereusedforaslightlydifferentproblem?).Toreceivemaximumcredit
foreachproject,satisfactionofallthreerequirementsisrequired.
ClassParticipation:
Thiscourseishighlyinteractive,bothintermsofworkingandlearninginteamsandasaclassroom.
However,interactiontakesavarietyofforms,rangingfromone-on-onediscussionstogroup
presentations,sothatdifferentskillsareemphasizedatdifferenttimes.Theevaluationofclass
participationusesaflexiblescalesothateveryonecanachievethehighestmeasure.Foreachclass
meeting,1=present,2=responsive,3=active,andtheoverallparticipationgradeisobtainedbysumming
overtheclasssessions.
Thefollowingsystemcanbeusedtoconvertevaluationscalesusedinthiscoursetolettergrades:
Lettergrade
A
gradegrade
B
C
D
Projects
Strong
Moderate
Weak
Inadequate
Df
Classpresentation
Exemplary
Useful
Inadequate
Classparticipation
Active
Responsive
Present
RequiredReadingsand/orText(apartialreadinglistisacceptable)
Zieffler,Harring,Long(2011).ComparingGroups:RandomizationandBootstrapMethodsUsingR.
Wiley.
Therewillbeanumberofreadings–particularlyusermanualsandtutorialsavailablefromtheweb.
AcademicIntegrity:
AllstudentsareresponsibleforunderstandingandcomplyingwiththeNYUSteinhardtStatement
onAcademicIntegrity.Acopyisavailableat:http://steinhardt.nyu.edu/policies/academic_integrity.
StudentswithDisabilities:
StudentswithphysicalorlearningdisabilitiesarerequiredtoregisterwiththeMosesCenterfor
StudentswithDisabilities,726Broadway,2ndFloor,(212-998-4980andonlineat
http://www.nyu.edu/csd)andarerequiredtopresentaletterfromtheCentertotheinstructoratthe
startofthesemesterinordertobeconsideredforappropriateaccommodation.