Intralinguis,cvaria,onintheaudio descrip,onofashortfilm AnnaMatamala UniversitatAutònomadeBarcelona TransMediaCataloniaresearchgroup [email protected] AIETI.AlcaládeHenares,8-10.03.17 FFI2015-64038-P(MINECO/FEDER,UE),2014SGR0027 Overview • Audiodescrip,onresearch • Corpusstudiesandaudiodescrip,on • TheVIWproject • Acomparisonofstudentsandprofessionals’audio descrip,ons • Conclusions 2 AD research • Theore,calapproaches(narratology,cogni,vestudies) • Experimentalresearch • Researchontechnology,training,etc. • Descrip,veapproachesmostlybasedoncasestudies 3 Audio description and corpora • TIWO(Salway2007) • TRACCE(JiménezHurtadoetal.2010) • MPIIMovieDescrip,ondataset(Rohrbachetal.2015) • PearTreeProject(MazurandKruger2012),inspiredby Chafe(1980) • Reviersetal.(2015) • VIW(Matamala&Villegas2016) 4 The VIW project • Shortfilmcommissionedtoafilmdirector(guidelines basedonliteraturereview):“Whathappenswhile---”,by NúriaNia,inEnglish. • DubbedintoCatalanandSpanishinprofessionalstudio • Professionalaudiodescrip,ons(10x3)andstudents’ audiodescrip,ons(10+7).Total:+30,000words. 5 The corpus hZp://pagines.uab.cat/viw/ LinkedtoUAB’sopenaccessrepository 6 7 8 9 Processing the materials mp4 txt eaffile web app Ling. Ling. Annotated Ling. Annotated textAnnotated text text conll2eaf 10 Segmenting and processing • Linguis,c,ers:AD-unit,er(sentences,chunks,tokens) andCredits,er • Token:partsofspeech,lemma,andseman,cvalues • Filmic,ers:scene,shot,sound,character,text. 11 The web app: source data • Rawmaterialperproviderandpersubcorpustoimport intoELANandintoCQPweb.Alsofilmicannota,onsas eaffile. • Visualiza,onsforpre-establishedanalyses. • Accessfrompreviouspagebutalsodirectly: hhp://transmediacatalonia.uab.cat/web/ 12 13 14 Analysing the data • Mul,pleanalyses(on-going) • FocusforAIETI:comparisonoftheSpanishprofessional subcorpusandtheSpanishstudents’subcorpus(and addi,onallytoSpanishgenerallanguagecorpora) • Quan,ta,veapproachonautoma,callytaggeddata 15 AD units, sentences, and words • ProfessionalsusemoreADunitsandshowlessvariability • Professionalsusemoresentencesandwords • MeannumberofwordsperADunit:professionals12.57, students11.42 • Meannumberofwordspersentences:similarmeans (around8.4),differentfrom20.89wordsfoundin Cumbrecorpus • Nosta,s,callysignificantdifferences 16 Word classes • Similarpercentagesforwordclasseswithslight differences • Professionals:moreadjec,vesandverbs • Students:morenounsandadposi,ons • Mostfrequent:nouns,followedbyverbs. • Higherpresencethaningenerallanguagecorpus. • Moreopen-classwordsthanclosed-classwords,in contrastwithCREAcorpus 17 20 most frequent lemmas • 70%ofmostfrequentlemmasshared. • Professionals(allwordclasses):el,uno,de,y,se,a,en,con,mirar,su, por,qué,James,Rick,caminar,móvil,lo,hacia,estar,playa. • Professionals(N,V,A,Adj):mirar,James,Rick,caminar,móvil,estar, playa,alrededor),dejarhablar,no,vaso,Jess,tener,hombre,llevar, banco,haber,mano,sonido. • Students(allwordclasses):el,uno,de,y,se,a,en,mirar,con,su,por, James,qué,Rick,caminar,Jess,no,hacia,estar,playa. • Students(N,V,A,Adj):mirar,James,Rick,caminar,Jess,no,estar, playa,lado,café,dejar,hombre,buscar,año,arena,hablar,móvil, tener,alrededor,blanco. 18 Text similarity • TedPedersen’sTextSimilarityModule (transmediacatalonia.uab.cat/web/similarity/WHW-ES-Pr) • Mostpairs(81.58%)between0.30and0.39,with10%or lessatalowerorhigherend • Lesssimilar(2professionalproviders)andmoresimilar (student-professional). 19 Semantic classes • Seman,ctagsfromSuggestedUpperMergedOntology (SUMO) • Selec,veannota,onfocusingon • Verbslinkedtospa,o-temporalsenngs,movements, communica,on,andcharacterdescrip,on • Adjec,veslinkedtocharacters’mood,hearingandsight • Nounslinkedtospa,o-temporalsenngsandcharacters • Adverbslinkedtospa,o-temporalsenngs 20 Semantic classes • Mostfrequentseman,ctags: • Loca,on(professional=12.09,students=11.80) • Human,ObjectandMo,onintheprofessional • Object,AuxandSeeinginthestudents’subcorpus • Similarpercentages. 21 Possibilities of automatic tagging • Firstapproachto: • Loca,on(70%mostfrequentloca,onnounsshared,similar percentages) • Describingcolours(students=2.27%,professionals2.14%, fourmostfrequentshared,interes,ngdescrip,onofa characteraspelirrojo/rubio) • Describingcharacters(90%ofthemostfrequentwords shared) 22 Conclusions • Needforcorpus-basedresearch • Needforopenaccessdata • Possibili,esofautoma,ctagging • Interestinanalysingvaria,onbutalsocontrastwith generallanguagecorpora • Quan,ta,vefollowedbyqualita,vestudies 23 Intralinguis,cvaria,onintheaudio descrip,onofashortfilm AnnaMatamala UniversitatAutònomadeBarcelona TransMediaCataloniaresearchgroup [email protected] AIETI.AlcaládeHenares,8-10.03.17 FFI2015-64038-P(MINECO/FEDER,UE),2014SGR0027
© Copyright 2025 Paperzz