Words,morewords…andstatistics Tosegmentwords,thebraincouldbeusingstatisticalmethods May19,2016 Pickingoutsinglewordsinaflowofspeechisnoeasytaskand,accordingtolinguists,tosucceed indoingitthebrainmightusestatisticalmethods.AgroupofSISSAscientistshasapplieda statistics-basedmethodforwordsegmentationandmeasureditsefficacyonnaturallanguage,in 9differentlanguages,todiscoverthatlinguisticrhythmplaysanimportantrole.Thestudyhas justbeenpublishedintheJournalofDevelopmentalScience. Haveyoueverrackedyourbrainstryingtomakeoutevenasinglewordofanuninterruptedflow ofspeechinalanguageyouhardlyknowatall?Itisnaïvetothinkthatinspeechthereiseventhe smallestofpausesbetweenonewordandthenext(likethespaceweconventionallyinsert betweenwordsinwriting):inactualfact,speechisalmostalwaysacontinuousstreamofsound. However,whenwelistentoournativelanguage,word“segmentation”isaneffortlessprocess. Whatare,linguistswonder,theautomaticcognitivemechanismsunderlyingthisskill?Clearly, knowledgeofthevocabularyhelps:memoryofthesoundofthesinglewordshelpsustopick themout.However,manylinguistsargue,therearealsoautomatic,subconscious“low-level” mechanismsthathelpusevenwhenwedonotrecognisethewordsorwhen,asinthecaseofvery youngchildren,ourknowledgeofthelanguageisstillonlyrudimentary.Thesemechanisms,they think,relyonthestatisticalanalysisofthefrequency(estimatedbasedonpastexperience)ofthe syllablesineachlanguage. Oneindicatorthatcouldcontributetosegmentationprocessesis“transitionalprobability”(TP), whichprovidesanestimateofthelikelihoodoftwosyllablesco-occurringinthesameword, basedonthefrequencywithwhichtheyarefoundassociatedinagivenlanguage.Inpractice,if everytimeIhearthesyllable“TA”itisinvariablyfollowedbythesyllable“DA”,thenthe transitionalprobabilityfor“DA”,given“TA”,is1(thehighest).If,ontheotherhand,wheneverI hearthesyllable“BU”itisfollowedhalfofthetimebythesyllable”DI”andhalfofthetimeby “FI”,thenthetransitionalprobabilityof“DI”(and“FI”),given“BU”,is0.5,andsoforth.The cognitivesystemcouldbeimplicitlycomputingthisvaluebyrelyingonlinguisticmemory,from whichitwouldderivethefrequencies. ThestudyconductedbyAmandaSaksida,researchscientistattheInternationalSchoolfor AdvancedStudies(SISSA)inTrieste,withthecollaborationofAlanLangus,SISSAresearch fellow,underthesupervisionofSISSAprofessorMarinaNespor,usedTPtosegmentnatural language,byusingtwodifferentapproaches. Basedonrhythm Saksida’sstudyisbasedontheworkwithcorpora,thatis,bodiesoftextsspecificallycollectedfor linguisticanalysis.Inthecaseathand,thecorporaconsistedoftranscriptionsofthe“linguistic soundenvironment”thatinfantsareexposedto.“Wewantedtohaveanexampleofthetypeof linguisticenvironmentinwhichachild’slanguagedevelops”,explainedSaksida,“Wewondered whetheralow-levelmechanismsuchastransitionalprobabilityworkedwithreal-lifelanguage cues,whichareverydifferentfromtheartificialcuesnormallyusedinthelaboratory,whichare moreschematicandfreeofsourcesof‘noise’.Furthermore,thequestionwaswhetherthesame low-levelcueisequallyefficientindifferentlanguages”.Saksidaandcolleaguesusedcorporaof nolessthan9differentlanguages,andtoeachtheyappliedtwodifferentTP-basedmodels. FirsttheycalculatedtheTPvaluesforeachpointofthelanguageflowforallofthecorpora,and thenthey“segmented”theflowusingtwodifferentmethods.Thefirstwasbasedonabsolute thresholding:acertainfixedreferenceTPvaluewasestablishedbelowwhichaboundarywas identified.Thesecondmethodwasbasedonrelativethresholding:theboundariescorresponded tothelocallylowestTPfunction. Inallcases,Saksidaandcolleaguesfoundthattransitionalprobabilitywasaneffectivetoolfor segmentation(49%to86%ofwordsidentifiedcorrectly)irrespectiveofthesegmentation algorithmused,whichconfirmsTPefficacy.Ofnote,whilebothmodelsprovedtobequite efficient,whenonemodelwasparticularlysuccessfulwithonelanguage,thealternativemodel alwaysperformedsignificantlyworse. “Thiscross-linguisticdifferencesuggeststhateachmodelisbettersuitedthantheotherfor certainlanguagesandviceversa.Wethereforeconductedfurtheranalysestounderstandwhat linguisticfeaturescorrelatedwiththebetterperformanceofonemodelovertheother”,explains Saksida.Thecrucialdimensionprovedtobelinguisticrhythm.“WecandivideEuropean languagesintotwolargegroupsbasedonrhythm:stress-timedandsyllable-timed“.Stress-timed languageshavefewervowelsandshorterwords,andincludeEnglish,SlovenianandGerman. Syllable-timedlanguagescontainmorevowelsandlongerwordsonaverage,andincludeItalian, SpanishandFinnish.ThethirdrhythmicgroupoflanguagesdoesnotexistinEuropeandisbased on“morae”(apartofthesyllable),suchasJapanese.Thisgroupisknownas“mora-timed”and containsevenmorevowelsthansyllable-timedlanguages. Theabsolutethresholdmodelprovedtoworkbestonstress-timedlanguages,whereasrelative thresholdingwasbetterforthemora-timedones.“It’sthereforepossiblethatthecognitive systemlearnstousethesegmentationalgorithmthatisbestsuitedtoone’snativelanguage,and thatthisleadstodifficultiessegmentinglanguagesbelongingtoanotherrhythmiccategory. Experimentalstudieswillclearlybenecessarytotestthishypothesis.Weknowfromthescientific literaturethatimmediatelyafterbirthinfantsalreadyuserhythmicinformation,andwethinkthat thestrategiesusedtochoosethemostappropriatesegmentationcouldbeoneoftheareasin whichinformationaboutrhythmismostuseful”. Thestudyisinfactunabletosaywhetherthecognitivesystem(ofbothadultsandchildren)really usesthistypeofstrategy.“Ourstudyclearlyconfirmsthatthisstrategyworksacrossawiderange oflanguages”,concludesSaksida.“Itwillnowserveasaguideforlaboratoryexperiments.” USEFULLINKS: • OriginalpaperArticolooriginale:http://goo.gl/cOk5VD IMAGES: • Credits:Jev55(Flickr:https://goo.gl/yVVdJ3) Contact: Pressoffice: [email protected] Tel:(+39)0403787644|(+39)366-3677586 viaBonomea,265 34136Trieste MoreinformationaboutSISSA:www.sissa.it
© Copyright 2025 Paperzz