MultilingualSentimentAnalysis:FromFormaltoInformalandScarceResourceLanguages SiawLingLo1,ErikCambria2*,RaymondChiong1,DavidCornforth1 1 SchoolofDesign,CommunicationandInformationTechnology,TheUniversityofNewcastle, Callaghan,NSW2308,Australia 2 SchoolofComputerEngineering,NanyangTechnologicalUniversity,639798Singapore *Correspondingauthor E-mail:[email protected] Phone:(+65)67904328 Abstract Theabilitytoanalyseonlineuser-generatedcontentrelatedtosentiments(e.g.,thoughtsand opinions)onproductsorpolicieshasbecomeade-factoskillsetformanycompaniesand organisations.Besidesthechallengeofunderstandingformaltextualcontent,itisalsonecessaryto takeintoconsiderationtheinformalandmixedlinguisticnatureofonlinesocialmedialanguages, whichareoftencoupledwithlocalisedslangasawaytoexpress‘true’feelings.Duetothe multilingualnatureofsocialmediadata,analysisbasedonasingleofficiallanguagemaycarrythe riskofnotcapturingtheoverallsentimentofonlinecontent.Whileeffortshavebeenmadeto understandmultilingualsentimentanalysisbasedonarangeofinformallanguages,nosignificant electronicresourcehasbeenbuiltfortheselocalisedlanguages.Thispaperreviewsthevarious currentapproachesandtoolsusedformultilingualsentimentanalysis,identifieschallengesalong thislineofresearch,andprovidesseveralrecommendationsincludingaframeworkthatis particularlyapplicabletodealingwithscarceresourcelanguages. Keywords:multilingualanalysis,sentimentanalysis,scarceresourcelanguages,socialmedia 1. Introduction Sentimentanalysishasbeenapopularresearchareaoverthepastfewyears.Itisgainingevenmore attentionwiththeprevalenceofsocialmediausage,wherenetizensfreelyandopenlyexpresstheir viewsandopinionsaboutanything;beitaproduct,apolicyorevenapicture.Althoughthese opinionsarevaluableforunderstandingtheconcernsandissuesontheground,itremainsa challengetofullydecipherthemessageandcontextofonlineuser-generatedcontent.Thisismainly duetoafewkeyissues,suchassentenceparsing,namedentityrecognition,anaphoraresolution andconceptdisambiguation.Itisessentialtocomprehendthesubjectandtopicofanycontent beforediscerningthesentimentexpressed(e.g.,positiveornegative).Tomakethemattermore complicated,onlinesharingorsocialmediacontentisknowntobenoisyandoftenmixedwith linguisticvariations.Itisthusnotsurprisingthatsentimentanalysiscontinuestobeoneofthemain analyticsresearchdomainsgivenitsmanychallengesbutalsopromises. Sentimentanalysisforalanguageisusuallydependentonmanuallyorsemi-automatically constructedlexicons[1],[2],foundindictionariesorcorpora[3].Theavailabilityoftheseresources enablesthecreationofrule-basedsentimentanalysisortheconstructionoftrainingdatafor classificationtasks.DespitethefactthatEnglishremainsthemainlanguageusedinvariousresearch studiesinthisarea(e.g.,see[4],[5]),therearealsoeffortsincreatingsubjectivityresourcesfor otherformallanguagessuchasJapanese[6],Chinese[7]andGerman[2].However,sincecreating lexicalorcorpusresourcesforanewlanguagecanbeverytime-consumingandresourceintensive, mostofthemultilingualsentimentanalysesonotherlanguages[3],[8]havebeenrelyingonsome availableEnglishknowledgebase,suchasSentiWordNet[9]. Whileincreasingefforthasbeenmadeincreatingresourcesforotherformallanguages,thereare notmanyresourcesavailablewhenitcomestolanguagesthatarenotcommonlyusedinofficial communicationorformalnewsreportingduetotheirinformalandevolvingnature.Theselanguages oftenevolvefromamainnationallanguage,suchasEnglish,andarebroadlyusedbyalocal communityindailyconversationbothinthephysicalandonlineworld.Withthepopularityofsocial mediaandthefreedomofexpressionitaffords,languageswithlocalisedexpressionsorvariantsof formallanguagesarebecomingwidespreadintheonlineenvironment.Inaddition,itisnot uncommontoseeafewlanguagesbeingmixedtoformauniquelanguageinamulticulturalsociety. OnesuchexampleisSinglish,thecolloquialSingaporeanEnglishthathasincorporatedelementsof someChinesedialectsandtheMalaylanguage[10].Tofullyunderstandthesentimentsinthissort oflanguages,itisessentialtoanalysethemalongsideotherformallanguages.Theaimofthispaper istoreviewsentimentanalysisresearchinamultilingualsetting,byconsideringnotjustformalbut alsoinformalandscarceresourcelanguagesusedonsocialmedia,especiallyvariantsoftheEnglish language.Itisofinteresttoexaminecurrentapproachesandtoolsusedinmultilingualsentiment analysis,sothatchallengescanbeidentifiedandrecommendationscanbeprovided. Byscarceresourcelanguages,werefertothosewithjustabasicdictionaryavailableand/orlacking ofdevelopedtextprocessingresources(suchasatranslationengine).ThevariousEnglishvariants widelyusedonsocialmediabelongtothiscategory.Inthispaper,wefirstassessarangeofcurrent multilingualsentimentanalysisstudiesbasedontheresourcesused,intermsofwhetheralexicon,a corpus,atranslationmachineoratranslatorisapplied,beforesentimentanalysisresearchcarried outonsocialmediaisreviewed.Itisimportanttoexaminecurrentapproachesusedinanalysing socialmediadata,giventhattheworldisatpresentdominatedbythiskindofdata.Mostofthe socialmediamessageswouldbewritteninaninformalmanner,withlinguisticvariationsthat requiredifferentconsiderationscomparedtoanalysingformalreviewsornewscorporathatwould typicallyconsistofasingleofficiallanguage.Socialmediadataanalysiscanbetreatedas understandinganother‘new’languagewithlimitedresources.Here,wehandlesentimentanalysisof socialmediadataseparatelywithrespecttootherscarceresourcelanguages,asthemajorityof researchstudiesonscarceresourcelanguageshavebeenfocusedonasinglelanguage. Inthenextsection,wedescribecurrentapproachesusedinmultilingualsentimentanalysisstudies. Wethencoverothertypesofstudiesonmultilingualsentimentanalysis,andlistresourcesavailable fordifferentlanguages.Thisisfollowedbyreviewingsentimentanalysisresearchcarriedouton socialmedia,beforetouchingonresearchdoneonotherscarceresourcelanguages.Afterthat,we putforththechallengesidentifiedandrecommendationstoovercomethesechallenges.The recommendationsincludesomeproposedsolutionsandahybridframeworkfordealingwithscarce resourcelanguages.Finally,weconcludethepaper. 2. Currentapproachesusedformultilingualsentimentanalysis Therearemainlytwoapproachesinsentimentanalysis–subjectivityandpolaritydetection. Subjectivitydetectionisaboutunderstandingifthecontentcontainspersonalviewsandopinionsas opposedtofactualinformation.Often,thesesubjectiveexpressionsareduetocultureorexperience ofapersonorcommunityandhence,canbevery‘localised’andspecifictoasociety.Asaresult, subjectivityisusuallystudiedbeforedetailedsentimentanalysisisdone,sinceitisessentialtofilter outfactualcontenttohaveabetterunderstandingofissuesthataresharedamongnetizens. Polaritydetection,ontheotherhand,isaboutstudyingsubjectivitywithdifferentpolarities, intensitiesorrankings.Somepolarityanalysisstudiesregardedanopinionaseitherhighlypositive, positive,negativeorhighlynegative[4],whileothers[11]workedonhumanemotionsuchasjoyor anger. MostsubjectivityandpolarityanalysisstudieshavelimitedthemselvestoEnglish,butwiththe increasingpopularityofonlinesocialmediaworldwide,itisnolongersufficienttodealwithonly Englishlanguagecontent.Infact,only28.6%oftheInternetusersspeakEnglish1.Itisthusessential toexploreorbuildresourcesandtoolsinlanguagesotherthanEnglish.Moreover,Asianowhasthe mostInternetusers(48.2%);followedbyEurope(18%)2.Asaresult,thereisagrowingneedtowork onlanguagessuchasChineseandJapanese.Multilingualsubjectivityandpolarityanalysisresearch hasbecomemorewidespread,andlanguagesthathavebeenstudiedincludeChinese[7],[12],[13], Japanese[14],German,Spanish,French,Italian[15],Swedish[16],Arabic[17]andRomanian[3]. Thisreviewpaperwilllookatthevariousmultilingualapproachestakenintheareasofsubjectivity andpolarityanalysis,andassesshowtheseapproachescanbeappliedtoascarceresource language.Thegeneralapproachesforbothsubjectivityandpolarityanalysesonmultilingualstudies arelexicon,corpusortranslator-based,althoughtherearealsoapproachesthatmovetowards researchbasedonconceptsandsentics.Senticcomputing[18]incorporatescommon-sense reasoningtospecifyaffectiveinformationassociatedwithreal-worldobjects,actions,eventsand people.ThevariousmultilingualsentimentanalysisapproachesusedcanbefoundinTable1,and theircorrespondinglexicon,corpusordatasetislistedinTable2. 2.1 Subjectivityanalysis Mihalceaetal.[3]investigatedbothlexiconandcorpus-basedapproachesformultilingual subjectivityanalysis(subjectivityvs.objectivity).Theirlexicon-basedapproachusesalemmatised formofEnglishtermsfromOpinionFinder[5],anEnglishsubjectivityanalysissystem,andtranslates themintoRomaniantermsusingtwobilingualdictionaries.Theythenbuiltarule-basedsubjectivity classifierusingthelexicon.Thesubjectivityprecisionoftheclassifierwasshowntobegood, althoughitsrecallwaslow.Withinthesamestudy,corpus-basedsentencelevelsubjectivityanalysis wasconductedbasedonaparallelcorpusconsistingof107documentsfromtheSemCorcorpus [19].ANaïveBayes(NB)[20]classifierwasusedontheRomaniantrainingdataset,wherethe 1 2 http://www.internetworldstats.com/stats7.htm http://www.internetworldstats.com/stats.htm annotationswereprojectedfromtwoOpinionFinder[5]classifiers.Whilethehighestprecisionfor subjectiveclassificationwasobtainedwiththerule-basedclassifierusingthegeneratedlexicon,the overallbestF-measureresultof67.85wasproducedbytheNB-basedstatisticalmachinelearning approach. Ahmadetal.[21]usedalocalgrammarapproachtoextractsentiment-bearingphraseswithina multilingualframework(English,Arabic,andChinese).Astheirfocuswasonsentimentanalysisof financialnewsstreams,domain-specifickeywordswereselectedbycomparingthedistributionof wordsindomain-specificdocumentstothedistributionofwordsinagenerallanguagecorpus. Wordsfromdomain-specificdocumentsfoundtobeasymmetricwiththegeneralcorpuswere assignedaskeywords.Thesekeywords,togetherwiththeirlocalgrammarpatterns,wereusedto extractsentiment-bearingphrases.Theirexperimentalresultsshowedthatthelocalgrammar patternsinallthreelanguagesconsidered,i.e.,English,ArabicandChinese,canbeusedtoextract sentiment-bearingphrases.Thisobservationisimportant,asitdemonstratesthatdomain-specific keywordscantranscenddifferentlanguagetypologies(Indo-European->Sino-Asiatic->Semitic). Theirmanualevaluationfoundthattheaccuracyoftheirapproachiswithinthe60-75%range. Itisworthnotingthattheapproacheslistedabovedonotdeterminethepolarityofcontentbut focusonconstructionanddetectionofwordsorphrasescontainingsubjectivitynotions.Although subjectivityanalysisdoesnotapplydirectlytosentimentanalysisoropinionmining,itisoftenthe firststeptowardsimprovingsentimentclassificationresults[4].Ithasbeenshownthat distinguishingsubjectiveversusobjectiveinstancesisoftenmorechallengingthanthesubsequent polarityclassification[22],[23]. 2.2 Polarityanalysis Therearedifferentgranularitiesofpolarityanalysis.Basicanalysisinvolvesclassifyingtheexpressed opinionofgiventext(e.g.,attheaspect,sentenceordocumentlevel)asbeingpositive,negativeor neutral.Moreadvancedanalysisdealswithclassificationattheemotionoraffectivelevel,where differentemotionstatessuchas“joy”,“angry”andsoonarerecognised.Thisreviewpaper concentratesonthemethods/approachestakenwithregardstowhetheralexicon,corpusor translationengineisused,andhenceboththeanalysesofpositive/negativeexpressionsandvarious emotionstatesareconsidered. Incontrasttosubjectivityanalysis,polarityanalysisisnotlimitedtolexicon-orcorpus-based approaches.Whilelexicalresourcesarestillusedtodetectthepolarityintext,machine-learning approachesaremorecommoninthistypeofanalysis.Inaddition,machinetranslationenginesor translatorsareoftenusedinconjunctionwithvariousEnglishknowledgebases.Concept-based resourcessuchasSenticNet[11]arealsousedformultilingualsentimentanalysis. 2.2.1 Lexiconandmachinelearning-basedpolarityanalysis OneofthefirststudiesonmultilingualpolarityanalysiscanbefoundintheworkofYaoetal.[24],in whichtheyproposedamethodtodeterminesentimentorientationofChinesewordsbyusinga bilinguallexicon.TheirmethodusestheoccurrenceofEnglishsentimentwordsfromaninterpreted ChinesewordtopredictthesentimentorientationoftheChineseword.Thisisachievedthroughthe calculationofthesentimentvectorfromtheEnglishwordsequencefollowedbyclassificationbased ontheSupportVectorMachine(SVM)[25]andC4.5[26].Thebestaccuracyobtainedinpredicting thesentimentorientationofaChinesewordisabove90%,whensupportvectorsthatdonotcontain anypolaritywordsareeliminatedfromtheclassification. KimandHovy[2]utilisedalexicaldatabase,i.e.,WordNet[27],andthreesetsofmanually annotatedpositive,negativeandneutralwordstobuildawordsentimentclassifierfordetecting opinionsinemails.Sincetheiropinion-bearingwordsareinEnglishandthetargetsystemisin German,astatisticalwordalignmenttechnique,GIZA++[28],isusedonaparallelEuropean ParliamentcorpustoacquirewordpairsinGerman-EnglishandEnglish-German.Thesewordpairs arethenusedtobuildaGermanopinionanalysissystemusingtheEnglishopinion-bearingwords withoutatranslationsystem.Theprecisionobtainedis72%forpositiveemailsandtherecallis80% fornegativeemails,buttherecallandprecisionvaluesforpositiveandnegativeemails,respectively, arelow. Inadifferentstudy,RosellandKann[16]constructedaSwedishgeneralpurposepolaritylexicon withagraph-basedrandomwalkapproach.UsingthePeople’sDictionaryofSynonyms[29],they extractedalargeamountofpolaritytermsfromasmallsetofseedwordsthroughmappingfroma bilingualdictionaryofEnglishandSwedishlanguages.Theirrandomwalkapproachtakesinto considerationthesynonymityandpathlengthincalculatingthemeanpolarityvalueofwords.Some examplesofwordswiththeirpolarityvalueshavebeenpresented. AnotherlexicalresourceforsentimentanalysisinEnglishisSentiWordNet[30],usedbyDenecke[9] todetectthepolarityofadocumentwithinamultilingualframework.Theclassificationhereisbased onthreeclassifiers:LingPipeClassifier3,SentiWordNetClassifierwithclassificationrules,and SentiWordNetClassifierwithmachinelearning.Theseclassifiersweretrainedusingtheannotated moviereviewsdatasetfromLingPipebutevaluatedontwodifferenttestingdatasets.Thefirst datasetwasgeneratedfromthemulti-perspectivequestionanswering(MPQA)[31]corpus,with250 positiveand250negativesentencesselectedatrandom.TheseconddatasetwasbasedonGerman moviereviewsselectedfromAmazon.de,with100positiveand100negativereviewstranslatedto English.Resultsfromthestudyshowthatthemachine-learningbasedSentiWordNetClassifierhas achievedthebestaccuracyof66%forGermanmoviereviews,whiletheothertwoclassifiershave similaraccuraciesofaround52%forEnglishand58%forGermandocuments.Inaddition,theresults suggestthattheaccuracyofthedifferentmethodsdoesnotdependontheprocessedlanguage. Wan[32]usedtheEnglishsentimentlexiconfromOpinionFinder[5]forChinesesentimentanalysis byemployingmachinetranslationandensembletechniques.Experimentalresultsshowthatusing anensembleofChineselexiconswithEnglishreviewstranslatedbybothGoogleTranslateandYahoo BabelFishcanachieveanaccuracyof0.854.Wanfurtherextendedthelexicon-basedapproachtoa corpus-basedoneviaaco-trainingmethodusingtwo-waytranslation[8],sothattheEnglishand Chinesefeaturescanbeconsideredastwoindependentviewsoftheclassificationproblem.Labelled EnglishreviewsareusedtocreatelabelledChinesereviewsthroughtranslation.Theunlabelled ChinesereviewsarepairedwiththelabelledChinesereviews(translatedfromEnglishreviews)for thefirsttrainingdataset.ThesecondtrainingdatasetisfromthetranslatedunlabelledEnglish reviews(derivedfromChinesereviews)pairedwithinitiallylabelledEnglishreviews.Theclassifiers fromthetwotrainingdatasetsarethencombinedintoasinglesentimentclassifierthroughaco 3 http://alias-i.com/lingpipe/index.html trainingprocess.Theco-trainingapproachachievesthebestaccuracyof0.775and0.79forEnglish andChineseclassifiers,respectively.Thisco-trainingapproachisusefulintheabsenceofaparallel corpus,whichiscoveredinthenextsection. 2.2.2 Parallelcorpus-basedpolarityanalysis Anothertypeofpolarityanalysisistouseparallelcorporatolearnlanguagecharacteristicswithout theneedofusingatranslationmachineortranslator.Mengetal.[33]builtagenerativecross-lingual mixturemodel(CLMM)toleverageunlabelledbilingualparalleldata.TheCLMMutiliseswordsfrom aparallelcorpustolearnaboutwordpolarity.Itexpandsthevocabularythroughmaximisingthe likelihoodofandestimatingword-generationprobabilitiesforwordsnotseeninthelabelleddata butpresentintheparallelcorpus.Itisshownthattheaccuracyofclassificationresultsusingonly Englishlabelleddatais71%buttheaccuracyimprovesto83%whenbothEnglishandChinese labelleddataareused.Theinitialloweraccuracyisprobablyduetothelimitedvocabularycoverage inmachinetranslateddataandhencetheusageoftheparallelbilingualcorpusimprovesthe classificationresultsbylearningpreviouslyunseensentimentwordsfromthelargeunlabelleddata. Luetal.[34]adoptedamaximumentropy-basedapproachtojointlylearntwomonolingual sentimentclassifiers.Theirfocusistosimultaneouslyimprovetheperformanceofsentiment classificationinapairoflanguages–EnglishandChinese–byrelyingonsentiment-labelleddatain eachlanguageaswellasunlabelledparalleltextforthelanguagepair.Itisreportedthatthe proposedapproachisabletooutperformthemonolingualbaselinesandimprovetheaccuracyfor bothlanguagesby3.44%-8.12%,withthebestaccuracyscoredat83.71%bytheEnglishclassifier usingtheNTCIRparallelcorpora[35],[36]. 2.2.3 Corpusandmachinelearning-basedpolarityanalysis Incontrasttotheparallelcorporaapproach,PrettenhoferandStein[37]usedEnglishasthesource language,andGerman,FrenchandJapaneseastargetlanguages,forcross-languagetopicand sentimentclassification.StructuralCorrespondenceLearning(SCL)[38],proposedfordomain adaption,wasadoptedintheirstudy.Unlabelleddocumentsfrombothlanguages,togetherwith pivotwordsorpairsofwordsthathavepredictivevalue,wereusedtocreateamapofcross-lingual featurespace.Itisshownthattheirapproachcanreducetherelativeerrorto59%insentiment classificationascomparedtoamachinetranslationbaseline. BoiyandMoens[39]alsodidnotuselanguagetranslationintheirwork.Instead,theyusedthree manuallyannotatedlanguages–English,DutchandFrench–totrainvariousmachinelearning algorithmsforclassifyingifastatementispositive,negativeorneutralwithregardstoacertain entity.Theyproposedacascadingframeworkforthethreelanguages,butdifferentnegationrules, discourseprocessingandparsingtoolswereusedforeachofthelanguages.Thisismainlyduetothe differentbehavioursofthelanguagesandthefactthatdifferentmachinelearningalgorithmsalso workdifferently.TheirresultsshowthatanEnglishcorpususingunigramfeaturesaugmentedwith linguisticfeaturesachievesanaccuracyof83%,whileDutchandFrenchtextshaveloweraccuracies of70%and68%becauseofthelargervarietyoflinguisticexpressionsinthetwolanguages.Thebest classificationresultsforEnglish,DutchandFrenchcamefromMultinomialNaïveBayes(MNB),SVM andMaximumEntropyclassifiers,respectively. 2.2.4 Corpus-basedtopicmodellingpolarityanalysis Whilemostofthecorpus-basedapproachesarecoupledwitheithermachinetranslationorparallel corporatoclassifythesubjectivityorpolarityofgiventext,Boyd-GraberandResnik[40]developeda generativetopicmodelknownasmultilingualsupervisedLatentDirichletAllocation(MS-LDA).Their approachjointlymodelstopicsthatareconsistentacrosslanguages,andconnectsthemtopredict sentimentratings.MS-LDAiscapableofclusteringthematicallycoherenttopicstogetherwiththeir sentimentswithoutrequiringparallelcorporaandmachinetranslation.Itisshownthatthemodelis abletomakebetterpredictionwhenamixofEnglishandGermandataisused,comparedtowhen Germandataaloneisused.Thisisinteresting,astheapproachshowsthepotentialofleveraging anotherlanguagetoimprovesentimentanalysisclassificationresults. 2.2.5 Cross-lingualandmachinetranslationpolarityanalysis Anotherpolarityanalysisapproachistousecross-lingualcorporaformultilingualsentimentanalysis. Cross-languageclassificationusesasourcelanguage(oftenannotated)asthetrainingdatasetand anotherlanguageorthetargetlanguageasthetestingdataset.Itisnotuncommontohave documentsfromthetrainingandtestingdatasetsmappedontonon-overlappingregionsofthe featurespacewhenthedomainsofbothsourcesaredifferent.Panetal.[41]utilisedanannotated sentimentcorpusinEnglishtopredictsentimentpolarityinChinese.Theapproachusesmachine translationsothattwodatasetsinthetwolanguagescanbecreatedastwoindependentviews.The twoviewsarecombinedinamatrixfactorisationprocesssothattrainingcanbedone simultaneously(insteadofconductingtrainingusingaseriesofclassifiersfromaco-training approach).Inaddition,lexicalknowledgeisincorporatedintothemodeltoimproveitsaccuracy. Threedifferentdatasets(i.e.,movie,bookandmusicreviews)weretestedinthestudyandthebest accuracyof84%camefromthemoviereviewsdataset. SimilartoPanetal.[41],Bautinetal.[42]alsousedlexicons,translatorsandtwotypesofcorpora (i.e.,multilingualnewsstreamsandparallelcorpora)forsentimentanalysisandcross-cultural comparison.Theirfocuswasoncomparingthediversityofdifferentlanguagesbasedonaselected entity,e.g.apolitician,overatimeperiod,andtheyemphasisedthatitisessentialtoapply normalisationcoefficientstominimisetheeffectofvarianceindifferentlanguages.TheLydia sentimentsystem[43]wasusedandcertainentitieswereselectedforcross-languagesentiment analysisusing10daysofnewsstreams.Entitysentiment(subjectivityandpolarity)wascalculated foreachdaybasedonco-occurringoftheentitywithsentimentwords.Eventhoughmachine translationhasbeenusedforthestudy,itisfoundthattheaccuracyislargelytranslator independent.Inaddition,theresultsfromanewsentityfrequencycorrelationstudyshowthat Englishhasasignificantcorrelationwiththeothereightlanguagesinvestigated,andhenceconfirm itspivotalroleinthemulti-languageanalysisapproach. 2.2.6 Translation-basedpolarityanalysis Oneofthereasonsforusingaparallelcorpusisduetothelanguagegapanddifferenceinthe underlyingdistributionbetweentheoriginallanguageandthetranslatedlanguage[8],[33].While poorperformanceofmultilingualsentimentanalysismaybeduetothelimitationofamachine translationsystem,BalahurandTurchi[44]conductedextensiveevaluationscenariostoshowthat machinetranslationsystemsarematureenoughtoobtainmultilingualdataforsupervised sentimentanalysis.Theyquantifiedtheeffectoftranslationqualityusingthreedifferentmachine translationsystems.Variousfeatures,algorithmsandmeta-classifierswereadoptedforpolarity detection,andtheyshowedthatfeaturerepresentationusingTermFrequency–InverseDocument FrequencyofunigramandbigraminanSVMwithsequentialminimaloptimisationproducesthebest result. Hiroshietal.[45]alsoexploredatranslation-basedapproach,whichincludesparsingandpattern discoveryformultilingualsentimentanalysis.Specifically,theyusedtransfer-basedmachine translationtechnologytodevelopahigh-precisionsentimentanalysissystemfortheJapanese languagebyleveragingEnglishsentimentresourcestoidentifyrelevantsentimentunits.The sentimentunitpolarityextractionprecisionwasreportedtobeashighas89%. 2.2.7 Concept-basedpolarityanalysis Whilelexicon,corpusandtranslator-basedapproachesoracombinationoftheseapproacheshave beenusedextensivelyforsubjectivityandpolarityanalysis,concept-basedtechniquesaregaining popularityduetotheirabilitytodetectsubtlyexpressedsentiments[46]-[48].SenticNet[11]isa widelyusedconcept-basedresource.Xiaetal.[49]createdalocalisationtoolkitforSenticNetby implementingasetofconceptdisambiguationalgorithmstodiscovercontext.Inthistoolkit,Google translateisusedtodomappingoftheEnglishandChineselanguages.VariousChineseresourcesare alsousedtodiscoverlanguage-dependentsentimentconceptsthroughtranslation.Theyevaluated thetoolkitbasedonthecorrectlypredictedpolarityoftherootconcept,andanagreementrateof 0.901wasachievedbasedonannotationsfromtwopostgraduatestudents. 2.2.8 Summary Inshort,itisobservedthatmultilingualsentimentanalysisusingaparallelcorpusinsteadofmachine translationcanimproveclassificationaccuracy[33],[34].Ontopofthat,Luetal.[34]showedthata naturalparallelcorpusproducesperformancegaincomparedtousingpseudo-paralleldatafrom machinetranslationengines.Havingsaidthat,thereareotherresearcherswhofirmlybelievethat machinetranslationtechnologyhasmatured[44],andthatthetechniquesusedintranslation[45] canbeappliedtomultilingualsentimentanalysis.Bothoftheseapproaches,however,donotwork wellforscarceresourcelanguages,asparallelcorporaandtranslationmachinesareliterallynonexistentforthissortoflanguages,andmanualeffortsareneededforcreatingsuchresourcesbefore theapproachesreviewedabovecanbeadopted. Table1.Multilingualapproachesusedinsubjectivityandpolaritystudies *L,CandTatthetableheaderindicateiftheapproachuseslexicon,corpusortranslator-based resources,respectively.ThecorrespondingresourcescanbefoundinTable2andTable3. Approach Challenges Subjectivity Bilingualdictionary • DuetoinflectedEnglishwords, translationand lemmatisedsubjectiveEnglish rule-based termsareusedtomapentriesto classifier thebilingualdictionarybutthis maylosesubjectivity Language L* C* T* Reference English, √ Romanian [3] Approach Challenges • Ambiguityofwordsenseandpart ofspeechduetoidenticalentries • Multi-wordexpressionsthat cannotmatchthedictionary entriessoword-by-wordmatching isadopted Parallelannotation • Interpretationofdifferent projectionand languagesonsubjectivityofa statisticalclassifier sentenceduetodifferentopinions ofannotatorsandlossof informationintranslation • Difficultyincapturingsubtle expressionssuchasirony Localgrammar • Wordsenseambiguityof patterndiscovery sentimentwordsextracted anddomain • Grid-basedanalysisisproposedto specifickeywords copewithmultiplenewssources andthehugevolume Polarity Lexicon-basedto • Manualannotationisusedfor buildSupport creatingsentiment-taggedChinese Vector(SV)of words sentimentwords • ItisobservedthatSVwithzero fortheSVMand elements(nomatchinallthe C4.5 positiveornegativewords)should beeliminatedtoimprovethe classifier’sresult Lexiconand • Hugemanualeffortneededfor parallelcorpus generatingalistofsentimentwithastatistical bearingwordsbasedonWordNet wordalignment • Resultsshowthattheapproach approach recognisesnegativeemailsbetter thanpositiveemails Lexiconwithrule- • Translationerrorsormissing basedclassifier translation andmachine • Ambiguitiesanddifferent learning(Simple meaningsofasynsetin LogisticClassifier) SentiWordNetarenotresolved • Limitedabilitytorecognise negativetext;maybedueto negatedstructuresnotconsidered intheclassifier Lexicon-based • Heavilydependentonthe randomwalk dictionaryofsynonymsandthe approachon weightderivedfromthelinks synonymswith betweenthewords seedwordsanda bilingualdictionary Lexicon-based • Cross-linguallexicontranslation Language L* C* T* Reference English, Romanian √ [3] English, Arabic, Chinese √ [21] English, Chinese √ [24] English, German √ √ [2] English, German √ √ [9] English, Swedish √ [16] English, √ √ [32] Approach withtranslation Challenges doesnotworkwellforChinese sentimentanalysis • Volumeandqualityofbilingual paralleldataiscriticaltothe performanceofthemethod Parallelcorpusbasedvialearning sentimentwords fromthecorpus Parallelcorpus• basedusingjoint trainingontwo monolingual classifiersonthe unlabelledcorpus Corpus-basedwith • domainadaptation ofSCL • Corpus-basedbut withaspectfocus • • Corpus-basedwith • MS-LDA • • Corpus-basedand 2-waytranslation withco-training • • Corpus-basedand translationwith LingPipeclassifier • • Corpus-basedwith • translationand machinelearning • Language Chinese L* C* T* Reference English, Chinese √ [33] Itisassumedthattheperspectives English ofparallelsentencesinthecorpus Chinese arethesameandshouldhavethe samesentimentpolarity √ [34] English, German, French, Japanese √ [37] English, Dutch, French √ [39] English, German, Chinese √ [40] English, Chinese √ √ [8] English, German √ √ [9] English, Spanish, French, German √ √ [44] Itisessentialtohaveataskor domainspecificcorpusforthe approach Thepragmaticcorrelationofpivot wordsorwordpairscanonlywork onadomainspecificcross-lingual corpus Manuallyannotatetrainingdatain regardtoacertainentity Majorcauseoferrorsisthe scarcityoftrainingexampleswith informallanguagesusedonblogs Variousresourcesareneededasa bridgetolinkthedifferentcorpora Qualityandtheamountofcorpora areessentialforbetter performance Mappingthatcapturesthelocal syntaxandmeaningful collocationscanimprovethe model Inaccuracyofmachinetranslation servicecausesthedifferencein featuredistribution Learningcurveofclassifiersinthe co-trainingapproach Limitedabilitytorecognise negativetext;maybedueto negatedstructuresnotconsidered intheclassifier Frequencyofpolarityfeaturesand subjectivitydetectionmethodsare proposedtoimproveaccuracy Translationenginesortranslators needtobeavailableforthetarget language Multipletranslateddatafrom Approach Challenges varioustranslatorsproposedin ordertominimisethetranslation error Lexiconand • Domainspecificdatasetsareused corpus-basedwith inthestudy;itisnotknownhow translationto theapproachperformsongeneral createabi-view cross-lingualclassification non-negative • Parametersusedhaveinfluence matrixtriondifferentlanguagesandthey factorisation needtobesetmanually; model suggestedtoestimatethe parametersviaavalidationset Lexiconand • Theavailabilityoftranslatorsfor corpus-basedwith thetargetlanguage translationto • Duetothescorevarianceofeach Englishto language,itisproposedthat understandthe includingnormalisation diversityofthe coefficientsforcross-language different polaritycomparisonwillhelp languages improvetheapproach Lexiconand • corpus-basedwith patterntransfer • translationto identifyasetof sentimentunits Concept-based • withtranslation • • Coverageofpatternsisimportant fortheaccuracyoftheapproach Itisessentialtounderstandthe knowledgeandtechniquesoftext translationtoderiveparsingrules andpatterns Disambiguationalgorithmsfor identifyingthecontextwithintext Manualeffortisneededasthe polarityofsomeconceptsmaybe ‘opposite’innature Translationerrors,untranslated termsandout-of-vocabulary (OOV)concepts Language L* C* T* Reference English, Chinese √ √ √ [41] English, √ Arabic, Chinese, French, German, Italian, Japanese, Korean, Spanish English, √ Japanese √ √ [42] √ √ [45] English, Chinese √ [49] Table2.Lexiconsandcorporausedinmultilingualsentimentanalysis Language English Name OpinionFinder[5] Type Subjectivity lexicon English Negationterms Valenceshifters.tff [5],[22] 244intensifier Intensifiers2.tff[5], [22] Negation lexicon Remarks Reference 6,856uniqueentries,outof [3] which990aremulti-word expressionsandwithattributes –strong,weakandwordsenses (verb,adj,adv) 88negationterms [32] Intensifier lexicon 244intensifiers English [32] Language English Name SentiWordNet[30] Type Polaritylexicon English Subjectivityclues Subjclueslen1HLTEMNLP05.tff [5],[22] LingPipemovies reviews4 MPQA[31] Polaritylexicon Polaritycorpus English Multi-domain sentimentcorpus [50] NTCIROpinion AnalysisPilotTask [35],[36] NTCIR8 Multilingual OpinionAnalysis Task(MOAT)5 GeneralInquirer Categories6 WordNet[51] English ReutersRCV17 English BritishNational Corpus(BNC)8 HITIR-LabTongyici Cilin[52] HowNetChinese sentimentlexicon9 Productreviews10 English English English English English English Chinese Chinese Chinese Chinese NTCIROpinion AnalysisPilotTask [36] Remarks Trioofpolarityscoresassigned (positivity,negativityand objectivityscores);thesumof thesescoresisalways1 2718Englishpositiveand4910 negativeterms Reference [9],[44] 1000positiveand1000negative reviews 535newsarticlesfrom187 differentforeignandU.S.news sources;4,958sentences(1,471 positiveand3,487negative) 8,000Amazonproductreviews (4000positive+4000negative) [9],[39], [41] [9],[33], [34] Polaritycorpus 1,737sentences(528positive and1,209negative) [33],[34] Polaritycorpus 6,223opinionunits [44] Polarity Dictionary Vocabulary lexiconswith synonymsets Financial trainingcorpus General language Lexicaldatabase [24] [40] 800,000texts,eachcontaining 200-400words [21] 77,3443Chinesewordswithin 17,817synsets 60,000Chinesewordsand 11,000sentences 886ITproductreviews(451 positiveand435negative) 4,294sentences(2,378positive and1,916negative) [53] Polaritycorpus Polaritycorpus Polaritylexicon Polaritycorpus Polaritycorpus 4 http://alias-i.com/lingpipe/index.html http://research.nii.ac.jp/ntcir/ntcir-ws8/permission/ntcir8xinhua-nyt-moat.html 6 http://www.wjh.harvard.edu/~inquirer/homecat.htm 7 http://trec.nist.gov/data/reuters/reuters.html 8 http://www.natcorp.ox.ac.uk/corpus/ 9 http://www.keenage.com/ 10 http://www.it168.com/ 5 [32] [8],[41] [21],[40] [32],[41], [53] [8] [33],[34] Language Chinese Name Doubanreviews11 Type Polaritycorpus Chinese OPINMINEChinese opinionannotation corpus[54] BingonlineEnglishChinese dictionary12 English-to-Chinese dictionary LDC_CE_DIC2.0 [32] Localisationfor TaiwanandBig5 Encoding(TaBE)13 Moviereviews14 ThePeople’s Dictionary15 SemCorcorpus[19] Polaritycorpus English Romanian RomanianNLP16 Various resourcesfor RomanianNLP ChineseEnglish English, German, French, Japanese English, Chinese, German English- German StarDict17 Dictionary Cross-Lingual Sentiment(CLS) dataset18 Polaritycorpus Chinese Chinese Chinese German English Swedish English Romanian Dictionary [53] Dictionary 128,366Chinesetermsand theircorrespondingEnglish terms [32] General language [21] Polaritycorpus English-Swedish dictionary Annotated subjective parallelcorpus [40] [16] AmherstSentiment Polaritycorpus Corpus[55] Ding19 Remarks Reference Movie/music/bookreviewswith [41] 1000positiveand1000negative foreachdomain AnnotationcorpusfromNTCIR- [53] 6OpinionAnalysisTask Dictionary Parallelcorpusof107 [3] documentscoveringtopicsin sports,politics,fashion, educationandothers Corpusofnewspaperarticles [3] (50millionwords),sensetagged data(39ambiguouswords), Romanian-Englishparalleltext (1millionwords),RomanianEnglishdictionary(38,000 entries) 10bilinguallexicons [24] 800,000Amazonproduct reviewsinfourlanguages(the productcategoriesarebooks, dvdsandmusic) [37] [40] 11 http://www.douban.com/ http://cn.bing.com/dict/ 13 http://sourceforge.net/projects/libtabe/ 14 http://www.cs.colorado.edu/~jbg/static/data.html 15 http://folkets-lexikon.csc.kth.se/folkets/folkets.en.html 16 http://web.eecs.umich.edu/~mihalcea/downloads.html#romanian 17 http://goldendict.org/dictionaries.php 18 http://www.uni-weimar.de/en/media/chairs/webis/corpora/webis-cls-10/ 19 https://www-user.tu-chemnitz.de/~fri/ding/ 12 [40] Language ChineseEnglish ChineseGerman Multiple Name MDBG20 Type Dictionary Remarks Reference [40] HanDe21 Dictionary [40] Universal Dictionary downloadsite22 Dictionary Webvolunteercontributors– 4,500entriesinRomanian [3] 3. Otherworkonmultilingualanalysis Thescopeofmultilingualanalysisdoesnotrestricttosubjectivityandpolarityanalysis;italso includescross-languagedocumentsummarisation[56]andinformationretrievalinwebsearch[57], amongothers.AlistofrelevanttoolsformultilingualsentimentanalysisisshowninTable3,asa referenceforothercross-languagestudies.Briefly,twotypesofresourcesareshared,i.e., translatorsandunlabelledparallelcorpora.BesidesthecommonlyusedtranslatorsuchasGoogle Translate,commercial23andopensource[58]toolsarealsocovered.Anumberofstudieshave shownthatYahooBabelFish24containstheleast‘correct’translationaftermanualinspection(e.g., see[32],[44]),andhenceitisoftenusedasabaselineeitherformanualcorrectionortoimpede translationbiasofahumantranslator[44].Parallelcorporacanbeavaluableassetforlearningand overcomingculturalandlinguisticdiversity,sothatinformationcanbesharedaccuratelyand transparentlyacrossdifferentsocietieswithdifferentlanguages. Table3.Toolsformultilingualanalysis Type Translator Name PROMTeXcellentTranslation (XT)Technology25 Translatorand GoogleTranslate26 Mapping Translator YahooBabelFish27 Translator Translator Translator BingTranslator28 Moses[58] IBMWebSphereTranslation Server(WTS)29 20 http://www.mdbg.net/chindict/chindict.php http://www.handedict.de/ 22 http://www.dicts.info/uddl.php 23 http://www.promt.com/ 24 http://www.babelfish.com/ 25 http://www.promt.com/ 26 https://translate.google.com.au/ 27 http://www.babelfish.com/ 28 http://www.bing.com/translator/ 21 Language German,English,Spanish,French, Portuguese,ItalianandRussian Multiple Chinese-to-Englishtasksof MT2005,theBLEU-4score~is 0.3531[32] Multiple Chinese-to-Englishtasksof MT2005,theBLEU-4score~is 0.1471[32] Multiple Multiple Multiple Reference [9] [8],[32],[53] [32] [44] [44] [42] Type Unlabelled Parallel Corpus Unlabelled Parallel Corpus Unlabelled Parallel Corpus Name ISIChinese-EnglishParallel Corpus[59] Language English,Chinese Reference [33],[34] ParallelCorporaof23 OfficialEULanguages30 Multiple [42],[60] ParallelCorporaof21 Multiple [2],[40] EuropeanLanguages (extractedfromthe proceedingsoftheEuropean Parliament)[61] ~BLEU(BilingualEvaluationUnderstudy)isanalgorithmforevaluatingthequalityofapieceof translatedtext.Thescoreiscalculatedbasedonamodifiedformofprecisionforcomparingthe candidatetranslationagainstmultiplereferences. 4. Sentimentanalysisonsocialmedia Intheearlierdays,companiesandgovernmentsdidnotrealisethepowerofsocialmedia,untilthey sawtheinfluenceofword-of-mouthandhowquicklyitcouldresonatewiththecommunityand inspirethelaunchofaprotestorcampaignforacause[62].Sincethen,sentimentanalysishas expandedfrombeingaresearchareaonformallanguagessuchasEnglishtoincludeinformal languagesusedonsocialmedia.Inparticular,thecontentoftweets(postssharedonTwitter)is amongthemoststudied,duetotheirabilitytopropagatehottopicsinaveryshortdurationandtoa largenumberofusersoverwidegeographicalregions. However,asseenfromSection2,mostofthesentimentanalysisstudiestodatehaveutilised resourcessuchaslexiconsandmanuallylabelledcorporainEnglish.Thecorporausedaremainly fromnews[3],[31],[33],[34]andreviews[8],[9],[37],[39],[41],[63],withcontentwrittenin properEnglish.Giventhatsocialmediaisbecomingthemainstreammodeforcommunicationand expressingone’sthoughtsonavarietyofissues,itisessentialtoanalysethestructureofsocial mediacorporaandcurrentapproachesusedforsentimentanalysisandopinionminingonsocial media. 4.1 Englishsentimentanalysisonsocialmedia Eventhoughitiscommonfortweetstoincludemanylinguisticvariationsormixedlanguages (especiallyinmulticulturalsocieties),mostsentimentanalysisstudiesstillfocusonEnglishcontent becauseoftheavailabilityofresources.PakandParoubek[64]collectedacorpusof300,000text postsfromTwitterforobjectivityandpositive/negativeemotionanalysis.Theyconcludedthat Twitteruserswouldusesyntacticstructurestodescribeemotionorstatefacts,andthatPart-ofSpeech(POS)tagsmaybestrongindicatorsofemotionaltext.Inaddition,thereisadifferencein usingthePOStagswhenexpressingdifferenttypesofemotion;positivetextusesmostlysuperlative adverbs,suchas“most”,“best”andpossessiveendings,whilenegativetextcontainsmoreverbsin 29 http://www-03.ibm.com/software/products/en/translation-server https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis 30 thepasttense.AnMNBclassifierwithn-gramsandPOStagsasfeatureswastestedandtheyfound thatthebestperformanceisachievedwithusingbigrams. LikePakandParoubek[64],BarbosaandFeng[65],Kouloumpisetal.[66]andDavidovetal.[67] followedthemachine-learningbasedapproachforTwittersentimentanalysis.BarbosaandFeng [65]proposedatwo-stepapproachtoclassifythesentimentoftweetsusingSVMclassifierswith abstractfeatures.Kouloumpisetal.[66]evaluatedtrainingdataextractedfromhashtagsand emoticonsandexaminedifTwitterfeaturesplayanimportantroleinTwittersentimentanalysis. Davidovetal.[67]usedasupervisedk-nearestneighbours-likeclassifiertoclassifytweetsinto multiplesentimenttypesusinghashtagsandsmileysaslabels. Incontrast,Jiangetal.[68]classifiedthesentimentofatweetaccordingtoitspositive,negativeor neutralsentimentaboutatargetorentity.Theyarguedthatthecontextofatweetisimportantto understandtheunderlyingsentiment,andhencerelatedtweetsshouldbetakenintoconsideration ratherthanjustrelyingonasingletweet,whichisusuallytooshortandambiguousforsentiment analysis.TheyusedPointwiseMutualInformation(PMI)[69]toidentifytheextendedtargetand implementedathree-levelapproachfordetectingsubjectivity,polarityandgraph-based relationships.Theirresultsshowthattheproposedapproachisabletoimprovetheperformanceof target-dependentsentimentclassification. 4.2 Multilingualsentimentanalysisonsocialmedia WhiletheaforementionedstudiesconcentrateonsentimentanalysiswithEnglishcontent,thereare alsoresearchstudiesthatusetweetsasacorpusformultilingualsentimentanalysis.Volkovaetal. [70]proposedanapproachforbootstrappingsubjectivitycluesfromTwitterdataandevaluated theirapproachonEnglish,SpanishandRussianTwitterstreams.Theproposedapproachusesthe MPQAlexicon[22]tobootstrapsentimentlexiconsfromalargepoolofunlabelleddatausingasmall amountoflabelleddatatoguidetheprocess.Termsthatarestronglysubjectiveintranslationare usedasseedtermsinthenewlanguage,withtermpolarityprojectedfromtheEnglishlexicon. However,itischallengingtoclassifysubjectivetweetswithphilosophicalthoughts.Thisismainly duetosometermsbeingweaklysubjectiveandhenceuseableonbothneutralandsubjective tweets.Besidesthat,termswithambiguouswordsenseandcontradictingpolarity(dependingon thecontext)arefoundtobeparticularlyerror-prone. BalahurandTurchi[15]builtasimplesentimentanalysissystemfortweetsinEnglish,andused tweetsfromSemEval2013Task2–SentimentAnalysisinTwitter[71]astheirtrainingandtesting datasets.TheyalsotranslatedthedatasetsfromEnglishtofourotherlanguages–Italian,Spanish, FrenchandGerman.Itisfoundthatjointtrainingdatasetsfromlanguageswithsimilarstructures helptoachieveimprovementovertheresultsobtainedonanindividuallanguage.Whilethismethod isattractive,asithelpstodisambiguatethecontextualuseofspecificwords,itcannoteliminatethe errorintroducedbytranslation.Fromthefindings,itisclearthatconsideringthedifferentwaysthe negationtermsareconstructedinthedifferentlanguagesishighlyessential. Cuietal.[72]didnotuseatranslationmachinebutinstead,focusedonbuildingemotiontokensor SentiLexiconusingemoticons,repeatingpunctuationsandrepeatingletters.Theseemotiontokens arefirstextractedtobuildaco-occurrencegraphandthroughagraphpropagationalgorithm, positiveandnegativelexiconsarelabelled.ThetypeoflanguageisidentifiedthroughUnicodeofthe character.IfatweetisfromtheBasicLatinorsymbolssection,itisassignedasaBasicLatintweet. MostofthetweetsconsideredareEnglishinnature.ThosecharactersintheLatinextendedsection areofteninPortuguese,Spanish,German,andsoon.Theircomparativeevaluationwith SentiWordNet[30]indicatedthatemotiontokensarehelpfulforbothEnglishandnon-English Twittersentimentanalysis. 4.3 Discussion Itisworthhighlightingthatapproachesfromsentimentanalysisonsocialmediaaremainlybasedon patterndiscoverysuchassyntacticstructures[64],Twitterfeatures[66],[67],emotiontokens[72], machinelearningthroughannotateddatasets[65],[67],[68]andtranslation[15],[70].Considering thelimitedwordsavailableinatweetanditsevolvingvocabulary,itisnotsurprisingthattheparallel corpus-basedapproachisnotadoptedascomparedtoothermultilingualstudiesdiscussedin Section2.Inaddition,sentencestructuresorgrammaticalrulesarehardlyconsideredeventhough PakandParoubek[64]showedthatPOStagscanbeausefulindicatorofemotionaltext.POStags areonlyapplicableifthesubjectofstudyisofasinglelanguagewithpropergrammaticalrules,as theidentificationoftagsisnotstraightforwardwhenatweetcontainsamixtureoflanguages.In fact,Kouloumpisetal.[66]showedthatPOSfeaturesmaynotbeusefulforsentimentanalysisbut otherfeaturessuchasemoticonsandintensifiersaremoreusefulincomparison.Itisobservedthat noneofthemultilingualsentimentanalysisstudiestakesintoconsiderationthemultiplelanguages foundinatweet.Instead,theirfocusistypicallyonstudyingtheeffectsofdifferentlanguagesona Twitterplatform[70],[72]orleveragingavailableresourcesofonelanguageforsentimentanalysis ofanotherlanguage[15]. 5. Workonscarceresourcelanguages Inadditiontotheinformallanguagesusedonsocialmedia,asdiscussedintheprevioussection,this sectionexploresstudiesthatanalyselanguageswithlimitedelectronicresources,i.e.,eitherno availableorveryminimalNaturalLanguageProcessing(NLP)toolscanbefoundforthelanguage.In areviewpapersuchasthis,itisimportanttoconsiderandtrytounderstandresearchthathasbeen doneonscarceresourcelanguages.OntopofdevelopingNLPtoolsforsomeofthoselanguages [73],effortshavealsobeenmadeinthefollowingthreeareas:sentimentanalysisitself[74]-[77], speechrecognition[78],[79]andmachinetranslation[80],[81].Whilestudiesonsentimentanalysis alongthislineofresearchoftenconcentrateondevelopingresourcesandapproachesforasingle scarceresourcelanguage,theotherareas,speechrecognitionandmachinetranslation,alsolook intoconstructingresourcesforotherlanguagesinordertosupporttheirresearch(suchascrowd sourcing[80]). 5.1 Sentimentanalysis Asinmultilingualsentimentanalysis,subjectivityanalysisandpolarityanalysishavebeendoneon scarceresourcelanguages,althoughnotextensivelyduetothelimitedresourcesavailable.Baneaet al.[74]createdasubjectivitylexiconfortheRomanianlanguageusingasmallsetofseedwords,a basicdictionary,andasmallrawcorpus.Theyusedabootstrappingapproachtoaddnewrelated wordstoacandidatelist.TheyalsousedbothPMI[82],[83]andLatentSemanticAnalysis(LSA)[84] tofilternoisefromthelexicon.ThecaveatoftheirapproachisthattheLSAmoduleneedstobe trainedusingasufficientlylargecorpus,anditissuggestedthatsemi-automaticmethodsshouldbe usedforcorpusconstructionasproposedbyGhanietal.[85].Baneaetal.showedthatunsupervised learningusingarule-basedsentencelevelsubjectivityclassifierisabletoachieveasubjectivityFmeasurescoreof66.2,whichisanimprovementcomparedtopreviouslyproposedsemi-supervised methods. Bakliwaletal.[75]constructedaHindisubjectivelexiconforpolarityclassificationofHindiproduct reviews.UsingWordNet[27]andagraph-basedtraversalmethod,theybuiltafull(adjectiveand adverb)subjectivelexicon.Theirapproachusesasmallseedlistwithpolaritytoleveragethe synonymandantonymrelationsofWordNetinordertoexpandontheinitiallexicon.The subjectivitylexiconisthenusedinreviewclassification.Theyachieved79%accuracyusingunigram andpolarityscoresasfeatures.AnotherapproachbyChowdhuryandChowdhury[76]usesboth BengaliandEnglishwordstoperformsentimentanalysisontweets.Theyappliedasemi-supervised bootstrappingmethodtocreatethetrainingcorpusformachinelearningclassification,andachieved 93%accuracythroughanSVMusingunigramswithemoticonsasfeatures. ThestudybySouzaandVieira[77]concentratedonsentimentanalysisofPortuguesetweetsusing Portuguesepolaritylexiconsandnegationmodels.Theyfoundthatdifferentlexiconssuchas OplexiconandSentiLexactuallyhavedifferentaccuracies.Specifically,Oplexicon[86]hasbetter performancecomparedtoSentiLex[87],duetotheformer’smorecomprehensivecoverageoftypes ofwordsanddomains.AseparatestudybyElmingetal.[88]usedarobustoffline-learningapproach forcross-domainsentimentanalysisonDanishbasedonapolaritylexicon.Theyobserved significantlypoorerperformancewhentheanalysisisdonefromonedomaintoanother(i.e., reviewsfromthefilmdomaintothecompanydomain). Asshownabove,theeffortsinanalysingsentimentonscarceresourcelanguagesarepredominately devotedtoconstructingpolaritylexicons[74],[75],[76]ormakinguseofanavailablelexiconfor sentimentclassification[77],[88].Thisisunderstandableaslexicon-basedapproachesarealso widelyadoptedinmultilingualsentimentanalysis(seeSection2).Partofthereasonbeingthat,a polaritylexiconprovidesastraightforwardmethodinassigningpolaritytosomecontentdepending ontheexistenceofatermorterms.Thisoffersaviableoptiongiventheconstraintofother resources,suchastheavailabilityofsynonymdictionariesandtranslationmachines. 5.2 Speechrecognition Thomasetal.[78]proposedtotraindeepneuralnetworks(DNNs)[89]forlowresourcespeech recognition.Toovercomethelimitationofhavinginsufficienttrainingdata,theyusedtranscribed datafromotherlanguagestobuildmultilingualacousticmodels.Theyobserveda16%improvement withjustonehourofin-domaintraining,andthree-fourthsofthegaincomesfromDNN-based features. Qianetal.[79]usedadataborrowingstrategyandtheSubspaceGaussianMixtureModel[90]for thesameproblem.Eventhoughtheirapproachachievesonlyanimprovementofabout1.7%,the resultsindicatethatitisimportanttoselectlanguagesthatarelinguisticallysimilarandtie parametersatacontext-dependentstate. 5.3 Machinetranslation Machinetranslationapproachesoftenrelyonparallelcorporatoimprovetheiraccuracyand coverage.However,limitedresourcesavailableforsomeofthelanguagesimplythatdevelopinga machinetranslationenginecanbeanexpensivetaskintermsofmoneyandeffortspent.Human annotationeffortsandtheavailabilityofexpertsarerequiredforthesuccessofsuchtasks.Ambati etal.[80]proposedanapproachtoleverageactivelearningof‘sentenceselection’throughcrowdsourcingtoenableautomatictranslationoflow-resourcelanguagepairs.Whiletheuseof MechanicalTurkforannotationtaskshasalwaysbeenquestioned,Ambatietal.showedthatitis possibletocreateparallelcorporausingnon-expertswithsufficientqualityassurance. Incontrast,IrvineandCallison-Burch[81]usedcomparablecorporatoimprovetheaccuracyof translationfromasmallparallelcorpus.Theyutilisedabilinguallexiconinductiontechniquetolearn newtranslationfromthecomparablecorporausingaphrase-basedstatisticalmachinetranslation modelforsixlowresourcelanguages.Theirresultsindicatethataddinginducedtranslationoflow frequencywordscanimprovetheperformancebeyondinducingOOVsalone. 6. Challengesandrecommendations AsshowninTable1,commonchallengesencounteredinmultilingualsentimentanalysisresearch includethewordsenseambiguityproblem[3],[9],[21],[49],languagespecificstructure(negation [15]orparsingrules[45])andtranslationerrors[8],[9].Mostofthechallengesarerelevantto scarceresourcelanguages,exceptfortheerrorsintroducedbytranslationmachines,asmostof theselanguagesdonothavesuchmachinesavailabletothem. 6.1 Wordsensedis-ambiguity Therearevarioussuggestionsforaddressingthewordsenseambiguityproblem.Xiaetal.[49]used LatentDirichletAllocation(LDA)[91]toextracttopwordsthatarerelatedtoatopic,andadopted PMI[82],[83]tocalculatethepolaritytendencyofanopinion.Baneaetal.[74]suggestedthatLSA [84]issufficienttocalculatethesimilaritybetweenanoriginalseedandeachofthecandidates extractedthroughabootstrappingprocess.Activelearning[80],whichisusedtoimprovemachine translationbyselectingsentencesthataremostinformativeforthetaskathand,mayhelpin targetingphrasesorimprovingsampleselection.Thesephrasesandsamplescollectedcanbeuseful foramanualdis-ambiguityannotationprocessandalsoasinputforfeedbacklearningofamachine learningapproach. 6.2 Languagestructure Itiswell-knownthatdifferentlanguageshavetheirownuniquewaysofexpression;forexample,itis foundthatintheRussianlanguage,philosophicalthoughtsandopinionsareoftenmisclassifiedand hencelexicon-basedapproachesmaynotbesufficient[70].Instead,adeeperlinguisticanalysisis required.Inaddition,negationrulesmaybedifferentfordifferentlanguagesandhencemaycause unnecessaryerrors[15].Forscarceresourcelanguages,someofthevariantsordialectscanbequite differentinnature[92].Inviewofthefactthatthereareatotalof48variantsofEnglishavailable aroundtheworld31,withsomebeingamixtureoflanguages,andothersbeingnon-native pronunciationofEnglishaswellasahostofotherpermutations,itisessentialtounderstandthe 31 http://en.wikipedia.org/wiki/List_of_dialects_of_the_English_language structureofalanguagesuchastheseinordertoassessthebestapproachforleveragingthe availableEnglishsentimentanalysisresources. 6.3 Machinelearning Mostofthescarceresourcelanguagesareusedonsocialmedia,whereslangorinformallanguages andemoticonsarecommonlyfound.Anumberofresearchstudieshavebeenabletoachieve reasonablygoodresultsbyincludingemotiontokensasfeaturesintheirmachinelearning approaches[67],[72],[76].Read[93]studiedemoticonsusingtextfromtheUsenetnewsgroups.He classifiedthetextintopositiveandnegativetypeswithboththeSVMandNB,andachievedan accuracyofaround70%onthetestsetused.Goetal.[94]usedasimilarideabuttheyconstructed theircorpusfromtweets.Thebestresultof81%accuracywasobtainedusingtheNBclassifier. Thesemethods,however,donotperformwellinidentifyingneutraltext.Amulti-level/cascading [39]ormeta-classifier[44]approachhasthereforebeenrecommendedformultilingualsentiment analysiswheresubjectivityanalysisshouldbedonebeforepolarityanalysisisconducted. 6.4 Essentialresources Subjectivityanalysiscannotbeaccomplishedwithoutalexiconorannotatedcorpus.Eventhough mostofthescarceresourcelanguageshavelimitedresourcesavailable,aninitialannotated dictionaryorlexiconisstillneededbeforeaclassifierwithreasonableaccuracycanbeachieved.The followingaretwoproposedapproachesforcreatinglexiconsforscarceresourcelanguages, dependingontheavailabilityofresources: 1. Asmallbilingualdictionaryastheavailableresource Theonlywaytoconstructasubjectivelexiconisbytranslatinganexistinglexiconfrom anotherlanguagethroughtheuseofabilingualdictionary.Althoughthismappingprocess canbeautomated,theaccuracywouldunfortunatelyberatherlowduetothecoverage limitationoftheinitialdictionaryandthecontext-freetranslationprocess,whichcan introducemanywordambiguityproblems.Itisessentialforthecreatedlexicontobe verifiedbyhumanannotatorstoensureitsquality,sothatitcanbeusedasabasisfor generatingmoreresourcesforagivenscarceresourcelanguage. 2. Asmallsubjectivelexiconastheavailableresource Asetofseedwordscanbeselectedfromthelexicontoextractacorpuscontainingthe seedsviaakeywordsearchonthecontentofinterest.Fromthissetofcandidates,a bootstrappingmethodcanbeapplied,withtheirrelatednessbeingmeasuredusing similaritymetricssuchasLSAorPMItoincreasethevolumeofthelexicon.Werecommend usingthebootstrappingalgorithmspecifiedintheworkofBaneaetal.[74],ifareasonablysizeddictionaryisavailable,oradoptingtheapproachbyVolkovaetal.[70]toextract subjectivitylexiconsfromsocialmediacontent,whichistypicallyshortandrelativelynonstructured. ThereviewfromSection4indicatesthatnoneofthemultilingualsentimentanalysisstudieson socialmediatakesintoaccountthepossibilityofhavingmixedlanguagesinmessagesshared,even thoughitiscommonforsocialmediadatatohavesuchlanguages(e.g.,Singlishwithwordsfrom English,MalayandChinesedialectsinasingletweet[92],[95]).Itisthereforenecessarytoconsider amorecomprehensivepolaritylexiconthatcontainspolaritylexiconsforeachofthelanguages.As mentionedinSection6.2,negationrulesmaybedifferentfordifferentlanguages.However,dueto theextensiveeffortrequiredforparsingasentenceinascarceresourcelanguage,initiativesin identifyingthedifferentnegationtermscanberewardingasastart.Thesenegationtermscanbe coupledwiththecombinedlexiconbuiltformoreaccurateclassification.Futureworkshould investigatethebehaviourandstructureofsentencesofdifferentlanguagesinordertoconstructa listofknowledge-basednegationrules. 6.5 Ahybridframework Inviewofthelimitationofresourcesandchallengesdiscussed,itisworthexploringaframework thatincorporatesbothknowledge-basedtechniques(e.g.,polaritylexicons)andstatisticalmethods (e.g.,machinelearning)[96].TherecommendedhybridframeworkisshowninFigure1.This proposedframeworkisespeciallyapplicabletoscarceresourcelanguages,whenresourcessuchas polaritylexiconsanddictionariesmaynotbeavailableorcomprehensiveenough.Ascanbeseen fromthefigure,machinelearningcanbeusedforassigningpolarityifthatisthecase.Eventhoughit isarequirementtohaveanannotatedtrainingdatasetbeforeamachinelearningmodelcanbe generated,semi-supervisedmethodswiththeuseofemoticons(seeSection6.3andreferences therein)orhashtags[66],[67]toextractapreliminarydatasetwithpolaritycanbeadoptedbefore manualannotationisdone.Theestablishedhybridframeworkisabletoassignpolaritytounseen content(consideringthesituationwhennoneofthewordsmatchesanyterminapolaritylexicon) bylearninghiddenrulesoftheannotateddata.Inaddition,unseendatathathasbeenclassifiedcan bereviewedforknowledge-basedruleextractionorasafeedbacksystemtoimprovemachine learningclassification. Duetoscarceresourcelimitationsandmultilingualsettings,thisframeworkcanbeadapted dependingonresourcesavailableandthetargetlanguage(s)tobeanalysed.Thepolaritypattern mentionedinFigure1canbeapolaritywordfoundinalexiconoratypeofnegationpatternspecific toalanguage.Theknowledge-basedpolarityassignmentismainlybasedonresourcesoralgorithms developedthroughdetailedanalysisofthelanguageorlanguages.Itcanbeamixedlanguage lexicontoaddressthemixtureoflanguagesfoundinsocialmediadataand/orknowledge-based negationrulesmentionedinSection6.4.Inaddition,theknowledgelearntfromwordsensedisambiguityexplainedinSection6.1canbeincorporatedintotheknowledge-basedalgorithmto improvetheaccuracyofpolarityassignment.Themachinelearningpolarityassignmentcanadopta simplemodeltrainedusingatrainingdatasetwithemoticonsorensemble/cascadinglearning pointedoutinSection6.3.Theaccuracyoftheproposedframeworkisheavilydependentonthe finalapproachesimplementedinthevariouscomponentsandqualityofresourcesavailable. Figure1.Therecommendedhybridframework 6.6 Otherconsiderations Whileamanuallyannotatedlexiconorcorpusisstillvitalforsentimentanalysis,itrequiresfinancial fundingsupportandaconsiderableamountofhumanefforttocreateareasonablysizedresource.If fundingisnotalimitingfactor,acrowd-sourcingapproach[80],[97]canbeconsidered,asthe qualityofannotationcanbeimprovedthroughcrossvalidationandverificationofseveral annotators.However,ifcrowdsourcingisnotaviableoption,aninitialpolaritycorpuscanbe createdbyusingemotiontokens[72].Thiscorpuscanthenbeputtogetherwiththelexiconbuilt,to discovermorecandidatesthroughabootstrapping[74]orSCL[37]approach.Itiscommontousea subjectivitylexiconforarule-basedclassifier,however,anumberofstudies[2],[42]haveshown thatacombinationofcorpus-basedmachinelearningandlexiconrule-basedmethodswith cascadinglearning[39]canimprovetheaccuracyofsentimentanalysis. Eventhoughthelinguisticstructureofascarceresourcelanguageisimportantfordeterminingif Englishresourcescanbeadaptedsuccessfully,itrequiresdetailedanalysistobecarriedoutby linguisticexpertsinordertoidentifythestructuraldifferences.Asaresult,itissuggestedthat machinelearningshouldbeusedasanalternativeoralitmustest,toassessifthereisaneedfora structuralstudytofurtherimprovetheaccuracy.AsshowninTable1,oneofthedownfallsisthe limitedabilityofaclassifiertorecognisenegativetextandomittednegationstructures.Whileitmay notbepossibletoconductastudyonthelinguisticstructureofascarceresourcelanguage,itis certainlypossibletomanuallyidentifysomenegationsamplesfromtheavailablecorpusand incorporatethespecificpatternorstructurewhenconstructingatrainingdataset. Tosumup,althoughthelexicon-basedapproachisstillessentialforsentimentanalysis,itshouldbe expandedtoincludecontextualawarenessfeatures,asmostofthesentimentsarerelatedtoan entityoratopic.Apartfromthat,theconcept-basedapproach,whichincorporatescommon-sense reasoning[98],isfastdeveloping.Concept-levelsentimentanalysisisnecessaryformanagingmore subtlesentimentsthatareoftennotcapturedorhandledincurrentmultilingualsentimentanalysis research. 7. Conclusion Sentimentanalysisisanactiveresearcharea,thankstothemanychallengesbutalsopromises. Whilemanysentimentanalysisstudieshavebeenconductedonformallanguagesusingmainstream platformslikenewsorofficialdocuments,increasingattentionisnowplacedonanalysisofsocial mediacontenttofacilitateunderstandingofthewellbeingofacommunityortheperceivedimageof acompany/product.Socialmediacontentoftencontainsinformalormixedlanguages.Itisthusno longersufficienttoconsideronlyaformallanguage(e.g.,English)insentimentanalysisresearch.In thisreviewpaper,wehavelookedatarangeofcurrentapproachesandtoolsusedformultilingual sentimentanalysis.Wetookintoaccountnotjustformallanguagesbutinformalandscarceresource languagestoo.Majorchallengeshavebeenidentified,andwerecommendedpossibleremediesas wellasahybridframeworkfordevelopingsentimentanalysisresourcesparticularlyforlanguages withlimitedelectronicresources. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] E.RiloffandJ.Wiebe,‘Learningextractionpatternsforsubjectiveexpressions’,inProceedings oftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2003,pp.105–112. S.-M.KimandE.Hovy,‘Identifyingandanalyzingjudgmentopinions’,inProceedingsofthe conferenceofNorthAmericanChapteroftheAssociationofComputationalLinguistics,2006, pp.200–207. R.Mihalcea,C.Banea,andJ.Wiebe,‘Learningmultilingualsubjectivelanguageviacrosslingualprojections’,inProceedingsofAnnualMeetingofAssociationforComputational Linguistics,2007,vol.45,p.976. B.PangandL.Lee,‘Opinionminingandsentimentanalysis’,Found.TrendsInf.Retr.,vol.2, no.1–2,pp.1–135,2008. T.Wilson,P.Hoffmann,S.Somasundaran,J.Kessler,J.Wiebe,Y.Choi,C.Cardie,E.Riloff,and S.Patwardhan,‘OpinionFinder:Asystemforsubjectivityanalysis’,inProceedingsof ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2005,pp.34–35. H.KanayamaandT.Nasukawa,‘Fullyautomaticlexiconexpansionfordomain-oriented sentimentanalysis’,inProceedingsoftheConferenceonEmpiricalMethodsinNatural LanguageProcessing,2006,pp.355–363. Y.Hu,J.Duan,X.Chen,B.Pei,andR.Lu,‘Anewmethodforsentimentclassificationintext retrieval’,inProceedingsofInternationalJointConferenceonNaturalLanguageProcessing, 2005,pp.1–9. X.Wan,‘Co-trainingforcross-lingualsentimentclassification’,inProceedingsoftheJoint Conferenceofthe47thAnnualMeetingoftheAssociationforComputationalLinguisticsand the4thInternationalJointConferenceonNaturalLanguageProcessing,2009,pp.235-243. K.Denecke,‘Usingsentiwordnetformultilingualsentimentanalysis’,inProceedingsof InternationalConferenceonDataEngineeringWorkshops,2008,pp.507–512. J.R.Leimgruber,‘SingaporeEnglish’,Lang.Linguist.Compass,vol.5,no.1,pp.47–62,2011. E.Cambria,D.Olsher,andD.Rajagopal,‘SenticNet3:acommonandcommon-sense knowledgebaseforcognition-drivensentimentanalysis’,inProceedingsofAAAIConference onArtificialIntelligence,2014,pp.1515–1521. S.TanandJ.Zhang,‘Anempiricalstudyofsentimentanalysisforchinesedocuments’,Expert Syst.Appl.,vol.34,no.4,pp.2622–2629,2008. J.Zhao,L.Dong,J.Wu,andK.Xu,‘Moodlens:anemoticon-basedsentimentanalysissystem forchinesetweets’,inProceedingsofthe18thACMSIGKDDInternationalConferenceon KnowledgeDiscoveryandDataMining,2012,pp.1528–1531. N.Kobayashi,K.Inui,Y.Matsumoto,K.Tateishi,andT.Fukushima,‘Collectingevaluative expressionsforopinionextraction’,inProceedingsofInternationalConferenceonNatural LanguageProcessing,2005,pp.596–605. A.BalahurandM.Turchi,‘ImprovingsentimentanalysisinTwitterusingmultilingualmachine translateddata.’,inProceedingsofRecentAdvancesinNaturalLanguageProcessing,2013, pp.49–55. M.RosellandV.Kann,‘Constructingaswedishgeneralpurposepolaritylexiconrandomwalks inthepeople’sdictionaryofsynonyms’,inProceedingsofSwedishLanguageTechnology Conference,2010,pp.19–20. M.Abdul-Mageed,M.T.Diab,andM.Korayem,‘Subjectivityandsentimentanalysisof modernstandardarabic’,inProceedingsoftheAnnualMeetingoftheAssociationfor ComputationalLinguistics:HumanLanguageTechnologies:shortpapers,2011,vol.2,pp. 587–591. E.CambriaandA.Hussain,Senticcomputing:acommon-sense-basedframeworkforconceptlevelsentimentanalysis,vol.1.Springer,2015. [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] G.A.Miller,C.Leacock,R.Tengi,andR.T.Bunker,‘Asemanticconcordance’,inProceedings oftheWorkshoponHumanLanguageTechnology,1993,pp.303–308. D.D.Lewis,‘Naive(Bayes)atforty:Theindependenceassumptionininformationretrieval’,in ProceedingsofEuropeanConferenceonMachineLearning,1998,pp.4–15. K.Ahmad,D.Cheng,andY.Almas,‘Multi-lingualsentimentanalysisoffinancialnews streams’,inProceedingsoftheInternationalConferenceonGridinFinance,2006. T.Wilson,J.Wiebe,andP.Hoffmann,‘Recognizingcontextualpolarityinphrase-level sentimentanalysis’,inProceedingsofConferenceonEmpiricalMethodsinNaturalLanguage Processing,2005,pp.347–354. A.EsuliandF.Sebastiani,‘DeterminingTermSubjectivityandTermOrientationforOpinion Mining.’,inProceedingsoftheConferenceoftheEuropeanChapteroftheAssociationfor ComputationalLinguistics,2006,vol.6,p.2006. J.Yao,G.Wu,J.Liu,andY.Zheng,‘Usingbilinguallexicontojudgesentimentorientationof Chinesewords’,inProceedingsofIEEEInternationalConferenceonComputerandInformation Technology,2006,pp.38–38. V.Vapnik,Thenatureofstatisticallearningtheory.SpringerScience&BusinessMedia,2000. J.R.Quinlan,C4.5:programsformachinelearning.Elsevier,2014. G.A.Miller,‘WordNet:alexicaldatabaseforEnglish’,Commun.ACM,vol.38,no.11,pp.39– 41,1995. F.J.OchandH.Ney,‘Improvedstatisticalalignmentmodels’,inProceedingsoftheAnnual MeetingonAssociationforComputationalLinguistics,2000,pp.440–447. V.KannandM.Rosell,‘FreeconstructionofafreeSwedishdictionaryofsynonyms’,in ProceedingsoftheNordicConferenceonComputationalLinguistics,2005,pp.105–110. S.Baccianella,A.Esuli,andF.Sebastiani,‘SentiWordNet3.0:Anenhancedlexicalresourcefor sentimentanalysisandopinionmining.’,inProceedingsofLanguageResourcesand EvaluationConference,2010,vol.10,pp.2200–2204. J.Wiebe,T.Wilson,andC.Cardie,‘Annotatingexpressionsofopinionsandemotionsin language’,Lang.Resour.Eval.,vol.39,no.2–3,pp.165–210,2005. X.Wan,‘UsingbilingualknowledgeandensembletechniquesforunsupervisedChinese sentimentanalysis’,inProceedingsoftheConferenceonEmpiricalMethodsinNatural LanguageProcessing,2008,pp.553–561. X.Meng,F.Wei,X.Liu,M.Zhou,G.Xu,andH.Wang,‘Cross-lingualmixturemodelfor sentimentclassification’,inProceedingsoftheAnnualMeetingoftheAssociationfor ComputationalLinguistics:LongPapers,2012,vol.1,pp.572–581. B.Lu,C.Tan,C.Cardie,andB.K.Tsou,‘Jointbilingualsentimentclassificationwithunlabeled parallelcorpora’,inProceedingsoftheAnnualMeetingoftheAssociationforComputational Linguistics:HumanLanguageTechnologies,2011,vol.1,pp.320–330. Y.Seki,D.K.Evans,L.-W.Ku,H.-H.Chen,N.Kando,andC.-Y.Lin,‘Overviewofopinion analysispilottaskatNTCIR-6’,inProceedingsofNTCIR-6WorkshopMeeting,2007,pp.265– 278. Y.Seki,D.K.Evans,L.-W.Ku,L.Sun,H.-H.Chen,N.Kando,andC.-Y.Lin,‘Overviewof multilingualopinionanalysistaskatNTCIR-7’,inProceedingsofNTCIR-7WorkshopMeeting, 2008. P.PrettenhoferandB.Stein,‘Cross-lingualadaptationusingstructuralcorrespondence learning’,ACMTrans.Intell.Syst.Technol.,vol.3,no.1,p.13,2011. J.Blitzer,R.McDonald,andF.Pereira,‘Domainadaptationwithstructuralcorrespondence learning’,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguage Processing,2006,pp.120–128. E.BoiyandM.-F.Moens,‘Amachinelearningapproachtosentimentanalysisinmultilingual Webtexts’,Inf.Retr.,vol.12,no.5,pp.526–558,2009. [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] J.Boyd-GraberandP.Resnik,‘Holisticsentimentanalysisacrosslanguages:Multilingual supervisedlatentDirichletallocation’,inProceedingsoftheConferenceonEmpiricalMethods inNaturalLanguageProcessing,2010,pp.45–55. J.Pan,G.-R.Xue,Y.Yu,andY.Wang,‘Cross-lingualsentimentclassificationviabi-viewnonnegativematrixtri-factorization’,inAdvancesinknowledgediscoveryanddatamining, Springer,2011,pp.289–300. M.Bautin,L.Vijayarenu,andS.Skiena,‘Internationalsentimentanalysisfornewsandblogs.’, inProceedingsofInternationalConferenceonWebandSocialMedia,2008. N.Godbole,M.Srinivasaiah,andS.Skiena,‘Large-scalesentimentanalysisfornewsand blogs.’,inProceedingsofInternationalConferenceonWebandSocialMedia,2007,vol.7,p. 21. A.BalahurandM.Turchi,‘Comparativeexperimentsusingsupervisedlearningandmachine translationformultilingualsentimentanalysis’,Comput.SpeechLang.,vol.28,no.1,pp.56– 75,2014. K.Hiroshi,N.Tetsuya,andW.Hideo,‘Deepersentimentanalysisusingmachinetranslation technology’,inProceedingsoftheInternationalConferenceonComputationalLinguistics, 2004,p.494. S.Poria,E.Cambria,G.Winterstein,andG.-B.Huang,‘Senticpatterns:Dependency-based rulesforconcept-levelsentimentanalysis’,Knowl.-BasedSyst.,vol.69,pp.45–63,2014. E.Cambria,P.Gastaldo,F.Bisio,andR.Zunino,‘AnELM-basedmodelforaffectiveanalogical reasoning’,Neurocomputing,vol.149,pp.443–455,2015. S.Poria,E.Cambria,A.Gelbukh,F.Bisio,andA.Hussain,‘Sentimentdataflowanalysisby meansofdynamiclinguisticpatterns’,Comput.Intell.Mag.IEEE,vol.10,no.4,pp.26–36, 2015. Y.Xia,X.Li,E.Cambria,andA.Hussain,‘AlocalizationtoolkitforSenticNet’,inProceedingsof IEEEInternationalConferenceonDataMiningWorkshops,2014,pp.403–408. J.Blitzer,M.Dredze,andF.Pereira,‘Biographies,bollywood,boom-boxesandblenders: Domainadaptationforsentimentclassification’,inProceedingsofAnnualMeetingof AssociationforComputationalLinguistics,2007,vol.7,pp.440–447. G.A.Miller,‘NounsinWordNet:alexicalinheritancesystem’,Int.J.Lexicogr.,vol.3,no.4,pp. 245–264,1990. W.Che,Z.Li,andT.Liu,‘Ltp:Achineselanguagetechnologyplatform’,inProceedingsofthe InternationalConferenceonComputationalLinguistics:Demonstrations,2010,pp.13–16. ‘NTCIR8MOATXinhuaandNYTNewscorpus’.[Online].Available: http://research.nii.ac.jp/ntcir/ntcir-ws8/permission/ntcir8xinhua-nyt-moat.html.[Accessed: 27-Mar-2015]. R.Xu,K.-F.Wong,andY.Xia,‘Opinmine–opinionanalysissystembyCUHKforNTCIR-6pilot task’,inProceedingsoftheNTCIR-6Workshop,2007. N.Constant,C.Davis,C.Potts,andF.Schwarz,‘Thepragmaticsofexpressivecontent: Evidencefromlargecorpora’,SpracheDatenverarb.,vol.33,no.1–2,pp.5–21,2009. F.Boudin,S.Huet,J.-M.Torres-Moreno,andJ.Torres-Moreno,‘Agraph-basedapproachto cross-languagemulti-documentsummarization’,Res.J.Comput.Sci.Comput.Eng.Appl. Polibits,vol.43,pp.113–118,2010. J.SavoyandL.Dolamic,‘HoweffectiveisGoogle’stranslationserviceinsearch?’,Commun. ACM,vol.52,no.10,pp.139–143,2009. P.Koehn,H.Hoang,A.Birch,C.Callison-Burch,M.Federico,N.Bertoldi,B.Cowan,W.Shen, C.Moran,andR.Zens,‘Moses:Opensourcetoolkitforstatisticalmachinetranslation’,in ProceedingsoftheAnnualMeetingonAssociationforComputationalLinguistics : Demonstrations,2007,pp.177–180. D.S.MunteanuandD.Marcu,‘Improvingmachinetranslationperformancebyexploiting non-parallelcorpora’,Comput.Linguist.,vol.31,no.4,pp.477–504,2005. [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] ‘IBM-WebSphereTranslationServerforMultiplatforms’.[Online].Available:http://www03.ibm.com/software/products/en/translation-server.[Accessed:28-Mar-2015]. P.Koehn,‘Europarl:Aparallelcorpusforstatisticalmachinetranslation’,inProceedingsof MachineTranslationSummit,2005,vol.5,pp.79–86. W.Zhang,T.J.Johnson,T.Seltzer,andS.L.Bichard,‘Therevolutionwillbenetworked:The influenceofsocialnetworkingsitesonpoliticalattitudesandbehavior’,Soc.Sci.Comput. Rev.,2009. ‘LingPipeHome’.[Online].Available:http://alias-i.com/lingpipe/index.html.[Accessed:25Mar-2015]. A.PakandP.Paroubek,‘Twitterasacorpusforsentimentanalysisandopinionmining.’,in ProceedingsofLanguageResourcesandEvaluationConference,2010,vol.10,pp.1320–1326. L.BarbosaandJ.Feng,‘Robustsentimentdetectionontwitterfrombiasedandnoisydata’,in Proceedingsofthe23rdInternationalConferenceonComputationalLinguistics:Posters,2010, pp.36–44. E.Kouloumpis,T.Wilson,andJ.D.Moore,‘Twittersentimentanalysis:Thegoodthebadand theomg!’,inProceedingsofInternationalConferenceonWebandSocialMedia,2011,vol.11, pp.538–541. D.Davidov,O.Tsur,andA.Rappoport,‘Enhancedsentimentlearningusingtwitterhashtags andsmileys’,inProceedingsofthe23rdInternationalConferenceonComputational Linguistics:Posters,2010,pp.241–249. L.Jiang,M.Yu,M.Zhou,X.Liu,andT.Zhao,‘Target-dependenttwittersentiment classification’,inProceedingsoftheAnnualMeetingoftheAssociationforComputational Linguistics:HumanLanguageTechnologies,2011,vol.1,pp.151–160. Q.Su,K.Xiang,H.Wang,B.Sun,andS.Yu,‘Usingpointwisemutualinformationtoidentify implicitfeaturesincustomerreviews’,inComputerProcessingofOrientalLanguages.Beyond theOrient:TheResearchChallengesAhead,Springer,2006,pp.22–30. S.Volkova,T.Wilson,andD.Yarowsky,‘Exploringsentimentinsocialmedia:Bootstrapping subjectivitycluesfrommultilingualtwitterstreams.’,inProceedingsofAnnualMeetingofthe AssociationofComputationalLinguistics,2013,pp.505–510. P.Nakov,Z.Kozareva,A.Ritter,S.Rosenthal,V.Stoyanov,andT.Wilson,‘Semeval-2013task 2:Sentimentanalysisintwitter’,inProceedingsoftheInternationalWorkshoponSemantic Evaluation,2013. A.Cui,M.Zhang,Y.Liu,andS.Ma,‘Emotiontokens:Bridgingthegapamongmultilingual twittersentimentanalysis’,inInformationretrievaltechnology,Springer,2011,pp.238–249. C.Monson,A.F.Llitjós,R.Aranovich,L.Levin,R.Brown,E.Peterson,J.Carbonell,andA. Lavie,‘BuildingNLPsystemsfortworesource-scarceindigenouslanguages:Mapudungunand Quechua’,Strateg.Dev.Mach.Transl.Minor.Lang.,p.15,2006. C.Banea,R.Mihalcea,andJ.Wiebe,‘Abootstrappingmethodforbuildingsubjectivity lexiconsforlanguageswithscarceresources.’,inProceedingsofLanguageResourcesand EvaluationConference,2008,vol.8,pp.2–764. A.Bakliwal,P.Arora,andV.Varma,‘Hindisubjectivelexicon:AlexicalresourceforHindi polarityclassification’,inProceedingsofLanguageResourcesandEvaluationConference, 2012,pp.1189–1196. S.ChowdhuryandW.Chowdhury,‘PerformingsentimentanalysisinBanglamicroblogposts’, inProceedingsofInternationalConferenceonInformatics,Electronics&Vision,2014,pp.1–6. M.SouzaandR.Vieira,‘Sentimentanalysisontwitterdataforportugueselanguage’,in ComputationalProcessingofthePortugueseLanguage,Springer,2012,pp.241–247. S.Thomas,M.L.Seltzer,K.Church,andH.Hermansky,‘Deepneuralnetworkfeaturesand semi-supervisedtrainingforlowresourcespeechrecognition’,inProceedingsofIEEE InternationalConferenceonAcoustics,SpeechandSignalProcessing,2013,pp.6704–6708. [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] Y.Qian,D.Povey,andJ.Liu,‘State-leveldataborrowingforlow-resourcespeechrecognition basedonsubspaceGMMs.’,inProceedingsofAnnualConferenceoftheInternationalSpeech CommunicationAssociation,2011,pp.553–560. V.Ambati,S.Vogel,andJ.G.Carbonell,‘Activelearningandcrowd-sourcingformachine translation.’,inProceedingsofLanguageResourcesandEvaluationConference,2010,vol.1, p.2. A.IrvineandC.Callison-Burch,‘Combiningbilingualandcomparablecorporaforlowresource machinetranslation’,inProceedingsoftheEighthWorkshoponStatisticalMachine Translation,2013,pp.262–270. P.D.Turney,‘MiningtheWebforsynonyms:PMI-IRversusLSAonTOEFL’,Lect.Notes Comput.Sci.,pp.491–502,2001. P.D.Turney,‘Thumbsuporthumbsdown?:semanticorientationappliedtounsupervised classificationofreviews’,inProceedingsofAnnualMeetingoftheAssociationof ComputationalLinguistics,2002,pp.417–424. S.T.Dumais,G.W.Furnas,T.K.Landauer,S.Deerwester,andR.Harshman,‘Usinglatent semanticanalysistoimproveaccesstotextualinformation’,inProceedingsoftheSpecial InterestGrouponComputer-HumanInteractionconference,1988,pp.281–285. R.Ghani,R.Jones,andD.Mladenić,‘Miningthewebtocreateminoritylanguagecorpora’,in ProceedingsoftheInternationalConferenceonInformationandKnowledgeManagement, 2001,pp.279–286. M.Souza,R.Vieira,D.Busetti,R.Chishman,andI.M.Alves,‘Constructionofaportuguese opinionlexiconfrommultipleresources’,inProceedingsoftheBrazilianSymposiumin InformationandHumanLanguageTechnology,2011,pp.59–66. M.J.Silva,P.Carvalho,C.Costa,andL.Sarmento,‘Automaticexpansionofasocialjudgment lexiconforsentimentanalysis’,2010. J.Elming,D.Hovy,andB.Plank,‘Robustcross-domainsentimentanalysisforlow-resource languages’,inProceedingsofAnnualMeetingofAssociationforComputationalLinguistics, 2014,pp.2–7. L.Deng,G.Hinton,andB.Kingsbury,‘Newtypesofdeepneuralnetworklearningforspeech recognitionandrelatedapplications:Anoverview’,inProceedingsofIEEEInternational ConferenceonAcoustics,SpeechandSignalProcessing,2013,pp.8599–8603. D.Povey,L.Burget,M.Agarwal,P.Akyazi,F.Kai,A.Ghoshal,O.Glembek,N.Goel,M. Karafiát,andA.Rastrow,‘ThesubspaceGaussianmixturemodel—Astructuredmodelfor speechrecognition’,Comput.SpeechLang.,vol.25,no.2,pp.404–439,2011. D.M.Blei,A.Y.Ng,andM.I.Jordan,‘Latentdirichletallocation’,J.Mach.Learn.Res.,vol.3, pp.993–1022,2003. S.L.Lo,E.Cambria,R.Chiong,andD.Cornforth,‘Amultilingualsemi-supervisedapproachin derivingSinglishsenticpatternsforpolaritydetection’,Knowl.-BasedSyst.,2016. J.Read,‘Usingemoticonstoreducedependencyinmachinelearningtechniquesfor sentimentclassification’,inProceedingsoftheAssociationforComputationalLinguistics StudentResearchWorkshop,2005,pp.43–48. A.Go,R.Bhayani,andL.Huang,‘Twittersentimentclassificationusingdistantsupervision’, CS224NProj.Rep.Stanf.,pp.1–12,2009. S.L.Lo,R.Chiong,D.Cornforth,andY.Bao,‘Anunsupervisedmultilingualapproachfor identifyinghigh-valuetopicsonTwitter’,WorkingPaper,2016. E.Cambria,‘Affectivecomputingandsentimentanalysis’,IEEEIntell.Syst.,vol.31,no.2,pp. 102–107,2016. E.Cambria,D.Rajagopal,K.Kwok,andJ.Sepulveda,‘GECKA:gameengineforcommonsense knowledgeacquisition’,inProceedingsofAAAIFLAIRSConference,2015,pp.282–287. [98] E.Cambria,J.Fu,F.Bisio,andS.Poria,‘AffectiveSpace2:Enablingaffectiveintuitionfor concept-levelsentimentanalysis’,inProceedingsofAAAIConferenceonArtificialIntelligence, 2015,pp.508–514.
© Copyright 2026 Paperzz