”Inside Jokes : Identifying Humorous Cartoon Captions” Mbonteh Ekuh Jude htw saar – Hochschule für Technik und Wirtschaft des Saarlandes Seminar “Internet in den Medien” Sommersemester 2016 Abstract—A funny joke doesn’t just make us laugh, but also benefits our brain in multiple ways. Motivated by the idea of creating a computational model for humor, which can detect funny cartoon captions, a group of researchers at Microsoft together with a cartoonist at The New Yorker magazine teamed up to understand the language used in cartoon captions and the features that make these captions to be deemed funny. The study makes use of a large collection of crowd-sourced cartoon caption data, submitted for the New Yorker cartoon contest. Firstly, we would take a look at how judgments were acquired based on humorousness of different captions. Using the same cartoon, the captions classified as funnier would then be paired with the less funny, the caption pairs are then examined and the major differences between the funny and the less funny cartoon captions are sorted, based on this information, a classifier is then built, which can identify funnier captions automatically. Given captions based on the same joke, the classifier picks the funnier caption 69% of the time and 64% for any pair of captions. Finally the classifier is used by the cartoon contest judge in finding the best captions, thus significantly reducing the workload of the cartoon contest judge. I. I NTRODUCTION Humor plays a very import role in multiple context, ranging from a better interaction in social gatherings, increasing attention span and much more, making it a vital tool in education and also can be of huge importance in advertisement when used correctly. But humor is mainly a human trait, and it being mainly a human trait makes it varies across different cultural and religious background. So far a lot have been done in fields like psychology and linguistic to better understand how humor works and possibilities of how this knowledge can be used to create actual humor models, as of yet very little work has been done in field of computer sciences on developing humor models, making it a new and exciting challenge for computer scientist. In this report we are going to identify and examine features of humorous cartoon captions pairs. The experiment is based on a large crowd source data from the New Yorker magazine weekly contest. The New Yorker magazine holds a weekly cartoons caption contest , where readers are given a cartoon are asked to submit captions suggestion for the given cartoon , the submitted captions are then analyzed and the judge selects a shortlist of the funniest captions and members of the editorial staff narrow it down to three finalists. All three finalists captions are then published in a later issue, and readers vote for their favorites. Figure 1. Sample prototype diagram of the classifier, having caption inputs vary with respect to same or different joke Study Main Contributions The contributions are results of a study conducted by a group of researchers ,and this paper is mainly based on thees results ,goal of study was the construction of a computational model for predicting relative humorousness of cartoon captions. A joke classifier (Figure 1) was built for predicting the percentage of humorousness of captions without deep image analysis and text understanding, by leveraging human tagging of scenes and automated analysis of linguistic features. The research [1] was conducted by : Robert Mankoff a cartoonist at The New Yorker Dafna Shahaf researcher at Microsoft Eric Horvitz researcher at Microsoft Michael Cavan writer at The Washington Post For conducting the experiment, a data-set of crowd-sourced captions from the New Yorker competition was used. Also with the help of human judgments, different variations of the same jokes where identify and analyzed, to find factors affecting the level of perceived humor. Jokes within a caption where also automatically identified, and the intensity of humor of different jokes and their relation to the cartoon quantified. A classifier was constructed, which when given any two captions and a particular cartoon, can determine which caption is funnier. The classifier achieves 69% accuracy for captions hinging on the same joke, and 64% accuracy comparing any two Feature %Funnier is higher Perlexity (1-gram) 0.45** Perlexity (2-gram) 0.49** Perlexity (3-gram) 0.46* Perlexity (4-gram) 0.46* POS Perplexity (1-gram) 0.45* POS Perplexity (2-gram) 0.52 POS Perplexity (3-gram) 0.5 POS Perplexity (4-gram) 0.51 Sentiment 0.61* Readability 0.57**. 0.56* Proper Nouns 0.48 Indefinite articles 0.43 3rd Person 0.58 Location (quarters) 0.53,0.42*,0.49,0.55* Table I D IFFERENT JOKES : P ERCENTAGE OF CAPTION PAIRS IN WHICH A FEATURE HAS HIGHER NUMERICAL VALUE IN THE FUNNIER CAPTION . ) Figure 2. Example cartoon from the New Yorker contest, with the shortlist of submitted captions. captions. A Swiss-system tournament was then implemented that ranks all captions. On average, all of the judge’s top-10 captions were ranked in the top 55.8%, making it obvious that the methods can be used to significantly reduce the workload faced by judges in selecting the best caption during a cartoon caption contest. one ranked by 30-35 workers, for a total of 1016 tasks and 10,160 pairwise rankings. Only 35% of the unique pairs that were ranked by at least five people achieved 80% agreement, resulting in 754 pairs. Indicating the difficulty of the task. B. Hypotheses: Humor Indications Features To make the task of finding the funny captions a machine learning problem ,features that could help determine if a caption is funny or not were selected. These features were then evaluated and assigned percentages corresponding to their respective humor predictive power , results of the findings on features and their respective percentage on the funny scale are presented on Table 1 Table II. 1) Unusual Language: Based on the hypothesis that funny captions may turn to make use of unusual language, a language model function was implemented to test the validity of this supposition. The idea of testing if a caption made use of language in a more unusual manner was examined using a language model function, which takes an input string (caption) , then outputs the predicted probability of the caption being on the unusual language side based on a gram scale . The language model was tested on 2GB of ClueWeb data [13],meant to represent common language. The distinctiveness of the used language was computed on a gram based model with various degree of perplexity. 1-grams : low perplexity 4-grams : high perplexity ,and with 2-gram and 3-gram somewhere in between 1 and 4. Interestingly, the higher the perplexity, the more the caption stand out, but surprisingly , funnier cartoon captions tend to use less-distinctive(low perplexity)language, reinforcing the idea of keeping the caption simple and readable, because with high perplexity one may even miss out on the joke completely, II. A NALYZING AND C OMPARING C APTIONS For the purpose of the Study 16 New Yorker cartoons along with their submitted captions were analyzed, similar cartoon captions data-set where grouped using the algorithm from King et al. [2]. After the captions were grouped, the three largest clusters for each cartoon was picked, 10 entries were then selected from each cluster, with the number of words in five of the captions ranging between 5 to 6 words and the other five captions ranging between 9 to 10 words, the above word rang was chosen based on the median word length of the submitted captions Figure 2 shows an example of a cartoon and some 9 captions submitted for the sample cartoon [3]. In the cartoon, a car salesperson attempts to sell a strange hybrid creature that appears to be part car, part animal. The judge’s short-list of captions appears under the cartoon.. A. Acquiring Judgment In finding out what makes some captions funnier, humor judgments where obtained from recruited crowd workers via Mechanical Turk ( an online outsourcing platform). The crowd workers where asked to compare randomly drawn caption from a batch and asked to rate which caption was funnier. Each answer provided a 10 pairwise comparisons (#1 is funnier than #2, #3, #4 and #5. #2 is funnier than #3, #4 and #5 and so on). Pairs of captions that achieved high agreement among the rankers were selected (80% agreement or more, similar length, ranked by at least five people). Using 30 batches, each 2 say you don’t understand the principal joke phrase due to the use of unusual language. Similar to unusual language (Perplexity),The part of speech perplexity (POS Perplexity) feature was achieved by replacing words in the caption with their corresponding parts of speech ,modifying the caption syntactically but keeping it’s semantic property. Again funny captions tend to have less distinct grammar based on the gram scale . 2) Taking Expert Advice: In an article written by one of the New Yorker contest winner, by name Patrick House, he really emphasizes on the use of simple language [15]. He advises contestants on the use of , c̈ommon, simple, monosyllabic wordsänd toS̈teer clear of proper nouns that could potentially alienate .̈ Based of this advice a computational model was built that determines the readability of caption based on an index function and the results from examining features using this model clearly shows that the readability of a caption scored significantly higher percentage on the funnier scale especially in the case of caption hinging on the same joke whereas joke location feature which we would later take a closer look at was more relevant in the case of captions hinging on different jokes 3) Joke Location: Taking the location of the main joke phrase into consideration , it was import to know the impact of joke location in a caption sentences . so the sentences were partitioned into 4 quarters, quarter 1 being the beginning of the caption , it was interesting to know (Table 1) that less funny captions mostly had their joke phrase at the second quarter , that’s almost at the beginning of the sentences, whereas placing the joke location at the end was the most effective strategy. 4) Sentiment: A caption is considered to have negative sentiment, when it consist of negative words(bad, terrible, mistake, not), so for finding out the usefulness of negative words in a caption, results from an early study On humor conducted by Michalcea and Pulman [5] was used as a starting point, The Study found out Humor was mostly associated with negative orientation ( negative words ),But surprisingly in our experiment based of the input from the crowd-workers, positive cartoon captions appear funnier when compared to negative captions contradicting the results of earlier research. Possible reason for the contrast could be based on the preferences of the crowd-workers, since their response where mainly based on their humor preference and as we know, humor varies from person to person based on factors like religious or cultural background , another reason could also be the inability of the sentiment analyzer in detecting the negative words in a caption Figure 3. Cartoon from the New Yorker contest and the tags associated with it. One list describes the general setting, or context, of the cartoon. The other describes its anomalies. to examine different jokes, since it’s going to be a much challenging task finding the relationship between the cartoon and it’s caption, talk less of further narrowing it down to finding the main joke phrase in the caption. In finding the joke phrase for different jokes, the word with the least perplexity (highest distinctiveness) were selected with respect to the 4-gram model mentioned earlier . For words belonging to a particular phrase, the entire phrase was selected , the method proofed very accurate at finding the main joke in a caption. looking at Figure 4 for an example. The red phrases are the ones identified through perplexity,and the purple ones were out-of-vocabulary words another useful way to identify the joke. Typos are often picked up as out-of-vocabulary words. If they happen to spell a real word, they are sometimes picked up as low-perplexity words (see, for example, the ” anti-theft devise,̈ including the misspelling of d̈evice¨). Also, note that our algorithm misses several expressions, e.g., ẗigerı̈n the tank”. IV. S UMMARY OF R ESULTS Based on the experiments and analyses of various features , Table 1 and 2 shows the results obtained from examining features for same joke and different jokes respectively . Taking a clear look at the statistics from the above tables . we can clearly notice the features which scored a better percentages on the funny cartoon caption scale. From a lexical point of view, funny captions used simpler language, which made them more readable , This feature had almost the same magnitude for both same and different joke, but although readability and sentiment both played a role in both scenarios. they were no III. S AME J OKES VS D IFFERENT J OKES There were some main differences in analyzing and Comparing same jokes and different, because unlike in the case of same jokes, where the captions examined revolved around the same humor context, For example in Figure 2 , all 9 captions revolving around the idea of the car being a hybrid. On the other hand the captions used on different jokes could varied completely in context, making it even more difficult 3 Feature %Funnier is higher Perplexity (1-gram) 0.48 Perplexity (2-gram) 0.51* Perplexity (3-gram) 0.52* Perplexity (4-gram) 0.4*** POS Perplexity (1-gram) 0.45* POS Perplexity (2-gram) 0.5 POS Perplexity (3-gram) 0.5 POS Perplexity (4-gram) 0.51 Indefinite articles 0.43 3rd Person 0.45** Min Perplexity 0.46***** Length of Joke Phrase 0.42***** Location (quarters) Proper Nouns 0.46,0.41*****,0.46*,0.43**** 0.35***** Sentiment 0.49 Readability 0.52. 0.51 Similarities ( context, anomaly, maxdiff, avgdiff) 0.48*,0.48,0.47**, 0.47** Table II D IFFERENT JOKES : P ERCENTAGE OF CAPTION PAIRS IN WHICH A FEATURE HAS HIGHER NUMERICAL VALUE IN THE FUNNIER CAPTION .) Figure 4. Identifying joke phrases: Phrases in red have low perplexity. Phrases in purple do not appear in our vocabulary, and are often useful for find ing the joke (as well as typos). down to a smaller size,thus helping the judges to easily find the best captions in lesser time. longer of very great significance in the case of different jokes as they were in that of same jokes. rather proper nouns and 3rd person words became very significant for different jokes, with funnier captions using significantly less of both. same joke captions use similar proper nouns most of the time. The situation is different when considering multiple jokes. Since many captions rely entirely on proper nouns for the in order to understand joke. For example , looking at sample captions for the cartoon in Figure 3, two of the captions read : Oh my God, Florenz Ziegfeld has gone to heaven! I’m afraid Mr. Burns is gone for the afterlife. A possible explanation for the poor results for the caption, is that names which are not universally recognizable may leave readers confused , while making use of proper nouns solves this issue An improved version for the caption using personal pronouns could read : Oh my God, he has gone to heaven! I’m afraid he is gone for the afterlife. To conduct this test , the classifier was used in a tournament algorithm, pairing captions against each other to find the best ones. With the goal being to find a set of the best captions, A non elimination tournament algorithm was employed by choosing the Swiss-system tournament method. In this method, competitors are paired randomly in the first round; in the next rounds, they are paired based on their performance so far. The competitors were grouped by total points and add edges between players who have not been matched yet. This ensures that no two competitors face each other more than once. The Blossom algorithm was then use to compute a maximal matching. For each cartoon, The classifier was trained using data on all other cartoons. This is meant to simulate a new contest. Whenever two captions were paired, The classifier’s predictions were then computed . Since many pairs of captions are incomparable . A victory was worth 3 points, and a tie was worth 1 point. Figure 5 shows some of the top-ranking (left) and bottomranking (right) captions for the cartoon in Figure 1. Many of the captions that were ranked at the very bottom were very long (15-42 words each); to make the comparison between the captions easy, only captions of length 5-10 where used. just by replacing the names with personal pronouns , could significantly increases the humor value of the caption V. U SE C ASE Finally a classifier was built , which can compare pairs of cartoon captions. The main goal was actually to use this computational model to reduce the workload of judges, during a cartoon caption contest, saving them the time of manually going through thousands of captions. so the classifier was tested to see if it could explore and filter caption submissions, 4 Figure 5. Tournament results: High-ranking (left) and low-ranking (right) captions for the cartoon in Figure only captions of length 5-10 are used. VI. R ELATED WORKS language used, and how technical details were elaborated Unlike in other fields like linguistics and psychology , The study of humor in computer sciences is still in an infancy stage and only few prototypes can be found on computational humor .In this section, we would have a brief review of research contributions from all of these fields. Psychology: Several studies in psychology have focused specifically on cartoons. For example, creativity was found to have a link to the ability to generate cartoon captions [7]. Jones et al. demonstrated the importance of a relationship between picture and caption . Subjects viewed picture and caption either simultaneously or sequentially (picture appearing five seconds before the caption, or vice versa). More vigorous changes in heart rate where detected when the elements were shown in sequence. Linguistics : There has also been plenty of research on theories of humor and linguistics of humor [8]. with most studies attempting to develop symbolic relationships and linguistics Computer Science: The JAPE punning-riddle generator[9] is one of the first attempts of computational humor , focusing on automatically generating humorous acronyms. Humor was achieved by mixing terms from different domains, exploiting incongruity theory. On the side of humor understanding. 1) Audience: It’s important to point out that the main audience of the paper are topic experts, or individuals with a well grounded background on the subject matter, whereas the main audience of the Media is the general public ,this is clearly noticeable in the style of writing. The research paper goes into details and explanations the methods and techniques used in the course of the experiment and the result obtained are presented in a scientific manner, on the other hand the Media simply tries to give the general public an overview of what the study was all about and the results and the impact these results could have. 2) Language and Technical Details: Having different audiences in mind, the writing style clearly varies in both scenarios. For example a text from : paper: ”We developed a classier that could pick the funnier of two captions 64% of the time and used it to find the best captions, significantly reducing the load on the cartoon contest’s judges” media : ”The tale’s takeaway is intended to be that computers, no matter how knowledgeable, will never be able to fathom the linguistic kinks and quirks of the human mind.” Clearly the paper makes use of facts and statistics, also words like algorithm and computer science related terms are used and explained in detail in the paper , while all these scientific details are is completely omitted from the media , making easy and possible for the general public to have a big picture of what the research was all about. VII. A RTICLE P RESENTATION IN M EDIA VS PAPER Michael Cavan wrote an article based on the result from the research paper ”Inside Jokes : Identifying humorous cartoon captions”. The article was published by The Washington Post under the title ”Can robots now write quality joke? How the world is becoming less laughable” [10]. In this section we would take a look at the manner in which the article was presented in the research paper in contrast to the presentation style in the media, In both scenarios we would look at who the target audience is , VIII. C ONCLUSION AND F UTURE W ORK The challenge of learning to find the degree of humor perceived for combinations of captions and cartoons was examined. An important set of features from linguistic properties of captions and the interplay between captions and pictures was extracted, with large amount of crowd sourced 5 cartoon captions data from the New Yorker, a classifier was implemented that could pick the funnier of two captions 64 % of the time, and this classifier was then to help cartoon contest Judges in finding the best caption, significantly reducing the load on the cartoon contest judges. The framework developed can serve as a basis for further research in computational humor which is still in it’s infancy stage. Future directions of work can include more detailed analysis of the context and anomalies represented in the cartoons, as well as their influence on setting up tension, questions, or confusion. Another interesting domain is humor generation for captions and cartoons. In principle, the classifier could be use to try achieve this goal and improve an existing caption, making it possible to identify weaknesses of the caption, and suggest improvements (for example, replacing a word by a simpler synonym). Furthermore,it would be important to explore the generative process employed by people in coming up with captions. For example, by trying to understand the influence of the visual salience of the core context and anomalies of a scene on the focus and linguistic structure of a joke. Finally, humor being mainly a human trait making it vary across different cultural and religious background. It would be an interesting challenge to explore opportunities to personalize the classifier with respect to personal taste of humor. Automated generation and recognition of humor could be useful to modulate attention, engagement, and retention of concepts, and thus has numerous interesting applications, including use in education, health, engagement, and advertising. Acknowledgments : The content of this work is mainly based on the research of the authors mentioned above R EFERENCES [1] D. Shahaf, E. Horvitz and R. Mankoff Inside Jokes : Identifying Humorous Cartoon captions, KDD, 2015 [2] B. King, R. Jha, D. R. Radev and R. Mankoff Random walk factoid annotation for collective discourse., In ACL, page 249-254 , 2013 [3] Tesla stock moves on april fools’ joke, http://blogs.wsj.com/moneybeat/2015/04/01/tesla-stock-moves-onapril-fools-joke, In ACL, page 249-254 , 2015 [4] E. Gabrilovich, M. Ringgaard, and A. Subramanya Facc1: Freebase annotation of clueweb corpora , 2013 [5] P. House How to win the new yorker cartoon caption , 2008 [6] R. Mihalcea and S. Pulman Characterizing humour: An exploration of features in humorous texts. In Computational Linguistics and Intelligent Text Processing, pages 337 - 347. Springer , 2007 [7] D. M. Brodizinsky and J. Rubien Humor production as a function of sex of subject, creativity, and cartoon content. Journal of consulting and Clinical Psychology, 44(4):597, 1976. [8] K. Binsted and G.Ritchie An implemented model punning riddles . In AAAI’94 , 1994 [9] S. Binsted Linguistic Theories of Humor. Approaches to Semiotics. Mouton de Gruyter, , 1994 [10] Can robots now write quality joke? How the world is becoming less laughable, https://www.washingtonpost.com/news/comicriffs/wp/2015/09/02/can-robots-now-write-quality-jokes-how-thenotion-is-becoming-less-laughable/, Michael Cavan, September 2, 2015 6
© Copyright 2026 Paperzz