Linguistic Steganography - Cambridge Computer Lab

Linguistic Steganography: Information Hiding in Text
Stephen Clark
with Ching-Yun (Frannie) Chang
University of Cambridge Computer Laboratory
Luxembourg, September 2013
Intro
Ling Steg
Lex Sub
Information Hiding
My friend Bob, until yesterday I was using binoculars for stargazing.
Today, I decided to try my new telescope. The galaxies in Leo and
Ursa Major were unbelievable! Next, I plan to check out some nebulas
and then prepare to take a few snapshots of the new comet. Although
I am satisfied with the telescope, I think I need to purchase light
pollution filters to block the xenon lights from a nearby highway to
improve the quality of my pictures. Cheers, Alice.
Linguistic Steganography
2
Intro
Ling Steg
Lex Sub
Information Hiding
My friend Bob, until yesterday I was using binoculars for stargazing.
Today, I decided to try my new telescope. The galaxies in Leo and
Ursa Major were unbelievable! Next, I plan to check out some nebulas
and then prepare to take a few snapshots of the new comet. Although
I am satisfied with the telescope, I think I need to purchase light
pollution filters to block the xenon lights from a nearby highway to
improve the quality of my pictures. Cheers, Alice.
mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpft
btxlfanhtitqompca
Linguistic Steganography
3
Intro
Ling Steg
Lex Sub
Information Hiding
My friend Bob, until yesterday I was using binoculars for stargazing.
Today, I decided to try my new telescope. The galaxies in Leo and
Ursa Major were unbelievable! Next, I plan to check out some nebulas
and then prepare to take a few snapshots of the new comet. Although
I am satisfied with the telescope, I think I need to purchase light
pollution filters to block the xenon lights from a nearby highway to
improve the quality of my pictures. Cheers, Alice.
mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpft
btxlfanhtitqompca
π = 3.141592653589793 . . .
buubdlupnpsspx
Linguistic Steganography
4
Intro
Ling Steg
Lex Sub
Information Hiding [Fridrich, 2010]
My friend Bob, until yesterday I was using binoculars for stargazing.
Today, I decided to try my new telescope. The galaxies in Leo and
Ursa Major were unbelievable! Next, I plan to check out some nebulas
and then prepare to take a few snapshots of the new comet. Although
I am satisfied with the telescope, I think I need to purchase light
pollution filters to block the xenon lights from a nearby highway to
improve the quality of my pictures. Cheers, Alice.
mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpft
btxlfanhtitqompca
π = 3.141592653589793 . . .
buubdlupnpsspx
attack tomorrow
Linguistic Steganography
5
Intro
Ling Steg
Lex Sub
Steganography
• Steganography is a branch of security concerned with hiding
information in some cover medium
• Use of images for hiding information has been extensively studied
• Make changes to an image so that the changes are imperceptible
to an observer
• The resulting image encodes the message
Linguistic Steganography
6
Intro
Ling Steg
Lex Sub
Steganography
• Steganography is a branch of security concerned with hiding
information in some cover medium
• Use of images for hiding information has been extensively studied
• Make changes to an image so that the changes are imperceptible
to an observer
• The resulting image encodes the message
• A related area is watermarking, which is concerned with hiding
information for the purposes of identification (e.g. copyright)
• or e.g. identifying Google translations
Linguistic Steganography
6
Intro
Ling Steg
Lex Sub
The Cover Medium
• Advantages of images
• local changes can maintain global properties of the image
• easy to make changes which are imperceptible to a human
• Disadvantages of images
• sender needs an image
• sender needs to transmit image to the receiver
• Text is everywhere - why not conceal information in a cover text?
Linguistic Steganography
7
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Cover text:
Which is why, some would say, it’s slightly odd that when no less an
authority than the chairman of the Financial Services Authority, Lord
Turner, questions the social utility of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s curious that
the chancellor of the exchequer (who could use a bob or two) doesn’t
lick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
8
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the chairman of the Financial Services Authority, Lord
Turner, questions the social utility of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s curious that
the chancellor of the exchequer (who could use a bob or two) doesn’t
lick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
9
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social utility of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s curious that
the chancellor of the exchequer (who could use a bob or two) doesn’t
lick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
10
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social usefulness of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s curious that
the chancellor of the exchequer (who could use a bob or two) doesn’t
lick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
11
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social usefulness of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s curious that
the chancellor of the exchequer (who could use a bob or two) doesn’t
lick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
12
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social usefulness of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s strange that
the chancellor of the exchequer (who could use a bob or two) doesn’t
lick his chops and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
13
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social usefulness of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s strange
that the chancellor of the exchequer (who could use a bob or two)
doesn’t lick his lips and demand a bit of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
14
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Data Embedding:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social usefulness of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s strange
that the chancellor of the exchequer (who could use a bob or two)
doesn’t lick his lips and demand a piece of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
15
Intro
Ling Steg
Lex Sub
Example using Lexical Substitution
• Stego Text:
Which is why, some would say, it’s fairly odd that when no less an
authority than the president of the Financial Services Authority, Lord
Turner, questions the social usefulness of much activity in financial
markets, and also suggests that it might be no bad thing to levy a tiny
Tobin tax on all this frenetic trading in electrons, well it’s strange
that the chancellor of the exchequer (who could use a bob or two)
doesn’t lick his lips and demand a piece of that.
• Secret bitstring: 0 1 1 0 0 0 1 0
Linguistic Steganography
16
Intro
Ling Steg
This Talk
• Joint work with Frannie Chang
• Outline:
•
•
•
•
more introduction to linguistic steganography
a stegosystem based on lexical substitution
a secret sharing scheme based on adjective deletion
online demo
Linguistic Steganography
Lex Sub
17
Intro
Ling Steg
Lex Sub
This Talk
• Joint work with Frannie Chang
• Outline:
•
•
•
•
more introduction to linguistic steganography
a stegosystem based on lexical substitution
a secret sharing scheme based on adjective deletion
online demo
• Motivation:
• can simple NLP methods deliver a practical steganography system?
• interesting research area at the intersection of Natural Language
Processing and Computer Security
Linguistic Steganography
17
Intro
Ling Steg
Lex Sub
Linguistic Steganography
• Some existing work, but very little compared to images
• Concerned with linguistic transformations, rather than superficial
properties of the text (e.g. white spaces)
• Difficulty is that local changes can lead to inconsistencies:
• ungrammatical or unnatural sentences
• grammatical, natural sentences which lack coherence with respect
to the rest of the document (or the world)
Linguistic Steganography
18
Intro
Ling Steg
Linguistic Steganography Framework
Linguistic Steganography
Lex Sub
19
Intro
Ling Steg
Lex Sub
Linguistic Steganography Framework
• Assume an existing cover text which will be modified (rather than
generated from scratch)
Linguistic Steganography
19
Intro
Ling Steg
Lex Sub
Linguistic Steganography Framework
• Note that the receiver does not need a copy of the cover text
(just the code dictionary for lexical substitution)
Linguistic Steganography
20
Intro
Ling Steg
Linguistic Steganography Framework
• Trade-off between imperceptibility and payload
Linguistic Steganography
Lex Sub
21
Intro
Ling Steg
Possible Linguistic Transformations
• Lexical (e.g. synonym substitution)
• Syntactic (e.g. passive/active transformation)
• Semantic/pragmatic
Linguistic Steganography
Lex Sub
22
Intro
Ling Steg
Possible Linguistic Transformations
• Lexical (e.g. synonym substitution)
• Syntactic (e.g. passive/active transformation)
• Semantic/pragmatic
• Can the transformations be applied reliably and often?
Linguistic Steganography
Lex Sub
22
Intro
Ling Steg
Simple Lexical Stegosystem (Winstein, 98)
Linguistic Steganography
Lex Sub
23
Intro
Ling Steg
Sense Ambiguity Problem
• Decoding ambiguity
⇒ use a novel form of vertex coding (later in talk)
Linguistic Steganography
Lex Sub
24
Intro
Ling Steg
Lex Sub
Security Simplifications
• Assuming that the adversary is not a computer (i.e. ignoring the
possibility of steganalysis)
• Assuming that the adversary is passive rather than active
• Ignoring the source of the cover text
• Assuming that the adversary does not know the steganographic
channel (Kerckhoff’s principle)
• but opportunities for secret shared keys
Linguistic Steganography
25
Intro
Ling Steg
Lex Sub
Lexical Substitution Problem
The idea is a powerful one → The idea is a potent one
This computer is powerful → This computer is potent
• Some synonyms are not acceptable in context
⇒ need to check whether a synonym is applicable in a given
context (to ensure imperceptibility)
Linguistic Steganography
26
Intro
Ling Steg
Lex Sub
Checking Synonym Applicability
• Use the Google n-gram corpus to see if the synonym in context
has been used before (and frequently)
• Now a fairly standard NLP technique which has been used for
many similar lexical disambiguation tasks
Linguistic Steganography
27
Intro
Ling Steg
Lex Sub
Paradigm Shift in NLP
• 30 years ago statistical, corpus-based methods began to appear
• Now the dominant approach for all NLP problems (e.g. Google
translate)
Linguistic Steganography
28
Intro
Ling Steg
Lex Sub
The Google n-gram Corpus
the
the
the
the
the
the
the
the
the
Linguistic Steganography
part
part
part
part
part
part
part
part
part
that you were
that you will
that you wish
that you would
that your read
the Riverside County
the United States
the detective was
the next day
103
198
171
867
45
51
72
63
95
29
Intro
Ling Steg
Lex Sub
Contextual Check
He was bright and independent and proud →
He was clever and independent and proud
f2 = 302, 492
f3 = 8, 072
f4 = 343
f5 = 0
Linguistic Steganography
was clever
clever and
He was clever
was clever and
clever and independent
He was clever and
was clever and independent
clever and independent and
He was clever and independent
was clever and independent and
clever and independent and proud
40,726
261,766
1,798
6,188
86
343
0
0
0
0
0
30
Intro
Ling Steg
Lex Sub
Contextual Check
He was bright and independent and proud →
He was clever and independent and proud
P
Count(w) = n log(fn )
max is the highest n-gram Count for any synonym
Score(w) = Count(w)/max
If Score(w) ≥ threshold , w passes the contextual check
Count(clever ) = log(f2 ) + log(f3 ) + log(f4 ) + log(f5 ) = 28
Score(clever ) = 28/max = 0.9
Linguistic Steganography
31
Intro
Ling Steg
Extensions to the Contextual Check
• Weight some n-grams more heavily than others
• Use wild-cards for unknown words
• ...
• ⇒ difficult to beat the basic system
Linguistic Steganography
Lex Sub
32
Intro
Ling Steg
Lex Sub
Evaluation
• Automatic evaluation using data from Lexical Substitution Task
(McCarthy and Navigli, Semeval 2007)
• Manual human evaluation of naturalness of the modified text
• more direct evaluation of imperceptibility for the steganography
application
• We use WordNet as the source of possible substitutes
Linguistic Steganography
33
Intro
Ling Steg
Lex Sub
WordNet
WordNet Search - 3.1
http://wordnetweb.princeton.edu/perl/webwn?s=newspaper&...
WordNet Search - 3.1
- WordNet home page - Glossary - Help
Word to search for:
Display Options:
newspaper
(Select option to change)
Search WordNet
Change
Key: "S:" = Show Synset (semantic) relations, "W:" = Show Word (lexical) relations
Display options for sense: (gloss) "an example sentence"
Noun
S: (n) newspaper, paper (a daily or weekly publication on folded sheets;
contains news and articles and advertisements) "he read his newspaper at
breakfast"
S: (n) newspaper, paper, newspaper publisher (a business firm that
publishes newspapers) "Murdoch owns many newspapers"
S: (n) newspaper, paper (the physical object that is the product of a
newspaper publisher) "when it began to rain he covered his head with a
newspaper"
S: (n) newspaper, newsprint (cheap paper made from wood pulp and used
for printing newspapers) "they used bales of newspaper every day"
Linguistic Steganography
34
Intro
Ling Steg
Lex Sub
Human Evaluation
• Evaluate imperceptibility by asking humans to rate naturalness of
sentences (1–4), in 3 conditions:
• sentence unchanged
• sentence changed by our system (with threshold of 0.95)
• sentence changed by random choice of target word and random
choice of substitute from target word’s synsets (baseline)
• Sentences are from Robert Peston’s BBC blog
• On average around 2 changes are made per sentence
Linguistic Steganography
35
Intro
Ling Steg
Lex Sub
Example Sentences
ORIG: Apart from anything else, big companies have the size and muscle to
derive gains by forcing their suppliers to cut prices (as shown by the furore
highlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn for a 2.5% rebate from its suppliers); smaller businesses lower down the food
chain simply don’t have that opportunity.
SYSTEM: Apart from anything else, large companies have the size and
muscle to derive gains by pushing their suppliers to cut prices (as evidenced
by the furore highlighted in yesterday’s Telegraph over Serco’s need - now
withdrawn - for a 2.5% rebate from its suppliers); smaller businesses lower
down the food chain simply don’t have that opportunity.
Linguistic Steganography
36
Intro
Ling Steg
Lex Sub
Example Sentences
ORIG: Apart from anything else, big companies have the size and muscle to
derive gains by forcing their suppliers to cut prices (as shown by the furore
highlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn for a 2.5% rebate from its suppliers); smaller businesses lower down the food
chain simply don’t have that opportunity.
RANDOM: Apart from anything else, self-aggrandising companies have the
size and muscle to derive gains by forcing their suppliers to foreshorten prices
(as shown by the furore highlighted in yesterday’s Telegraph over Serco’s
demand - now withdrawn - for a 2.5% rebate from its suppliers); smaller
businesses lower down the food chain simply don’t birth that chance.
Linguistic Steganography
37
Intro
Ling Steg
Lex Sub
Experimental Design
• 60 sentences
• 30 judges
• Latin square design with 3 groups of 10 judges
• People in the same group receive the 60 sentences under the
same set of conditions
• Each judge sees all 60 sentences, but sees each sentence only
once in one of the three conditions
Linguistic Steganography
38
Intro
Annotation Guidelines
Linguistic Steganography
Ling Steg
Lex Sub
39
Intro
Annotation Example
Linguistic Steganography
Ling Steg
Lex Sub
40
Intro
Ling Steg
Lex Sub
Results
• Average score for the original sentences is 3.67 (scale of 1–4)
• Average score for the sentences modified by our system is 3.33
• Average score for the randomly changed sentences is 2.82
• Differences between the systems are highly significant (Wilcoxon
Signed-Ranks Test)
Linguistic Steganography
41
Intro
Ling Steg
Lex Sub
Results
• Average score for the original sentences is 3.67 (scale of 1–4)
• Average score for the sentences modified by our system is 3.33
• Average score for the randomly changed sentences is 2.82
• Differences between the systems are highly significant (Wilcoxon
Signed-Ranks Test)
• Payload is a few bits per sentence for this level of imperceptibility
• Threshold controls tradeoff between payload and imperceptibility
Linguistic Steganography
41
Ambiguity
Sharing
Deletion
Sense Ambiguity Problem
• Different codewords assigned to different senses of composition
leads to a decoding ambiguity
Linguistic Steganography
42
Ambiguity
Sharing
Sense Ambiguity Problem
• Represent synonymy relation in a graph
• words are nodes in the graph
• edges represent membership of the same synset
Linguistic Steganography
Deletion
43
Ambiguity
Sharing
Deletion
Vertex Colour Coding
• Vertex Colouring: a labelling of the graph’s nodes with colours
(codes) so that no two adjacent nodes share the same colour
Linguistic Steganography
44
Ambiguity
Sharing
Deletion
Vertex Colour Coding Algorithm
• Assume synsets have no more than 4 words
• 99.6% of synsets have less than 8 words
• Task is to maximise the number of nodes (words) in the graph
whilst assigning a unique codeword to each node
• We propose a greedy algorithm to perform the colouring – add
edges and codes assuming some ordering of the words so that no
two adjacent nodes share the same code
Linguistic Steganography
45
Ambiguity
Sharing
Vertex (Colour) Coding Algorithm
Linguistic Steganography
Deletion
46
Ambiguity
Vertex Coding Algorithm
Linguistic Steganography
Sharing
Deletion
47
Ambiguity
Sharing
The Stego Lexical Substitution System
Linguistic Steganography
Deletion
48
Ambiguity
Sharing
Deletion
Deletion as the Transformation
• Words can often be deleted without affecting the meaning
(especially adjectives)
“Have you heard of the mysterious death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change came
over the woman’s face as I asked the question. It was some
seconds before she could get out the single word “Yes” – and
when it did come it was in a husky, unnatural tone.
Linguistic Steganography
49
Ambiguity
Sharing
Deletion
Deletion as the Transformation
• How can the receiver detect deleted words in the stego text?
• One possibility is to have more than one stego text, with different
words deleted in each
• More than one stego text leads to the idea of secret sharing
Linguistic Steganography
50
Ambiguity
Sharing
Deletion
Secret Sharing
• There are two receivers, each receiving a different version of the
cover text
• Only when the receivers compare texts can the secret message be
revealed
Linguistic Steganography
51
Ambiguity
Sharing
Deletion
A Secret Sharing Scheme
Secret
bits:
101
Text: “Have you heard of the mysterious death of your late
boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible
change came over the woman’s face as I asked the question.
It was some seconds before she could get out the single word
“Yes” – and when it did come it was in a husky, unnatural
tone.
Linguistic Steganography
52
Ambiguity
Sharing
Deletion
A Secret Sharing Scheme
Embed
1st bit: 1
Share0 : “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Target
adj:
mysterious Share1 : “Have you heard of the mysterious death of your late
boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible
change came over the woman’s face as I asked the question.
It was some seconds before she could get out the single word
“Yes” – and when it did come it was in a husky, unnatural
tone.
Linguistic Steganography
53
Ambiguity
Sharing
Deletion
A Secret Sharing Scheme
Embed
2nd bit:
0
Target
adj:
terrible
Linguistic Steganography
Share0 : “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Share1 : “Have you heard of the mysterious death of your
late boarder Mr. Enoch J. Drebber, of Cleveland?” A change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
54
Ambiguity
Sharing
Deletion
A Secret Sharing Scheme
Embed
3rd bit: 1
Target
adj:
single
Linguistic Steganography
Share0 : “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the word “Yes” – and
when it did come it was in a husky, unnatural tone.
Share1 : “Have you heard of the mysterious death of your
late boarder Mr. Enoch J. Drebber, of Cleveland?” A change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
55
Ambiguity
Sharing
Deletion
A Secret Sharing Scheme
read off
bits: 101
Share0 : “Have you heard of the death of your late boarder
Mr. Enoch J. Drebber, of Cleveland?” A terrible change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the word “Yes” – and
when it did come it was in a husky, unnatural tone.
Share1 : “Have you heard of the mysterious death of your
late boarder Mr. Enoch J. Drebber, of Cleveland?” A change
came over the woman’s face as I asked the question. It was
some seconds before she could get out the single word “Yes”
– and when it did come it was in a husky, unnatural tone.
Linguistic Steganography
56
Ambiguity
Sharing
Deletion
Adjective Deletion Data
• Pleonasm data for pilot study
• free gift, cold ice, final end, . . .
• Full study used human annotated data
• 1,200 sentences from the BNC marked for naturalness (yes/no)
Linguistic Steganography
57
Ambiguity
Sharing
Deletion
Example Judgements (YES)
Judgement Example sentence
Deletable
Deletable
Deletable
He was putting on his heavy overcoat, asked again casually if he could
have a look at the glass.
We are seeking to find out what local people want, because they must
own the work themselves.
We are just at the beginning of the worldwide epidemic and the situation
is still very unstable.
Linguistic Steganography
58
Ambiguity
Sharing
Deletion
Example Judgements (NO)
Judgement
Example sentence
Undeletable
He asserted that a modern artist should be in tune with his times, careful
to avoid hackneyed subjects.
With various groups suggesting police complicity in township violence,
many blacks will find little security in a larger police force.
There can be little doubt that such examples represent the tip of an
iceberg.
Undeletable
Undeletable
Linguistic Steganography
59
Ambiguity
Sharing
Deletion
Data Collection
• 30 native English speakers
• 1,200 sentences with 300 annotated by 3 judges; the rest
annotated by one
• Fleiss kappa was 0.49 (moderate agreement)
• 700 training; 200 development; 300 test
• Ratio of deletable:undeletable was roughly 2:1
Linguistic Steganography
60
Ambiguity
Sharing
Deletion
Deletion Classifier
• SVM classifier with a variety of features, e.g.:
•
•
•
•
Google n-gram count ratios before and after deletion
lexical association measures between noun and adjective, eg PMI
Noun and adjective entropy measures
...
Linguistic Steganography
61
Ambiguity
Sharing
Deletion
Full Classifer Results on Test Set
Threshold
Pre
Rec
Linguistic Steganography
0.69
0.70
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
70.1
74.5
69.8
73.4
70.7
72.9
72.0
70.8
70.8
65.6
71.1
58.9
74.8
41.7
85.0
26.6
90.9
15.6
100
5.2
62
Ambiguity
Sharing
Deletion
References
• Practical Linguistic Steganography using Contextual Synonym
Substitution and a Novel Vertex Coding Method
Ching-Yun Chang and Stephen Clark
To appear in Computational Linguistics
• Adjective Deletion for Linguistic Steganography and Secret Sharing
Ching-Yun Chang and Stephen Clark
Proceedings of the 24th International Conference on Computational
Linguistics (COLING-12), Mumbai, India, 2012
• The Secret’s in the Word Order: Text-to-Text Generation for Linguistic
Steganography
Ching-Yun Chang and Stephen Clark
Proceedings of the 24th International Conference on Computational
Linguistics (COLING-12), Mumbai, India, 2012
• Linguistic Steganography using Automatically Generated Paraphrases
Ching-Yun Chang and Stephen Clark
Proceedings of the Annual Meeting of the North American Association for
Computational Linguistics (NAACL-HLT-10), Los Angeles, 2010
Linguistic Steganography
63