Anti-SIFT Images Based CAPTCHA Using Versatile Characters

Anti-SIFT Images Based CAPTCHA Using Versatile
Characters
Chen-Chiung Hsieh and Zong-Yu Wu
Department of Computer Science and Engineering
Tatung University
Taipei City, Taiwan
[email protected]
Abstract—Due to vigorous development of pattern
recognition, traditional human form filling tasks would be
replaced by automated processes. However, these automation
processes are often misused for illegal behavior such as spam email or application for website account. In order to prevent
website owner from suffering the attacks of automated program,
this paper proposed an innovative image-based CAPTCHA for
distinguishing human and computer by embedding versatile
characters in the images. The proposed method makes the
characters indiscernible by automated image analysis
technologies like scale-invariant feature transform while human
can easily distinguish the location of the embedded characters.
Our designed mechanism was capable to elude such kind of
attacks. In experiments, 15 users were invited to test the system
and the success rate is 86%. If wrong operations like clicking out
of text boxes were excluded, the success rate reached 95%.
Compare the average logging time with reCAPTCHA and
HELLO CAPTCHA, the proposed method is faster than these
two methods by 32 seconds and 115 seconds, respectively.
Keywords—CAPTCHA; internet verification code; internet
attack; image analysis; SIFT
I.
image. The most difficult part for automated process is to
recognize each versatile character that is interfered by nonuniform background image. To prove the robustness of
proposed mechanism, pattern recognition technique Scaleinvariant feature transform (SIFT) were tried and the system
could successfully resist this kind of attack. Meanwhile,
human users need to complete the verification task through
mouse clicking only.
II.
Related Works
Shirali-Shahreza and Shirali-Shahreza [3] proposed
handwritten CAPTCHA man-machine distinguishing method
as shown in Fig. 1, where the alphabetic characters displayed
are handwritten style. Users need to identify alphabetic letters
in the image and input the correct answer to passed through
verification. The letters displayed in this simple background
image can be analyzed by computers. Furthermore, some of
the characters are distorted too much for user to identify and
answer in a short time.
Introduction
Throughout the years, CAPTCHA [1] has provided us with
a number of ways to answer the same question. Why would a
website want to verify this? To prevent spam emails.
CAPTCHA originally developed by Carnegie Mellon
researchers and professors in 2000 provided webmasters with
a solution to the spam that plagued their sites. By generating
an image containing a string of random, warped characters,
CAPTCHA forms stopped automated spam bots in their tracks.
Consequently, CAPTCHA [2] verification mechanism was
applied to man-machine distinguishing. It is mainly with an
automatically generated test mechanism where an individual
person can easily pass through while automated program can
hardly pass through on the contrary.
However, spammers quickly adapted their automation
software to bypass this kind of test by image analysis
technologies: remove a CAPTCHA’s background, separate
and identify individual characters, and ultimately type in the
displayed character string. In this study, text based CHPTCHA
is improved from a 1D distorted string to a 2D versatile
characters by displaying individual character with varying size,
color, style, and angle on an randomly selected background
Fig. 1. Verification mechanism using handwritten characters [2].
Baird and Bentley proposed implicit CAPTCHA method
[4], where the verification code mechanism only requires user
an easy click. First, the system shows a picture and informs
user on basis of the program to click on the correct location.
User must click on the correct location to pass through the
verification. However, such verification system is easy to be
solved by semantic analysis, so safety protection could not be
guaranteed.
Shirali-Shahreza and Shirali-Shahreza proposed drawing
CAPTCHA [5], where drawing method is used as verification
mechanism. User is required to draw a correct shape. There
are some special points (diamond shaped) given by the system,
user need to connect these special points for verification. In
the picture, other small points are used for interference, mainly
for inhibiting automatic programs. On the contrary, it also
creates inconvenience to the user and not easy for user to
detect the special point designed to construct a required shape.
978-1-4799-0604-8/13/$31.00 ©2013 IEEE
Liao [6] proposed a man-machine distinguishing
mechanism based on contents in the image, where arbitrarily
selected two blocks in the image are randomly switched. The
used image is mainly human face because face organs can be
easily identified by users as shown in Fig. 2. Thus the
switched blocks can be easily found by user.
Shirali-Shahreza and Shirali-Shahreza proposed motion
CAPTCHA [8], which is based on recognition of a human
motion, as shown in Fig. 5, where user will be presented with
a movie clip. In the movie, a person is making some motion
and there will be four choices below. Each sentence describes
a motion for selection. User is required to select the correct
answer to pass through the test so as to continue using the
website. However, such test needs much download time for
users to view the movie contents.
Fig. 2. Two blocks switched for verification. (a) Original image (b) Image
with blocks switched. Block size is of ratio 0.3*0.3 of the original image. (c)
Solid rectangles are correct answers.
In Microsoft Hotmail website, when users register for
email service, “man-machine distinguishing” is used to
distinguish human and automatic software. In the method, a
string of randomly selected alphabetic letters is displayed with
shape modified or interfered by information added graphics as
shown in Fig. 3. That verification could not be easily passed
through by automatic software, and certainly such interfering
information could also create confusion to users. Thus, user
needs to wait for images that can be more easily identified to
perform the verification.
Fig. 5. Motion CAPTCHA [7].
III.
Proposed Method
In the study, random number generation module is used to
select image and verification characters for conducting manmachine distinguishing. Then color, size, style, and angle
variation are applied to the verification characters. Finally, the
verification image is generated.
Fig. 3. Verification mechanism for Hotmail registration.
There are related studies in man-machine distinguishing
form Microsoft, such as Animal Species Image Recognition
for Restricting Access (ASIRRA) [7], where the advantage of
human over machine on image contents is applied. The test
requires user to select cat or dog image in the pictures, as
shown in Fig. 4. Microsoft collaborated with Petfinder
(www.petfinder.com), not only that more than 3 million
images of cats and dogs in Asirra database are provided, but
also fun is provided by “adopt me” link shown below pet
images in the verification process. Since the pictures are
provided by pet adopting website, the verification of filming
angle of each pet image is also significant. Although the
classification could not be conquered by automatic program, it
could also lead to difficult in identification when the angle
deviation is too great to users.
The system flow of the verification mechanism, as shown
in Fig. 6, is firstly to randomly select a background image
from an image bank. Then, five verification codes from
alphanumeric are also selected randomly but to avoid text out
of background image. The text display area is set smaller than
the original image size, with collision detection to other
anchored verification codes.
Fig. 6. System flow.
Fig. 4. Asirra verification mechanism.
A. Text Generation
Arbitrarily select a background image from a nonrestricted image database. Then randomly generate five
alphanumeric characters and embed these characters into the
image. These five alphanumeric characters with varying size,
color, style, and coordinates are displayed on web page for
user verification. In the verification process, user needs to
select characters according to the sequence of the verification
codes displayed in the text box. When a character is clicked,
its color will be changed. Verification is passed once the fifth
character is clicked. If error occurred then a verification error
will be shown and the system will generate another new
verification image again until the user passes through the
verification process.
B. Image with Verification Codes Generation
The background image arbitrarily selected from the image
database is brought into web page using JQuery Ajax. 26
uppercase and lowercase letters and 10 numeric digits are used
as the character set. Five verification codes are generated by
random and non-duplicate, wherein make variations on their
size, color, and style. Each character is also rotated from the
normal oriental direction. In addition, to avoid characters from
going beyond image borders, as shown in Fig. 7, the location
range for verification codes is in 270×270, as shown in Fig. 10.
(a)
(b)
Fig. 7. (a) Situations of characters beyond 300×300 image boundary. (b)
Inner rectangle of 270×270 is the location range for code generation.
C. Character Collision Detection
During the process of characters generation, overlapping
may occur. We adopt an array to store the upper-left corner
and bottom-right corner of the generated characters. Character
collision detection could be easily implemented by comparing
the location of newly generated character with those locations
of already generated characters stored in the array.
Figure 8(a) gives an example of the generated image for
user verification. The verification code in normal format is
displayed under the image and user needs to click these
characters according to the sequence in the verification code.
Figure 8(b) shows the four corner locations of the five
generated characters.
(a)
(b)
Fig. 8. (a) An example of generated image with verification code embedded.
The verification code is displayed under the image for user to find them in the
image. (b) The corners for “2OJzY” are listed as list a to e.
D. SIFT Character Matching
Scale-invariant feature transform (SIFT) [9, 10] is an
algorithm in computer vision to detect and describe local
features in images. Lowe's method for image feature
generation transforms an image into a large collection of key
points with feature vectors. Each key point is invariant to
image translation, scaling, rotation, and partially invariant to
illumination changes and robust to local geometric distortion.
SIFT feature point comparison method comprises four steps:
scale-space extreme value detection; feature point position
optimization; calculating feature point directionality; feature
point description.
IV.
Experimental Results
A. User Verification Test
For the test of the system, users need a computer
connected to the developed web server. 15 subjects aged
among 23 ~ 27, male and female were invited to conduct three
experiments. In Experiment one, each subject conducted 10
times the verification. For comparisons, reCAPTCHA [11]
and HELLO CAPTCHA [12] were also tested 10 times by
each person. To avoid misunderstanding caused operation
errors, each subject was told the standard verification
procedure in advance. Table I summarized the results and it
demonstrated that our method surpassed reCAPTCHA and had
the equal performance with HELLO CAPTCHA. Our
execution time was 102 seconds which was lower than
reCAPTCHA’s 134 seconds and HELLO CAPTCHA’s 217
seconds. The proposed verification code mechanism is
superior to reCAPTCHA due to the distorted characters in
reCAPTCHA were too difficult for user to read. The situation
was worse in HELLO CAPTCHA owing to occluded
characters in dynamic presentation. In addition, downloading
time was also time-consuming. Instead to require more time
for user to recognize the text so as to input verification codes,
the proposed mechanism only requires simple clicks to pass
the verification. It is able to save time spending on web.
TABLE I.
PASS RATES FOR RECAPTCHA, HELLO CAPTCHA AND
THE PROPOSED VERIFICATION MECHANISM.
TABLE II.
PASS RATES FOR ROTATED CODES OF PROPOSED METHOD
In Experiment two, the characters were generated with
rotation and the same procedure was conducted by each
subject 10 times. Table II summarizes the results which
showed that the success rate declined only one percent. It
induced that human could easily find the character even with
some degree of rotation.
To investigate the factors which caused failure,
Experiment three was conducted by asking each subject
finding the five characters 10 times without abort on error.
Hence, each subject totally needed to find 50 characters from
10 images. The character recognition rate was 95% in average.
The failure reasons includes background of similar color,
background of high illumination, background with duplicate
characters, complex background, and characters happened to
fit in background.
B. SIFT Attacks
SIFT Experiment was mainly to verify whether the
proposed mechanism can be easily conquered or not. The
more number of matched key points, the more possible be
attacked. Figure 9 gives two examples doing SIFT matching
of characters ‘w’ and ‘4’. It was found that no matter colored
image nor grey scaled image, the embedded characters in the
image could not be correctly extracted by the various
interferences in image. On the other hand, once the deployed
background image was simple enough, then the verification
codes of the proposed mechanism may not under protection.
Fig. 9. Two SIFT attacks. (a) Codes in color image. (b) Codes in grey image.
(c) Codes in simple background image.
V.
Conclusion
Along with the widespread of internet usage, people rely
on automatic mechanism to process massive data day by day.
To avoid the automatic mechanism to be misused and reduce
any improper use of operation, we need to distinguish between
man and machine so as to prevent the risk brought by
automation. Rights and interests of users could be effectively
maintained only if man-machine difference could be
differentiated.
In this paper, a novelty concept of man-machine
distinguishing mechanism is developed in combination of
various verification mechanisms is proposed. By varying the
size, color, style, angle, and position of the verification codes,
while keeping the human readability of these codes, a web
server was created with such verification mechanism to test
the feasibility of proposed method. User only needs to click
the codes according to the displayed text. From experimental
results, the pass rate by human is higher than some existing
approaches. Meanwhile, the human verification time of our
method is quite shorter than the other approaches. To test the
robustness of the proposed method, famous SIFT matching
algorithm was adopted to attack the system and the
experimental results showed that the mechanism is capable of
defending such kind of attacks.
Acknowledgment
This work was supported in part by the Division of
Research and Development, Tatung University, Taipei,
Taiwan, under grant B101-I03-035, 2013.
References
[1]
http://www.techi.com/2010/05/are-you-human-captch as-many-ways-ofasking-the-same-question/
[2] L. Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA: telling
humans and computers apart (automatically)”, in Eurocrypt '03
Advances in Cryptology, Lecture Notes in Computer Science, vol. 2656,
pp. 294–311, 2003.
[3] A. Rusu and V. Govindaraju, “Handwritten CAPTCHA: using the
difference in the abilities of humans and machines in reading
handwritten words”, in Proceedings of the 9th Int’l Workshop on
Frontiers in Handwriting Recognition, Sept 2004.
[4] H. S. Baird and J. L. Bentley, “Implicit CAPTCHAs,”, in Proceedings of
the SPIE on Document Recognition and Retrieval XII, vol. 5676, pp.
191-196, 2004.
[5] Shirali-Shahreza, “Drawing CAPTCHA,” in Proceedings of the 28th
International Conference Information Technology Interfaces (ITI 2006),
Cavtat, Dubrovnik, Croatia, pp. 475-480, June 19-22, 2006.
[6] W. H. Liao, “A CAPATCHA mechanism based on textured patterns,” A
Thesis of the Department of Computer Science, National Chengchi
University, 2007.
[7] J. Elson, J. R. Douceur, J. Howell, and J. Saul, “Asirra: a CAPTCHA
that exploits interest-aligned manual image categorization,” in Proc. of
ACM Conference on Computer and Communications Security, pp. 366374, 2007.
[8] M. Shirali-Shahreza and S. Shirali-Shahreza, “Dynamic CAPTCHA,” in
Proceedings of the 2008 International Symposium on Communications
and Information Technologies (ISCIT 2008), Vientiane, Lao, pp. 436440, Oct. 2008.
[9] D. G. Lowe, “Object recognition from local scale-invariant features”, in
Proc. of the International Conference on Computer Vision, Corfu,
Greece, pp. 1150-1157, Sep. 1999.
[10] D. G. Lowe, “Distinctive image features from scale-invariant key
points”, International Journal of Computer Vision, vol. 60, no. 2, pp. 91110, 2004.
[11] ReCAPTCHA:http//www.google.com/recaptcha.
[12] HELLO CAPTCHA:http://hellocaptcha.com/index.php.