A Methodology and Implementation for Annotating Digital Images for

Journal of the American Medical Informatics Association
Application of Information Technology
Volume 11
Number 1 Jan / Feb 2004
29
j
A Methodology and Implementation for Annotating
Digital Images for Context-appropriate Use in an
Academic Health Care Environment
PATRICIA A. GOEDE, BS, JASON R. LAUMAN, BS, CHRISTOPHER COCHELLA, MS, GREGORY L. KATZMAN, MD,
DAVID A. MORTON, PHD, KURT H. ALBERTINE, PHD
A b s t r a c t Use of digital medical images has become common over the last several years, coincident with the
release of inexpensive, mega-pixel quality digital cameras and the transition to digital radiology operation by hospitals.
One problem that clinicians, medical educators, and basic scientists encounter when handling images is the difficulty of
using business and graphic arts commercial-off-the-shelf (COTS) software in multicontext authoring and interactive
teaching environments. The authors investigated and developed software-supported methodologies to help clinicians,
medical educators, and basic scientists become more efficient and effective in their digital imaging environments. The
software that the authors developed provides the ability to annotate images based on a multispecialty methodology for
annotation and visual knowledge representation. This annotation methodology is designed by consensus, with
contributions from the authors and physicians, medical educators, and basic scientists in the Departments of Radiology,
Neurobiology and Anatomy, Dermatology, and Ophthalmology at the University of Utah. The annotation methodology
functions as a foundation for creating, using, reusing, and extending dynamic annotations in a context-appropriate,
interactive digital environment. The annotation methodology supports the authoring process as well as output and
presentation mechanisms. The annotation methodology is the foundation for a Windows implementation that allows
annotated elements to be represented as structured eXtensible Markup Language and stored separate from the image(s).
j
J Am Med Inform Assoc. 2004;11:29–41. DOI 10.1197/jamia.M1247.
Annotating digital images with symbols and text is a fundamental task that a clinician, medical educator, or basic
scientist must perform when preparing material for academic
use.1–3 Image annotation, in the broad sense, includes any
means that allows an author to label, point to, or otherwise
indicate some feature of the image that is to be the focus of
attention.1,4–7 Authors should be able to perform the task of
annotation quickly and easily to optimize utility, workflow,
and time management. Unfortunately, annotation is made
difficult by the lack of tools for annotation of digital media for
Affiliations of the authors: Department of Radiology, University of
Utah Health Sciences Center, Salt Lake City, UT (PAG, JRL, CC,
GLK); Department of Pediatrics, University of Utah Health Sciences
Center, Salt Lake City, UT (PAG, JRL, CC, KHA); Department of
Neurobiology and Anatomy, University of Utah Health Sciences Center, Salt Lake City, UT (DAM, KHA).
Research and development for this project was funded partially by
a State of Utah Center of Excellence Grant. Partial funding was also
from the George S. and Dolores Dore Eccles Foundation and
a Benning Research Grant from the Department of Radiology.
Additional support was provided by the Department of Pediatrics
and the Department of Neurobiology and Anatomy, University of
Utah Health Sciences Center.
Correspondence and reprints: Patricia Goede, University of Utah
Health Sciences Center, Department of Pediatrics, Program in
Imaging, Communication, and Collaboration, Electronic Medical
Education Resource Group, 30 N 1900 E, Salt Lake City, UT 841322202; e-mail: <[email protected]>.
Received for publication: 09/10/02; accepted for publication:
07/27/03.
use in a context-appropriate setting (i.e., colleagues, students,
patients) that promotes reuse of the annotated material.
Commercial-off-the-shelf software (COTS) is commonly used
to create and present material for academic use and is general
enough to handle most tasks, but does not support the
author’s conceptual framework or workflow.1,8 Difficulties
arise for several reasons, including lack of optimal file format
that supports reuse, lack of a methodology for annotating
digital material in a hierarchical fashion that does not embed
the annotations within the raster-based image, and lack of
mechanisms to index and catalogue annotated material for
reuse. Consequently, clinicians, medical educators, and basic
scientists amass enormous numbers of images, many of
which are duplicated.
Two unanswered questions that clinicians, medical educators,
and basic scientists have are (1) how to annotate digital
material in a widely accepted manner with a clearly defined
set of rules (methodology) that supports reuse of the
annotated images and (2) how to create material for context-appropriate reuse in lectures, case conferences, and publications, without having to maintain multiple copies or file
formats for different uses.9,37 For example, when an image
from the clinical Picture Archiving and Communications
System (PACS) is acquired, the radiologist has the option of
saving key images as Tagged Image File Format (TIFF), Joint
Photographic Experts Group (JPEG), or bitmapped image
(BMP) files (depending on vendor platform and capability).
After the image is acquired, the radiologist typically uses
raster-based COTS to annotate the image for a single use
within a given presentation (e.g., the process of applying
30
GOEDE ET AL., Annotating Images for Context-appropriate Use
annotations using Photoshop for use in a PowerPoint presentation). Basic science teachers, such as gross anatomists,
encounter the same problems because they use enormous
numbers of labeled images to teach human anatomy.7,10,11
Each image may contain several annotations or groups of
annotations that are necessary to convey a certain point. Most
software applications do not have the ability to manage visual
annotations and text in the output, which leads to a cluttered,
overly annotated image. Moreover, many software applications provide a feature to manage annotations (i.e.,
visible/invisible) within the application but do not provide
the ability to manage the annotations in the output. The
annotations are embedded in the flattened, permanently
altered image. Each flattened image has one specific use
within a single context, which necessitates multiple duplications of the same annotated image, as opposed to annotating
a single image that can be presented with some, all, or none of
the annotations, thereby allowing reuse for multiple purposes
in multiple contexts. Flattened image annotations result in
a variety of undesirable side effects: repetition of work,
increased authoring effort, increased organization requirements, increased complexity, difficulties to automate image
cataloging, and reduced instructional capability.
The solution is to define an annotation methodology that
is the foundation for development of a software implementation that facilitates annotation of digital image data,
tracks the inherent structure, and identifies relationships or
intellectual groupings of the annotations.7,10 The annotation
methodology forms the basis of an eXtensible Markup
Language (XML)12 schema that defines a platform-independent annotation exchange format. The annotations are stored
as vector information in the Scalable Vector Graphics (SVG)
format and remain linked to the original image.13 This
methodology keeps annotations accessible as vector data, not
embedded in the image, and maintains links between the
image and annotation information. Because the annotations
are vector based, they can be overlaid and linked to images
with similar features that are generated from other imaging
modalities. Therefore, the annotations and images become
reusable. Moreover, with vector-based image annotations,
management of multiple original versions in a proprietary
format or distribution of multiple copies of the same image is
not necessary.
We identified requirements for and implemented a simple,
yet effective, image annotation tool based on the annotation
methodology. The tool was designed to be simple to use, to
utilize vector-based annotations, and represent annotation
groupings. These requirements created the need for a stan-
Table 1
j
dardized methodology to effectively communicate visual
annotations in a consistent and congruent manner while
preserving images for reuse. Therefore, the focus of this report
is to present the annotation methodology. We also define
the schema for development of a software implementation
for multispecialty visual annotation of digital images. The
software implementation also facilitates visual annotation
interactivity and context-appropriate viewing of the visually
annotated images. The annotation implementation was
developed specifically for clinicians, medical educators, and
basic scientists who required the ability to annotate images
with visual expert knowledge for viewing in an interactive,
context-appropriate, digital environment.
Design Objectives—Methods
This project had three objectives. First, we conducted
a multispecialty requirements assessment, consisting of
clinicians, medical educators, and basic scientists, to establish
a guideline for annotation and visual knowledge representation. The second objective was to establish a methodology for
annotation and visual knowledge representation and further
define a specification to be translated into compiled code
(software). The third objective was to develop, implement,
and evaluate a prototype software application for annotation
and knowledge representation that would be used by the
participants in the requirements analysis.
Requirements Assessment
Based on the desire to develop a software implementation for
annotation, we organized a requirements assessment to define
what is involved in the process of annotation, what elements
and features are needed for annotation, and how the output
would be used and in what context (e.g., is the output going to
be viewed via the Worldwide Web [WWW], is the output
targeted for publication in a peer-reviewed journal, is the
output to be viewed primarily by a group of investigators, or
will the output need to be viewed by patients or other health
professionals?). This assessment established the basic design
requirements for the annotation software implementation.
The requirements assessment was conducted within radiology (5 radiologists) and was extended to query other medical
science professionals, including two anatomists, a dermatologist, two ophthalmologists, and three medical students, to
establish guidelines and a foundation for annotation (Table 1).
The methods for collecting the requirements began as an
open-ended discussion with the participants who described
their requirements for annotating digital images. All of the
radiologists that participated in the requirements analysis
are practicing clinicians. One of the five radiologists also
Summary of Participants in the Requirements Assessment
Specialty
Radiology
Ophthalmology
Neurobiology and anatomy
Dermatology
Ophthalmology
Second-year medical students
Clinical
Research
Basic
Science Research
Clinical
Activities
1
1
1
4
1
1
Teaching (Formal
Instruction)
2*
1
1
3
*One of the participants performs basic science research (lung and cancer) as well as teaches human gross anatomy.
Number of
Participants
5
2
2
1
2
3
Journal of the American Medical Informatics Association
Volume 11
participates in a clinical research project. The two anatomists
direct and teach the human gross anatomy course for firstyear medical students. One of the anatomists also directs the
human gross anatomy course and is involved in basic science
lung and cancer research. The dermatologist is a clinician. Of
the two ophthalmologists, one is a clinician, and the other is
a basic science vision investigator. The three medical students
were enrolled in the human gross anatomy course. We invited
clinical and basic science investigators who were not part of
the formal requirements assessment to review and give
feedback on what was established in the formal requirements
assessment.
We selected participants based on use of images to convey
information (e.g., imaging results, clinical information, or
teaching) as part of their clinical, educational, or research
activities. The participants were familiar with annotation and
had extensive experience annotating images ranging from
traditional Letraset to using sophisticated image processing
software such as Adobe Photoshop,14 Adobe Illustrator,15 and
Microsoft PowerPoint.16 Only individuals that expressed the
need for a simple tool to visually annotate an image with
pointers and labels to convey information and/or multiple
output formats and reuse of the archive image participated in
the requirements assessment.
Annotation Methodology for Visual Knowledge
Representation
Concurrent with the requirements assessment was the
definition of a methodology for annotation and visual
knowledge representation. To define the Annotation Methodology, the participants were asked to describe how they
use annotated images, including what software they most
commonly use and in what contexts and types of output
(format) they require. Additionally, participants were asked
to provide detailed information on how they store annotated
images, their process for locating and retrieving digital
images, problems they encounter retrieving images and
visual knowledge, and what type of storage they use to store
images (i.e., database, fileserver, backup media). Design
interface questions, such as icon use, placement of icons,
and drop down menus that were directly related to the
software implementation were posed to the participants to
get feedback on those features.
Results
Part of the requirements assessment was to determine if the
need for annotation transcends the boundaries of clinical,
basic science research, and teaching efforts. For example, two
open-ended questions, ‘‘what is a region of interest?’’ and
‘‘what is the definition of a pointer?’’ were asked to determine
if the concepts were different for clinicians compared with
basic scientists and educators. Examples of the top five
questions, specific to annotation, that participants responded to in the formal requirements analysis are shown
in Table 2.
The results of the requirements assessment identified the
following fundamental annotation requirements:
Accurately identify a feature on an image
Identify a region of interest (ROI) with a point, edge, or
polygon on the image
Number 1 Jan / Feb 2004
31
Table 2 j Example of the Five Most Frequently
Responded to Questions in Requirements Assessment
How would you identify a region of interest (ROI)?
How would you label a region of interest (ROI)?
How can you group annotations for a particular image?
Is it necessary to define arbitrary groups of annotations for specialpurpose and context-appropriate viewing?
What about inclusion and support for third-party lexicons and
nomenclature?
Label a feature or ROI with alphanumeric symbols, labels,
and captions
Adjust visible contrast and color of annotations
Organize annotations into hierarchical groupings
Define arbitrary groups of annotations for special purpose
and context-appropriate viewing
Support for third-party lexicons and nomenclature
Flexibility to annotate any image file format
The most frequently identified requirements were to define
and outline a region of interest; label it with a symbol, label,
or caption; set the visible contrast to an annotation (e.g., black
on white or white on black); group the annotated regions
of interest in a hierarchical fashion (e.g., similar to a table of
contents); and provide support for third-party lexicons. The
participants also identified the need for a software application that would allow them to annotate images in different
file formats and not be constrained by a single image
format.
All participants in the requirements analysis identified the
need to have annotated material in a format that promotes
reuse (e.g., reuse as in cross-media publishing—journals,
conference presentations, lectures, Web viewing). Another
desirable feature was the ability to reuse annotated
material for context-appropriate viewing (e.g., clinician–
clinician, clinician–basic scientist, medical educator–
student, and eventually clinician–patient) as an interactive
resource.
Most important to the participants was the need for a tool that
provided all of the functional annotation requirements and
reduced the variety of undesirable side effects, such as
repetition of work, excessive authoring effort, organizational
requirements, and complexity. Other undesirable side effects
are difficulties to automate image cataloging, and limited
instructional capability.
Finally, participants were asked to identify categories into
which the annotation requirements should be placed. The
resultant four categories are (1) visual annotation, (2) textual
information, (3) presentation attributes, and (4) interactive
features. The results are summarized in Table 3 and discussed
in the following sections.
Model for Annotating Digital Images—The Process
of Annotation
Actual output from the annotation software implementation
that maps how an image may be annotated for contextappropriate viewing (multiple audiences) and reused in
academic activities (journal, conference, teaching, Web viewing) is shown in Figure 1. The annotation requirements of the
32
GOEDE ET AL., Annotating Images for Context-appropriate Use
individual author included selectable options for style, color,
size, orientation, visual elements used to indicate features,
and presentation methods.
Authors expressed a fundamental need to visually annotate
areas of an image that are relevant to their instructional
audience and goal.4,7,11,17,18 Equally important was the need
to establish a method to share or discuss their images in
a given context. Additionally, investigators entering into the
task of annotation wanted to accomplish the annotation task
without being too constrained by the methodology. Lastly,
authors did not want this task to be hindered by complexity
or by cumbersome software.
Table 3
j
The process of annotation is illustrated in Figure 2. Our
annotation methodology is divided into four sections, as
defined by the participants during the requirements analysis.
The first section is the definition of the data that are used to
construct the visual parts of the annotation, which are the
region of interest (ROI) and the pointer. The second section is
the textual data that are tied to the visual annotation and
include the symbol, label, and caption. The textual data also
include the order, which is used to prioritize and group the
annotations logically. The third section contains the presentation attributes that direct the size, color, shape, and
visibility of the annotation. The fourth section defines the
Categories of the Annotation Methodology
Visual
Annotation
Textual
Information
Presentation
Attributes
Interactive
Features
Region of Interest
Pointer
Order (list)
Label(s)
Hierarchical grouping
Arbitrary views
Symbol
Caption(s)
Lexicon(s)
Size (small, default, large)
Pointer type (none, line,
edge, arrow)
Color (light, default, dark)
Pointer location
Visibility
Rollovers
Zooming
F i g u r e 1. Reuse and context-appropriate viewing of an annotated image. The annotated image is a computed tomogram (CT)
of a cross-sectional slice across the neck of a patient who had a cyst on the left side of the neck. The cyst is annotated with a light
gray polygon to create a region of interest, a light gray pointer (arrow), and a light gray label (Cyst). Other annotated structures
are the carotid sheath (dark gray annotations) and carotid artery (light gray annotations). The pop-up boxes show the hierarchical
structure of the annotations. Reuse is indicated by directing the annotated image to three outputs. Context-appropriate views are
available to many users because the annotation structure enables the users to select the annotations that are displayed or hidden.
Journal of the American Medical Informatics Association
Volume 11
33
Number 1 Jan / Feb 2004
F i g u r e 2. Process of annotating raster images with vector annotations. The left side of the figure shows the steps to annotate an
image. The author opens an image, adds annotations through an iterative process, and saves the annotated image. The right side
of the figure shows an example of a computed tomographic (CT) image of the head of a patient who had hemorrhage metastasis in
the brain. Grouping of the annotations is shown in the box below the annotated CT image.
interactive features that facilitate presentation to multiple
audiences in a context-appropriate manner. Combination of
the four sections in the annotation methodology established
a simple and flexible information model that gives the
author precise means to add meaningful annotations to an
image.
Section 1: Visual Annotation
Region of Interest
The visible portion of the annotation includes a region of
interest (ROI). The participants defined the ROI as a feature or
structure on an image (e.g., pathology, tumor, nerve) that
conveys a clinical or research finding. The participants
defined the need to be able to highlight (draw a point, line,
or polygon) to indicate an ROI. The ROI is most often
accompanied by a pointer and symbol that allow the author
to identify features that convey relevant information about an
image 5,17 (Fig. 3).
Pointer
The pointer for the annotation is partially defined by the
author and partially computed based on where the author
initially places it. For example, the author selects where the
tail of the pointer should appear, and an algorithm calculates
the closest point on the ROI to place the pointer tip.10 This
dual mechanism for anchoring the pointer allows the author
34
GOEDE ET AL., Annotating Images for Context-appropriate Use
F i g u r e 3. Display of regions of interest on the base of the skull. The base of the skull has three depressions (called cranial
fossae) on its inside surface. The fossae are named anterior cranial fossa (ACF), middle cranial fossa (MCF), and posterior cranial fossa
(PCF). Each fossa is highlighted by a region of interest polygon. The region of interest polygon in the ACF has a thin line and
numerous white circles, both of which designate an active polygon. When a polygon (or other type of pointer) is active, the white
circles serve as nodes to adjust the position of the region of interest line relative to landmarks in the image. For this figure, the
contents of the MCF are enabled (turned on). The annotation methodology was used to annotate points, pointers, and symbols to
identify holes (foramina) through which nerves and blood vessels exit or enter the MCF. Symbols in the MCF are FS (foramen
spinosum), FO (foramen ovale), FR (foramen rotundum), SOF (superior orbital fissure), and FL (foramen lacerum).
to make choices about the layout of visual information on the
image without relying on a totally automated and potentially
unpredictable layout algorithm.19
may be used as a key to link the visual annotation to the
textual information. These presentation options are defined in
the presentation attributes section, below.
Symbol
The symbol that is customarily associated with a visual piece
of the annotation is taken from the textual information that is
derived from a lexicon or free text entry. In the annotation
software implementation, the symbol is an abbreviation,
typically derived from the label, that is six characters or
shorter. The character length of the symbol allows it to be
drawn on the image with numerous sets of other annotations,
without obscuring visual information or interfering with the
other annotations. When the symbol is used in this manner, it
Section 2: Textual Information
The textual information that is defined by the annotation
methodology includes the symbol (described in the previous
section), label, and caption (Fig. 4). Providing the ability to
add textual information about the annotation enables the
author to comment or add his or her expert knowledge on
contents of an image in the form of a symbol, label, and
caption. The comments may refer to a detail of the image or
the annotated image as a whole.20 The symbol, label, and
Journal of the American Medical Informatics Association
Volume 11
Number 1 Jan / Feb 2004
35
F i g u r e 4. Illustration of the textual attributes enabled in Figure 3 for the middle cranial fossa. The annotated image in Figure 3
was zoomed for Figure 4 to show that the annotations remain anchored to their anatomic structure. Moreover, the annotations did
not pixelate despite zooming. Both advantageous outcomes are the result of using scalable vector graphics.
caption are a set of information commonly used across many
fields but may have specialty-specific terminology.17,18
Label
The label is the word or phrase that defines the visual
annotation. For medical purposes, this label may also be
taken from a lexicon or vocabulary, which enables dictionarystyle lookup in the software implementation. The lexiconspecific piece of textual information allows the annotation to
be linked to a larger body of information outside the image.
For authors who do not use lexicons during the authoring
process, the symbol may be enough to match the annotation
with external information. The annotation methodology does
not restrict or define lexicons because use of lexicons is the
author’s preference or institution’s policy. If the label is
drawn from a defined lexicon, it should at least be consistent
across the author’s work.
Caption
The caption is defined as a sentence or paragraph that
describes the annotation. The description may include references to other pieces of information that may be part of an
index or hypertext system. The caption should not contain
information about the image as a whole, which is handled
through a constant nonvisual annotation (i.e., image metadata21).
Order or Grouping
The order is a character sequence that allows the annotations
of the image to be organized in an outline format, allows the
annotations to be grouped (or nested) logically, and may
impart priority (like the first annotation in the outline is the
most important). The order is not treated as an annotation
but is used to identify and set up the hierarchy that the
visual annotations fall into. This piece of textual information
is an invisible annotation that links the pieces of textual
information consisting of the symbol, label, or caption to the
image.
The ordered or grouped textual information is linked with the
image, much like the chunks of data that are embedded
within the Portable Networks Graphics (PNG) format.22 This
practice is similar to the concept of a table of contents. The
textual information that defines the order or grouping of the
visual annotations is a constant, nonvisual annotation that
always exists at the first position in the outline and is a part of
the information used to create the metadata of the image.
Section 3: Presentation Attributes
The presentation attributes of the annotation methodology
define how annotations should be drawn when rendered
through presentation software.10,17 The visible parts of the
presentation attributes may also be interpreted differently,
depending on the medium (e.g., laser print, journal article, or
Web browser) or the context (e.g., clinician–basic scientist,
clinician–patient, or medical educator–student). The presentation attributes that are currently defined are size, color,
pointer type, and tip location. To accommodate contextappropriate attributes, the annotation software implementation gives the author the ability to add annotations as visual
elements with the textual information in groups that can
be viewed independently. Annotations can also be visible or
36
GOEDE ET AL., Annotating Images for Context-appropriate Use
invisible (turned on/off) to optimize presentation and
management of annotated structures. Figure 5 illustrates the
presentation output from the annotation software implementation. Visible parts of an annotation can be changed for
viewing on the Web, printed in a journal or textbook, or used
in a professional presentation.
Each presentation attribute has only three or four options to
provide better control over presentation and annotation reuse.
All presentation attributes in the annotation methodology are
guidelines for the rendering and reuse of visual characteristics,
including fonts, sizes, and colors. Hypertext Markup Language (HTML) 23 has used this approach with success.
Annotation Size
The options for the annotation size attribute are ‘‘small,’’
‘‘default,’’ and ‘‘large.’’ The size options control the size and
line width of the pointer and associated text rendered with
the visual annotation. The algorithm for determining actual
pixel dimensions is processed by the software implementation of the annotation methodology.
Annotation Color
The options for annotation color are ‘‘light,’’ ‘‘default,’’ and
‘‘dark.’’ The color options control the color of the polygon,
line, or point that indicate an ROI and the pointer and text
that are rendered as part of the visual annotation. The light,
default, and dark options for annotating radiographic grayscale images present the author with three options that
optimize contrast on the image. Authors who annotate full
color images may use a color palette from which to pick the
color for individual annotations, if the standard light, default,
and dark options are insufficient. Moreover, consideration is
integrated into the annotation software implementation for
individuals with color deficiencies. Such individuals have
difficulty distinguishing between colors and the contrast of
the annotations on the images that are annotated with the
light, default, and dark (e.g., incomplete dichromatopsia,
most commonly red/green color deficiency, which affects 8%
of white men or achromatopsia, no color differentiation or
reduced visual acuity).24 The color deficiency may cause the
author/viewer to not see an annotation because of insufficient contrast.25 With this in mind, the annotation
methodology has an option that gives the author the ability
to change the color of an annotation on an individual basis.
The color that each of the three-color attributes, light, default,
and dark, map to must be defined in a separate style sheet,
offering style control to the author.
Font
The annotation methodology has an option that gives the
author the ability to select font and font size. The font is
selected from the fonts that are available in the system. The
size options are controlled and rendered similar to the visual
annotations. The algorithm for determining actual font size is
processed by the software implementation of the annotation
methodology.
Pointer Type
The pointer type options are ‘‘none,’’ ‘‘line,’’ ‘‘wedge,’’ and
‘‘arrow.’’ Other pointer types may be added, but these four
options form the foundation for the types of pointers that
may appear with an ROI. The style sheet and software
implementation control the appearance of these pointers.
Another pointer option is the ‘‘tip’’ option control, in which
the tip of the pointer appears relative to the ROI. The options
are ‘‘center’’ and ‘‘edge.’’ Using this attribute, the software
implementation determines the actual pixel location of the
pointer tip.
Section 4: Knowledge Representation: Interactive
Features, Context-appropriate Viewing, and Reuse
The participants in the requirements analysis identified the
need for a software application that achieved three goals: (1)
annotation on any type of image file format (i.e., TIFF, JPEG,
or PNG), (2) an interactive feature that provides the ability to
turn on and off sets of annotations for context-appropriate
viewing, and (3) reuse of annotated images. Therefore, these
goals were included in the software implementation of
the annotation methodology. We used existing standards
whenever possible. We incorporated the Annotation and
Collaboration Working Group (W3C) (,www.W3C.org/
annotation.) standards into the definition framework.
An annotation and related textual information (i.e., label or
caption) consist of discrete pieces of information that, when
viewed over the WWW, are interactive. Interactivity in this
sense is defined as giving the viewer the ability to turn on/off
annotated groups on the image. Annotations and associated
textual information are viewed and controlled independently
from the image.
F i g u r e 5. Example of the output of the fully annotated
image of the base of the skull. All of the annotations for each
of the cranial fossae are enabled (turned on).
Context-appropriate viewing of an image and related
annotations is a feature that allows the annotations on an
image to be turned on or off for a particular audience or
presentation. The annotation view attribute controls the
visibility of an annotation because the annotations are
separate from the image. Thus, the view attribute can turn
Journal of the American Medical Informatics Association
Volume 11
annotations on/off in a context-appropriate manner. The
options for view presentation are ‘‘all,’’ ‘‘ROI only,’’ ‘‘ROI and
symbol,’’ ‘‘ROI and label,’’ ‘‘ROI with pointer and symbol,’’
‘‘ROI with pointer and label,’’ ‘‘pointer and symbol,’’ ‘‘pointer
and label,’’ and ‘‘none.’’ Depending on the context, portions of
annotations may be viewed in a presentation, while other
portions remain hidden. An example of how an annotated
image can be viewed in an interactive manner is shown in
Figure 6.
Reuse is facilitated by providing an open ‘‘hook’’ to link the
image and related annotations to larger cataloging systems.
Participants in the requirements analysis, with the help of the
developers, identified the need to be able to reuse annotated
images for different purposes (i.e., publication, Web viewing,
or professional conferences).37 The software implementation,
based on the annotation methodology, gives the author the
ability to annotate an image once and reuse the annotations
or the image with the annotations. Authors can store the
archived image with the linked annotations. The images
remain unaltered because the annotations are not embedded
into the image. Therefore, the image remains in an archival
format and can be reused for other purposes or applications.
Software Implementation
Based on the above methodology for visual annotation and
knowledge representation, we developed a software implementation for the Windows16 Desktop environment.
Number 1 Jan / Feb 2004
37
Although the current prototype implementation was written
in Tool Command Language (Tcl/tk)26 for Microsoft
Windows,16 Tcl/Tk provides a cross-platform scripting
environment and facilitates rapid development, prototyping,
and testing for Macintosh27 and Linux.28 Thus, the initial
software implementation for visual annotation and knowledge representation is a platform-specific application. On the
other hand, the output from the software implementation is
not platform specific. Rather, the output format uses the
Scalable Vector Graphics (SVG) format, which is an extension
of the eXstensible Markup Language (XML) specification.12
The SVG format provides flexibility and interactivity when
viewed through a Web browser and can be used for output to
print material since the annotations remain as vector information overlaid onto the image.
The output includes metadata that contain information about
the image, visual annotations, author information, lexicons,
and information related to the authoring sessions, such as
revision control, and is stored within the XML output file.
SVG facilitates extensibility, interactive Web viewing, and
reuse. SVG also allows the annotations and visual expert
knowledge (i.e., labels and captions) to remain linked to the
image, as opposed to embedding the annotations to the
image.13 To facilitate the interactivity of the annotated
images, we leveraged Adobe’s freely available SVG plug-in
(Adobe Systems, San Jose, CA) 14 for viewing annotated
images over the WWW. The flexibility of using XML allows
F i g u r e 6. Examples of two annotation groups (panels A and B) for the same image to show context-appropriate presentation
and use. Annotation groupings can be enabled/disabled, depending on the features that are viewed. Panel A shows the
annotation hierarchy in pop-up windows for all of the annotations. The check marks to the left of each line in the pop-up windows
mean that the annotation is enabled. Panel B shows the annotation hierarchy when only the annotations are enabled for the PCF.
The user selects which annotations are enabled.
38
GOEDE ET AL., Annotating Images for Context-appropriate Use
powerful graphics editing programs similar to Adobe
Photoshop 14 and presentation programs similar to Microsoft
PowerPoint 16 to consume the output for further editing and
other uses. Currently, Adobe Illustrator 15 can consume the
SVG output from the annotation implementation without
changing any of the visual annotations and their attributes.
The annotation methodology and subsequent annotation
exchange format, based on XML/SVG supports linking of
images through annotation. The attributes that are linked to
regions of interest on one image can be linked to corresponding regions of interest on other images and remain
persistent. Linked images could be composed of serial
sections generated from a single imaging modality or images
that are generated from different imaging modalities. All of
the information regarding linking between images through
their annotation sets is stored within the XML output file. An
example of how the annotations link images from two
imaging modalities is shown in Figure 7.
By adopting open standards such as XML and SVG in the
software implementation, the annotation methodology provides authors with the ability to save images with the
annotations linked to the images, in a structured format of
XML (SVG).12,13 The open and extensible features of SVG
promote indexing of the image with associated annotations
and textual information, thus, allowing images and annotations to be catalogued in a database or asset management
system. An example of the structured output is shown in
Figure 8, which illustrates an annotated image of the posterior
cranial fossa of the skull.
Lessons Learned
The requirements analysis to define an annotation methodology identified users of different abilities and with specific
requirements. Individuals participating in the requirements
assessment appreciated the need for a methodology but at the
same time did not like the constraints of a methodology. For
example, participants did not want to be too constrained with
the three color choices (gold, white, or black) and preferred
a full range of color options to pick from similar to color
palette options. The individuals that annotate grayscale
images eventually decided that having the additional color
options was not necessary and decided on the original three
color choices. Balancing these competing interests presents
a unique challenge for the software implementation.
From the software developers’ and integrators’ viewpoint,
the annotation methodology and software implementation
must remain simple and extensible for authoring and, at the
same time, generate structured output that is in a standard
format, is flexible, and adheres to open standards. Annotated
image content eventually will require integration into a larger
enterprise cataloging system. By adopting open standards
for the software implementation and structured output, we
can develop a software solution that generates annotated
collections of images that can be integrated into or consumed
by cataloging systems.
Discussion
The overall goal of the project was to define a methodology
for visual annotation of digital images that functions as the
F i g u r e 7. Example of two images that are linked through their annotations (panels A and B). The regions of interest on the
base of the skull (Panel A) are applied to the computed tomography (CT; panel B) that shows corresponding soft tissue structures.
The annotation groupings applied to the CT can be enabled/disabled, depending on the features that are viewed. The
presentation and use attributes allow the user to select which annotations are enabled on either image. The images remain linked
through the annotation sets.
Journal of the American Medical Informatics Association
Volume 11
Number 1 Jan / Feb 2004
39
F i g u r e 8. Structured output (XML) of the annotated image of the posterior cranial fossa and the foramen magnum (FM)
shown in Figure 6. The output shows the annotations in a structured XML format. The XML output is flexible and contains
additional information related to the annotations such as style, the metadata fields that are generated from annotation session,
and links to the images. The flexibility of XML facilitates indexing and cataloging, Web presentation, and consumption by other
systems such as a database or file system.
foundation for a software implementation that fulfills the
image annotation and visual knowledge representation
requirements of clinicians, medical educators, and basic
scientists. After completing a requirements analysis, we
defined a methodology and developed a software implementation prototype that provides users with the ability to
visually annotate an image that preserves the original image,
links the visual annotations and expert knowledge to the
image, enables reuse of the images and annotations, provides
interactive viewing, and supports context-appropriate presentation of annotated images.
A workflow-crippling issue identified by clinicians, medical
educators, and basic scientists is the lack of software solutions
to create visual annotations on digital images, with associated
expert knowledge, that can be shared or reused either
together or separately.3,19,29 An ideal solution is to enable
clinicians, medical educators, and basic scientists to annotate
their digital images with a software package that is simple to
use, facilitates locating and retrieving images and/or their
associated annotations, and can be integrated into the
workflow. Additionally, a solution must also provide the
ability to output annotated images for context-appropriate
presentation to the WWW, print, or other digital presentation
used for professional conferences. Unlike the analog, or hard
copy environment (rub on labels), the digital environment
requires a formal definition (methodology) to handle digital
image annotation. The author is not able to just ‘‘do it’’ on
a computer without formal definition of the data he or she
is handling and a user-friendly, easy-to-learn software
implementation to support the definition. Caruso et al.30
proposed using Adobe Photoshop (Adobe Systems, San
Jose, CA) for annotation of digital image data, such that the
digital image data become suitable for publication. This
approach is used widely for annotation of print-quality
images; however, because the final output is a raster image
file with the annotations also rasterized and therefore
embedded in the image, the visually annotated images are
only suitable for print and no other media (i.e., WWW
40
GOEDE ET AL., Annotating Images for Context-appropriate Use
viewing and interactivity). Further, the process of annotating
with Adobe PhotoShop does not link annotated information
to the image and does not lend itself to reuse of the source
material (image and annotations) for other purposes or
interaction with the annotations and context-appropriate
presentations, especially in the digital environment.4,29,30
Because raster output does not separate the visual annotations from the underlying image, the annotations cannot
be manipulated separate from the image. It is this manipulation that permits interactivity, reuse for multiple purposes,
multiple publishing targets (print, Web), and multiple
contexts as illustrated in the output from the annotation
implementation.
The annotation methodology began several years ago as a set
of style decisions for a raster-based, image annotation, Web
application named ArrowMagick 17 that was developed using
the ImageMagick libraries.31 The application itself was not
deployed, but the requirements and concepts that were built
into the software were extracted and reconfigured as the first
draft of the annotation methodology. Development of the
annotation methodology continued by analyzing existing
analog and digital annotation mechanisms. Traditional
annotation methods, including photographic annotation with
Letraset, physical annotation with marking utensils, the
concept of ‘‘pin-and-string’’ annotation,10 digitally annotated
magnetic resonance (MR) images,1 and map labeling in the
Geographical Information Systems (GIS) field 5,6 were used to
form the foundation for the annotation methodology.
Individual user habits, with general-purpose image manipulation software (e.g., Adobe PhotoShop [Adobe Systems]14),
were observed and taken into consideration while defining
a medical-based method for annotating digital images.
The annotation methodology constrains the author to a set
of artistically clean choices for presentation and authoring.
In addition, the methodology defines how the annotation
information is captured in a structured manner for reuse
and stores the annotated information in a vector format for
visual clarity.19 The annotation methodology forms the basis
of an XML schema that defines a platform-independent
annotation exchange format.12,13 The annotations are stored
as vector information in the SVG format and remain linked to
the original image. The annotation exchange format ensures
that the annotations are accessible as vector data linked to the
image, not embedded in the image.
The design decisions behind the annotation methodology
have been derived from traditional analog annotating
methods and a consensus of common practices of clinicians,
medical educators, and basic scientists. The basic annotation
element of pointer and label exists across professional fields
and conveys the meaning of ‘‘this is the focus of attention.’’
However, each professional field deviates from the basic
annotation elements to handle field-specific data. For example, in the field of academic radiology, radiologists often
identify pathology as an ROI and want the ability to point with
an associated label or caption, as often is the case in a clinical
teaching conference. On the other hand, in the field of human
gross anatomy, detailed assignment of labels is customary to
teach students relevant normal structures in a region.11
Multiple colors and shapes of labels are needed because the
color and contour of human anatomic structures are variable.
Another example is the field of basic science research for
which the ability to identify relevant structures on a digital
photograph for publication in a peer-reviewed journal,
including electronic journals, is of paramount importance.
The annotation methodology was defined to establish
a schema for software development that would provide the
author the ability to annotate an image similar to commercially available tools such as Adobe Illustrator 15 and Adobe
Photoshop,14 in which the author identifies a region of
interest on an image, then places an arrow or pointer to draw
attention to a region of interest. These features were noted,
since many authors are familiar with such applications.
The annotation methodology has been used to develop
software that enables authors to layer annotations on images
while retaining the original format of the image and by
linking annotations to the image file. The Annotation
Methodology and implementation allow the author to add,
track, and retain visual annotations and associated expert
knowledge as separate layers of information within the
software environment and in the output. Additionally, the
structured nature of the visual annotations and associated
textual knowledge allows the image and visual information
to be indexed and cataloged, thus, facilitating location of the
annotated image for other uses.
Standards that define annotation elements (i.e., symbols,
labels, and captions) on digital images have yet to be adopted.
Associated standards for applying annotation to digital
medical images likewise do not exist. Without a standard
for digital image annotation, every image becomes a custom
annotation job. The annotation methodology presented here
is an assembled and evolved annotation exchange format that
solves the problem of standardizing digital image annotation
in health care yet has applicability to other disciplines.32
The shortcomings of establishing the annotation methodology are in the complexity of the methodology. What
started as a definition for placing arrows and labels on
digital images grew into a complex definition of requirements for annotation of digital images produced in
a university medical center community.11 The definition
included the ability to reuse annotated images for multiple
output and to create context-appropriate presentation for
interaction with clinicians, medical educators, basic
scientists, and, in the future, patients. The shortcomings
should become less of a hindrance, however, as new
software applications implement methodologies that give
users the flexibility described in the annotation methodology. A limitation of the assessment was that a modest
usability study was conducted to collect feedback and gather
user statistics on the implementation. An extensive usability
study is needed to determine outcomes of developing and
using a tool for visual annotation and will be the focus of
a follow-up report.
The annotation methodology, through modification of earlier
versions of the software implementation, has been presented
at several national conferences, including the Radiological
Society of North America InfoRAD (RSNA),33 Society of
Computer Applications in Radiology (SCAR),34 American
Telemedicine Association (ATA),35 and Federation of
American Societies for Experimental Biology (FASEB).36
Each of these professional meetings has provided a forum
Journal of the American Medical Informatics Association
Volume 11
outside the University of Utah Health Sciences Center to
collect feedback from medical academicians who are participating in complementary and competing projects. This
feedback was also evaluated and incorporated into the
current annotation methodology.
Conclusions
The annotation methodology presented in this report provides several key solutions for creating interactive digital material. Because the annotation methodology is the
culmination of multidisciplinary consensus, the methodology
is a robust standard that fulfills medical annotation requirements, at least currently. Clinicians, medical educators,
and basic scientists can collect and annotate digital images
and group relevant annotated groups for use in a contextappropriate environment, with colleagues, students, or patients, using the same annotated image without modifying
the original image. Furthermore, because the annotation
software implementation uses a standard, open annotation
exchange format (XML, SVG), the annotated images can be
reused in a cross-media publishing environment (e.g., print,
Web, and database), interactive teaching, and contextappropriate collaboration and communication (e.g., clinician–clinician, clinician–basic scientist, medical educator–
student). By creating a user-friendly tool that promotes
standardization of annotations, indexing, cataloging, and
vector-based interactivity, the annotation methodology
functions as the foundation for new and important solutions
for annotating digital images.
References
j
1. Caruso R, Postel G, McDonald C, Aronson B, Christensen J.
Software-annotated, digitally photographed, and printed
MR images: suitability for publication. Acad Radiol. 2002;
9:346–51.
2. Marshall C. Annotation: From Paper Books to the Digital
Library. Proceedings of the 1997 ACM International Conference
on Digital Libraries (DL97). Philadelphia, PA: ACM Press on
Digital Libraries, pp 131–40.
3. Davidson H, Lauman J, Goede P, Harnsberger HR. CAT:
A methodology for annotating digital teaching file images.
Scientific Program Proceedings in Radiology. 2000:698.
4. Goede P. CAT: An Annotation Methodology for the Medical Image Annotation Tool. National Center for Research
Resources (NCRR) Sponsored BioInformatics Approaches to
Neuroimaging in Clinical Research. Seattle, WA: January 25–27,
2002.
5. Wagner F, Wolff A. Map labeling heuristics: provably good and
practically useful. ACM, Annual Symposium on Computational
Geometry. 2001;3:109–11.
6. Wagner F, Tycho S, Wolff A, Kapoor V. Three rules suffice for
good label placement. Algorithmica. 2001;30(2):334–49.
7. Brinkley JF, Rosse C. The digital anatomist distributed framework and its applications to knowledge-based medical imaging.
J Am Med Inform Assoc. 1997;4:165–83.
8. Chronaki C, Zabulis X, Orphanoudakis S. I2Cnet Medical Image
Annotation Service. Med Inform, Special Issue. 1997;22:337–47.
9. Albertine KA. Use and Re-use of Content from an Imager’s
Perspective. National Center for Research Resources (NCRR)
Sponsored BioInformatics Approaches to Neuroimaging in
Clinical Research. Seattle, WA: January 25–27, 2002.
10. Lober B, Brinkley J. A portable image annotation tool for
Web-based anatomy atlases. Proc Am Med Inform Assoc.
1999.
Number 1 Jan / Feb 2004
41
11. Morton DA, Goede PA, Lauman JR, Albertine KH. Annotation
tool for images for human gross anatomy. FASEB J. 2002;16:A1090.
12. World Wide Web Consortium (W3C) eXtensible Markup Language (XML) Working Group. <http://www.w3c.org/xml>.
Accessed October 31, 2003.
13. World Wide Web Consortium (W3C) Scalable Vector Graphics
(SVG) Working Group. <http://www.w3c.org/svc>. Accessed
October 31, 2003.
14. Adobe PhotoshopÒ, San Jose, Calif. <http://www.adobe.com/
products/photoshopmain.html>. Accessed October 31, 2003.
15. <Adobe IllustratorÒ, San Jose, Calif. http://www.adobe.com/
products/illustratormain.html>. Accessed October 31, 2003.
16. Microsoft Corp., Redmond, WA. <http://Microsoft.com/>.
Accessed October 31, 2003.
17. Heaps N, Davidson H, Lauman J, Harnsberger H. Arrow
Magick: Labeling digital radiological images on the Web.
Telemed J. 1999;5:95.
18. Albertine K, Morton D, Peterson K, Dalton M, Schultz R.
Radiologic holograms as teaching tools for human gross
anatomy. FASEB J. 2001;15:A65.
19. Lieberman H, Rosenweig E, Push S. Aria: an agent for annotating and retrieving images. IEEE Computer. 2001;34(7):57–62.
20. Chronaki C, Zabulis X, Orphanoudakis S. I2Cnet medical image
annotation service. Medical Informatics, Special Issue. 1997;
22(4):337–47.
21. Marshall C. Making metadata: a study of metadata creation for
a mixed physical-digital collection. 1998 ACM International
Conference on Digital Libraries (DL98).
22. Wiggins RH, Davidson HC, Harnsberger HR, Lauman JR,
Goede PA. Image file formats: past, present and future. Radiographics. 2001;21:789–98.
23. The Hypertext Markup Language (HTML). <www.w3c.org>.
Accessed October 31, 2003.
24. Joshi VG. Brightness contrast as source of error in the Ishihara
test for colour blindness. J All India Ophthalmol Soc. 1965;
13(3):83–7.
25. Aarnisalo E. Screening of red-green defects of colour vision with
pseudoisochromatic tests. Acta Ophthalmol (Copenhagen).
1979;57(3):397–408.
26. Active Static TCL Developer Exchange. <http://www.tcl.tk/>.
Accessed October 31, 2003.
27. Apple Computer, Cupertino, CA. <http://www.Apple.com/>.
Accessed October 31, 2003.
28. LINUX Online, Ogdensburg, NY. <http://www.Linux.org/>.
Accessed October 31, 2003.
29. Lauman J. Image annotation and re-use issues in medical
academia. Proceedings of American Society for Experimental
Biology. FASEB J. 2001;15:A67.
30. Caruso R, Postel G. Image editing with Adobe Photoshop 6.0.
Radiographics. 2002;22:993–1002.
31. ImageMagickÓ 1998, E. I. Du Pont de Nemours and Co., Inc.,
John Cristy.
32. Lober B. Personal Annotated Image Server (PAIS). <http://
faculty.washington.edu/lober/pais/IML1c/>. Accessed October 31, 2003.
33. Radiological Society of North America (RSNA). <http://
www.rsna.org/>. Accessed October 31, 2003.
34. Society for Computer Applications in Radiology (SCAR).
<http://www.scarnet.org/>. Accessed October 31, 2003.
35. American Telemedicine Association (ATA). <http://www.
atmetda.org/>. Accessed October 31, 2003.
36. Federation of American Societies Proceedings (FASEB). <http://
www.faseb.org/>. Accessed October 31, 2003.
37. Cochella C, Lauman JR, Goede P, Harnsberger HR, Katzman GL.
A simple mechanism for sharing and transporting medical
digital case information across disparate computer language and
data storage environments. J Dig Imaging. 2001;14(2 suppl
1):187–9.