Computer Implemented Semantic Search Methodology, System and

USOO8880389B2
(12) United States Patent
(10) Patent N0.:
Io?nov
(54)
(45) Date of Patent:
COMPUTER IMPLEMENTED SEMANTIC
SEARCH METHODOLOGY, SYSTEM AND
7,769,742 B1*
8,005,841 B1 *
8 020 206 B2 *
COMPUTER PROGRAM PRODUCT FOR
(76)
US 8,880,389 B2
’
’
8/2010 Brawer et a1. ............... .. 707/709
8/2011 Walsh et a1. ..... ..
707/739
9/2011 Hubbard et a1.
726/22
8,245,135 B2*
8/2012
DETERMINING INFORMATION DENSITY IN
2001/0049700 A1*
12/2001
TEXT
2002/0052894
2002/0069100
2003/0182124
2003/0182631
A1*
A1*
AI*
A1*
5/2002
6/2002
9/2003
9/2003
2004/0267730
Inventor:
Igor Io?nov, Brooklyn, NY (US)
-
-
( * ) Notice:
-
-
-
Nov. 4, 2014
-
Cal et a1. ...... ..
715/254
Ichikura ........... ..
707/512
Bourdoncle et a1.
707/513
Arberman
705/10
Khan ~~~~~ ~~_~~~_~ ~~~~~~~~~~~~ ~~ 704/2701
Tsochantar1d1s et a1.
715/531
A1*
12/2004
Subject to any disclaimer, the term ofthis
Zoos/0015357 A1,,
100%
Patent 15 extended or adJusted under 35
U.S.C. 154(b) by 564 days.
2005/0027708 A1* 2/2005 Mueller et a1. .
707/10
2005/0278309 A1* 12/2005 Evans et a1. ..................... .. 707/3
C
(21) Appl. N0.: 13/315,298
D
'
t
l.
....... ..
707/3
707/1
t'
d
( on “me )
OTHER PUBLICATIONS
(22) Filed:
Dec. 9, 2011
(65)
Alonso, O., Germ., M. and Baeza-Yates, R., Nov. 2009. “Clustering
and Exploring Search Results using Timeline Constructions”.
Prior Publication Data
Us 2013/0151236 A1
Jun. 13, 2013
CIKM’09’ 974°“
(51) Int Cl
(Continued)
G06F 17/20
(2006-01)
Primary Examiner * Edgar Guerra-Erazo
i 5‘;
(52)
(74) Attorney, Agent, or Firm * Steven Rinehart
G06F 17/21
(2006.01)
G10L 21/00
(2013.01)
(57)
_
us CL
A method, computer program product and system are dis
USPC ______________ __ 704/9. 704/1. 7040. 7046. 704/4.
closed for determining the semantic density of textualized
704/5. 704/6. 702‘”. 70mg. 702V“). 7b4/270f
,
(58)
ABSTRACT
I
704,070 1 7704/27
digital media (a measure of how much information is con
70 4 078’
veyed in a sentence or clause relative to its length). The more
Field of Classi?cation Search
’
’
USPC
704/1i9 270 270 1 275 278
semantically dense text is, the more information it conveys in
a given space. Users input atopic, atimeline, and one or more
See ap'tlgr C 0mp1 a; seagch hist’ory ’
target web media sources for analysis. Text in the target media
'
(56)
sources is deconstructed to determine density, and a density
References Cited
rating assigned to the web media source. Over time, users can
track trends in the density of text media relative to a given
topic, and determine how much information is being con
veyed in connection with the topic, such as a political cam
U.S. PATENT DOCUMENTS
6,081,774 A *
6,154,213 A *
6,675,159
paign. Line graphs, pie charts, and other time-elapsed output
6/2000 de Hita et al. ................... .. 704/9
Renni son et a1.
..
11/2000
B1*
1/2004
7,117,207 B1*
7,260,573 B1*
7,707,161 B2*
10/2006
8/2007
4/2010
Lin et a1.
. . . . . . . . . . . . . .
. . . . . . . . ..
graphic representations of the semantic density are generated
1/1
and rendered for the user.
Kerschberg et al. .
........ .. 1/1
Jeh et a1. ..................... .. 707/708
Hall et a1. ............ .. 707/999.103
20 Claims, 8 Drawing Sheets
702
704
700
c dependencles
exist between
709
clauses?
72°
Has Bantam
changed from
prevlous scan?
Crsale one or "WIS ASG
graphs linking clauses
"0
graphically
Yes
Parse sentences In textuallzed
data on each web media SOUI'CE
ldemlfy clauses wllhln the
sentences
Store semences and clauses In
730
Archive semantic density
Ci lculabe semantic
densin score (or senlsncs
lnlunnellen in memory
Calculate semamlc
density score ier web
medla source
Render llmeelepeed graphls)
US 8,880,389 B2
Page 2
(56)
References Cited
OTHER PUBLICATIONS
U.S. PATENT DOCUMENTS
2006/0026013
2007/0038610
2007/0073745
2008/0034058
2008/0162498
2008/0263038
2008/0288442
2009/0138565
2010/0268700
2010/0318526
2013/0091090
A1 *
2/2006
A1*
2/2007 Omoigui ..
A1*
A1*
3/2007
2/2008
A1*
A1*
Cui, W., Liu, S., Tan, L., Shi, C., Song,Y., Gao, Z., Qu, H., andTong,
X., Dec. 2011. “Text?ow: Towards better understanding of evolving
topics in text”. I EEE Trans. Vis. Comput. Graph. 17, 12 (2011 ),
Kraft ..................... ..
Scott et al. ............ .. .... ..
Korman et a1. ....... .. .... ..
707/100
2412-2421~*
709/217
Liu, S., Zhou, M., Pan, S., Qian, W., Cai, W., Lian, X., Nov. 2009.
“Interactive Topic-based Visual Text Summarization and Analysis”,
7/2008 Omoigui
10/2008
Judge et al. ...... ..
A1* 10/2010 Wissner et al
707/2
‘ 709/206
, 707/706
A1*
. 707/749
A1 * 11/2008 Feigenbaum et al.
A1 *
A1 *
5/2009 Shiff et a1.
12/2010 Nakazawa et al. .
4/ 2013
Spivack et al. ........ .. .... .. 707/608
Wei, F., et al., Jul. 2010. “Tiara: AVisual Exploratory Text Analytic
System”. KDD’IO, Jul. 25-28, 2010, pp. 1-10.*
* cited by examiner
US. Patent
Nov. 4, 2014
Sheet 1 0f 8
US 8,880,389 B2
100
K
Server
116a
Media
content\\
Indexing
—1183
server @
Internet
User
112a
m
Web
server
?
DBMS
M
User
112b
Persistent
Storage
106
FIG. 1
Keywor S
108
Media
content \
118b
Server
116b
e 1c;o()ntent
\
US. Patent
Nov. 4, 2014
Sheet 2 0f8
US 8,880,389 B2
Storage
User
Keyword(s) 10ia
112g
Timeline 202a
URL 2 4a
URL
4b
E
Member
Structure
M
User
112g
Registration Q
Keyword(s) 108b
Timeline 2b
URL 2 40
Web content
m
Web media
Web media
Sentences 21 Oa-z
m
m
Clauses 212a-z
Stats 214a-f
Density
ratings
m
User
M
Density m
Graph m
Stats m
Keywords
M
AST
graphs
M
200
FIG. 2
US. Patent
Nov. 4, 2014
Sheet 3 0f8
US 8,880,389 B2
300\
A
->
COMPILING A LIST OF WEB MEDIA SOURCES @
1
—>
QUERYING THE WEB MEDIA SOURCES @
->
—> STORING THE TEXT IN COMPUTER READABLE MEMORY w —>
—>
IDENTIFYING SENTENCES WITHIN THE WEB MEDIA @
—>
—>
STORING THE SENTENCES IN AN ARRAY m
—>
—>
IDENTIFYING CLAUSES WITHIN THE SENTENCES m
—>
—>
STORING THE CLAUSES WITHIN AN ARRAY m
—>
—> QUANTIFYING SEMANTIC ELEMENTS IN THE CLAUSES m —>
—>
CALCULATING SEMANTIC DENSITY SCORE m
—>
—>
QUERYING SENTENCES FOR KEYWORD(S) i
—>
->
QUERYING WEB MEDIA SOURCES FOR INSTANCES OF THE
KEYWORD(S) ?
_’
_>
RENDERING A GRAPHIC REPRESENTATION OF SEMANTIC
DENSITY g
_’
4
V
FIG. 3
US. Patent
Nov. 4, 2014
Sheet 4 0f8
US 8,880,389 B2
I
System
1
I
400
I
|
:
Web server
102
Indexing server
120
RDBMS
104
|
:
Sentence
Synonym
Text
Acceptor
I
I
I
scanner
module
414
determiner
module
416
querier
module
418
module
420
l
l
|
|
|
Render
Density
ASG
Depdncy
module
422
storer
module
424
creator
module
426
analyzer
module
428
|
l
l
|
|
|
Block
storer
module
|l
420
Reception
module
432
l
:
l
l
|
I
l
l
l
|
|
|
|
l
l
|
|
URL
prompter
module
Calculator
module
436
l
44
I
ll
|
l
FIG. 4
Timeline
220
US. Patent
Nov. 4, 2014
500\
///
/
/
//
sentence
US 8,880,389 B2
\\\
Array 502
strlng 210 ?x
\
\
\
/
/
/
/ / /
/’
Sheet 5 0f8
‘ / /
A
\
\ \ ‘
K, ’
\ \
\\
\A
\IndepenQerTf \ Ereposition Parenthet)‘ Clause Abpqsiti Modi?e?~ .End Code
\
ciau‘ée
\58621
4 ’ 504
\
\
\
\\‘
\
\
Verb
‘Noun
\
514
91 6
\
\
510a
508a/ I
/
/
/
\
\
\ \
-ca| /
\
\
I
\
518a
/
\
I
\
\
‘512
\ z
\\
\
\
\
\
\\
/ Adverb
/
on \ \ \51 00
510b
Verb\ \
520a
\\
Subject
\ \ 522a
\
\
/
\
\
FIG. 5
US. Patent
Nov. 4, 2014
Sheet 6 0f8
US 8,880,389 B2
600
VVebpage
VVebpage
118a
118b
VVebpage
VVebpage
1189
1180
Search
Engme
@
VVebpage
VVebpage
118f
118d
VVebpage
118g
VVebpage
FIG. 6
118h
US. Patent
Nov. 4, 2014
Sheet 7 0f8
704
702
Promptand
userend
for p0|nts
timeline start
700
: -
.
706
US 8,880,389 B2
Start
/
+
Prompt user for topic
708
\A
710
o dependencies
exist between
¢
clauses?
Scan web media content
Has content
720
Changed from
Create one or more ASG
\A
graphs linking clauses
previous scan?
graphlcauy
Yes
722
712
¢
\/\
Increase one or more
\A Parse sentences in textualized
dependency identi?ers
data on each web media source
71 4
\A
716
l,
724
Identify clauses within the
or topic synonyms
sentences
found in sentence?
l
\A Store sentences and clauses in
_l\
arrays
726
\/\
730
\A
Archive semantic density
information in memory
v
‘
Calculate semantic
density score for sentence
728
\A
+
Calculate semantic
density score for web
732
r
\/\ Render time-elapsed graph(s)
media source
734
A
End
t
FIG. 7
US. Patent
Nov. 4, 2014
Sheet 8 0f8
US 8,880,389 B2
Politics [density of discussion 41.1%]
/
800
High
Medium
Low
12/12/2009
-
12/12/2010
-
12/12/2011
850
Candidate XYZ [density of discussion 32.4%]
/
High
Medium
Low
12/12/2009
-
12/12/2010
FIG. 8B
-
12/12/2011
US 8,880,389 B2
1
2
COMPUTER IMPLEMENTED SEMANTIC
operations of the computer program product comprising:
SEARCH METHODOLOGY, SYSTEM AND
compiling a list of web media sources for analysis in com
puter readable memory; querying the web media sources for
a block of text; storing the block of text in volatile computer
readable memory; identifying sentences within the block of
text; storing the sentences as strings within an array; scanning
COMPUTER PROGRAM PRODUCT FOR
DETERMINING INFORMATION DENSITY IN
TEXT
each sentence to quantify one or more of the following: a
BACKGROUND OF THE INVENTION
number of words in the sentence; a number of prepositions,
postpositions, adjectives, adverbs, verbs, nouns, and gram
1. Field of the Invention
This invention relates to information retrieval systems, and
more particularly relates to a computer implemented method
matical conjunctions, by referencing words within the sen
tence with a dictionary in computer readable memory; a num
ber of dependent clauses in the sentence; a number of
independent clauses in the sentence; a number of ellipsis, a
number of dashes (both en dashes and em dashes), and a
number of commas, semicolons, and colons; a number of
and system for determining the information density of a target
body of digital text.
2. Description of the Related Art
With the emergence of the Internet has come access to
almost limitless amounts of information. Users typically seek
subjects and predicates in the sentence; a number of apposi
access to this information with computer in logical commu
tions in the sentence; a number of syllables in each word of
the sentence by cross-referencing each word with the dictio
nary in persistent storage; and a number of alphanumeric
nication with an information-retrieval system like Google®
or other search engines. These search engines typically com
prise a graphical user interface for accepting keywords used
20
characters in the sentence; storing each quanti?ed number in
a persistent computer readable database with a time-stamp
to identify relevant information, typically in the form of
online electronic documents.
identifying the date the number(s) were quanti?ed; calculat
Typical search engines rank the importance of a document
by the number of instances that the keyword appears in the
body of text forming the document, the number of backlinks
ing a semantic density score for each web media source in the
list of web media sources, wherein the semantic density score
is a function of the quanti?ed numbers for each sentence in
25
the web media source; and storing the semantic density score
in a persistent computer readable database, the score exclu
sively associated with the web media content from which the
to the document, and the importance of the documents com
prising the backlinks. This is problematic because many
documents included in a given set of search results contain
only super?cial references to the keywords, and users are left
to manually ?lter through irrelevant results in search for the
documents conveying more denser amounts of information.
There exists no ef?cient means in the art of valuing the
context in which keywords are appearing, or of determining
the information density of sentences and clauses comprising
the keywords and it synonyms. Methods, computer program
products and systems are lacking in the art which determining
the semantic density of textualized digital media. Semantic
score was derived.
30
a topic for semantic density determination; prompting the
user with a webpage to input a timeline for semantic density
determination; prompting the user with a webpage to input
35
URLs for one or more web media sources; determining syn
onym(s) for the topic by referencing a thesaurus in computer
readable memory; querying textualized media on the web
media source for instances of the topic; querying textualized
density is a measure of how much information is conveyed in
a sentence or clause relative to its length in a given block of
text. The more semantically dense text is, the more informa
tion it conveys in a given space.
In some embodiments, the computer program product of
further comprises: prompting a user with a webpage to input
media on the web media source for instances of synonyms of
40
The semantic density of text with regard to a particular
the topic identi?ed by referencing the dictionary; storing
semantic density ratings for sentences comprising the topic;
storing semantic density ratings for sentences comprising the
topic is a useful metric to users desiring to measure trends in
synonyms of the topic; and rendering a graph on a computer
online chatter, news, such as a political campaign.
The present invention bene?cially teaches a unique com
display showing elapsed time across one axis, the graph
showing a plurality of semantic density ratings for the web
45
media identi?ed by the user.
The web media sources may be queried at predetermined
intervals of time.
The list of web media sources may consist of web media
puter implemented methodology for determining the seman
tic word density of bodies of text in digitized documents
which overcomes prior shortcomings in art.
SUMMARY OF THE INVENTION
50
sources manually identi?ed by one or more users to the web
site. The sentences may be identi?ed within the block of text
as being a string of text preceding a period immediately
From the foregoing discussion, it should be apparent that a
need exists for a method, system, and computer program
55
followed by a space.
Therein sentences may be identi?ed within the block of
text as being a string of text following a period (.) immedi
60
ately followed by a space ( ) which string of text also precedes
a period (.) immediately followed by a space ( ).
The computer program product may further comprise stor
ing a dictionary in computer readable memory, the dictionary
identifying words of a language, the dictionary characterizing
product for determining the semantic word density of a body
of text. The present invention has been developed in response
to the present state of the art; and, in particular, in response to
the problems and needs in the art that have not yet been fully
solved by currently available methods, systems, computer
program products and apparatii, which invention overcomes
many or all of the above-discussed shortcomings in the art.
Accordingly, the present invention has been developed to
provide a method, computer program product, and system for
each word as one or more of a noun, verb, adverb, adjective,
determining semantic word density.
lables in each word.
In some embodiments, a dependent clause in the sentence
A computer program product is disclosed, comprising a
computer readable medium, having computer usable pro
gram code executable to perform operations for determining
a semantic density of textualized digital web media, the
and preposition, the dictionary identifying a number of syl
65
may be identi?ed as a string of text not comprising a verb
within two commas. An independent clause in the sentence
may be identi?ed as a string of text comprising a verb within
US 8,880,389 B2
3
4
two commas. The semantic density score may further be a
ing a semantic density of textualized digital web media, the
function of the number of sentences appearing in the block of
text. A database query may be generated using the topic and
prompting a user with a graphic user interface to input one or
synonyms, wherein the sentences are stored in a database in
more keyword(s) for semantic density determination;
computer readable memory and queried with the database
prompting the user with the graphic user interface to input a
operations of the computer program product comprising:
query by a DBMS.
timeline for semantic density determination; prompting the
The computer program product may further comprise ren
dering a graph on a computer display showing elapsed time
across one axis, the graph showing a plurality of semantic
word density ratings for the web media identi?ed by the user.
user with the graphic user interface to input domain names for
one or more web media sources, the domain names compiled
into a list of web media sources for analysis in computer
readable memory; storing the blocks of text in the web media
sources in volatile computer readable memory; and identify
ing sentences within the block of text, wherein sentences are
identi?ed within the block of text as being a string of text
A system to perform operations for relaying commercial
offerings from select merchants to select consumers is also
disclosed, the system comprising: a webserver; a semantic
following a period (.) immediately followed by a space ( )
which string of text also precedes a period (.) immediately
followed by a space ( ).
The computer program product further comprises storing
indexing server; an RDBMS; a topic prompter module con
?gured to prompt a user with a graphic user interface to input
a topic for semantic density determination; a timeline
prompter module con?gured to prompt the user with a
graphic user interface to input a timeline for semantic density
the sentences as strings within an array; identifying clauses
determination; a URL prompter module con?gured to prompt
within the sentences satisfying one or more of the criteria
the user with a graphic user interface to input URLs for one or 20 from the group consisting of: a string of text enclosed ellipsis;
more web media sources, the URLs compiled into a list of
a string of text enclosed by en dashes a string of text enclosed
by em dashes; a string of text within the sentence enclosed by
web media sources for analysis in computer readable
memory; a synonym determiner module con?gured to deter
mine synonym(s) for the topic by referencing a thesaurus in
computer readable memory; a block storer module con?g
ured to store the blocks of text forming the web media sources
in volatile computer readable memory; and a sentence iden
25
ti?er module con?gured to identify sentences within the
block of text.
The system further comprises a sentence parser module
con?gured to parse each sentence to quantify one or more of
one or more of commas, semicolons, and colons; storing the
identi?ed clauses as strings within an array; calculating a
semantic density score for each web media source in the list of
web media sources, wherein the semantic density score is a
function of the number of clauses in each sentence of the web
media source; storing the semantic density score in a persis
tent computer readable database, the score exclusively asso
30
ciated with the web media content from which the score was
derived; determining synonym(s) for the keyword by refer
the following: a number of words in the sentence; a number of
encing a thesaurus in computer readable memory; querying
prepositions, postpositions, adjectives, adverbs, verbs,
the sentences for instances of the keyword and synonyms;
nouns, and grammatical conjunctions, by referencing words
within the sentence with a dictionary in computer readable
memory; a number of dependent clauses in the sentence; a
number of independent clauses in the sentence; a number of
ellipsis, a number of dashes (both en dashes and em dashes),
and a number of commas, semicolons, and colons; a number
of subjects and predicates in the sentence; a number of appo
sitions in the sentence; a number of syllables in each word of
storing semantic density ratings for sentences comprising the
35
display showing elapsed time across one axis, the graph
showing a plurality of semantic density ratings for the web
media identi?ed by the user.
Reference throughout this speci?cation to features, advan
40
the sentence by cross-referencing each word with the dictio
nary in persistent storage; and a number of alphanumeric
characters in the sentence.
The system further comprises a value storer module con
?gured to store each quanti?ed number in a persistent com
invention should be or are in any single embodiment of the
45
advantages is understood to mean that a speci?c feature,
advantage, or characteristic described in connection with an
embodiment is included in at least one embodiment of the
present invention. Thus, discussion of the features and advan
tages, and similar language, throughout this speci?cation
may, but do not necessarily, refer to the same embodiment.
50
Furthermore, the described features, advantages, and char
acteristics of the invention may be combined in any suitable
manner in one or more embodiments. One skilled in the
relevant art will recognize that the invention may be practiced
persistent computer readable database, the score exclusively
without one or more of the speci?c features or advantages of
associated with the web media content from which the score
was derived; a text querier module con?gured to query tex
tualized media on the web media source for instances of the
topic; a synonym query module con?gured to query textual
ized media on the web media source for instances of syn
55
onyms of the topic identi?ed by referencing the dictionary;
60
a particular embodiment. In other instances, additional fea
tures and advantages may be recognized in certain embodi
ments that may not be present in all embodiments of the
invention.
and a render module con?gured to render a graph on a com
puter display showing elapsed time across one axis, the graph
showing a plurality of semantic density ratings for the web
These features and advantages of the present invention will
become more fully apparent from the following description
and appended claims, or may be learned by the practice of the
invention as set forth hereinafter.
media identi?ed by the user.
A second computer program product is disclosed compris
ing a computer readable medium having computer usable
program code executable to perform operations for determin
tages, or similar language does not imply that all of the
features and advantages that may be realized with the present
invention. Rather, language referring to the features and
puter readable database with a time-stamp identifying the
date the number(s) were quanti?ed; a calculator module con
?gured to calculate a semantic density score for each web
media source in the list of web media sources, wherein the
semantic density score is a function of the quanti?ed numbers
for each sentence in the web media source; a density storer
module con?gured to store the semantic density score in a
keyword and synonyms; rendering a graph on a computer
BRIEF DESCRIPTION OF THE DRAWINGS
65
In order that the advantages of the invention will be readily
understood, a more particular description of the invention will
US 8,880,389 B2
6
5
be rendered by reference to speci?c embodiments that are
The web server 102 may also comprise a computer pro
illustrated in the appended drawings. Understanding that
these drawings depict only typical embodiments of the inven
gram running on one or more data processing devices
tion and are not therefore to be considered to be limiting of its
(DPDs), such as a server, computer workstation, router, main
frame computer, or the like. In various embodiments, the
scope, the invention will be described and explained with
additional speci?city and detail through the use of the accom
computing device well-known to those in the art and may
DPD comprises one or more processors. The processor is a
include an application-speci?c integrated circuit (“ASIC”).
A legal entity controls the web sever 102. This legal entity
may comprise any individual, organization, entity, or combi
panying drawings, in which:
FIG. 1 is a UML deployment diagram illustrating one
embodiment of a system for determining semantic word den
nation thereof, tasked with administering the server 102. The
legal entity may be a legal entity, such as a partnership,
sity in accordance with the present invention;
FIG. 2 is a data-entity relationship diagram illustrating the
interrelationships and data?ow of the operations and modules
of the present invention;
limited liability company (LLC), corporation, joint venture,
municipality, d/b/a, or the like. Information may be routed
from the sever 102 and/ or from the indexing server 120 to any
FIG. 3 is a ?owchart illustrating method steps of the present
DPD, device, person, merchant, consumer, company, organi
invention, including steps which may be performed determin
zation, or the like, in logical communication with the server
102.
ing a semantic density of textualized digital web media in
accordance with the present invention;
The server 102 communicates via the Internet 114 with the
DPDs of users 112a-b through PCs, smart phones, or other
FIG. 4 is a block diagram illustrating one embodiment of a
system which determines a semantic density of textualized
20
personal DPDs such as laptops, Palm Pilots®, digital cam
25
eras, and the like. The server 102 may be in logical commu
nication with users 112a-b through a networked environment
such as the Internet 114. The Internet 114 may comprise a
local area network (LAN) or wide area network (WAN). The
server 102 may email search results to users 112a-b using
digital web media in accordance with the present invention;
FIG. 5 is sentence content unit subdivided into clauses in
accordance with the present invention;
FIG. 6 is a Booch-relationship diagram illustrating the
relationships between the search engine, users, and web
media pages in accordance with the present invention;
variations of the Simple Mail Transfer Protocol (SMTP),
Internet Message Access Protocol (IMAP), Post Of?ce Pro
FIG. 7 is a ?owchart illustrating method steps of the present
invention, including steps which may be performed determin
tocol (POP), or other protocols well-known to those of skill in
ing a semantic density of textualized digital web media in
accordance with the present invention;
FIG. 8A is a line graph illustrated a time elapsed display of
the art.
30
database management system (RDBMS), such as Oracle,
MySQL, SQL, FireBird, IBM DB2®, or the like.
semantic density ratings for a general topic in accordance
with the present invention; and
In the shown embodiment, one or more users 112 login to
FIG. 8B is a line graph illustrated a time elapsed display of
semantic density ratings for a speci?c topic in accordance
The web server 102, in the shown embodiment, comprises
a database management system (DBMS) 104 or relational
35
with the present invention.
DETAILED DESCRIPTION OF THE INVENTION
an account they have created with the server 102, wherein the
account information is stored in computer readable database
memory 106 accessible by the server 102. In other embodi
ments, it is not necessary for the users 112a-b to have
accounts with the server 102.
Reference throughout this speci?cation to “one embodi
40
The users 112a-b may comprise any individual, organiza
ment,” “an embodiment,” or similar language means that a
tion, or group of individuals interested in the trends of seman
particular feature, structure, or characteristic described in
tic word density of a general or speci?c topic over a period of
connection with the embodiment is included in at least one
time. The users 112a-b access the web server 102 via the
embodiment of the present invention. Appearances of the
phrases “in one embodiment,” “in an embodiment,” and simi
lar language throughout this speci?cation may, but do not
Internet 114, interfacing with the server 102 using a graphic
45
necessarily, all refer to the same embodiment.
The described features, structures, or characteristics of the
The servers 116a-b, in the shown embodiment, comprises
DPDs hosting third-party website with digital media content
invention may be combined in any suitable manner in one or
more embodiments. In the following description, numerous
speci?c details are provided. One skilled in the relevant art
50
will recognize, however, that the invention may be practiced
without one or more of the speci?c details, or with other
methods, components, materials, and so forth. In other
instances, well-known structures, materials, or operations are
not shown or described in detail to avoid obscuring aspects of
the invention. The apparatus modules recited in the claims
may be con?gured to impart the recited functionality to the
user interface on a DPD local to the users 112a-b (this graphic
user interface usually comprising a website shown in an Inter
net browser on a computer display).
55
118a-b, which content 118a-b is accessible via the Internet or
another network by the server 102 and/or the server 120.
The users 112a-b submit topics for processing to the sever
102. These
The indexing server 120 scans textualized web media con
tent for text blocks, and parses text found on the web media
sources into sentences, which sentences are arrayed in per
sistent storage 106. Web media content 118a-b gathered from
web media sources on the servers 116a-b is transferred to the
server 102 and then to the indexing server 120, or altema
apparatus.
density in accordance with the present invention. The system
tively straight to the indexing server 120.
In various embodiments of the present invention, the
indexing server 120 is con?gured to identify and store in
100 comprises a web server 102, a DBMS 104, persistent
storage 106, keywords 108, web content 110, a user 11211, a
volatile or nonvolatile memory 106 copies of text on web
media sources identi?ed by the users 112a-b; or pointers or
FIG. 1 is a UML deployment diagram illustrating one
embodiment of a system 100 for determining semantic word
60
user 112b, a network 114, a server 11611, a server 116b, media 65 references to those web media sources on servers 116a-b. In
content 118a, media content 118b, and an indexing server
120.
various embodiments of the present invention, the sever 102
and/or the indexing server 120 may be con?gured to scan a
US 8,880,389 B2
7
8
network 114 and identify and store in volatile or nonvolatile
web media sources. These web media sources are queried 304
memory 106 copies of text found on a network, even at
sources not identi?ed ahead of time by the users 112a-b.
In some embodiments, this scan is done before a topic for
for textualized web media content. This textualized web
media content is stored 306 in computer readable memory
106.
analysis is submitted by the users 112a-b to the server 102. In
Sentences 210 within the web media content 118 are iden
ti?ed and parsed 308 from the rest of the web media content
other embodiments, the scan is done subsequently and only
web media sources found to contain references to the topic (or
118. The sentences 210 are stored 310 in an array datastruc
its synonyms) are parsed and stored by the indexing server
ture.
120 in memory 106.
In some embodiments, the display may be a cathode ray
Clauses 212 are identi?ed 312 and parsed from the sen
tences 210, then stored 314 in an array.
tube (“CRT”), a liquid crystal display (“LCD”), a plasma
display panel (“PDP”), or the like.
FIG. 2 is a data-entity relationship 200 diagram illustrating
Dependencies between the clauses are determined using
methodology further described below. The types of clauses
and dependencies identi?ed are used to increase the values of
variables associated with those dependencies, a process
the interrelationships and data?ow of the operations and mod
ules of the present invention. The relationship 200 comprises
through which semantic density is quanti?ed 316. A semantic
users 112a-c, web media sources 118a-b, and storage 106.
density score for the each sentence, web media source, and
The storage 106 comprises computer readable ?les, including
member structure 216, web content 110, density rating 218,
keywords 108, and AST graphs 220. Various species of data
the topic as a whole are calculated 318.
are transmitted back and forth between the users 112a-c, the
web media 118a-b, and storage 106, including keywords
108a-b, timelines 202a-b, URL (Uniform Resource Loca
tors) 204a-c, sentences 21011-2, clauses 21211-2, statistics
214a-g, density data 216, and time elapsed semantic word
graphs 218.
In some embodiments, the sentences 210 are then queried
322 for instances of the topic, or keyword(s) 108, input by a
20
instances of synonyms of the keyword(s) 108 ascertained by
cross-referencing a digital thesaurus. Alternatively, instances
of the keyword(s) 108 in the sentences 210 may be replaced
with the synonyms, and the sentences 210 re-quanti?ed 316
25
In the shown embodiment, user 11211 is prompted to upload
for semantic word density.
Finally, a graph of semantic word density for the topic or
keyword(s) 108 is displayed 324 for the user 112.
via a graphic user interface a number of pieces on information
to the server 102, including one or more keyword(s) 10811, a
timeline 20211 with a start date (i.e. start point) and an end
date, and plurality of URLs 204a-b.
user 112. The sentences 210 may subsequently be queried for
FIG. 4 is a block diagram illustrating one embodiment of a
30
system which determines a semantic density of textualized
digital web media in accordance with the present invention.
The system 400 comprises modules largely con?gured to
perform functions corresponding to the steps of the method
The keyword(s) 108 connote topics of interest to the users
112a for which the user 112a would like semantic word
density information in the form of graphs 218 (shown further
300 and computer program product embodiments. These cor
in FIGS. 8A-8B below) or statistics 214.
The keyword(s) 108 are stored in a database readable ?le
108 in storage 106.
A plurality of URLs to web media content 118a-b are
identi?ed by the user 11211 which the user 11211 would like
searched for textualized media. The URLs 204a-b are stored
in storage 106, and the indexing server 120 scans the web
media sources 118a-b identi?ed by the URLs 204a-b for
textualized web media content. This textualized web media
relations are further described in the claims.
content is parsed into sentences 21011-2 and clauses 214a-f,
which sentences 21011-2 and clauses 214a-f are timestamped
and stored in storage 106. Dependencies between the clauses
35
40
45
The system 400 may be said to form a search engine for
information analysis and summary.
are used to create statistics describing the magnitude of word
50
stored in a database ?le 218 in storage 106.
Subsequently, these statistics 214a-f can be transferred to
another user 1120 (or the same user 112a) in the form of
55
can perform semantic word searches by uploading personal
Certain unique identi?ers are associated with each charac
teristic of each sentence 210. For each sentence 210 a unique
identi?er is exclusively associated with a number of words in
60
the sentence; a number of prepositions, postpositions, adjec
tives, adverbs, verbs, nouns, and grammatical conjunctions,
storage 106.
by referencing words within the sentence with a dictionary in
computer readable memory; a number of dependent clauses
FIG. 3 is a ?owchart illustrating method 300 steps of the
present invention, including steps which may be performed
The method 300 begins 302 with a list of domain names
and/ or ULRs from a user 112 being compiled 302 into a list of
ing a string of text not preceded and/or not followed by any
text.
information to the server 102, such as name, email address,
determining a semantic density of textualized digital web
media in accordance with the present invention.
FIG. 5 is sentence content unit 210 subdivided into clauses
in accordance with the present invention. Blocks of text in the
present invention are parsed into sentences 210. Sentences
210, in some embodiments, are identi?ed as strings of
text separated by a period (.) following immediately by a
space ( ). In other embodiments, the sentences 210 compris
time-elapsed graphs 218.
age, and the like. This personal information is used to create
a member structure 216, which is stored in the database in
Density ratings 218 are stored in computer readable
memory 106 by the indexing server 120 and output in graphi
cal form to the DPD ofa user 112.
212a-z are determined as further described below in relation
In some embodiments of the present invention, users
112a-c are required to register with the server 102 before they
identi?ed by users 112. A timeline 220, with start and end
points, is uploaded by a user 112 to the acceptor module 420.
Keywords 108 and URLs 204a-c are also uploaded to the
system 400 by the user 112. The user 112 may comprise a
human being, group of human beings, or another DPD con
?gured to use the system 400 for information retrieval pur
poses.
to FIG. 4 and other subsequent Figures. These dependencies
density contained by each sentence in the respective web
media 118 (i.e. density ratings). These density ratings are
Textualized web media content 118a-c is gathered by the
block storer module 430 from web content sources at URLs
65
in the sentence; a number of independent clauses in the sen
tence; a number of ellipsis, a number of dashes (both en
dashes and em dashes), and a number of commas, semico
lons, and colons; a number of subjects and predicates in the
US 8,880,389 B2
10
readable memory and only those blocks of text satisfying the
sentence; a number of appositions in the sentence; a number
of syllables in each word of the sentence by cross-referencing
each word with the dictionary in persistent storage; and a
query execution plan relayed to the users 112 or modules
con?gured to parse and quantify identi?ers in the sentences
210 and clauses.
FIG. 6 is a Booch-relationship diagram illustrating the
interrelationships 600 between the search engine, users, and
web media pages in accordance with the present invention. In
the shown embodiment, a search engine 602, comprising the
system 400, is given information by users 112a-d who have
been made aware of speci?c webpages 118a-h which they
number of alphanumeric characters in the sentence.
The magnitude, or value assigned to these unique ideni?ers
is representative of the number of times that each character
istic in the sentence in identi?ed. These unique identi?ers,
along with unique identi?ers representing the dependencies
between sentence 210 clauses 504-510, are used in determin
ing the informative density, or semantic density of sentences
210; and; consequently, entire blocks of text and web source
documents.
The sentence 210 comprises clauses identi?ed by the sys
want processed by the search engine 602.
FIG. 7 is a ?owchart illustrating method steps of the present
invention, including steps which may be performed determin
tem 400. The sentences 210 are stored in computer readable
memory as encoded content units 210, which units are sub
ing a semantic density of textualized digital web media in
accordance with the present invention.
jected to analysis by the indexing server 120.
The keywords 108, in the shown embodiment, comprise
The method 700 begins 702 and progresses as shown. A
user 112 is prompted 704 to input a timeline with start and end
digitized search terms entered by a user of the search engine.
Typically, the keywords 108 are received by the server 102
and relayed to the indexing server 120. The keywords 108
may comprise phrases (one or more words) that relate to
content which the user wishes to search, which content typi
cally will relate to the subject matter of interest of the user. In
response to reception of the keywords 1-8, the sever 120 may
locate information archived in a content library under the
points (i.e. start and end dates). The users 112 is also
prompted 706 for keyword 108 (e.g. a topic). Web media
content identi?ed by the user 112 is scanned 708 by the
20
indexing server 120.
In the shown embodiment, if content has changed 710 from
an earlier scan, information on web media page is scanned
25
control of the sever 120.
The content library may be internal or external to the index
ing server 120. In some embodiments, the content library is in
logical communication with the indexing server 120 via a
distributed network, or through other means well-known to
those of skill in the art. In the shown embodiment, the content
units 210 are stored in a computer readable database con
trolled by the database server 101, such as IBM DB2, Oracle,
or SQL.
The keywords 108 are formulated into a database query
and then associated with a query execution plan which iden
ti?es content units within the content library that satisfy the
30
lations between words and/or clause(s).
In some embodiments of the present invention, blocks of
text identi?ed by a user 112 are searched for instances of the
35
of web media pages are searched for instances of the key
40
XPATH, and the like. The keywords 106a-c may also com
prise other identi?ers relevant to creating, or identifying, the
FIG. 8A is a line graph 800 illustrating a time elapsed
display of semantic density ratings for a general topic in
accordance with the present invention. The general topic for
which time elapsed semantic word density is shown is poli
The keywords 106a-c may be formulated into database
tics. The graph 800 shows word density, or the informative
as needed by a user with a form that is unknown until the
50
is likely to change between requests) or static (meaning the
database query is prede?ned does not change form between
requests, although the parametric data values of the query
may change).
The dynamic database queries comprising the keywords
still further embodiments, the blocks of text may be searched
for instances of synonyms of the keyword(s) 108, while in
still further embodiments, instances of the keyword(s) 108
appearing the text may be substituted out and replaced with
synonyms of the keyword(s) before performing further
semantic word density analysis and calculation.
45
queries which are dynamic (meaning the query is generated
query is received by the database server 101 and which form
keyword(s) 108 after the semantic word density fo the blocks
of text is determined. In other embodiments, large collections
word(s) 108 before semantic word density is calculated. In
may be generated using various languages including SQL,
proper query execution plan such as the title and the names of
speakers featured in the content units 108 or other forms of
additional information described above.
used in calculating the informative density, or semantic den
sity of the sentence 210 containing the clause(s). In some
embodiments, b-tree and/or ASG graphs are used to deter
mine these dependencies. In other embodiments, dictionaries
and thesauri are referenced to determine relations and corre
database query after searching textualized digital transcripts
of the audio portions of the content units 108 stored in asso
ciation with the content units 108. These database queries
again and sentences 210 are parsed 712. Clauses are identi?ed
714, and sentences are stored in arrays 716.
The dependencies between the clauses are identi?ed and
value of the discussion of politics, rising in 2009, slumping in
2010, and rising again in 2011. The sentences discussing
politics average 41.1% of the maximum achievable density
over the given 2009-2011 timeline 220.
FIG. 8B is a line graph 850 illustrating a time elapsed
55
display of semantic density ratings for a speci?c topic in
108-c may comprise more than just textualized words or
accordance with the present invention.
phrases, but also logical bitwise operations that the indexing
The speci?c topic for which time elapsed semantic word
density is shown is the name of political candidate. Examples
might include Barack Obama, Mitt Romney, and the like. The
graph 850 shows word density, or the informative value of the
server 120 infers were intended by the user 112 such as NOR,
OR, XOR, AND, NAND, and the like, as well as other infor
mation. The users 112 may be afforded a great deal of ?ex
60
ibility in searching for not just words and phrases, but com
binations of words, phrases, and the like within the web media
discussion of this candidate, falling in 2009, rising in 2010,
and falling again in 2011. The sentences discussing politics
content.
average 32.4% of the maximum achievable density over the
Web media which satis?es a database query comprising the
keywords 108 may be analyzed for informative or semantic
density only after identifying instances of the keyword(s) 108
given 2009-2011 timeline 220.
The present invention may be embodied in other speci?c
forms without departing from its spirit or essential character
in the text. The web media content may be cached in computer
istics. The described embodiments are to be considered in all
65
US 8,880,389 B2
11
12
respects only as illustrative and not restrictive. The scope of
rendering a graph on a computer display showing elapsed
the invention is, therefore, indicated by the appended claims
rather than by the foregoing description. All changes which
time across one axis, the graph showing a plurality of
come within the meaning and range of equivalency of the
the user.
claims are to be embraced within their scope.
semantic density ratings for the web media identi?ed by
5
What is claimed is:
1. A computer program product comprising a non-transi
tory computer readable medium having computer usable pro
gram code executable to perform operations for determining
4. The computer program product of claim 1, wherein the
list of web media sources consists of web media sources
manually identi?ed by one or more users to the website.
an informative score of textualized digital web media, the
operations of the computer program product comprising:
5. The computer program product of claim 1, further com
prising:
compiling a list of web media sources for analysis in com
puter readable memory;
for each sentence, creating one of an abstract semantic
graph (ASG) and an abstract syntax tree (AST); and
querying the web media sources for a block of text;
storing the block of text in volatile computer readable
memory;
identifying sentences within the block of text;
storing the sentences as strings within an array;
parsing each sentence to quantify one or more of the fol
lowing:
20
a number of words in the sentence;
a number of prepositions, postpositions, adjectives,
by referencing words within the sentence with a dic
characterizing each word as one or more of a noun, verb,
25
text not comprising a verb within two commas.
and em dashes), and a number of commas, semico
30
a number of subjects and predicates in the sentence;
a number of appositions in the sentence;
a number of syllables in each word of the sentence by
cross-referencing each word with the dictionary in
persistent storage; and
adverb, adjective, and preposition, the dictionary identifying
a number of syllables in each word.
7. The computer program product of claim 1, wherein a
dependent clause in the sentence is identi?ed as a string of
a number of dependent clauses in the sentence;
a number of independent clauses in the sentence;
a number of ellipsis, a number of dashes (both en dashes
lons, and colons;
calculating a semantic density score for each web media
source in the list of web media sources, wherein the
semantic density score is a function of the number of
clauses in each sentence of the web media source, the
dependency identi?er, and one or more of the ASG and
the AST.
6. The computer program product of claim 1, further com
prising storing a dictionary in computer readable memory, the
dictionary identifying words of a language, the dictionary
adverbs, verbs, nouns, and grammatical conjunctions,
tionary in computer readable memory;
3. The computer program product of claim 1, wherein the
web media sources are queried at predetermined intervals of
time.
8. The computer program product of claim 1, wherein an
independent clause in the sentence is identi?ed as a string of
text comprising a verb within two commas.
9. The computer program product of claim 1, where in the
35
semantic density score is further a function of the number of
sentences appearing in the block of text.
10. The computer program product of claim 1, wherein a
a number of alphanumeric characters in the sentence;
storing each quanti?ed number in a persistent computer
database query is generated using the topic and synonyms,
readable database with a time- stamp identifying the date
wherein the sentences are stored in a database in computer
readable memory and queried with the database query by a
the number(s) were quanti?ed;
calculating a semantic density score for each web media
source in the list of web media sources, wherein the
semantic density score is a function of the quanti?ed
numbers for each sentence in the web media source; and
storing the semantic density score in a persistent computer
readable database, the score exclusively associated with
40
11. The computer program product of claim 1, further
comprising rendered a graph on a computer display showing
elapsed time across one axis, the graph showing a plurality of
semantic word density ratings for the web media identi?ed by
45 the user.
12. A system to perform operations for determining a level
of discussion of a topic, the system comprising:
the web media content from which the score was
derived.
2. The computer program product of claim 1, further com
a webserver;
a semantic indexing server;
prising:
prompting a user with a webpage to input a topic for
50
semantic density determination;
density determination;
semantic density determination;
prompting the user with a webpage to input (Uniform
Resource Locators) URLs for one or more web media 55
sources;
determining synonym(s) for the topic by referencing a
thesaurus in computer readable memory;
semantic density determination;
to input URLs for one or more web media sources, the
60
URLs compiled into a list of web media sources for
analysis in computer readable memory;
querying textualized media on the web media source for
instances of synonyms of the topic identi?ed by refer
the synonyms of the topic; and
a timeline prompter module con?gured to prompt the user
with a graphic user interface to input a timeline for
a (Uniform Resource Locator) URL prompter module con
?gured to prompt the user with a graphic user interface
querying textualized media on the web media source for
encing the dictionary;
storing semantic density ratings for sentences comprising
the topic;
storing semantic density ratings for sentences comprising
an (Relational Database Management System) RDBMS;
a topic prompter module con?gured to prompt a user with
a graphic user interface to input a topic for semantic
prompting the user with a webpage to input a timeline for
instances of the topic;
(Database Management System) DBMS.
a synonym determiner module con?gured to determine
synonym(s) for the topic by referencing a thesaurus in
computer readable memory;
65
a block storer module con?gured to store the blocks of text
forming the web media sources in volatile computer
readable memory;
US 8,880,389 B2
14
13
a sentence identi?er module con?gured to identify sen
tences within the block of text;
a sentence parser module con?gured to parse each sentence
to quantify one or more of the following:
a number of words in the sentence;
wherein a dependency comprises a clause with one or more
a number of prepositions, postpositions, adjectives,
noun, adjective, adverb, or preposition in a second clause; and
characteristics from the group consisting of: a modi?er in the
?rst clause modifying a verb in a second clause; a modi?er in
a ?rst clause modifying one of a noun, verb, and adverb in
second clause; a complement in the ?rst clause of one of a
adverbs, verbs, nouns, and grammatical conjunctions,
an interjection referencing a noun in the second clause.
by referencing words within the sentence with a dic
18. The system of claim 12, further comprising rendering a
tionary in computer readable memory;
graph on a computer display showing elapsed time across one
a number of dependent clauses in the sentence;
a number of independent clauses in the sentence;
a number of ellipsis, a number of dashes (both en dashes
axis, the graph showing a plurality of semantic word density
ratings for the web media identi?ed by the user.
19. A computer program product comprising a non-transi
and em dashes), and a number of commas, semico
tory computer readable medium having computer usable pro
gram code executable to perform operations for determining
a semantic density of textualized digital web media, the
lons, and colons;
a number of subjects and predicates in the sentence;
a number of appositions in the sentence;
a number of syllables in each word of the sentence by
cross-referencing each word with the dictionary in
operations of the computer program product comprising:
prompting a user with a graphic user interface to input one
persistent storage; and
a number of alphanumeric characters in the sentence;
a value storer module con?gured to store each quanti?ed
number in a persistent computer readable database with
a time-stamp identifying the date the number(s) were
20
domain names for one or more web media sources, the
quanti?ed;
a calculator module con?gured to calculate a semantic
density score for each web media source in the list of
web media sources, wherein the semantic density score
is a function of the quanti?ed numbers for each sentence
in the web media source;
a density storer module con?gured to store the semantic
domain names compiled into a list of web media sources
25
sentences are identi?ed within the block of text as being
30
a text querier module con?gured to query textualized
35
topic;
a synonym query module con?gured to query textualized
media on the web media source for instances of syn
onyms of the topic identi?ed by referencing the dictio
40
a render module con?gured to render a graph on a com
puter display showing elapsed time across one axis, the
graph showing a plurality of semantic density ratings for
the web media identi?ed by the user.
13. The system of claim 12, wherein sentences are identi
?ed within the block of text as being a string of text following
45
50
increased for each dependency identi?ed between a ?rst
clause and a second clause in the sentence, wherein a
dependency comprises a clause with one or more char
55
acteristics from the group consisting of: a modi?er in the
?rst clause modifying a verb in a second clause; a modi
?er in a ?rst clause modifying one of a noun, verb, and
adverb in second clause; a complement in the ?rst clause
the sentence is identi?ed as a string of text not comprising a
of one of a noun, adjective, adverb, or preposition in a
second clause; and an interjection referencing a noun in
verb within two commas.
16. The system of claim 12, wherein an independent clause
the second clause;
60
for each sentence, creating one of an abstract semantic
graph (ASG) and an abstract syntax tree (AST);
verb within two commas.
17. The system of claim 12, further comprising a depen
dency analyzer module con?gured to analyze dependencies
?edbetween a ?rst clause and a second clause in the sentence,
following an agent-object-verb (AOV) order;
analyzing dependencies between the clauses and increas
ing the magnitude of a dependency identi?er exclusively
associated with the sentence, the dependency identi?er
and preposition, the dictionary identifying a number of syl
between the clauses and increasing the magnitude of a depen
dency identi?er exclusively associated with the sentence, the
dependency identi?er increased for each dependency identi
string of text enclosed ellipsis;
string of text enclosed by en dashes
string of text enclosed by em dashes;
string of text within the sentence enclosed by one or
more of commas, semicolons, and colons;
a string of text within the sentence comprising words
storing the identi?ed clauses as strings within an array;
each word as one or more of a noun, verb, adverb, adjective,
in the sentence is identi?ed as a string of text comprising a
a
a
a
a
following a subject-object-verb (SOV) word order;
space ( ).
lables in each word.
15. The system of claim 12, wherein a dependent clause in
storing the sentences as strings within an array;
identifying clauses within the sentences satisfying one or
more of the criteria from the group consisting of:
and string of text within the sentence comprising words
a period (.) immediately followed by a space ( ) which string
of text also precedes a period (.) immediately followed by a
14. The system of claim 12, further comprising storing a
dictionary in computer readable memory, the dictionary iden
tifying words of a language, the dictionary characterizing
a string of text following a period (.) immediately fol
lowed by a space () which string of text also precedes a
period (.) immediately followed by a space ( );
content from which the score was derived;
nary; and
for analysis in computer readable memory;
storing the blocks of text in the web media sources in
volatile computer readable memory;
identifying sentences within the block of text, wherein
density score in a persistent computer readable database,
the score exclusively associated with the web media
media on the web media source for instances of the
or more keyword(s) for semantic density determination;
prompting the user with the graphic user interface to input
a timeline for semantic density determination;
prompting the user with the graphic user interface to input
65
calculating a semantic density score for each web media
source in the list of web media sources, wherein the
semantic density score is a function of the number of
clauses in each sentence of the web media source, the
dependency identi?er, and one or more of the ASG and
the AST;
US 8,880,389 B2
15
storing the semantic density score in a persistent computer
readable database, the score exclusively associated with
the web media content from which the score was
derived;
determining synonym(s) for the keyword by referencing a 5
thesaurus in computer readable memory;
querying the sentences for instances of the keyword and
synonyms;
storing semantic density ratings for sentences comprising
the keyword and synonyms; and
10
rendering a graph on a computer display showing elapsed
time across one axis, the graph showing a plurality of
semantic density ratings for the web media identi?ed by
the user.
20. The computer program product of claim 19, wherein a 15
database query is generated using the keyword(s) and syn
onyms of the keyword(s), wherein the sentences are stored in
a database in computer readable memory and queried with the
database query by a (Database Management System) DBMS.
*
*
*
*
*
20