fall 2011 lectures cmpsci 120

FALL 2011 LECTURES CMPSCI 120
#1: Wednesday, September 7 – First lecture introduction to the class. Hand out syllabus sheet. Cover
timeline of technology from 1900 to 2011. Emphasize difference between Internet and Web;
contributions of single individuals to major advances in tech (Tim Berners-Lee, Douglas
Englebart, etc.). Place student birthdates into timeline appropriately to illustrate how much
change has happened within their lifetimes. Contrast with that of my Grandmother, born three
weeks before Wright Brothers’ first flight, died as metal being cut for International Space
Station. Exhort students to be kind to elders, born before the exponential change in Net tech
within their lifetimes.
#2: Friday, September 9 – Exercise in image analysis, as preparation for scavenger hunt assignment,
using historical picture of 1865 execution of conspirators in Lincoln assassination, handed out to
students. Questions include when photography started, when glass-plate photography popular
(evidenced by crack in image), when electricity became prevalent (gas lamp on wall), why a
woman is being executed (Mary Surratt), what style of uniforms are being worn (Union soldiers),
what the blurring of people walking illustrates (shutter speed), why people are carrying
umbrellas when it isn’t raining (sun protection), etc. Handed out sheet of timeline from first
lecture.
#3: Monday, September 12 – Discussion of bias and psychological issues in performing searches. First
exercise was to play the Say, Yeah! mashup video of Teen Titans characters, paying particular
attention to the audio. Students did not hear anything out of the ordinary, except most of song
is in Korean(?) with a few English words, until it was pointed out that several times the audio
sounded like “your momma told me I’m a dead guy”. After that, everyone heard it. This is an
audio example of pareidolia. Other example given was in early hominid evolution: seeing
danger in the grass and reacting to it, regardless of whether danger actually exists, is an
evolutionary advantage to the species, but means that individuals within species over time
acquire bias for seeing patterns where they do not necessarily exist. (Modern examples: seeing
faces in tacos or shower curtains.) Next, drew a line on the blackboard and asked students to
place sites or groups according to bias (NTSB at low bias, news and religious organizations at
high bias, Wikipedia somewhere in the middle, etc.). Gave example of how to use this in
performing search: actively seek unbiased sites (difficult), or actively seek biased sites and
subtract out the known bias (easy), figuring they’ve done most of the legwork against a
particular topic. Finally, listed several “rules” of bias and self-deception people are likely to
encounter on the Internet, including Poe’s Law, Hanlon’s Razor, No True Scotsman, etc., and
gave examples of correlation vs. causality. This document is available on class site. Also warned
students about “Rule 34”.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 1
#4: Wednesday, September 14 – Email. Notion that all email is text, including non-text attachments.
Notion that email is like a postcard – readable by everyone at every node between sender and
receiver, so sending sensitive info such as credit card or Social Security numbers in plain text
email is a Very Bad Idea in general, unless encryption is used. Notion that emails are easy to
spoof (appear to come from someone other than actual sender). Notion that some email
systems store messages on server, others always download to local client (what are advantages
of each approach?). SPAM – Hormel canned meat product co-opted by old Monty Python skit –
makes up most emails today. Examples of “phishing” attacks, trying to get users to do
something they shouldn’t:
(1) “Your [bank, credit card, computer] account is compromised, to fix click HERE”. No
legitimate company will do this, instead they will say “To fix go to our Web site”. By looking
at underlying code, users can often tell where the click-here address is directed, always to a
site not associated with corporation being spoofed.
(2) “Dear friend, my name is Mrs. ______ and my [husband, friend, colleague] has $___ million
to get out of the country before the corrupt [government, bank, financial institution] steals
it. Please help by providing bank info, and you’ll receive [5%, 10%, 15%] as a reward.” Also
known as the Nigerian scam letter (although can come from anywhere). Once done on
actual paper, today costs very little to send a billion emails, only need one pigeon to fall for
scam.
Encryption is good, both for email and secure on-line ordering (how it works will be subject of
future lecture), but was resisted by FBI, NSA, CIA for use by private citizens. (Conflict: need
encryption for thriving e-commerce, but how to eavesdrop on drug dealers, child
pornographers, and terrorists?) Questions: Is spamming illegal? How to prosecute when sender
is in another country? How can VPNs (virtual private networks) protect communications for
companies using insecure Internet? Is it OK to send credit card numbers, etc., in text messages?
#5: Friday, September 16 – Email finish up: play Monty Python SPAM skit, show example of Nigerian
scam letter received days before. Play “State of the Internet 2009” video. Network topologies:
point-to-point (one wire between every pair of machines, fast, but does not scale well: for N
machines there are N(N-1)⁄2 connections, or O(N2)) , star (central fast machine, dumber but cheap
satellite terminals or computers, central machine must be exceptionally fast and therefore
expensive, vulnerable to single-point-failure of central machine, “Big Brother” issues, common
trope of old science fiction view of computers), token-ring (each machine connects to only two
neighbors, passes “token” or “magic cookie” around ring so every machine gets a chance to
send messages, scales reasonably well, all machines identical and inexpensive, but break in ring
causes overall failure), Ethernet (machines “talk” to common wire, everyone “listens” for
messages addressed to them, machine needing to transmit waits until wire quiet then talks, two
machines talking at same time causes message collision, but because machines listen to their
own transmission they can detect when collisions occur [1s get changed to 0s, 0s get changed to
1s], then back off and try again at random time later, scales well but increased net traffic results
in congestion, need routers).
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 2
#6: Monday, September 19 – Introduction to Scavenger Hunt project, due Wednesday, September 28.
Review of Ethernet from Friday. Need for routers/switches to separate legs of network. Outline
of typical home network configuration: TV cable splits between TV and cable modem, cable
modem can go directly to a computer (presenting 1 IP address to Internet) OR to a hub with
multiple computers (each presenting its own unique IP address to Internet) OR to a router with
multiple computers (each presenting its own unique local IP address to router, but router
presents only one IP address to Internet). Local side of router can have hubs in different rooms,
but no two computers should be more than three/four hubs from one another: router becomes
central focal point in star network (if it fails, local network crashes). Wireless router can talk to
wireless laptops just as if wire directly links between them. Printers may be connected to a local
machine (which must be always left running), OR to wireless print server (in isolated room)
which talks to the router, OR IP-enabled printers may be connected directly into the router.
Wireless: typical range 100 meters, but can be attenuated by objects (walls, chimneys, etc.)
Current standard is IEEE 802.11 (2 megabits/second), with revisions 802.11b (11 mb/s), 802.11g
(54 mb/s compatible with 802.11b), 802.11a (54 mb/s but not compatible with 802.11b or
802.11g), and 802.11n (faster using multiple simultaneous radio channels).
Student asked after class whether we would discuss “ad-hoc” networks (all described so far are
“infrastructure” networks).
Details of IPv4 addressing coming on Wednesday.
#7: Wednesday, September 21 – Introduction to bits (binary digits, values 0 or 1) and bytes (aggregates
of 8 bits, between 00000000 and 11111111, with 28 = 256 unique patterns). IPv4 addresses are
four bytes, most common notation consists of decimal values 0…255 separated by periods
(examples: 0.0.0.0 through 255.255.255.255, UMass addresses are all 128.119.xxx.xxx). Old
style addresses in use from 1981 to 1993 are “classful” addresses, from class A through class E,
depending on number of networks and number of machines within networks. Examples below
show class patterns in binary, where x represent part of network address and y represent
machine address within the network:
class A addresses: 0xxxxxx.yyyyyyyy.yyyyyyyy.yyyyyyyy (128 nets, 16,777,216 machines each)
class B addresses: 10xxxxxx.xxxxxxxx.yyyyyyyy.yyyyyyyy (16384 nets, 65536 machines each)
class C addresses: 110xxxxx.xxxxxxxx.xxxxxxxx.yyyyyyyy (2 million nets, 256 machines each)
class D addresses: 1110xxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx (multicast addresses)
class E addresses: 1111xxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx (future expansion, never used)
Problem with classful addressing, is that it does not match reality. A site with 257 machines is
too large for a class C, but is wasteful of address space if moving to a class B – over 65,000
addresses are wasted. Worse for site with 70,000 machines – too big for a class B, but more
than 16 million potential addresses are wasted.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 3
Solution is to use CIDR (classless inter-domain routing) which specifies how many bits are used
in the network name and how many for the machine name – specification is with a slash and a
number of bits. For example, an address such as xxx.xxx.xxx.xxx/14 uses 14 bits for the network
identifier (214=16384) with the remaining 18 bits for the machine name (218=262144), smaller
than a class A but bigger than a class B. Class A addresses roughly correspond to
xxx.xxx.xxx.xxx/8, and last of the /8 groups were allocated in early 2011.
(See
http://xkcd.com/195/ for map of /8 groups as of 2006.)
All these techniques – classful addressing, CIDR, hiding networks behind routers (which present
only one IP address to the outside world, and route packets coming in to the correct machine),
served to delay the exhaustion of IPv4 addresses, but the hard limit of 4,294,967,296 unique
IPv4 addresses never went away. Solution: IPv6.
#8: Friday, September 23 – Instead of four bytes, IPv6 addresses are eight words (where a word is two
bytes, ranging between 0 and 65,535). This gives a total of 128 bits for an address instead of
just 32. With 128 bits there are 2128=3.4×1038 unique addresses (232=4.3×109, so IPv6 is a lot
larger than IPv4). Not likely to run out very soon (see also http://xkcd.com/865/). First
proposed in 1998, so problem has been recognized for a long time. Reasons for not adopting by
now include: cost of overhaul of hardware and software, complexity of getting newer standard
to work with old, laziness, etc. Test day in June 2011 to see what bugs may surface.
DNS (Domain Name Service or Domain Name System) maps Web addresses (URLs) onto IP
addresses. IP addresses are really what are needed. For example, http://www.cs.umass.edu/
can be replaced with http://128.119.240.19/ (although browsers are twitchy about security
when you do this). DNS is designed so one of 13 or so top-level root servers world-wide (and
their proxies) examine top level domains (.EDU, .COM, etc.) and hand off requests to proper
server. Server handling .EDU domains hands off request to server handling UMass, then server
handling UMass hands off request to server handling CS, etc. Eventually, some server knows the
IP address of www.cs.umass.edu and returns it, or if address is invalid returns error message. In
practice, this would severely overload root servers, serving trillions of URL requests per day.
Instead, DNS servers cache recently requested IPs, so requests go up the chain instead of down,
until some server can either handle the request or knows who can. Root servers get very little
traffic as a result. Possible for cache to be “poisoned” to return wrong IP address. Cache entries
have a TTL (time to live) so that stale IP addresses don’t stay cached forever and can be replaced
eventually. TTL can be between seconds and weeks.
#9: Monday, September 26, 2011 – Interpreting a URL (Uniform Resource Locator). Addresses such as
http://www.cs.umass.edu/~verts/cmpsci120/cmpsci120.html (the class site)
can be broken into sections:
http://
The protocol. Promise that the resource observes the conventions of
the hypertext transport protocol. Other protocols include ftp://,
telnet://, etc.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 4
www.cs.umass.edu The host address. Read from right-to-left. Traditional top-level
domains (TLDs) include .edu (education), .com (commercial), .net
(network), .org (organizations), .gov (U.S. government), and .mil
(U.S. military), but new ones have been added, such as .aero
(aeronautics), .biz (business), .xxx (porn), etc. TLDs may be suffixed
by country codes (.us, .uk, .jp, .ru, etc.), which have all been using
Roman alphabet until recently, when United Arab Emirates, Egypt, and
Saudi Arabia started using Arabic characters (from right to left), and the
Russian Federation starting using .РФ (Cyrillic for Russian Federation) in
addition to .ru (Russia).
~verts
The username. When present, usernames are prefixed with the tilde (~)
character to say “look in the account on the specified machine”.
cmpsci120/
The folder path. The name (or names) of a path of folders to the file
being fetched.
cmpsci120.html
The actual resource (file) being fetched. This can be a text file (.txt),
an image file (.gif, .jpg, or .png), an Acrobat file (.pdf) or a Web
file (.htm or .html). When not specified, index.html or
index.htm are assumed (the .htm extension dates from MS-DOS up
through Windows 3.1, which only supported three character file
extensions; Windows 95 was the first version to support longer
filenames and extensions).
Legal subsets of this URL include:
http://www.cs.umass.edu/
File name not specified, by default fetch index.html from Computer Science server.
http://www.cs.umass.edu/~verts/
File name not specified, by default fetch index.html from verts’ account on Computer
Science server.
http://www.cs.umass.edu/~verts/cmpsci120/
File name not specified, by default fetch index.html from cmpsci120 folder in verts’
account on Computer Science server.
http://www.cs.umass.edu/~verts/cmpsci120/cmpsci120.html
Filename is specified, fetch cmpsci120.html from cmpsci120 folder in verts’ account on
Computer Science server (overriding default value of index.html).
Sidebar discussion of .pdf (Portable Document Format) files. Used for distributing printerready documents when tool used to create document may be something not everyone has.
Acrobat Reader is free (Writer is not, although PDF specification is now open, so programmers
can create their own PDF files). Files can be secured to protect intellectual property – password
protected to restrict opening and/or printing and/or copying of document. Normally, PDF files
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 5
can be annotated, or include animations, sounds, etc., but PDF/A (archival) documents contain
only basic text guaranteed readable.
#10: Wednesday, September 28, 2011 – Beginning HTML. Tags and tag pairs, basic HTML Web page
layout:
<HTML>
<HEAD>
<TITLE>My Spiffy Web Page</TITLE>
</HEAD>
<BODY>
Content
</BODY>
</HTML>
Use of BGCOLOR to change background color in BODY, as in <BODY BGCOLOR="green">
(BGCOLOR is deprecated). Colors may be also in 6-digit hexadecimal (base 16), where the first
two digits encode red, the middle two encode green, and the last two encode blue. Values for
each of the three color components are bytes that range from 00 (zero) to FF (255), where
each of the two digits range between [0…9, A…F] and the leftmost of the two is “dominant”
(contributes most to the value). After 9, A=10, B=11, C=12, D=13, E=14, and F=15. Having
three bytes (24 bits) for color gives 16,777,216 unique colors encoded in exactly six characters.
To convert a byte value to hex, divide it by 16; the quotient is the leftmost digit and the
remainder is the rightmost digit. For example, the number 236 when divided by 16 gives the
quotient 14 and the remainder 12, which is EC in hexadecimal. Similarly, to convert EC back to
decimal the equation is E×16 + C×1, or 14×16 + 12 = 224 + 12 = 236.
#11: Friday, September 30, 2011 – Making a basic Web page, utilizing basic markup tags:
Boldface:
Italic:
Superscript:
Subscript:
heading tags:
Centering:
named colors:
hex colors:
HTML entities:
fractions:
links:
font:
<B>…</B>,
<I>…</I>,
<SUP>…</SUP>,
<SUB>…</SUB>,
<H1>…</H1> through <H6>…</H6>,
<CENTER>…</CENTER> (deprecated),
red, green, yellow, blue, etc.
#FF0000, #1EA7F9, etc.
&copy; equivalent to &#169;, etc.
&frac12; &frac14; &frac34; (respectively: ½, ¼, ¾, but no others)
<SUP>numerator</SUP>&frasl;<SUB>denominator</SUB>),
<A HREF="url">link text</A>.
<FONT FACE="typeface" COLOR="color">text</FONT>
Both <CENTER>…</CENTER> and <FONT>…</FONT> tags are deprecated, as is the
BGCOLOR attribute of the BODY tag, meaning that the standards committees prefer that we
not use them. These items may disappear from future versions of HTML.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 6
Why is the <FONT> tag deprecated? Example:
<CENTER><FONT FACE="___" COLOR="___"><H2>Paragraph #1</H2></FONT></CENTER>
text text text…
<CENTER><FONT FACE="___" COLOR="___"><H2>Paragraph #2</H2></FONT></CENTER>
text text text…
Embedding of style in with content leads to cluttered, hard-to-maintain code. This motivates
the use of CSS (Cascading Style Sheets) as a means to decouple style from content.
#12: Monday, October 3, 2011 – Introduction to CSS: <STYLE TYPE="text/css">…</STYLE> in
header, STYLE="___" attribute inside tags, <LINK
REL="stylesheet"
TYPE="text/css" HREF="filename.css"> to pull up external file. The “cascade” of
dependencies: the STYLE attribute overrides the <STYLE> section, which overrides any
<LINK> to external style sheet files, which overrides the default settings for tags. Example
(equivalent to the non-style approach in previous lecture):
<STYLE TYPE="text/css">
H2 {text-align:center ; font-family:____ ; color:____}
P {text-align:justify ; text-indent:0.5in}
</STYLE>
…
<H2>Paragraph #1</H2>
<P>text text text…</P>
<H2>Paragraph #2</H2>
<P>text text text…</P>
or MyStyles.css contains the text:
H2 {text-align:center ; font-family:____ ; color:____}
P {text-align:justify ; text-indent:0.5in}
and HTML document contains:
<LINK REL="stylesheet" TYPE="text/css" HREF="MyStyles.css">
<STYLE TYPE="text/css">
</STYLE>
…
<H2>Paragraph #1</H2>
<P>text text text…</P>
<H2>Paragraph #2</H2>
<P>text text text…</P>
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 7
Pulling style information out of body of document “declutters” text, makes changes easier to
implement and more consistent, and many pages using the same linked style sheets all see
changes at once. Empty <STYLE>…</STYLE> sections can be omitted, but use of <STYLE>
section with <LINK> to external style sheet can be valuable, as <STYLE>…</STYLE> is then
used to override settings in external style sheet for the current document only.
Colors in CSS can be names (red, green, blue, etc.) or traditional 6-digit hex numbers (e.g.,
#1EF208), but also may be 3-digit short hex (#1AE, which by “stuttering” is equivalent to the
6-digit color #11AAEE), or can use the rgb function with either byte values 0..255 or percents
0%…100% (e.g., rgb(255,128,196) or rgb(100%,50%,75%), as preferred).
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 8
#13: Wednesday, October 5, 2011 – Types of graphics files.
.BMP (bitmap). Native to Windows. Uncompressed. Four styles:
24-bit
(3 bytes per pixel, true RGB color, 16,777,216 colors possible)
256-color
(1 byte/8 bits per “pixel”, up to 256 true RGB colors in palette)
16-color
(½ byte/4 bits per “pixel”, up to 16 true RGB colors in palette)
2-color
(⅛ byte/1 bit per “pixel”, up to 2 true RGB colors in palette, often B&W)
A “palette” is a table stored in the file (takes only a small amount of storage) containing
colors picked from the 16,777,216 possible true RGB colors – the “pixels” in paletted
formats are actually indexes into this table. Due to their size, .BMP files are typically
unsuitable for use on the Web; early browsers did not support the file type at all.
.JPG/.JPEG (Joint Photographic Experts Group). Supports 16,777,216 colors. Compressed, but
uses lossy compression technique. Converting a 24-bit .BMP into .JPG results in image
that is visually identical to source, but pixel values have been changed subtly. Great for
photographs, where lossy compression won’t be noticed. Poor for cartoons, line-art,
text (images with high contrast edges) because lossy compression “fuzzes out” the
edges. Compression “quality” setting between 1…100 is trade-off between file size and
image quality – larger settings give better images but larger files; smaller settings give
smaller files but compression artifacts become more noticeable. Widely used on the
Web.
.GIF (Graphics Interchange Format, developed by CompuServe in 1987 with a revision in 1989).
Compressed, and uses lossless compression, but supports only up to 256 colors (using
palette of true 24-bit RGB colors). Converting a 256-color, 16-color, or 2-color .BMP into
.GIF results in image that is pixel-for-pixel identical to the source. (Converting a 24-bit
.BMP requires loss of color-depth first.) Supports transparency in 1989 version only
(one color not painted to let background show through, simulating non-rectangular
images). Supports simple animations in 1989 version only. Patent entanglements in
mid-1990s threatened its use, but all relevant patents have now expired. Great for
cartoons, line-art, text, but mediocre for photographs. Badly supported by Windows
Paint (messes up palettes), but well supported by other graphics packages. Widely used
on the Web.
.PNG (Portable Network Graphics, proposed in 1996 as a response to deficiencies of both .JPG
and .GIF). Supports up to 48-bit color (16 bits, or 2 bytes, for each primary of R, G, and
B). Supports both lossy compression for photographs and lossless compression for
cartoons, line-art, and text. Supports transparency. No animation. Format is free for
general use (no patent issues). Initially slow to be adopted, but presently all major
browsers, and Microsoft Word, support the format. Good all-around choice for
photographs or line-art and cartoons. Widely used (now) on the Web.
.TIF/.TIFF (Tagged Image File Format). Typically used on Macs, but now may be found
anywhere. Has many of the same features as earlier formats.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 9
Use of graphic images in HTML:
<IMG SRC="filename">
or
<IMG SRC="filename" WIDTH="__" HEIGHT="__"
TITLE="text to appear on mouse-over"
ALT="text to appear if image cannot be shown">
#14: Friday, October 7, 2011 – Images on the Web. Repeated background images on a Web page:
<BODY BACKGROUND="__"> (deprecated attribute), vs. the equivalent style approach
(<STYLE> BODY {background-image:url('__')} </STYLE>) along with
background-repeat and background-position attributes. Use of simple graphics
editors such as Windows Paint to create speckle patterns for backgrounds (
sculptured buttons:
) and 3D
Recommendation: Bookmark W3Schools for on-line HTML and CSS reference:
http://www.w3schools.com/
#15: Tuesday, October 11, 2011 (Monday schedule) – Discussion of legal issues involving images (in
particular, copying an image from someone’s site, putting in on your own, then displaying it with
<IMG SRC="pic.jpg">, versus linking to original image on owner’s site and wasting their
bandwidth <IMG SRC="http://someone_elses_site/pic.jpg">).
Client-side image maps (first example of linking tags together by name):
<IMG SRC="pic.jpg" USEMAP="#MyMap">
<MAP NAME="MyMap">
<AREA SHAPE="rect" COORDS="x1,y1,x2,y2" HREF="__">
<AREA SHAPE="circle" COORDS="x,y,r" HREF="__">
<AREA SHAPE="poly" COORDS="x1,y1,x2,y2,…,x1,y1" HREF="__">
<AREA SHAPE="default" HREF="__">
</MAP>
Rectangles need two <x,y> coordinate pairs (points) to define opposing corners. Circles need
one point (the center) and a radius. Polygons always use an arbitrary list of points, but where
the last point is the same as the first point to close the polygon. Shapes are allowed to overlap
(for example a circle and a rectangle), but if the mouse is clicked in the overlapping region it
appears that the first shape in the list takes priority. In most cases, however, the overlapping
regions are likely to point at the same URL. The “default” shape represents any portion of the
image not covered by any other shape (it is optional and may be omitted).
Demonstration of using Windows Paint to extract coordinates from image.
Demonstration of my MakeButtons program to create button panels and equivalent HTML.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 10
#16: Wednesday, October 12, 2011 – Demo of Mac running Parallels running Windows 7 running
CamWatcher. Problem with all raster images (.GIF, .JPG, .PNG, etc.) is that all pixels are stored,
and when image is zoomed up in scale what look like smooth lines become jagged – a
mathematically pure line is painted with an alias.
Discussion of .SVG (scalable vector graphics) files – description of objects on screen with interior
color (fill), border color (stroke), and border thickness (stroke-width), along with attributes for
controlling the ends of lines or how one line connected to another. Browsers re-render the files
at requested scales, so images look “perfect” at all sizes, with no jaggies or aliasing. Files are
plain text (emailable, editable with Notepad on a PC or Text Editor on the Mac, etc.), but have
“magic incantations” at the start which determine the character set (UTF-8), rules for
configuration (http://www.w3.org/2000/svg), etc. While not required, SVG files may
also contain a style section that occupies the same conceptual position as the <STYLE> block in
an HTML document, with some minor differences.
Tags in SVG files are always in lower-case, and standalone tags contain a trailing slash (as in
<rect … />). The trailing slash is not required for standalone HTML tags, but is becoming
recommended practice (as in an image tag <IMG … />). Comments are the same as in HTML:
<!-- comment text -->, and comments can span lines.
Here is an SVG file (created by my own Bézier Madness program) with a triangle, rectangle, and
circle on a cyan background:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-<!-<!-<!--
Generator Copyright (C)2010 Dr. William T. Verts (all rights reserved) -->
File Created: Thursday, February 24, 2011 9:11:50 AM -->
Generated from Dr. Bill's Bezier Madness -->
C:\Users\Bill\Desktop\SVG Demo\MyDiagram.svg -->
<svg
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
version="1.1"
x="0px"
y="0px"
width="320px"
height="240px"
>
<style type="text/css" >
<![CDATA[
polygon {
stroke-linecap:round;
stroke-linejoin:miter;
}
polyline {
stroke-linecap:round;
stroke-linejoin:round;
}
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 11
line {
stroke-linecap:round;
stroke-linejoin:round;
}
path {
stroke-linecap:round;
stroke-linejoin:round;
}
]]>
</style>
<!-- Background Color -->
<rect
x="0px"
y="0px"
width="320px"
height="240px"
style="fill:#00FFFF"
/>
<!-- ******************** Object #1 ******************** -->
<!-- Triangle -->
<polygon
points="
150, 20,
20, 60,
190, 100,
150, 20"
style="fill:#00FF00;stroke:#000000;stroke-width:3;"
/>
<!-- ******************** Object #2 ******************** -->
<!-- Rectangle -->
<rect
x="100"
y="120"
width="160"
height="100"
style="fill:#00FF00;stroke:#008080;stroke-width:3;"
/>
<!-- ******************** Object #3 ******************** -->
<!-- Circle -->
<circle
cx="100"
cy="100"
r="40"
style="fill:#FFFF00;stroke:#FF0000;stroke-width:6;"
/>
</svg>
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 12
#17: Friday, October 14, 2011 – Three HTML topics: Horizontal Rules, Ordered and Unordered Lists, and
Tables. This pretty well covers “standard” or “boring” HTML, not including frames, forms,
JavaScript, icons, etc.
Horizontal Rules: <HR> tag. Draws horizontal line across browser window, default is to have a
small gap at each end. Can be modified using SIZE attribute, as in <HR SIZE="200">, but
SIZE attribute is deprecated in favor of style sheet approach: <HR STYLE="width:200px">
or <HR STYLE="width:3in"> (or if placed up in the style block in the header section:
<STYLE> HR {width:3in} </STYLE>). Length measurements in CSS can be pixels (px),
inches (in), centimeters (cm), percents (%), etc.
Lists have enclosing tags <OL>…</OL> (ordered) or <UL>…</UL> (unordered), and each list
item starts with <LI>. Closing list item with </LI> is recommended, but not strictly required
by most browsers.
Ordered Lists:
Appears As:
Unordered Lists:
Appears As:
<OL>
<LI>List item</LI>
<LI>List item</LI>
<LI>List item</LI>
</OL>
1 List Item
2 List Item
3 List Item
<UL>
<LI>List item</LI>
<LI>List item</LI>
<LI>List item</LI>
</UL>
● List Item
● List Item
● List Item
By default ordered lists use leading Arabic numbers (1,2,3,…) and unordered lists use round
bullets (●,●,●, …) but these may be changed. Using the TYPE attribute (deprecated) or styles,
lists can be formatted as follows:
Symbol
1,2,3,4,…
A,B,C,D,…
a,b,c,d,…
I,II,III,IV,…
I,ii,iii,iv,…
Ordered Lists using TYPE
<OL TYPE="1">
<OL TYPE="A">
<OL TYPE="a">
<OL TYPE="I">
<OL TYPE="i">
Using Styles
<OL STYLE="list-style-type:decimal">
<OL STYLE="list-style-type:upper-alpha">
<OL STYLE="list-style-type:lower-alpha">
<OL STYLE="list-style-type:upper-roman">
<OL STYLE="list-style-type:lower-roman">
Symbol
●
■
○
Unordered Lists using TYPE
<UL TYPE="disc">
<UL TYPE="square">
<UL TYPE="circle">
Using Styles
<UL STYLE="list-style-type:disc">
<UL STYLE="list-style-type:square">
<UL STYLE="list-style-type:circle">
Using styles, individual list items may override the symbol normally used by the <OL> or <UL>
tag, as in <LI STYLE="list-style-type:disc">.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 13
Tables <TABLE>…</TABLE> are composed of rows <TR>…</TR>, which are themselves
composed of data <TD>…</TD>. Table data (cells) contain the information for that cell, and
can be text, graphics, links, etc. Use the <TH>…</TH> for table headers, which format their
data to be boldfaced and centered. Here is a table with three rows and four columns:
<TABLE>
<TR>
<TD>row
<TD>row
<TD>row
<TD>row
</TR>
<TR>
<TD>row
<TD>row
<TD>row
<TD>row
</TR>
<TR>
<TD>row
<TD>row
<TD>row
<TD>row
</TR>
</TABLE>
1
1
1
1
column
column
column
column
1</TD>
2</TD>
3</TD>
4</TD>
2
2
2
2
column
column
column
column
1</TD>
2</TD>
3</TD>
4</TD>
3
3
3
3
column
column
column
column
1</TD>
2</TD>
3</TD>
4</TD>
Tables with the BORDER or BORDER="2" or BORDER="2px" attribute will be rendered on
screen with a lined border around each cell. Tables without a border show just the data on
screen. In a <TD> tag adjacent cells may be merged by using the attributes ROWSPAN or
COLSPAN, but care must be taken to count up the number of cells properly in a row or column.
For example <TD COLSPAN="3"> merges three cells in the current row.
The tags <TABLE>, <TR>, <TH>, and <TD> may all contain a BGCOLOR="___" attribute to
change the color of the table, row, header, or data, respectively, but BGCOLOR is deprecated
(the table color is overridden by the row color, which in turn is overridden by the color of the
header or data cell. The style approach is to use STYLE="background-color:____" in a
particular tag (or in the style block, as appropriate).
Tables have a large number attributes, both deprecated and not, and many style options. These
include colors, border thickness and color, vertical and horizontal alignment of data within cells,
alignment and width of the table, etc.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 14
#18: Monday, October 17, 2011 – Introduction to Frames. Initial setup page (Contents.html) is a
page of simple links, using traditional HTML techniques:
<HTML>
<HEAD>
<TITLE>Table of Contents</TITLE>
</HEAD>
<BODY BGCOLOR="Yellow">
<H1>Contents</H1>
<A HREF="http://www.yahoo.com">Yahoo</A><BR>
<A HREF="http://www.cnn.com/">CNN</A><BR>
<A HREF="http://www.cs.umass.edu/~verts">Dr. Bill</A><BR>
</BODY>
</HTML>
This page uses nothing that hasn’t been seen before. Clicking on any link causes the browser to
replace the currently displayed page with the linked page. The first change is to insert an
attribute into each link to open each new page in its own page of the notebook:
<HTML>
<HEAD>
<TITLE>Table of Contents</TITLE>
</HEAD>
<BODY BGCOLOR="Yellow">
<H1>Contents</H1>
<A HREF="http://www.yahoo.com" TARGET="_blank">Yahoo</A><BR>
<A HREF="http://www.cnn.com/" TARGET="_blank">CNN</A><BR>
<A HREF="http://www.cs.umass.edu/~verts" TARGET="_blank">Dr.
Bill</A><BR>
</BODY>
</HTML>
Notice that there is an underscore before the word “blank” in TARGET="_blank", which is an
indicator of a special target, namely a new (blank) window or page.
Now we build the driver for the frames, which can be the index.html default page:
<HTML>
<HEAD>
<TITLE>Test of Frames</TITLE>
</HEAD>
<FRAMESET COLS="200,*">
<FRAME SRC="Contents.html" NAME="TOC">
<FRAME SRC="MainBody.html" NAME="MAIN">
</FRAMESET>
<NOFRAMES>
Put code here for browsers which do not support frames
</NOFRAMES>
</HTML>
In this code, there is no <BODY> tag, but instead the <FRAMESET> tag defines how the screen
is to be split, and how large each split (frame) is to be. In this case, the <FRAMESET
COLS="200,*"> defines two frames, split vertically, one frame 200 pixels wide and the other
the remainder of the width of the browser window.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 15
If we were to split the window into three columns of 300 pixels, 200 pixels, and variable pixels in
width, respectively, the command would be <FRAMESET COLS="300,200,*"> instead.
Similarly, if we wished to split the window into rows instead of into columns, we might use
<FRAMESET ROWS="200,*"> instead. In both cases the * means to use for the last frame
whatever portion of the window is leftover after defining the other frame(s).
Each frame links to its own page, BUT those frames must also have names so that the
Contents.html page can link to the correct one. In this case the name TOC will not be used,
but each page in Contents.html will open its linked page in the frame named MAIN.
Finally, the <NOFRAMES> section will contain code to show in the browser if frames are not
supported. If frames are supported, everything works but the code between <NOFRAMES> and
</NOFRAMES> will be ignored. If frames are not supported, the <FRAMESET>, <FRAME>,
<NOFRAMES>, and </NOFRAMES> tags will all be ignored, and the only code rendered by the
browser will be that which remains inside the <NOFRAMES>…</NOFRAMES> section.
Here’s what the final version of Contents.html will contain:
<HTML>
<HEAD>
<TITLE>Table of Contents</TITLE>
</HEAD>
<BODY BGCOLOR="yellow">
<H1>Contents</H1>
<A HREF="http://www.yahoo.com" TARGET="MAIN">Yahoo</A><BR>
<A HREF="http://www.cnn.com/" TARGET="MAIN">CNN</A><BR>
<A HREF="http://www.cs.umass.edu/~verts" TARGET="MAIN">Dr. Bill</A><BR>
</BODY>
</HTML>
By defining MAIN in the <FRAME> tag of one page and using TARGET="MAIN" (instead of
TARGET="_blank") in the target of a link in another page, we have a mechanism for linking
resources in different pages. Note that MAIN is neither a special nor a reserved word: we could
have said <FRAME SRC="…" NAME="Fred"> so long as the link to that frame in
Contents.html was <A HREF="…" TARGET="Fred">.
Finally, we need a default page, linked as MainBody.html in the second <FRAMESET> tag.
This can be anything meaningful, such as:
<HTML>
<HEAD>
<TITLE>Main Body</TITLE>
</HEAD>
<BODY BGCOLOR="cyan">
<H1>Welcome to my Web page!</H1>
This is the default page that will load automatically
when the main frame page is loaded.
</BODY>
</HTML>
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 16
Upon initial loading of the frame driver code, the browser window will look like the following
image. Clicking any link in the yellow contents page in the left frame will replace the right frame
with the linked page (the left frame remains unchanged).
Finally, we can modify the Contents.html page by using buttons instead of text for the links,
AND through JavaScript have each button change from a plain to a highlighted design when the
mouse rolls over the button. The first image below shows no button selected, and the second
shows where the mouse has rolled over the CNN button:
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 17
Here is the new HTML code for Contents.html:
<HTML>
<HEAD>
<TITLE>JavaScript RollOver Test</TITLE>
<SCRIPT LANGUAGE="JavaScript">
<!-function ShowImage (item_name,image_name)
{
eval ('document.' + item_name + '.src = "' + image_name + '"');
return true ;
}
//-->
</SCRIPT>
</HEAD>
<BODY BGCOLOR="#C0C0C0">
<H1>Contents</H1>
<A HREF="http://www.cnn.com"
onMouseOver="ShowImage('Button1', 'Button1_Selected.gif') ;"
onMouseOut ="ShowImage('Button1', 'Button1_Normal.gif') ;"
TARGET="MAIN">
<IMG SRC="Button1_Normal.gif" BORDER=0 NAME="Button1">
</A>
<BR>
<A HREF="http://www.yahoo.com"
onMouseOver="ShowImage('Button2', 'Button2_Selected.gif') ;"
onMouseOut ="ShowImage('Button2', 'Button2_Normal.gif') ;"
TARGET="MAIN">
<IMG SRC="Button2_Normal.gif" BORDER=0 NAME="Button2">
</A>
<BR>
</BODY>
</HTML>
The new section is the <SCRIPT> section in the heading, where JavaScript code (typically userdefined functions such as ShowImage) can reside. That function is called by events, in this case
the onMouseOver event (when the user brings the mouse over the link) and onMouseOut
(when the user moves the mouse away from the link). The purpose of the code as written is to
define a name for each <IMG> tag, then use that name to assign a default button to that tag
both when the page is initially loaded and when the mouse moves away from the link, and
another “selected” button when the mouse moves onto the link. The “heavy lifting” is
performed by the ShowImage function, which builds in real-time from its arguments (the
image tag name and the image file name) a string that looks like the following:
document._____.src = "_____"
with the defined name of the <IMG> tag in the first slot and the name of the image file (a
.GIF) in the second slot, and then executes (or evaluates) that string as a legal JavaScript
command. We’ll look at JavaScript in some detail in the next lecture.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 18
#19: Wednesday, October 19, 2011 – Introduction to JavaScript. As we have seen from the previous
lecture, JavaScript code can be inserted into the heading of an HTML Web page. There are two
ways to do this. The version on the left is an early approach, which is now deprecated in favor of
the version on the right (similar to how we link CSS in a STYLE block):
<SCRIPT LANGUAGE="JavaScript">
<!-// JavaScript code goes here
//-->
</SCRIPT>
<SCRIPT TYPE="text/javascript">
<!-// JavaScript code goes here
//-->
</SCRIPT>
In both cases the HTML comment framework <!-- and --> is so that browsers that do not
support JavaScript fail gracefully. If JavaScript is not supported the <SCRIPT> and
</SCRIPT> tags are ignored, and the HTML comments hide all internal JavaScript code from
the browser – it will not appear in the rendered body of the Web page (the page might not work
correctly, but it won’t have a large pile of code cluttering up the page). If JavaScript is
supported, JavaScript knows to ignore the opening HTML comment, and the // JavaScript
comment hides the closing HTML comment. This framework can appear in both the <HEAD>
section AND in the body wherever code needs to appear. For the examples here, the code will
be strictly in the body of the page, as follows:
<HTML>
<HEAD>
<TITLE>Testing some JavaScript code</TITLE>
</HEAD>
<BODY>
<H1>Testing some JavaScript code</H1>
<SCRIPT TYPE="text/javascript">
<!-*
//-->
</SCRIPT>
</BODY>
</HTML>
* From now on, rather than show the entire page for every example only the JavaScript code
that goes in the framework marked by the asterisk will be shown.
Here’s the first example:
document.writeln ("Hello, World") ;
This writes into the current document the string Hello, World which will appear just as if
we had written it directly into the HTML code. Similarly:
document.writeln ("Hello, World") ;
document.writeln ("Hello, World") ;
writes the string in twice, but because whitespace is ignored by browsers, the two strings will be
run together on the browser window.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 19
To put them on different lines, we have to write in the break tag manually.
document.writeln ("Hello, World") ;
document.writeln ("<BR>") ;
document.writeln ("Hello, World") ;
While these examples are pretty inefficient, the critical idea is that JavaScript can create HTML
content in real-time as the page is being rendered by the browser.
In the next examples we use variables, which are values in memory that we choose to give
names, and expressions that use those variables:
N = 56 ;
M = 47 ;
document.writeln (N+M) ;
The result of this is to write the number 103 (the sum of 56 and 47) into the Web page as it is
being rendered. Variables may be either strings (surrounded by quotes) or double-precision
floating-point numbers (numbers with fractions), and JavaScript will determine the data type of
a variable as it receives its new value. A variable may be a string at one moment and a number
a bit later. This is called dynamic typing (in contrast to other languages that use static typing,
where a variable always has the same data type throughout the execution of the program).
Note that the code above could have been written without semicolons as:
N = 56
M = 47
document.writeln (N+M)
This is not recommended, as statements can extend over several lines, and several statements
can also appear on the same line, as in:
N = 56 ; M = 47 ; document.writeln (N+M) ;
Always use semicolons to terminate statements (although there are a couple of exceptions).
The code can ask questions of its variables with an if statement, as in the following code:
N = 56
M = 47
if (N > M)
{
document.writeln ("N is bigger") ;
}
else
{
document.writeln ("M is bigger or equal") ;
}
What is written into the HTML code is either the string N is bigger or the string M is
bigger or equal, but which one is written depends on the values of the variables N and M.
(Which one is it here?).
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 20
Finally, the power of JavaScript (or any language) comes from the action of a loop, which is code
that executes over and over many times. To do this, we need a variable to count the number of
times we’ve gone through the loop. Here’s the basic framework:
N = 1 ;
while (N <= 10)
{
// do something interesting here
N = N + 1 ;
}
This framework executes the code in between the {…} until variable N becomes greater than
10, and because of the N = N + 1 statement (replace the value of N with the old value of N
plus 1) this happens after the 10TH pass through the loop. To execute the loop 1000 times we
need only replace the 10 with 1000. Here is code that puts the numbers 1 through 1000 into
our Web page:
N = 1 ;
while (N <= 1000)
{
document.writeln (N) ;
N = N + 1 ;
}
To do something interesting, we would like to build a Web page that contains all integers
between 1 and 1000 and their square-roots. We can’t do this by hand very easily, but we can do
this with JavaScript, assuming there is a square-root function available. Not knowing this, we
look up “JavaScript square root” on the Web (Google or some other search engine), which again
takes us to http://www.w3schools.com at the page talking about that particular
function. We learn that the square-root function is called sqrt and is part of the Math object,
so to use it we write Math.sqrt(___) and put the item for which we need a square root
inside the parentheses. To make all of our answers appear on the Web page in an unordered
list, we have to write the JavaScript code to create the correct tags in real-time. Here is the final
code:
document.writeln ("<UL>") ;
N = 1 ;
while (N <= 1000)
{
document.writeln ("<LI>Square Root (", N, ") = ", Math.sqrt(N), "</LI>") ;
N = N + 1 ;
}
document.writeln ("</UL>") ;
So, when N = 1 and the square root returns 1 as well, the string written out by the loop is:
<LI>Square Root (1) = 1</LI>
and when N = 2 and the square root returns 1.4142135623730951 the code written out is:
<LI>Square Root (2) = 1.4142135623730951</LI>
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 21
Here is the complete HTML page, including the JavaScript code to generate the list of square
roots:
<HTML>
<HEAD>
<TITLE>Testing some JavaScript code</TITLE>
</HEAD>
<BODY>
<H1>Testing some JavaScript code</H1>
<SCRIPT TYPE="text/javascript">
<!-document.writeln ("<UL>") ;
N = 1 ;
while (N <= 1000)
{
document.writeln ("<LI>Square Root (", N, ") = ",
Math.sqrt(N), "</LI>") ;
N = N + 1 ;
}
document.writeln ("</UL>") ;
//-->
</SCRIPT>
</BODY>
</HTML>
This is a lot easier than writing out 1000 lines of square roots by hand.
#20: Friday, October 21, 2011 – Review for midterm #1.
#21: Monday, October 24, 2011 – Midterm #1, in-class.
#22: Wednesday, October 26, 2011 – Forms and Buttons, sending form data to server. Forms have
names and actions, and contain buttons, checkboxes, radio buttons, text areas, etc. Actions
determine what to do with the form data when a special button (TYPE="submit") is clicked.
Most common action is to post the data to the Web (METHOD="POST"), but this requires a
server script to receive and process the data (ACTION="http://www. … /cgibin/echohtml.cgi">). These scripts are called CGI (Common Gateway Interface) scripts,
and are programs written in languages such as Perl, Python, PHP, etc. Here is a Web page
containing a form showing a lot of the common input methods, and where the form data are
posted to a CGI script called echohtml.cgi on my site (commonly, CGI scripts are in a folder
called cgi-bin):
<HTML>
<HEAD>
<TITLE>Test of HTML Forms</TITLE>
</HEAD>
<BODY BGCOLOR="#00FFFF">
<FORM NAME="MyForm1" METHOD="POST"
ACTION="http://www.cs.umass.edu/~verts/cgi-bin/echohtml.cgi">
<H3>Buttons</H3>
<H4>(User-defined buttons require JavaScript onClick events)</H4>
<INPUT
<INPUT
<INPUT
<INPUT
<INPUT
TYPE="button"
TYPE="button"
TYPE="button"
TYPE="button"
TYPE="button"
NAME="Button1"
NAME="Button2"
NAME="Button3"
NAME="Button4"
NAME="Button5"
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
VALUE="Click
VALUE="Click
VALUE="Click
VALUE="Click
VALUE="Click
Me
Me
Me
Me
Me
#1"><BR>
#2"><BR>
#3"><BR>
#4"><BR>
#5"><BR>
Page 22
<INPUT TYPE="reset" VALUE="Reset All"> Resets items to default values<BR>
<INPUT TYPE="submit" VALUE="Submit this Form"> Sends data to server<BR>
<H3>Checkboxes</H3>
<INPUT TYPE="checkbox" NAME="Check1">My Checkbox 1<BR>
<INPUT TYPE="checkbox" NAME="Check2" CHECKED="checked">My Checkbox 2<BR>
<INPUT TYPE="checkbox" NAME="Check3">My Checkbox 3<BR>
<INPUT TYPE="checkbox" NAME="Check4">My Checkbox 4<BR>
<INPUT TYPE="checkbox" NAME="Check5">My Checkbox 5<BR>
<H3>Radio Buttons</H3>
<INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option1"
CHECKED="checked">Radio
<INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option2">Radio
<INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option3">Radio
<INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option4">Radio
<BR>
<INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option1"
CHECKED="checked">Radio
<INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option2">Radio
<INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option3">Radio
<INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option4">Radio
#1
#1
#1
#1
#1<BR>
#2<BR>
#3<BR>
#4<BR>
#2
#2
#2
#2
#1<BR>
#2<BR>
#3<BR>
#4<BR>
<H3>Input Text Boxes</H3>
<INPUT
<INPUT
<INPUT
<INPUT
TYPE="text" NAME="Input1" VALUE="Default Text">default text<BR>
TYPE="text" NAME="Input2"><BR>
TYPE="text" NAME="Input3"><BR>
TYPE="password" NAME="Password1"> This is a password box<BR>
<H3>Drop-Down List</H3>
<SELECT NAME="List1">
<OPTION>(A) I don't know
<OPTION SELECTED="selected">(B) No
<OPTION>(C) Yes
</SELECT>
<H3>Multiline Text Area, with default text</H3>
<TEXTAREA NAME="Area1" COLS="50" ROWS="10">
Default Text
</TEXTAREA>
</FORM>
</BODY>
</HTML>
#23: Friday, October 28, 2011 – Forms and Buttons, JavaScript in browser. Forms may also
communicate directly with JavaScript functions in the current page, and not require a CGI script
on a server. In the following Web page there are a number of input text boxes, an output text
box, and a number of buttons. Each button “calls” a JavaScript function in its onClick event
handler. For example, the Add button contains onClick="Add();" which means that the
Add() function defined in the <SCRIPT> area will be called when the button is clicked. The
Add() function calls the N1() and N2() functions, each of which extracts text from one of the
input boxes and returns the number corresponding to the text typed into that box (the built-in
parseFloat function takes in a string of characters and returns the number corresponding to
those characters). The Add() function then adds the two numbers together, converts the
result back to a string of characters (using the String() function), and puts that string into
the Answer text box.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 23
Notice how the N1() function gets its result from document.MyForm1.Input1.value: it
reaches into the current document, finds the form named MyForm1, finds in that form the
input object named Input1, and extracts its value. That value is converted from a string to a
number and is returned as the result of the function. Similarly, the Add() function reaches into
the current document, finds the form named MyForm1, finds in that form the input object
named Answer, and sets its value to the string generated by the sum of N1() and N2().
<HTML>
<HEAD>
<TITLE>Test of HTML Forms</TITLE>
<SCRIPT TYPE="text/javascript">
<!-function N1
() {return parseFloat(document.MyForm1.Input1.value) ;}
function N2
() {return parseFloat(document.MyForm1.Input2.value) ;}
function N3
() {return parseFloat(document.MyForm1.Input3.value) ;}
function N4
() {return parseFloat(document.MyForm1.Input4.value) ;}
function Add
() {document.MyForm1.Answer.value = String(N1()+N2()) ;}
function Subtract() {document.MyForm1.Answer.value = String(N1()-N2()) ;}
function Multiply() {document.MyForm1.Answer.value = String(N1()*N2()) ;}
function Divide () {document.MyForm1.Answer.value = String(N1()/N2()) ;}
function Distance (x1,y1,x2,y2)
{
return Math.sqrt((x2 - x1)*(x2 - x1) + (y2 - y1)*(x2 - x1)) ;
}
function GetDistance ()
{
document.MyForm1.Answer.value = String(Distance(N1(),N2(),N3(),N4())) ;
}
//-->
</SCRIPT>
</HEAD>
<BODY BGCOLOR="#00FFFF">
<FORM NAME="MyForm1">
<H2>Simple Calculator Using JavaScript</H2>
<H3>Reset</H3>
<INPUT TYPE="reset" VALUE="Reset All"><BR>
<H3>Numeric Inputs</H3>
<INPUT TYPE="text" NAME="Input1">Input
<INPUT TYPE="text" NAME="Input2">Input
<INPUT TYPE="text" NAME="Input3">Input
<INPUT TYPE="text" NAME="Input4">Input
<H3>Commands</H3>
<INPUT TYPE="button"
<INPUT TYPE="button"
<INPUT TYPE="button"
<INPUT TYPE="button"
<INPUT TYPE="button"
1<BR>
2<BR>
3<BR>
4<BR>
VALUE="Input1 + Input2" onClick="Add();"><BR>
VALUE="Input1 - Input2" onClick="Subtract();"><BR>
VALUE="Input1 * Input2" onClick="Multiply();"><BR>
VALUE="Input1 / Input2" onClick="Divide();"><BR>
VALUE="Distance: <Input1,Input2> to <Input3,Input4>"
onClick="GetDistance();"><BR>
<H3>The Answer</H3>
<INPUT TYPE="text" NAME="Answer"><BR>
</FORM>
</BODY>
</HTML>
JavaScript functions are free-format. For example, the Add() function could have been:
function Add()
{
document.MyForm1.Answer.value = String(
}
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
N1()+N2()
) ;
Page 24
#24: Monday, October 31, 2011 – Snow Day. Nor’easter dumps 10 inches of wet heavy snow over the
weekend, breaking trees throughout region. Power takes days to be restored.
#25: Wednesday, November 2, 2011 – Handed back midterm exam (count = 103, average = 60.3,
minimum score = 26, maximum score = 99). Discussion of infrastructure issues concordant with
the storm on Saturday, October 29, that knocked out power, telephones, and Internet access to
most of Western New England.
#26: Friday, November 4, 2011 – In-class quiz given by TA.
#27: Monday, November 7, 2011 – Telnet. Traditional telnet is a program for connecting over the
Internet (not the Web) to a remote computer for the purpose of issuing it commands. The
original raw telnet was not encrypted, exposing usernames and passwords (and everything else)
to packet sniffers running on machines intermediate between user’s terminal and remote
machine being connected to. Modern programs encrypt all communications so that packet
sniffers cannot make sense of what they intercept. On Macs: open Finder-Applications-UtilitiesTerminal to launch a UNIX command shell on the Mac itself (modern Macs run UNIX), then from
there issue an ssh command to connect to a remote server. On PCs there is no facility available
by default, but Windows users can download a program called PuTTY from
http://www.chiark.greenend.org.uk/~sgtatham/putty/ which provides an
encrypted telnet (and which requires no installation – just drop the program on the desktop).
We have a server dedicated to my classes called elsrv3.cs.umass.edu, located
somewhere in the CMPSCI building. All students have a username and password on this server.
The username should be the same as a student’s UMail username. The initial password is of the
form ELxxxaaa, where xxx is the last three letters of the SPIRE ID number and aaa is the first
three letters of the UMail username. For example, Fred Q. Smith with SPIRE ID 12345678 will
have username fqsmith and password EL678fqs. Passwords do not echo to the screen.
To connect via Mac:
1.
Open Finder-Applications-Utilities-Terminal
2.
Type ssh [email protected] with your UMail username in the blank.
3.
When the server responds asking for password, type in your default (ELxxxaaa).
To connect via PuTTY on a Windows system:
1.
Launch PuTTY.
2.
In the Host Name box enter elsrv3.cs.umass.edu, and make sure Connection Type is ssh.
3.
When the server responds asking for username, type it in.
4.
What the server asks for password, type it in (initially the ELxxxaaa format).
When first connected to the server, the first action will be to change the default password. The
process requires that the original password be re-entered (ELxxxaaa), and then the new
password is to be entered twice. I cannot recover a lost or forgotten password, but I can reset it
to the default. Once in to the server, everything is a list of UNIX commands, and the first bunch
is to set up the UNIX nest for Web pages: ls -al to list files, mkdir public_html to
create the default Web nest, chmod a+rx public_html to allow access to the nest from
outside, and chmod a+rx . (don’t forget the dot) to allow access into the account from
outside. Type logout to break the connection to the UNIX server. Mac users must then type
logout again to close their local Terminal session.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 25
#28: Wednesday, November 9, 2011 – UNIX. Getting in, changing directories and permissions. Once
logged in, change into the public_html folder with cd public_html (the cd command
means “change directory” and is always cd _____ with either a folder name in the blank or
.. to close the current folder).
File permissions are always of the form rwxrwxrwx, where the individual rwx triplets
represent, respectively, the user (you!), the group (the group of related accounts the user
belongs to), and others (everyone else). The r means read permission (the file contents can be
examined), w means write permission (the file can be changed or deleted), and x means execute
permission (if a program the file can be run, and if a folder the file can be opened). Changing
permissions is through the chmod (“change mode”) command, and always is of the form
chmod _____ filename where the blank contains directives on how the permissions are to
be changed. These directives may be relative or absolute.
Relative: u, g, or o (user, group, others), + or – (add or take away permission), r, w, or x (read,
write, or execute). For example, to add read and execute permission for group and others, the
pattern would be go+rx (group-others-add-read-execute). The pattern ugo may be
abbreviated as a, and multiple patterns may be separated by commas. For example, to add read
and execute permission to user, group, and others, but take away write permission for group
and others on a file called public_html, the command would be chmod ugo+rx,go-w
public_html, or alternately it could be specified as chmod a+rx,go-w public_html.
Absolute: treat each rwx triplet as a binary number, where a letter (presence of the permission)
is indicated by a 1 and a dash (absence of the permission) is indicated by a 0. Convert each
triplet to decimal (octal, actually). For example, the pattern r-x would be thought of as the
binary number 101, which has the value 5 (1×22 + 0×21 + 1×20 = 1×4 +0×2 +1×1 = 4+0+1 = 5).
Here are the patterns:
rwx =
111 =
7
(common for personal folders)
rw- =
110 =
6
(common for personal files)
r-x =
101 =
5
(common for public folders)
r-- =
100 =
4
(common for public files)
-wx =
011 =
3
(rare)
-w- =
010 =
2
(rare)
--x =
001 =
1
(rare)
--- =
000 =
0
(common for private files)
Thus, to set the permissions on public_html to rwxr-xr-x, the UNIX command to type in
is chmod 755 public_html.
#29: Monday, November 14, 2011 – All about FTP. FTP stands for File Transfer Protocol, and is used for
moving files from one computer to another over the Internet. Like Telnet, FTP came out of a
time when traditional tools were not encrypted. Today, we use encrypted versions. There are
many tools available, such as WinSCP for Windows and Fugu for the Mac, both of which require
an installation process (and there is a beta version of Fugu for Mac Lion). Text files on the PC
contain both a carriage-return (CR) and a line-feed (LF) character at the end of each line of text
(dating from a time when printing terminals needed to return the print carriage to the left and
feed the paper up one line). Old-style Macs just used the CR character to separate lines, and
UNIX (including modern Macs) use just the LF to separate lines. FTP can convert text files to the
proper form for the receiving system. Copying a file from Windows to UNIX in text mode strips
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 26
out CR characters, making files smaller. Copying a file from UNIX to Windows puts them back in,
making files larger. Doing this on non-text files results in garbage on the receiving system,
however. This requires that files be transferred with no alterations, in binary mode. It is
possible to use WinSCP or Fugu to do all common file management tasks (setting permissions,
changing names, deleting files, etc.), instead of using Telnet to connect to the server and then
typing in UNIX commands.
To edit a Web page, there are three basic approaches:
1.
Use WinSCP or Fugu to FTP-copy a file to the local PC/Mac, edit the file locally in
Notepad or Text Editor, then use WinSCP or Fugu to FTP-copy the file back to the server.
2.
Use WinSCP to “edit the file directly”: the WinSCP program has a built-in editor, and
editing a remote file causes WinSCP to automatically FTP-copy the file to the PC before
editing and FTP-copy the file back to the server when complete, hiding the file copying.
3.
Telnet to the server with ssh or PuTTY, then use the UNIX editor emacs directly on the
server to change the files.
#30: Monday, November 14, 2011 – More on FTP. Demo of ExpressPCB as example of embedded FTP.
ExpressPCB is a program for designing printed circuit boards on a local PC. When the design is
complete, the program will securely FTP the design and the credit card information to a remote
server, the company builds the circuit board according to the design, and then sends the
complete boards via FedEx to the user within three days. This is a program in which FTP is a
critical component, but it is not the program’s primary function. Modern FTP programs always
have the local PC/Mac as one end of the transfer, and the other to a remote UNIX machine. The
alternative is to Telnet to a UNIX machine, and then FTP to a second remote machine (example
using a server in Finland at garbo.uwasa.fi, requiring an anonymous log-in because I do not
have an account at the machine in Finland).
#31: Wednesday, November 16, 2011 – Intro to Python as responder to Web forms. The following code
is a Python program called echohtml.cgi that runs on the UNIX server, as a responder to
Web forms (see lecture of October 26TH)
#!/usr/bin/python
import cgi
form = cgi.FieldStorage()
print
print
print
print
print
print
print
print
print
"Content-type: text/html\n"
""
"<HTML>"
"
<HEAD>"
"
<TITLE>Received Data</TITLE>"
"
</HEAD>"
""
"
<BODY>"
"
<OL>"
for key in form.keys():
val = form[key].value
print "
<LI>", key, " = ", val
print "
</OL>"
print "
</BODY>"
print "</HTML>"
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 27
The first line of the Python program (#!/usr/bin/python) is a directive to the UNIX system
of where on the disk to find the appropriate interpreter to run the program. This is required for
all Python programs that run on UNIX systems.
The next two lines (import cgi and form = cgi.FieldStorage()) are the interface
to the Web data being received. Python programs have a lot of code already created and
written by other people that “do things” that we don’t know how to do. This other code is
stored in libraries which must be imported; the import cgi line says that we will need code
to handle CGI scripting tasks and must import that code from a library called cgi into our
program. The form = cgi.FieldStorage() line says to call a function called
FieldStorage from the cgi library and assign its results to a local variable called form.
The variable form contains all the information from the Web form as key-value pairs (for
example, on an HTML page a checkbox named Check3 containing a checkmark will be received
from the submitting Web form as Check3=checked, where Check3 is the key and
checked is the value).
So, variable form contains all the data submitted to the CGI script from the Web form as a list
of key-value pairs.
Most of the rest of the CGI script generates HTML code on-the-fly as Python print statements.
The “heavy lifting” is from the section of code that says:
for key in form.keys():
val = form[key].value
print "
<LI>", key, " = ", val
This code steps through the list of keys in the variable form (form.keys()), assigning each in
turn to variable key for the body of the loop. For each value of key, variable val is assigned
from the form variable the corresponding value. The print statement prints out the keyvalue pair as a member of an HTML list.
The important lesson about this simple program is that it generates HTML code on-the-fly – the
HTML Web page is dynamic, not static. This is characteristic of CGI scripts. Depending on what
data they receive, they will generate an HTML response that is appropriate for the data, and not
the same page every time.
The CGI script is a text file, but because it is also a program it must have execute permission in
order to run. The permissions on this file should be rwxr-xr-x (everybody can look at the
contents and run it as a program, but nobody can change the file except the user).
#32: Friday, November 18, 2011 – Python on server. Here is a simple Python program that asks the user
for a number and prints it:
#!/usr/bin/python
N = input("Enter a number --- ")
print N
The first line is the “magic incantation” that starts the Python interpreter. The second line asks
for a number and assigns the value to variable N. The third line prints the value of N. This file
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 28
can be entered via emacs directly on the server. Its permissions must be set to rwxr-xr-x for
it to run, and the program can be run by typing its name (test.py in this case) directly at the
UNIX command line. (Note that creating the file on a PC using Notepad will work, but the file
must be transferred to UNIX by WinSCP in text mode and not binary mode – one student found
out the hard way that the file looks OK in binary mode on the UNIX side but will not run unless
the Python interpreter is manually started, as in python test.py instead of just test.py
by itself. The extra CR characters in the file will prevent the program from running.)
Here is a factorial program in JavaScript, using the same techniques as those already shown.
<HTML>
<HEAD>
<TITLE>Test of Factorial in JavaScript</TITLE>
<SCRIPT TYPE="text/javascript">
<!-function Factorial (N)
{
var F = 1 ;
var I = 1 ;
while (I <= N):
{
F = F * I ;
I = I + 1 ;
}
return (F) ;
}
function Interactive ()
{
var N = parseFloat(document.MyForm.Input1.value) ;
document.MyForm.Answer.value = String(Factorial(N)) ;
}
//-->
</SCRIPT>
</HEAD>
<BODY BGCOLOR="cyan">
<FORM NAME="MyForm">
<INPUT TYPE="text" NAME="Input1"><BR>
<INPUT TYPE="text" NAME="Answer"><BR>
<INPUT TYPE="button" VALUE="Compute!" onClick="Interactive();">
</FORM>
</BODY>
</HTML>
The Factorial function is “clean” in the sense that it only computes factorials; the interface
between it and the HTML form is through the Interactive function, called in the onClick
event handler of the button. As you can see, the code is lengthy and complicated, as it needs to
deal with HTML, forms, and JavaScript.
Unlike JavaScript, Python is not free-format. Indentation matters.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 29
Here is the source code for the equivalent Factorial.py program:
#!/usr/bin/python
def Factorial(N):
F = 1
I = 1
while (I <= N):
F = F * I
I = I + 1
return F
def Interactive():
N = input("Enter a number ")
print Factorial(N)
Interactive()
The body of the functions and of the while-loop are indented relative to the rest of the code. As
a result, programs in Python tend to contain more, simpler statements than JavaScript programs
(you cannot pack the code as you can in JavaScript), but each line tends to be cleaner (fewer odd
characters such as curly-braces and semicolons, but note the required use of the colon at the
end of def and while statements). In Python you do not need to connect the input-output to
HTML objects such as text boxes.
A simpler version of the Factorial program does not require functions, as in:
#!/usr/bin/python
N = input("Enter a number ")
F = 1
I = 1
while (I <= N):
F = F * I
I = I + 1
print F
(Factorial.py versus JavaScript version). Demo of Lab #5 (factorial that sends email), plus
exhortation to not misuse.
The current assignment extends Factorial to automatically email its answer to a central mail
drop. Here is the complete program for that assignment (replace my name and email address,
underlined in the assignment, with yours).
#!/usr/bin/python
import smtplib
N = input("Enter a number")
F = 1
I = 1
while (I <= N):
F = F * I
I = I + 1
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 30
From = "Bill Verts <[email protected]>"
To
= "[email protected]"
Subject = "Factorial of " + str(N) + " from Bill Verts"
Text = "Factorial of " + str(N) + " is " + str(F)
Message = "From: " + From + "\r\n" + \
"To: " + To + "\r\n" + \
"Subject: " + Subject + "\r\n" + \
Text
print "Sending: " + Message
Server = smtplib.SMTP("localhost")
try:
Code = Server.sendmail(From, [To], Message)
finally:
Server.quit()
if Code:
print "Error sending email"
else:
print "Email sent successfully"
The only new line at the beginning of the program is the import smtplib line, which
accesses a library of routines for handling email.
Many of the intermediate lines do nothing more than build up a set of strings into named
variables. Those strings look like (and are) the requirements for an email message, including the
From: line, the To: line, the Subject: line, etc. The final string is built into variable Message,
which concatenates (glues together) the individual strings, but also inserts CR and LF characters
(the \r and \n items, respectively) to separate the lines. The “heavy lifting” is done by three of
the remaining lines:
Server = smtplib.SMTP("localhost")
Code = Server.sendmail(From, [To], Message)
Server.quit()
These lines could be placed one after another as you see here, but only as long as everything
works correctly. The first creates a local “server” object, the second actually sends the email,
and the third closes down the server correctly. By wrapping the email-send code in a tryfinally block, it is guaranteed that even if the email-send fails, that everything is shut down
properly (even if an error occurs in the try section, the code in the finally section will run).
Finally, the if Code: line uses the result of sending the email to inform the user if everything
worked properly.
It should be pretty obvious that sending email in this fashion is an extremely powerful tool,
and it can be easily misused. It is trivial to “pretend” to be someone you are not, and it is easy
to send a spam message to a million addresses. DO NOT MISUSE THIS CODE. You are
representatives of the University: If I catch anyone using this code for spamming, they will
automatically fail the course. If anyone uses this code to harass other people, or to
communicate with anyone under false pretenses, I will turn them in to the police. DON’T
SCREW THIS UP.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 31
#33: Monday, November 21, 2011 – In the Factorial program written in JavaScript, the largest input
number that returns an integer result is 21 (21! = 51090942171709440000). Larger numbers
return values in scientific notation (22! = 1.124×1021), but only up to a limit of 170 (170! =
7.25×10306). In JavaScript, the factorial of any numbers greater than 170 return the value
“infinity” as their result. This is because all numbers in JavaScript are double-precision floatingpoint numbers (even those shown as integers), which have a dynamic range of approximately
10±308 (and only about 15 significant figures).
In contrast, integers in Python are a separate data type from floating-point numbers, and unlike
double-precision floating-point numbers integers have arbitrary precision. The size of integers
are limited only by the available memory, at a commensurate reduction in processing speed (the
larger the integer, the longer it takes to compute with it). Here is our first Python factorial
program again:
#!/usr/bin/python
def Factorial(N):
F = 1
I = 1
while (I <= N):
F = F * I
I = I + 1
return F
def Interactive():
N = input("Enter a number")
print Factorial(N)
Interactive()
With this program it is possible to compute factorials of any size; the factorial of 200 is
78865786736479050355236321393218506229513597768717326329474253324435944996340
33429203042840119846239041772121389196388302576427902426371050619266249528299
31113462857270763317237396988943922445621451664240254033291864131227428294853
27752424240757390324032125740557956866022603190417032406235170085879617892222
2789623703897374720000000000000000000000000000000000000000000000000 (exactly).
We get this behavior because all variables (N, F, I) and constants (1) in the program are
integers. Changing just one to floating-point is enough to force all dependent calculations to be
done in floating-point as well. For example changing the single line F = 1 to F = 1.0 will do
the job, as the dependent line F = F * I multiplies a floating-point number by an integer,
forcing the calculation to be done in floating-point. If we do this, the limit of the calculations is
now the same as in the JavaScript version, and we cannot compute any factorials larger than
170! (both Python and JavaScript use double-precision floating-point numbers).
We must be very careful about integers and floating-point numbers in Python, particularly with
division. Dividing 6/2 gives 3 as you would expect, but dividing 6/4 gives 1 and not 1.5 because
the 6 and 4 are both integers. To get the fraction, one of the source numbers must be floatingpoint. This can be done as 6.0/4, 6/4.0, or 6.0/4.0 – in each case at least one number is floatingpoint, which forces all integers in the calculation to be converted to floating-point before the
division. Similarly, 1+5=6 (everything is integer), but 1+5.0=6.0 (the 1 is converted to 1.0).
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 32
#34: Wednesday, November 23, 2011 – All about encryption. Single-key encryption (the same key is
used for both encryption and decryption) can be good for large messages, but both sender and
receiver need to have the same key. Code books during WWII say what key to use at particular
time, and became highly protected resources and highly prized by the other side. Here are
some examples of single-key techniques:
1.
Caesar cipher: rotate the alphabet by a fixed amount (rotate by 5 has A  F, B  G, C
 H, …, X  C, Y  D, Z  E). The encryption key is the rotation factor; both sender
and receiver need to know by how much the alphabet was rotated. This form of
encryption is easy to break by brute force – simply try all 25 possible rotations and see
which one makes sense.
2.
Permutation cipher: scramble the letters (A  Q, B  E, C  R, D  M, …). The
encryption key is the number of the permutation used, from 26! =
403291461126605635584000000 possibilities. Harder to break by brute force, but can
be broken by applying statistical means for the target language (most letters in English
generally follow the frequency pattern E, T, A, O, I, N, …).
3.
XOR (exclusive-OR) cipher. Generate a random sequence of bits, and whenever the
random bit is 1 flip the corresponding bit from the message from 0 to 1 or from 1 to 0,
but whenever the random bit is 0 leave the corresponding message bit alone. This lays
a statistical “noise field” over top of the message. Bits from a computer randomnumber generator are not truly random (statistically they appear to be random), but
instead form a pseudorandom sequence depending on a starting seed (the sequence is
repeatable). The encryption key is the starting seed for the random-number generator;
both sender and receiver have to know the seed.
In double-key encryption (also known as public key encryption) the key is formed in two pieces
simultaneously. The math technique is to pick two very large prime numbers and multiply them
together; asking if a large number is prime, divisible only by 1 and itself, is much easier than
factoring numbers into their components. The two pieces of the key are based on the two
prime numbers and their product. Knowing one key does not mean that you can figure out the
other key unless you can factor the product back into the two primes; when it becomes practical
to factor numbers of a certain size, you need only pick larger numbers and the method is secure
once again.
Remember that the two pieces of the key are generated simultaneously. One is made public
and the other is kept private. Keys are symmetric: you can encrypt with either one, and then
decrypt with the other. To send a message to someone, use their public key for encryption;
they will use their private key to decrypt. For someone to send a message to you, they will
encrypt the message with your public key. Only you can decrypt the message because only you
have the private key that matches. To sign a message, use your private key to encrypt the
message; because everyone can decrypt the message with your public key, they know the
message had to come from you.
Generally, messages are encrypted twice: first with the sender’s private key (to sign it), and then
with receiver’s public key (to encrypt it). Only the receiver can get through the outer layer with
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 33
their own private key, and then they use the sender’s public key to verify who it came from.
Messages sent this way are small numbers – but these numbers can then be used in large-scale
single-key encryption for big messages. When browsers connect to secure sites, browser and
server first exchange public keys, then use public key encryption to exchange keys for single key
encryption, and finally use single key encryption to communicate the ordering information
(names, addresses, credit card numbers, etc.).
Steganography is not encryption, but a way to hide information in plain site (“photographic
microdot” of old WWII movies). To do this digitally, you can for example hide each bit of the
message in the low-order bit of the blue value of pixels in an image – doing so doesn’t change
the picture in a noticeable way. Alternately, you can hide each bit of the message in the loworder bit of each sample in an audio file – this doesn’t change the sounds in a noticeable way.
The message can be encrypted first, but doesn’t have to be.
#35: Monday, November 28, 2011 – In-class quiz #2. Here’s the quiz and its solution:
<1>
8 Points – In Python, the following variables X, Y, and Z have the values shown:
X = 9
Y = 5
Z = 4.5
What are the values printed out by the following Python statements?
A.
B.
C.
D.
E.
F.
G.
H.
<2>
print
print
print
print
print
print
print
print
X+Y
X-Y
X*Y
X/Y
X+Z
X-Z
X*Z
X/Z
14
4
45
1 (integer division)
13.5
4.5
40.5
2.0 (floating point division)
4 Points – How many lines are being printed out by each of the following two sections of Python code? How
does the indentation of the print statement affect your answers?
I = 0
while (I < 10):
I = I + 1
print I
I = 0
while (I < 10):
I = I + 1
print I
The code fragment on the left prints ten separate lines (with the value of I ranging from 1 through 10), but
the right-hand code fragment prints only one line (the value of I is 10 at that point). The indentation is the
only difference between the two, where on the left the print statement is indented to be inside the loop
along with the I = I + 1 statement, and on the right the print statement is outside the loop and is
executed only when the loop has completely run its course.
<3>
8 Points – Complete the following Python code to create a Web page containing all numbers from 1 through
1000 and their squares, each in its own <LI>…</LI> tag as part of the unordered list:
#!/usr/bin/python
print
print
print
print
"<HTML>"
"
<HEAD>"
"
<TITLE>My Spiffy Web Page</TITLE>"
"
<HEAD>"
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 34
print ""
print "
print "
<BODY>"
<UL>"
I = 1
while ( I <= 1000 ):
print "<LI>", I, " squared is ", I*I, "</LI>"
I = I + 1
print "
</UL>"
print "
</BODY>"
print "</HTML>"
Mini-Topic: FavIcons (“favorite icons”) are the little custom pictures at the corners of Web
pages. Here is my own home page, showing the icon associated with that page in Firefox next to
the URL and in the tab.
These images are currently 16×16 pixels, with only 16 colors allowed. To edit one of these
images, you need a special program (Windows Paint and Mac PaintBrush won’t work). There
are a number of suitable programs on the Web, including my own icon editor suite. The suite
contains twelve versions of the program, one for each of the common sizes (16×16, 32×32,
48×48, 64×64) and color depths (2, 16, 256 colors) of icon files currently in use today. The
version suitable for FavIcons (16×16, 16 color) is shown below, with the icon from my Web page
loaded:
The left side is the editor panel, where clicking on a pixel changes it to the selected color in the
tools panel on the right. Regions may be flood-filled, horizontal and vertical lines may be drawn
across the image, and pixels may be set to transparent. Infinite “undo” is supported. The icon is
also shown in its actual size in the tool panel.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 35
Loading a FavIcon into a Web page requires that two links be added to the HTML in the
<HEAD>…</HEAD> section:
<LINK REL="shortcut icon" HREF="http://_______/favicon.ico" />
<LINK REL="apple-touch-icon" HREF="http://_______/favicon.ico" />
The differences between the two links are due to browser differences; in some cases the
favicon.ico file must be at the root directory (inside public_html), but for many
browsers the file can be in other folders. Browsers also tend to balk at showing the FavIcon
unless the cache has been cleared and the browser re-started (and sometimes not even then).
#36: Wednesday, November 30, 2011 – Viruses and other Malware. A virus is a program written by
people to intentionally cause disruption to others’ computer systems. Historically, viruses come
in several major epochs:
1.
File-Infector Viruses: programs that attach themselves to the end of MS-DOS .EXE
programs (making them larger on the disk), patching a “jump” instruction to jump to
their own code. When the program is given to someone else (on a floppy disk) and run,
the virus code runs, infecting the new system. These are the easiest to detect, and the
easiest to kill: the virus has a unique “signature” that identifies it to anti-virus programs.
Common from mid 1980s through mid 1990s.
2.
Boot-Sector Viruses: programs that insert themselves in track-zero-sector-zero of disks
(floppy and hard). On old MS-DOS systems, a floppy disk containing the MS-DOS
operating system is inserted into disk A: before power-up (on systems with hard disks,
the operating system is on disk C:), and the permanent code in ROM knows to look in
track-zero-sector-zero for the operating system startup code. By replacing the code in
track-zero-sector-zero with the malware code, that code is automatically run when the
computer is turned on. “Well behaved” viruses move the startup code to somewhere
else on the disk and run it when they are done installing themselves, but other viruses
were not so nice. Easy to detect, but harder to kill: sometimes the disk needs to be
reformatted. Common from late 1980s through late 1990s.
3.
Macro Viruses: Macros are programs written in a special programming language
associated with an application program such as Microsoft Word or Excel; they automate
complicated or frequently executed tasks. In creating Word and Excel for both PCs and
Macs, Microsoft made certain that the macros executed the same way on both
platforms. Thus, a macro program created to do harm will work the same way on both
PCs and Macs. Some graphical email programs from the late 1990s were designed to
detect the presence of an attachment and automatically load the attachment into its
associated application program. If someone was sent an Excel file attachment, the
email program would automatically load the file into Excel. Unfortunately, this means
that the mere receipt of an email with a Word/Excel document containing a macro virus
would be enough to infect the receiving system, be it a PC or Macintosh. This hole was
quickly patched, and now email programs no longer auto-load attachments into their
applications. Email programs now also automatically scan attachments with an antivirus
scanner. Common from late 1990s through mid 2000s.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 36
4.
Buggy Email Exploits: Similarly, malware programs exploit bugs in email programs to
gain control of their hosts. Microsoft Outlook Express has been a common vector. The
macro virus called Melissa in the late 1990s took advantage of bugs in an email program
to send itself to the first 50 addresses in the user’s address book; other viruses send
themselves to everybody in the list. Common from late 1990s through today.
5.
Social-Networking Exploits: While many programs (email programs, browsers running
JavaScript, Java applets, etc.) still contain bugs that can be used to take over the system,
it is harder and harder for malware to get a foothold in a system by themselves. They
need help, and often get that help directly from the user of the system through what is
called “phishing”. These are often done through emails that are Web pages where the
“click here” links to a page containing the malware code configured to take over the
system, or where the “click here” links to a page where the user is fooled into entering
private and personal information (name, bank account numbers, social security
numbers, credit card numbers, etc.). Another variant is the Nigerian Spam Letter (not
always from Nigeria) which is a money-laundering scam that promises millions of dollars
for temporary use of someone’s bank account. Common long ago, still common today.
An old Nightline video shows the panic over the Michelangelo virus in March of 1992, back when
the Web was in its infancy, Microsoft Windows 3.1 and MS-DOS were the common operating
systems of the day, and few people knew how to handle malware. At the time, there were
about 1000 different viruses; many more are known today. Virus “payloads” can be relatively
benign (locking the keyboard) or extremely destructive (deleting all files).
In November of 1988, Robert Morris, Jr., released a program now known as the Internet Worm
(not a virus). It exploited (known) bugs in the UNIX sendmail program to send itself to other
computers. Unfortunately, once each new system was infected, it kept replicating in that
system so that all computer resources were used to run the program and nothing else. The
Internet (still pre-Web) was essentially unusable for several days. Morris received probation
and a fine.
A few years earlier, astronomer Clifford Stoll at Berkeley was tracking a hacker from Germany
who was using the Internet to break into computers in the U.S. looking for secrets to sell to the
KGB. His exploits are documented in a 1989 book called “The Cuckoo’s Egg,” which reads like a
real-life spy novel (as written by a hippie). The appendix to that book recounts how the Internet
Worm worked, and how, because of his experiences tracking the hacker, Stoll for a time was a
suspect in its creation.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 37
#37: Friday, December 2, 2011 – What actions can be taken to avoid malware and phishing attacks?
1.
Run antivirus programs. Well known programs include McAfee AntiVirus, Norton
AntiVirus, Avast, SpyWare Doctor, etc. Some programs are free, including Ad-Aware,
Spybot Search and Destroy, Malware Bytes, etc. Most are for the PC, but there are
programs for the Mac as well (including iAntiVirus).
2.
Run system “de-gunker” system cleanup programs. Not anti-virus programs, per se, but
programs to delete files no longer needed. These include the contents of the
trash/recycling bin, temporary files (both system and browser), cookies (including
tracking cookies), leftover software from installing new programs, etc. One of the best
known for the PC is CCleaner (crap cleaner). Often these programs have “secure delete”
features to thoroughly erase files on the disk by writing random data over them, instead
of simply “forgetting” where the file is located.
3.
Clear cookies, cache, and history in your browser (and empty the trash). In a browser,
the cache is where downloaded files are stored in case they are needed later – rather
than download a file again, the browser can pull it from its cache much more quickly.
Unfortunately, as more and more files are stored in the cache the worse the system
performance will get, and there is always the possibility that an “embarrassing” file is
stored there. Clearing the cache on a regular basis keeps system performance up, gets
rid of “stale” copies of a frequently changing Web page, and gets rid of problematic
files. Similarly, cookies have legitimate uses so that Web sites can track a user’s
preferences, but “tracking cookies” can share browsing habits among different sites.
Clearing cookies regularly reduces problems with privacy, as does clearing the browser
history. Many browsers have a “private browsing” mode that clears these items
automatically.
4.
Never click a “click here” button. No legitimate company will send emails saying
“there’s a problem with your account; click here to visit our site.” Instead, all reputable
companies will say “there’s a problem with your account; please go visit our site.”
Never enter personal or private information into a Web page unless you manually
activated that page AND it is a secure page. Never put passwords, ID numbers, credit
card numbers, etc., in emails – they are NOT typically encrypted. If you receive a
suspect email configured as a Web page (common these days), some email programs let
you see the underlying HTML code – read the code to see if links are to where they
should (i.e., if the Web page shows a link to eBay, the underlying code should also show
a link to <A HREF="http://www.ebay.com/"> and not to some bogus site such
as <A HREF="http://www.hotsex.com/">, or worse, to an anonymous IP
address).
Please make certain that your beloved and aged friends and relatives practice these tips as well.
People who are largely unfamiliar with social engineering exploits are commonly fooled by
“please click here” scams.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 38
Mini-Topic: On-line polls. Everyone familiar with the Web has seen polls, but most people do
not realize how worthless they are. Many polls are badly designed. For example, a poll may
have answers Yes-No-Other (where Other is irrelevant), Yes-Yes-No or Yes-No-No (where one
vote is split among two alternatives or where two questions are being asked in one poll), or may
have answers that do not cover all possibilities.
Polls that are open to everyone can be crashed by concerted efforts of a large number of people
(look up the phrase: “to Pharyngulate a poll”). Polls that are not open, but instead require a
registration log-in, often are targeted to a specific group of people with similar interests. In
those cases, the poll answers will tend to agree with the attitudes of those registrants only.
Here are a couple of examples of bad polls from http://fails.failblog.org/:
#38: Monday, December 5, 2011 – This day was mostly devoted to covering the requirements of
assignment 6 and extra-credit assignment N. Assignment 6 uses a Web page in HTML to send
<FORM> data to a server-based Python script (provided, but requiring modifications).
Assignment N requires the creation of a client-side image map on a transparent .GIF image.
While Windows Paint and Mac PaintBrush can be used to create the basic image and get most of
the coordinates of the objects on screen, they cannot be used to generate transparent .GIF
images. My own Bézier Madness program allows users to create and edit drawings, save those
drawings as transparent .GIF images, extract the coordinates of the objects (either from the
saved .BEZ description file or by creating a .SVG image and examining its contents). The
extra-extra credit portion of assignment N is to create a FavIcon (using my icon editor suite or an
equivalent program).
#39: Wednesday, December 7, 2011 – This was a review of the big themes in the course. In this
semester we covered essentially five different “languages” (some true computer languages,
others languages in name only). Those languages include HTML, CSS, JavaScript, UNIX, and
Python.
The items on the client side include HTML, CSS, and JavaScript. In HTML, we covered an
amalgam of HTML 3 and HTML 4, noting several common tags and attributes now considered
deprecated (<CENTER>, BGCOLOR, etc.), but the Web is slowly migrating towards HTML 5. In
CSS we covered just the basic ideas of CSS 1 (i.e., the “cascade”), but CSS 2 is available now and
use of CSS 3 is growing. JavaScript is used in the browser to create HTML content on the fly (by
writing HTML code directly into the current document) and to permit interaction with <FORM>
data.
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 39
On the server side, we covered the basics of UNIX (the file structure, permissions, directories,
and how to run programs), and the Python programming language. UNIX (and Linux) is
commonly the operating system that runs Web servers today. Python is a general-purpose
programming language, with libraries that can do many things (such as send emails), but it is
also used to respond to Web pages sending <FORM> data by dynamically generating HTML
responses. Python is not the only language pressed into service in these roles: Perl and PHP are
two such, along with server-side JavaScript and many others.
#40: Friday, December 9, 2011 – Return Quiz 2, go over quiz answers, course evaluations. In preparation
for the final exam, it should be emphasized that there will be questions on programming in
JavaScript and Python, as well as HTML and CSS (of course). There will be at least one question
that asks “What is written into the current document by the JavaScript code?” (similar to the
midterm and quizzes) as opposed to “What shows up on screen?”. I encourage students to
download and print out the sample code from the class site (HTML pages, JavaScript samples,
etc.).
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts
Page 40