CmSc 365 Theory of Computation

CmSc 365 Theory of Computation
Grammars. Numeric functions
(Chapter 4, Sections 4.6, 4.7)
1. Grammars
Grammars are language generators.
They consist of an alphabet of terminal symbols, alphabet of non-terminal symbols, a
starting symbol and rules. Each language, generated by some grammar, can be
recognized by some automaton. Languages (and the corresponding grammars) can be
classified according to the minimal automaton sufficient to recognize them. Such
classification, known as Chomsky Hierarchy, has been defined by Noam Chomsky, a
distinguished linguist with major contributions to linguistics.
The Chomsky Hierarchy comprises four types of languages and their associated
grammars and machines.
Language
Type 3
Grammar
Machine
Example
Regular grammars

Regular
languages
Type 2 Context-free
languages

Right-linear
grammars
Left-linear
grammars
Context-free
grammars
Type 1 Context-sensitive Context-sensitive
languages
grammars
Type 0 Recursive and
recursively
enumerable
languages
Unrestricted
grammars
Deterministic or
nondeterministic finitestate automata
a*
Nondeterministic
pushdown automata
a b
Linear-bound automata
a b c
Turing machines
Any
computable
function
Regular expressions do not have non-terminal symbols, instead they have rules
to describe expressions.
Context-free grammars use terminal and non-terminal symbols. Their rules have
a restriction - only one non-terminal symbol in their left-hand side
Unrestricted grammars - the rules of these grammars do not have the restriction
above - their left-hand sides may contain any string of terminal and /or non-terminal
symbols, provided there is at least one non-terminal symbol.
1
The types of languages form a strict hierarchy; that is, regular languages  context-free
languages  context-sensitive languages  recursive languages  recursively enumerable
languages.
The distinction between languages can be seen by examining the structure of the
grammar rules of their grammar, or the nature of the automata which can be used to
identify them.

Type 3 - Regular Languages
As we have discussed, a regular language is one which can be represented by a
regular grammar, described using a regular expression, or accepted using an FSA.
There are two kinds of regular grammar:
Right-linear (right-regular), with rules of the form
A   B or A   , where A and B are single non-terminal symbols,  is
a terminal symbol
Parse trees with these grammars are right-branching.
Left-linear (left-regular), with rules of the form
A  B  or A  
Parse trees with these grammars are left-branching
Examples of regular languages are pattern matching languages (regular
expressions)

Type 2 - Context-Free Languages
A Context-Free Grammar (CFG) is one whose production rules are of the form:
A
where A is any single non-terminal, and  is any combination of terminals and
non-terminals.
The minimal automaton that recognizes context-free languages is a push-down
automaton. It uses stack when expanding the non-terminal symbols with the righthand side of the corresponding grammar rule.
Examples of CFLs are some simple programming languages

Type 1 - Context-Sensitive Languages
Context-Sensitive grammars may have more than one symbol on the left-handside of their grammar rules, provided that at least one of them is a non-terminal
and the number of symbols on the left-hand-side does not exceed the number of
symbols on the right-hand-side. Their rules have the form:
2
 A    
where A is a single non-terminal symbol, and    are any combination of
terminals and non-terminals.
Since we allow more than one symbol on the left-hand-side, we refer to those
symbols other than the one we are replacing as the context of the replacement.
The automaton which recognizes a context-sensitive language is called a linearbounded automaton: an FSA with a memory to store symbols in a list.
Since the number of the symbols on the left-hand side is always smaller or equal
to the number of the symbols on the right-hand side, the length of each derivation
string is increased when applying a grammar rule. This length is bounded by the
length of the input string. Thus a linear-bounded automaton always needs a finite
list as its store
Examples of context-sensitive languages are most programming languages

Type 0 - Unrestricted (Free) Languages
Unrestricted grammars have no restrictions on their grammar rules, except that
there must be at least one non-terminal on the left-hand-side.
The rules have the form 
where  and  are arbitrary strings of
terminal and non-terminal symbols and    (the empty string)
The type of automata which can recognize such a language is a Turing machine,
with an infinitely long memory.
Examples of unrestricted languages are almost all natural languages.
Turing Machines and Grammars

A language is recursively enumerable if there exists a Turing machine that
accepts every string of the language, and does not accept strings that are not in the
language.

"Does not accept" is not the same as "reject" -- the Turing machine could go into
an infinite loop instead, and never get around to either accepting or rejecting the
string.
The languages generated by unrestricted grammars are precisely the recursively
enumerable languages.
Theorem. Any language generated by an unrestricted grammar is recursively
enumerable.
Theorem: A language is generated by an unrestricted grammar if and only if it is
recursively enumerable.
3
Are all languages recursively enumerable? The answer is no.
Regular languages and context-free languages are recursive languages. This means that a
Turing machine can say whether a string belongs to the language or not.
The complement of a recursive language is also a recursive language - follows from the
fact that we can reverse the "yes" answer to a "no" answer.
Recursive languages are also recursively enumerable languages - we can change the
halting "no" configurations to configurations with non-halting states.
However, there are recursively enumerable languages that are not recursive languages.
They are generated by unrestricted grammars. A Turing machine semidecides such
languages - it can say whether a string belongs to the language. However, if the string
does not belong to the language the machine never stops.
The complement of recursively enumerable languages is not recursively enumerable - we
cannot change a non-existing answer to a "yes" answer.
Non-recursive languages cannot be generated by a grammar - there is no grammar that
can describe them. Each formal grammar has a finite description and therefore can be
considered as a string. Thus, the set of all formal grammars is infinitely countable. The
set of all languages over an alphabet is the power set of all strings over that alphabet. We
have shown that power sets of infinite sets are not countable. Therefore there is no oneto-one match between grammars and languages.
2. Numerical functions
Recursive language - a language that can be decided by a Turing machine
Recursive function - a function that can be computed by a Turing machine
Why do we use the word "recursive"?
It turns out that functions computable by a Turing machine can be represented by means
of very simple, basic functions using composition and recursive definition.
Three basic numerical functions, so simple that their computability is obvious:
1. Zero function: matches a tuple to zero
zk(n1,n2,…nk) = 0 for any k
4
2. Identity function: matches a tuple to a number within the tuple:
Id j,k (n1,n2,…nk) = nj, 0 < j  k
Example: id 3,5 (1,3,5,7,9) = 5
3. Successor function: defines the natural numbers:
s(0) = 1
s(n) = n+1
Using these three functions we can define more complex functions.
Examples:
Addition:
plus(m,0) = m
plus(m,n+1) = s(plus(m,n))
Multiplication:
mult(m,0) = 0
mult(m,n+1) = plus(m,mult(m,n))
It can be proved that all computable functions can be obtained from these primitive
functions and vice versa - all functions that can be obtained are computable.
Question: is f(x) = x2, for x - real number, computable function? The answer is: No.
The reason - we cannot represent all real numbers.
5