Generic Programming 1 Introduction 2 Motivating examples

Generic Programming
Edgar Müller
Technische Universität München
4.6.2011
Abstract
This article is a short and concise introduction into the topic of generic
programming. We therefore rst try to cover the theoretical foundations
which are needed to understand the various concepts and will then deal
with dierent types of genericity. We will then take a look at the implementation details of generic programming concepts in popular programming languages, especially the template mechanism in C++. Along the
way, we present some more examples, which have been written in C++,
Java and Haskell. Therefore a basic understanding of these languages
might be helpful.
1
Introduction
Generic programming has been a main topic of interest, both to theoreticians
and practitioners, since Ada introduced such concepts known as 'gengerics' in
1983. Popularity has risen over the last years with the introduction of templates
in C++ and generics in Java 1.5. Although the term 'generic programming' does
not have an exact denition, most parties agree, that the concepts of it help
to improve modularity, extensibility and reusability of code. We will highlight
why this is the case and what kinds of techniques exist.
2
Motivating examples
For motivating the concepts of generic programming we will give two examples,
each one covering a dierent kind of genericity. In both cases we will see, that
writing code without using concepts from generic programming can quickly
become tedious and error-prone. We will explain why this is often the case and
show that the term 'generic programming' describes more than one technique
to improve the overall quality of code.
1.
(genericity by type, taken from [3])
Suppose, we want to implement a square function in C++. A rst naive
approach might look like:
Square function
1
int sqr ( int x) { return x * x; }
As one can see, this works only for integers. If we want oats to work with
sqr too, we have to provide almost the same code once again:
float sqr ( float y) { return y * y; }
The solution is to use generic parameters. C++ supports the idea of
generic parameters with templates1 . The compiler will then generate code
for each parameter type the function is called with. In this sense, using
function templates, as we do in the listing below, can be considered declaring a whole family of functions2 .
template < class T > T sqr (T x ) { return x * x; }
Here T is a placeholder for the actual type to be used. Now we can use
the sqr function with each datatype, that supports the * operator3 . The
next example shows the usage of the redened sqr function.
int
foo = sqr (5);
float bar = sqr (6.0);
2.
// foo = 25 , T = int
// bar = 36.0 , T = float
(genericity by function, taken from [7])
Next, we want to show an example to demonstrate a feature, that is typically found in functional languages: the concept of higher-order functions,
which also can be considered a form of genericity. We want to motivate
the usefullness of such functions by showing a common pattern, that is often found. Say, for example, we want to provide functions for lower-casing
and upper-casing a string. We probably get something like:
Higher-order functions
stringToUpper :: List Char -> List Char
stringToUpper [] = []
stringToUpper (x : xs ) = ( toUpper x) : ( stringToUpper xs )
stringToLower :: List Char -> List Char
stringToLower [] = []
stringToLower (x : xs ) = ( toLower x) : ( stringToLower xs )
Listing 1: Recursive function denitions for upper-/lowercasing a string
1 For
2 We
a very short introduction into the syntax of C++ templates, please see appendix A.
also could introduce the type parameter
class keyword, this is due historical
archive/2004/08/11/212768.aspx.
the
3 This
T
typename keyword instead of
http://blogs.msdn.com/b/slippman/
by using the
reasons, see
is called operator overloading, a concept we will go into more detail in section 4.2.
2
Both functions follow the same common pattern: rst modify the head
of the list and then recur with the rest of the list. Actually, the dierence between stringToUpper and stringToLower, is the function that
gets applied on each list element. What we want, is to pass that function
somehow, since it denes the characteristic behavior. Because functions
are rst-class citizens (meaning, that they can be treated as common variables), we can abstract the pattern and introduce a second parameter, that
represents the function that will be applied on each list element. What
we get is the known map function (note that this example also involves
'genericity by type', since we use type variables).
map :: (a -> b) -> List a -> List b
map f [] = []
map f (x: xs ) = f x : map f xs
Listing 2: Denition of the higher-ordered map function
stringToLower and stringToUpper now can be redened in a more concise way than before using point-free style4 .
stringToLower = map toLower
stringToUpper = map toUpper
As we've seen in the former two examples, the concepts of generic programming
help to reduce the amount of tedious code, so called Boilerplate code, one needs
to write. Furthermore, they reduce the fragility, because, in most cases, only
one function has to be changed instead of multiple ones. Lastly, one should also
note, that if one isn't enforced to write Boilerplate code, it helps staying focused
on the task to accomplish.
3
Theoretical foundations
In this section we want to introduce the necessary concepts one needs to understand in order to fully grasp the concepts of generic programming. Before going
into details, we will give a short and somewhat informal denition of what a type
system is, to get a rough idea what such a system is responsible for. Benjamin
C. Piere's denition, which is used here, reads as follows:
A type system is a tractable syntactic method for proving the absence of
certain program behaviors by classifying phrases according to the kinds of
values they compute.
For a discussion of this denition, see [12].
4 In
functional programming it is typical to omit the actual arguments, see
haskell.org/haskellwiki/Pointfree
for an detailed explanation.
3
http://www.
3.1
Static vs. dynamic typing
Basically there are two approaches which can be applied when checking the types
of a program. Either they can be determined at compile-time or at run-time.
The former approach has the advantage, that type errors can be determined
directly at compile-time. In this sense, a certain range of errors already can
be avoided, when writing a program. On the contrary it is also possible, to
check for type errors when a program is run5 . This approach generally is more
exible, since data types must not be known at compile-time. It also benets
rapid application development, because it often speeds up development, e.g.
assume you want to change a Haskell program where all functions have been
annotated with their type signature6 . Now, if a function has to be changed, the
according type signatures also have to be changed, which costs the programmer
additional time. But, of course, the advantage of the program being type-safe
might get lost.
With dynamic typing, components can be written independent of their type.
The components either understand the message that has been passed to them,
or not, in which case each language denes a special semantics to handle such
conditions [9].
What's important about parameterized types (see section 3.3) and dynamic typing, is that these approaches are complementary to each other. Parameterized
types are only possible in statically typed languages [6].
3.2
Strong typing vs. weak typing
Strong typing generally describes a concept, that is used to avoid certain type
errors, e.g. if an operation is applied to arguments which don't t together.
For example, if one tries to add a number and a string. The term does not
have an exact denition but the main characteristic is that no implicit type
conversions are allowed. On the contrary, weak typing allows such actions. Luca
Cardelli's article 'Typeful Programming' describes strong typing as the absence
of unchecked run-time type errors [1]. As an example consider the following
pseudo-code and check the outcome7 .
a = 2
b = '2 '
concatenate (a , b) // returns '22 '
add (a , b)
// returns 4
Listing 3: Weak typing example
With strong typing the given example will cause type errors to happen. Therefore one rst has to convert the type by using the conversion functions str and
5 Actually,
what would be a better term, is 'dynamically checked' instead of 'dynamically
typed' [12].
6 Haskell
is statically typed and it is considered good style to annotate each function with
its type signature.
7 Examples
taken
typing&oldid=427717870
from
http://en.wikipedia.org/w/index.php?title=Strong_
4
int, which convert their argument to a string or an int, respectively (note that
this is not the same as casting).
a = 2
b = '2 '
concatenate (a , b)
//
add (a , b)
//
concatenate ( str (a), b) //
add (a , int ( b ))
//
type error
type error
returns '22 '
returns 4
Listing 4: Strong typing example
3.3
Parameterized types (generics) vs. inheritance vs. object composition
In this section we want to compare dierent techniques for enhancing reusability
and extensibility of software components [10].
Parameterized types, also called parametric polymorphism, allow the easy reuse
of container objects, that can hold elements of a certain type, e.g. a list can be
parameterized over the type of elements it may contain [6]. To declare such a
type, one instantiates the type parameter. If we want to declare a list of string
objects in Java for example, we would write List<String>.
Parameterized types are only one way of three to dene behavior in objectoriented languages: the other two are object composition and inheritance. To
illustrate all approaches, assume we want to dene a sorting routine by the
operation it uses to compare the elements [6]. We then can choose among these
possibilities:
• an operation implemented by a subclass (inheritance, e.g. the template
method pattern, in Java: abstract function):
+ can be used to provide default implementations
- can't be changed at run-time
• an additional parameter object, that is passed to the sorting routine (object composition, e.g. strategy pattern)
+ behavior can be composed at run-time
- might be less ecient due to indirection
• an additional argument, that determines the function, that should be used
to compare the elements (parameterized types)
+ easy reuse
- can't be changed at run-time
5
4
Types of genericity
We've already seen examples for genericity by type and function in section 2 of
this document. We will now try to cover more common aspects of genericity, in
particular we will take a deeper look at genericity by type.
4.1
Genericity by value
This might be the simplest form of genericity. Take for example a function
which plots a triangle. A naive approach would be to directly hard-wire the
behavior, for example [7]:
System . out . println ( "*" );
System . out . println ( " ** " );
System . out . println ( " *** " );
System . out . println ( " **** " );
Listing 5: Naive approach to print a triangle
Instead, we can introduce an additional parameter to abstract the intended
behavior:
void triangle ( int side ) {
for ( int row = 1; row < side ; row ++) {
for ( int col = 1; col < row ; col ++) {
System . out . print ("* " );
}
System . out . println ();
}
}
Listing 6: Function to print a triangle
This way, the program becomes more re-usable, because, if we would like to
change the side length of the triangle to be printed, all we have to do is to
change the value of the actual parameter that has been passed to triangle
function.
4.2
Genericity by type
We've seen what parametric polymorphism is and how it is used like in objectoriented languages. But this concept also applies to functional programming
languages, where the term 'generics' isn't used much. A classical example of
this would be the declaration of a list datatype, where we can make use of a
type variable, e.g. in Haskell we can declare such a type as follows:
data List a = Nil | Cons a ( List a)
Related to parametric polymorphism is subtype polymorphism often found in
object-oriented languages. Let's say, we have an addObserver function, that is
often found when implementing the observer pattern [6]:
public void addObserver ( Observer observer ) {
observers . addElement ( observer );
}
6
This function accepts all parameters which are either an observer or a subtype of an observer (or if, observer is an interface, all types which implement
the Observer interface). Subtype polymorphism is often also called inclusion
polymorphism.
The crucial dierence between parametric and subtype polymorphism is that
with the former, values and functions take type parameters, either implicitly or
explicitly, while with the latter, types are organized into a hierarchy where a
subtype of a certain kind is substituable for all its supertypes [7].
To illustrate both concepts furthermore, here is a C++ example taken from [3]:
class A {
public : void hello () const {
cout << " Instance of A" << endl ;
}
};
class B {
public : void hello () const {
cout << " Instance of B" << endl ;
}
};
template < class T >
void test ( const T & v) {
v. hello ();
};
A a;
B b;
test (a );
test (b );
Listing 7: Demonstration of parametric polymorphism in C++
While test(a) prints "Instance of A" the second line prints "Instance of
B". Now we want to achieve the same behavior, but this time we use a parent
class, from which two child classes A and B inherit:
class ClassWithHello {
public : void hello () const = 0;
}
class A : public ClassWithHello {
public : void hello () const {
cout << " Instance of A" << endl ;
}
};
class B: public ClassWithHello {
public : void hello () const {
cout << " Instance of B" << endl ;
}
};
void test ( const ClassWithHello & v ) {
v. hello ();
7
};
A a;
B b;
test (a );
test (b );
Listing 8: Demonstration of subtype polymorphism in C++
Another aspect, which we haven't yet touched upon, is ad-hoc polymorphism.
Cardelli and Wagner provide us with this denition: there's universal polymorphism, to which parametric and subtype polymorphism belong and there's
ad-hoc polymorphism, to which overloading and coercion belong [2] (see gure
1).
Basically, overloading means, that it is possible to dene multiple functions with
the same name and that these functions can have dierent meanings in dierent
contexts. Coercion describes a function with just one meaning, but semantic
conversions are applied to arguments where necessary, i.e. implicit type conversions take place, as for example in 42 + 0.42 where the 42 is automatically
converted to a float.
Figure 1: Classication of dierent polymorphism classes
4.3
Genericity by structure
Genericity by structure is dened by 'a methodology for program design and
implementation that separates data structures and algorithms through the use
of abstract requirement specications' [7]. For example, C++'s STL provides
containers, iterators and algorithms for a variety of datatypes. Iterators hereby
form the interface between containers and algorithms. The exact requirements
on parameters passed to algorithms (such as for example, an iterator) are called
a concept in C++. A concept essentially encapsualtes the operations required
on a formal parameter (in some sense this is comparable to an interface in
Java). Containers and algorithms can be instantiated by passing an actual
structure (again, if compared to Java, this is the class implementing the interface
required).
Haskell's type classes resemble concepts in C++. Concepts, in that sense, can
then be seen as constraints in the type signature of a function, as the following
sort example with type classes demonstrates:
8
sort :: Ord a = > [a] -> [a]
This Haskell type signature states that in order to be able to sort a list, the
elements of that list must be comparable. The Ord type class denition contains
all the operations that must be supported (below is the minimum denition, all
other operations can be implemented based on that one):
class ( Eq a) = > Ord a where
( <=) :: a -> a -> Bool
Datatypes can be made members of a type class by making them an instance
of the class. Here we state that Integers are comparable with each other
instance Ord Integer where
(m <= n ) = isNonNegative ( n - m)
Note that Haskell's type classes as well as C++'s STL both are ad-hoc polymorphic and not universal [7].
4.4
Other forms of genericity
There are more forms of genericity not covered here like 'genericity by stage',
'genericity by property' or 'genericity by shape', see [7] for a reference.
5
Generic programming implementation details
One often heard concern people have with generic programming is eency [4].
C++ solves this problem by generating dedicated code for each datatype, that
a generic function is instantiated with. In some cases this might result in
code bloat, but it does not have an impact on performance. In Java, all
generics related type information is removed upon compile-time, i.e. for both
ArrayList<Integer> and ArrayList<String> the same compilation unit is
used. This isn't done due to performance reasons, but more due backwardcompatibility. As a consequence type checking at run-time is not possible. Java
considers the types ArrayList<Integer> and ArrayList<String> to be the
same at run-time.
In dynamic languages, where type-checking occurs at run-time, the executing
interpreter must track the type information of variables and expressions. This is
also known as tagging [12]. For each operation, that is executed, the interpreter
has to rst inspect the tags and then ensure type safety. This results in a
performance overhead, since additional memory must be used to track the type
information and additionally, the type-checks have to be executed at run-time.
In functional languages like Haskell, there is one sequence of compiled instructions for each polymorphic function. Pointers to datatypes are used when passing parameters to functions and in the representation of data structures, such
that the compiler can access all necessary type information. This concept is also
called uniform data representation [11].
As an example of this principle, lets consider a pair function, which produces a
tuple of the arguments passed:
9
pair :: t -> t1 -> (t , t1 )
pair x y = (x ,y )
When the pair function is called, it receives two pointers, which are always
independent of the type they point to. The compiler then implements the entire
computation by just manipulating the pointers instead of the data they point to.
Uniform data representation often leads to smaller code size, especially when
compared to C++'s approach, but on the other hand it can be less ecient,
since a lot of pointer manipulation must be done [11].
5.1
A closer look at the C++ template mechanism
We've already seen that templates in C++ are used to realize parametric polymorphism. We will now take a closer look at how templates work internally.
Most of this is based on [8]. We will use an example to illustrate the inner
workings. Suppose we have declared a class A with a template parameter T:
template < class T >
class A {
private : T x , y;
public :
A(T p , T q) {
x = p;
y = q;
}
void f ();
void g ();
}
Listing 9: Declaration of the template-based class A
Our main program might look like this:
int main () {
A < int > intA (4 , 2);
A < float > floatA (4.2 , 8.4);
}
What happens during compilation of the above main program, is that the compiler generates two distinct class types and assigns each one of them an internal
name8 . This internal name and the generated class denition is then used to
replace all occurences of a encountered template, e.g. the type A<int> will be
replaced by the generated class below:
class A_int {
private : int x , y;
public :
A( int p , int q) {
x = p;
y = q;
}
8 Thus
compile-time will take longer, but as mentioned earlier, it has no impact on run-time
performance
10
}
Listing 10: Internally generated class for A<int>
So the actual types of the declared variables in the main program in listing
5.1 are A_int and analogously A_float. Template code is always generated on
demand, i.e. as long as no instantiation or function call is met, no code will be
generated. Suppose we change the main program as in the listing 5.1 below:
int main () {
A < int > intA (4 , 2);
A < float > floatA (4.2 , 8.4);
intA .f ();
}
The compiler will generate code for the constructors and deconstructors of
A<int> and A<float> and for the function f, since it is called explicitly, but
it won't generate code for the function g, since it hasn't been called anywhere.
As a consequence of this, templates can't be split into the interface (the header
le) and their implementation (the cpp le). This is due to the fact that in the
implementation le there's no call to a constructor (which usually resides in the
header le) and when the main program tries to call the constructor, an error
will be thrown due to unresolved references. Listing 11 illustrates this:
// file A.h
template < class T >
class A {
private : B x , y;
public :
A(B p , B q );
}
// file A. cpp
# include "A.h "
template < class T >
A <T >:: A(B p , B q) {
x = p;
y = q;
}
}
// file main . cpp
# include "A.h "
int main () {
A < int > a (4 , 2);
return 0;
}
Listing 11: The call a(4,2) will produce an error
As a consequence, interface and implementation must reside in the same le,
when using templates, what can be seen as a drawback in respect to modularity.
The template mechanism has some other minor drawbacks, which are not covered here, but they are related to static variables, type safety of the template
arguments and the usage of templates with a template argument, that inherits
from an other class. Please see [8] for a discussion of this properties.
11
6
Conclusion
This article tried to cover the most important aspects of generic programming in
a way, that is accessible to novices. We tried this by rst giving some motivating
examples and then covering the theoretical foundations for understanding the
various concepts. We also covered the most important implementation details.
Nevertheless, this article only scratches the surface of a very broad topic. In
particular we didn't try to make use of any formal theory, that can be used
to describe the concepts of generic programming, as for example done with the
help of category theory in [5].
In summary, the success of generic programming over the last years is no coincidence: it helps to reduce boilerplate code and makes software easier to write and
maintain, although in some cases it has minor drawbacks, as for example with
C++, but these hardly can compromise the positive view. It can be foreseen,
that interest in generic programming will remain high, since it is still an active
research area.
A
C++
templates
syntax
in
comparsion
with
Java's generics
Since nowadays most people know Java, we compare Java's generics syntax
with the template syntax of C++, such that a basic understanding of declaring generic C++ types can be gained. Basically, there are two dierent kinds
of templates: class and function templates. The former is used to parameterize data structures as in, e.g. template <class T> class Stack9 Function
templates are used for declaring generic functions, e.g. template <class T> T
max(T a, T b)10 .
Here we just covered the syntactic diernces between C++ templates and Java's
generics. There are of course a lot more of dierences between C++ templat
References
[1] Luca Cardelli. Typeful programming. Technical report, 1989.
[2] Luca Cardelli and Peter Wegner. On understanding types, data abstraction,
and polymorphism. ACM Comput. Surv., 17:471523, December 1985.
[3] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative programming:
methods, tools, and applications. ACM Press/Addison-Wesley Publishing
Co., New York, NY, USA, 2000.
9 An equivalent declaration of a stack class would look like class MyStack<T>.
10 Again, the equivalent declaration of a function in Java would look like <T> T max(T x, T
y).
12
[4] James C. Dehnert and Alexander A. Stepanov. Fundamentals of generic
programming. In Selected Papers from the International Seminar on
Generic Programming, pages 111, London, UK, 2000. Springer-Verlag.
[5] Gabriel Dos Reis and Jaakko Järvi. What is generic programming? In Proceedings of the First International Workshop of Library-Centric Software
Design (LCSD '05). An OOPSLA '05 workshop
, October 2005.
[6] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.
Patterns. Addison-Wesley, Boston, MA, 1995.
[7] Jeremy Gibbons.
Datatype-Generic Programming.
, 9 2004.
EE
Design
Times
(http://www.eetimes.com)
[8] Arijit Khan and Shatrugna Sadhu. A comparative analysis of generic programming paradigms in c++, java and c#. Technical report, Department
of Computer Science, University of California, Santa Barbara, 2009.
[9] Erik Meijer and Peter Drayton. Static Typing Where Possible, Dynamic
Typing When Needed. 2005.
[10] Bertrand Meyer. Genericity versus inheritance.
405, June 1986.
[11] John C. Mitchell.
Press, 2003.
SIGPLAN Not.
Concepts in programming languages
[12] Benjamin C. Pierce. Types
bridge, MA, USA, 2002.
, 21:391
. Cambridge Univ
. MIT Press, Cam-
and programming languages
13

Download Report

Generic Programming 1 Introduction 2 Motivating examples

Paperzz.com

Your Paperzz