Abstract Data Type

Data Structure &
Abstract Data Type
C and Data Structures
Baojian Hua
[email protected]
Data Types

A data type consists of:



A collection of data elements (a type)
A set of operations on these data elements
Data types in languages:

predefined:



any language defines a group of predefined data types
(In C) int, char, float, double, …
user-defined:


allow programmers to define their own (new) data types
(In C) structure, union, …
Data Type Examples

Predefined:




type: int
elements: …, -2, -1, 0, 1, 2, …
operations: +, -, *, /, %, …
User-defined:



type: complex
elements: 1+3i, -5+8i, …
operations: newComplex, add, sub, distance, …
Concrete Data Types (CDT)

An concrete data type:


both data type declarations and concrete
representations are available
Almost all C predefined types are CDT

For instance, “int” is a 32-bit double-word,
and +, -, …
Abstract Data Types (ADT)

An abstract data type:



separates data type declaration from
representation
separates function declaration (prototypes) from
implementation
Example of abstract data types in languages



interfaces in Java
signatures in ML
(roughly) header files & typedef in C
Data Structures

Data structure studies the organization of
data in computers, consisting of




the (abstract) data types (definition and repr’)
relationship between elements of this type
operations on data types
Algorithms:

operations on data structures



tradeoffs: efficiency and simplicity, etc.
subtle interplay with data structure design
Slogan: program = data structures+algorithm
What will this part cover?

Linear structures:


Tree & forest:




Linked list, stack, queue, extensible array,
descriptor-based string
binary tree, binary search tree
Graph
Hash
Searching
More on Modules, CDT
and ADT

Suppose we need a data type to
represent complex number c:



a data type “complex”
elements: 3+4i, -5-8i, …
operations:


newComplex, add, sub, distance, …
How to represent this data type in C
(CDT, ADT or …)?
Complex Number
// Recall the definition of a complex number c:
c = x + yi, where x,y \in R, and i=sqrt(-1);
// Some typical operations:
complex newComplex (double x, double y);
complex complexAdd (complex c1, complex c2);
complex complexSub (complex c1, complex c2);
complex complexMult (complex c1, complex c2);
complex complexDistance (complex c1, complex c2);
complex complexModus (complex c1, complex c2);
complex complexDivide (complex c1, complex c2);
// Next, we’d discuss several variants of rep’s:
// CDT, ADT.
CDT of Complex:
Interface—Types
// In a file “complex.h”:
#ifndef COMPLEX_H
#define COMPLEX_H
struct complexStruct
{
double x;
double y;
};
typedef struct complexStruct complex;
complex newComplex (double x, double y);
// other function prototypes are similar
…
#endif
Client Code
// With this interface, we can write client codes
// that manipulate complex numbers. File “main.c”:
#include “complex.h”
int main ()
{
complex c1, c2, c3;
c1 = newComplex (3.0, 4.0);
c2 = newComplex (7.0, 6.0);
c3 = complexAdd (c1, c2);
complexOutput (c3);
return 0;
}
Do we know c1, c2,
c3’s concrete
representation?
How?
CDT Complex:
Implementation
// In a file “complex.c”:
#include “complex.h”
complex newComplex (double x, double y)
{
complex c;
c.x = x;
c.y = y;
return c;
}
// other functions are similar. See Lab2
Problem #1
int main ()
{
complex c;
c = newComplex (3.0, 4.0);
// Want to do this: c = c + (5+i6);
// Ooooops, this is legal:
c.x += 5;
c.y += 6;
return 0;
}
Problem #2
#ifndef COMPLEX_H
#define COMPLEX_H
struct complexStruct
{
// change to a more fancy one? Anger “main”…
double a[2];
};
typedef struct complexStruct complex;
complex newComplex (double x, double y);
// other function prototypes are similar
…
#endif
Problems with CDT?

Operations are transparent.



user code have no idea of the algorithm
Good!
Data representations dependence

Problem #1: User code can access data directly



kick away the interface
safe?
Problem #2: make code rigid

easy to change or evolve?
ADT of Complex:
Interface—Types
// In file “complex.h”:
#ifndef COMPLEX_H
#define COMPLEX_H
// note that “struct complexStruct” not given
typedef struct complexStruct *complex;
complex newComplex (double x, double y);
// other function prototypes are similar
…
#endif
Client Code
// With this interface, we can write client codes
// that manipulate complex numbers. File “main.c”:
#include “complex.h”
int main ()
{
complex c1, c2, c3;
c1 = newComplex (3.0, 4.0);
c2 = newComplex (7.0, 6.0);
c3 = complexAdd (c1, c2);
complexOutput (c3);
return 0;
}
Can we still know
c1, c2, c3’s
concrete
representation?
Why?
ADT Complex:
Implementation#1—Types
// In a file “complex.c”:
#include “complex.h”
// We may choose to define complex type as:
struct complexStruct
{
double x;
double y;
};
// which is hidden in implementation.
ADT Complex:
Implementation Continued
// In a file “complex.c”:
#include “complex.h”
complex newComplex (double x, double y)
{
complex c;
c = (complex)malloc (sizeof (*c));
c->x = x;
c->y = y;
return c;
}
// other functions are similar. See Lab2
ADT Summary

Yes, that’s ADT!


Algorithm is hidden
Data representation is hidden



user code may never access it
thus, client code independent of the impl’
See Lab2 for another data type “nat”

CDT or ADT
Polymorphism


To explain polymorphism, we start with a new
data type “tuple”
A tuple is of the form: (x, y)



xA, yB (aka: A*B)
A, B unknown in advance and may be different
Example:

A=int, B=int:


(2, 3), (4, 6), (9, 7), …
A=char *, B=double:

(“Bob”, 145.8), (“Alice”, 90.5), …
Polymorphism

From the data type point of view, two types:


operations:






A, B
newTuple (x, y);// create a new tuple with x and y
equals (t1, t2); // equality testing
first (t);
// get the first element of t
second (t);
// get the second element of t
…
How to represent this type in computers
(using C)?
Monomorphic Version

Next, we first consider a monomorphic tuple type
called “intTuple”:



both the first and second components are of “int” type
(2, 3), (8, 9), …
The intTuple ADT:



type: intTuple
elements: (2, 3), (8, 9), …
Operations:





tuple newNatTuple (int x, int y);
int first (int t);
int second (tuple t);
int equals (tuple t1, tuple t2);
…
“intTuple” CDT
// in a file “intTuple.h”
#ifndef INT_TUPLE_H
#define INT_TUPLE_H
struct intTupleStruct
{
int x;
int y;
};
typedef struct intTupleStruct intTuple;
intTuple newIntTuple (int n1, int n2);
int first (intTuple t);
…
#endif
Or the “intTuple” ADT
// in a file “intTuple.h”
#ifndef INT_TUPLE_H
#define INT_TUPLE_H
typedef struct intTupleStruct *intTuple;
intTuple newIntTuple (int n1, int n2);
int first (intTuple t);
int tupleEquals (intTuple t1, intTuple t2);
…
#endif
// We only discuss “tupleEquals ()”. All others
// functions left to you.
tupleEquals()
// in a file “intTuple.c”
int tupleEquals (intTuple t1, intTuple t2)
{
return ((t1->x == t2->x) && (t1->y==t2->y));
}
t1
x
y
t2
x
y
Polymorphism

Now, we consider a polymorphic tuple type
called “tuple”:




“poly”: may take various forms
Every element of tuple may be of different types
(2, 3.14), (“8”, ‘a’), (‘\0’, 99), …
The “tuple” ADT:


type: tuple
elements: (2, 3.14), (“8”, ‘a’), (‘\0’, 99), …
The Tuple ADT

What about operations?





tuple newTuple (??? x, ??? y);
??? first (tuple t);
??? second (tuple t);
int equals (tuple t1, tuple t2);
…
Polymorphic Type

To cure this, C offers a polymorphic
type “void *”




“void *” is a pointer which can point to
“any” concrete types (i.e., it’s compatible
with any pointer type), very poly…
think a box or a mask
can not be used directly, use ugly cast
similar to constructs in others language,
such as “Object”
The Tuple ADT

What about operations?





tuple newTuple (void *x, void *y);
void *first (tuple t);
void *second (tuple t);
int equals (tuple t1, tuple t2);
…
“tuple” Interface
// in a file “tuple.h”
#ifndef TUPLE_H
#define TUPLE_H
typedef void *poly;
typedef struct tupleStruct *tuple;
tuple newTuple (poly x, poly y);
poly first (tuple t);
poly second (tuple t);
int equals (tuple t1, tuple t2);
#endif TUPLE_H
Client Code
// in a file “main.c”
#include “complex.h”
#include “tuple.h”
// need the ADT version
int main ()
{
complex c1 = newComplex (1.0, 2.0);
int *ip = (int *)malloc (sizeof (*i));
tuple t1 = newTuple (c1, ip);
return 0;
}
“tuple” ADT Implementation
// in a file “tuple.c”
#include <stdlib.h>
#include “tuple.h”
t
struct tupleStruct
{
poly x;
poly y;
};
tuple newTuple (poly x, poly y)
{
tuple t = (tuple)malloc (sizeof (*t));
t->x = x;
t->y = y;
return t;
}
x
y
“tuple” ADT Implementation
// in a file “tuple.c”
#include <stdlib.h>
#include “tuple.h”
struct tuple
{
poly x;
poly y;
};
poly first (tuple t)
{
return t->x;
}
t
x
y
Client Code
#include “complex.h”
#include “tuple.h”
// ADT version
int main ()
{
complex c1 = newComplex (1.0, 2.0);
int *ip = (int *)malloc (sizeof (*i));
tuple t1 = newTuple (c1, ip);
complex c2 = (complex)first (t1);
return 0;
}
// type cast
“equals”?
struct tupleStruct
{
poly x;
poly y;
};
// The #1 try:
int equals (tuple t1, tuple t2)
{
return ((t1->x == t2->x)
&& (t1->y == t2->y));
// Wrong!!
}
“equals”?
struct tuple
{
poly x;
poly y;
};
// The #2 try:
int equals (tuple t1, tuple t2)
{
return (*(t1->x) == *(t2->x)
&& *(t1->y) == *(t2->y));
// Problem?
}
“equals”?
struct tuple
{
poly x;
poly y;
};
// The #3 try:
int equals (tuple t1, tuple t2)
{
return (equalsXXX (t1->x, t2->x)
&& equalsYYY (t1->y, t2->y));
// but what are “equalsXXX” and “equalsYYY”?
}
Function as Arguments
// So in the body of “equals” function, instead
// of guessing the types of t->x and t->y, we
// require the callers of “equals” supply the
// necessary equality testing functions.
// The #4 try:
typedef int (*tf)(poly, poly);
int equals (tuple t1, tuple t2, tf eqx, tf eqy)
{
return (eqx (t1->x, t2->x)
&& eqy (t1->y, t2->y));
}
Change to “tuple” Interface
// in file “tuple.h”
#ifndef TUPLE_H
#define TUPLE_H
typedef void *poly;
typedef int (*tf)(poly, poly);
typedef struct tuple *tuple;
tuple newTuple (poly x, poly y);
poly first (tuple t);
poly second (tuple t);
int equals (tuple t1, tuple t2, tf eqx, tf eqy);
#endif TUPLE_H
Client Code
// in file “main.c”
#include “complex.h”
#include “tuple.h”
int main ()
{
complex c = newComplex (1.0, 2.0);
int *ip = (int *)malloc (sizeof (int));
tuple t1 = …, t2 = …;
equals (t1, t2, complexEquals, intEquals);
return 0;
}
Moral

void* serves as polymorphic type in C


Pros:



mask all pointer types (think Object type in Java)
code reuse: write once, used in arbitrary context
we’d see more examples later in this course
Cons:

Polymorphism doesn’t come for free



boxed data: data heap-allocated (to cope with void *)
no static or runtime checking (at least in C)
clumsy code

extra function pointer arguments
Data Carrying Functions


Why we can NOT make use of data,
such as passed as function arguments,
when it’s of type “void *”?
Better idea:


Make data carry functions themselves,
instead of make external function calls
such kind of data called objects
Function Pointer in Data
int equals (tuple t1, tuple t2)
{
// note that if t1->x or t1->y has carried the
// equality testing functions, then the code
// could just be written:
return (t1->x->equals (t1->x, t2->x)
&& t1->y->equals (t1->y, t2->y));
equals
}
t1
equals_x
……
x
y
equals
……
equals_y
Function Pointer in Data
// To cope with this, we should modify other
// modules. For instance, the “complex” ADT:
struct complexStruct
equals
{
n
int (*equals) (poly, poly);
double a[2];
x
};
y
complex newComplex (double x, double y)
{
complex c = (complex)malloc (sizeof (*c));
c->equals = complexEquals;
…;
return n;
}
The Call
int equals (tuple t1, tuple t2)
{
return (t1->x->equals (t1->x, t2->x)
&& t1->y->equals (t1->y,t2->y));
}
equals
t1
x
a[0]
a[0]
x
y
a[1]
a[1]
y
t2
Client Code
// in file “main.c”
#include “complex.h”
#include “tuple.h”
int main ()
{
complex c1 = newComplex (1.0, 2.0);
complex c2 = newComplex (1.0, 2.0);
tuple t1 = newTuple (c1, c2);
tuple t2 = newTuple (c1, c2);
equals (t1, t2); // dirty simple!
return 0;
}
:-P
Object

Data elements with function pointers is the
simplest form of objects


With such facilities, we can in principal model
object oriented programming



object = virtual functions + private data
In fact, early C++ compilers compiles to C
That’s partly why I don’t love object-oriented
languages
See Lab #2 for a more production-quality
implementation of objects
Summary

Abstract data types enable modular
programming




clear separation between interface and
implementation
interface and implementation should
design and evolve together
Polymorphism enables code reuse
Object = data + function pointers