Model Class

Requirement Specification
Introduction
For tracking of humanoids or just parts of the humanoid body a motion capturing software will be
developed. In order to create a standardized motion capturing software, a framework should be
implemented which has the following main specifications:





It should be able to deal with multiple depth cameras.
It should combine a model based and machine learning based approach for tracking.
The model based optimization for tracking should be independent of the actual
implementation of the cost function. And the cost function should be independent of the
actual implementation of the model rendering.
The machine learning should be able to perform as standalone tracking or as prior term for
the optimization.
The rendering should also be used for model based learning of the machine learning.
To fulfill these specifications abstract classes which provide general interfaces need to be defined.
Basic Idea
The basic idea of the framework to delveop, is to have two different approaches which can be
combined – a model based and a machine learning based.
The model based approach uses a 3D model which is defined by a mesh and an underlying skeleton.
This can be a human model or anything else. For creation of human models the “Blender” based
software “MakeHuman” is used. It creates humanoid models based on key parameters like hight,
weight and gender. An example for an created mesh with the underlying skeleton can be seen in
figure 11. The simulated images are then compared to captured images to obtain a measure for the
similarity (the depth values). With an optimization algorithm the parameters of the skeleton (joint
angles) are adjusted until the similarity measure (the costs) are approaching the optimum. If this is
done the optimization starts with the next captured frame. The simulation/rendering of the images is
done using OpenGL and Assimp. Assimp is a library
which offers direct acces to plenty of 3D models like
those created by MakeHuman.
The machine learning approach should be
implemented in parallel and has two functionalities.
First, it should be included into the cost function as a
prior term which forces the optimizer to tend to more
likely poses. Second, it should be able to estimate a
set of parameters (joint angles) for the model on its
own. These parameters can serve as initial point for
the model based optimization.
Figure 1: MakeHuman Model
1
http://s34.photobucket.com/user/subzero2006/media/rigproblem_zps23038c05.png.html
Implementation Hints


Apply Model-View-Control structure
Use Observer-Pattern for Logger and Visualizer
Class Overview
In order to realize these ideas and specifications the following classes should be implemented:
Class
Optimizer (abstract)
CostFunction (abstract)
Renderer
Model
Camera
MachineLearner (abstract)
Purpose
 Runs the optimization
 Has the cost function object assigned to compute the costs
 Abstract to ensure usage of different optimization approaches
 Particle Swarm Optimization
 Non Linear Optimization Toolbox NLOpt
 used by optimizer
 has method Costs=Compute(Images, Parameters)
 also holds the constraints
 should also have a method Grad=Gradient(Parameters)
 Needs a function pointer to the MachineLearner class for the
prior term
 One inherited class is the CostFunction_Render: This uses the
OpenGL based 3D model rendering
 The CostFunction_Render has a Renderer object assigned
 Renders simulated images for each camera based on the given
model
 Has a Model object assigned
 Has also camera objects assigned (as many as real cameras are
used)
 Maybe abstract to realize different rendering methods (CPU,
GPU)
 Holds the skeleton and the mesh of the humanoid model
 Computes the transformation of the skeleton based on the
parameters
 Passes the transformation matrices for each bone to the
Renderer
 Class containing the parameters of the used cameras to
capture the images
 Also provides the interface to get new frames
 Maybe also remote (via Network/IP) cameras need to be
implemented
 Needs to implement a method Prob=Probability(Parameters,
Images)
 Abstract to take different machine learning approaches into
account
 Also should have a method Params=MostLikely(Images) to
computed a set of most likely parameters based on a given set
of images from different or one camera
 Should be able to use the model based rendering for learning
Visualizer


Logger

Should implement the displaying of important data
3D view of the model including the registered point cloud of
the cameras
Data logger for debug purposes
Basic Program Flow
In the following the basic flow of the program is explained by UML2 activity diagrams.
Main Tracking Loop
Single Optimization Loop
Computation of Cost Function
Rendering of Images
Camera Class
General Purpose
This Class has two major purposes. First, it should define an easy to use interface to get the RGBD
images of the according real camera. And second, it should provide all camera specific parameters
which are needed for the rendering of a simulated scene.
Annotation
Since a unified camera framework already exists, it is recommended that the newly developed class
is just the frontend to the already existing framework.
Dependencies or Restrictions
Used by the Renderer class
Properties and Methods
Property
ExtrinsicParameter
Data Type
Struct
IntrinsicParameter
Struct
ImageType
Enumeration
ParameterType
Enumeration
Description
 Contains all
extrinsic camera
parameters
 Used by Renderer
 Contains all
intrinsic camera
parameters
 Used by Renderer
 To select in the
GetImage method
which image
 COLOR, DEPTH,
MASK
 To select which
parameters
(matrix) to return
 VIEW_MAT,
PROJECT_MAT
Methods
GetImage(…)
This method should provide the interface to the captured images.
Parameter
Type
In/Out Data Type
In
Camera::ImageType
Image
Out
cv::Mat
Description
 To select
which image
 The
recorded
image
GetParameterMatrix(…)
This method should return the camera matrix to use by OpenGL for setting up the simulated camera.
The matrices must be set as ModelView matrix or Projection matrix in OpenGL.
Parameter
Type
In/Out
In
Data Type
Camera:: ParameterType
Description
 To select
which
parameter
Matrix
Out
cv::Mat

The
transformati
on matrix
Additional Ideas
Open Questions
Renderer Class
General Purpose
This class should render the predefined 3D model to provide simulated depth images or maybe also
color images. Therefore it uses references to the Camera objects to adjust the parameter and to
know how from how many different viewpoints the images need to be computed.
Annotation
The Renderer needs a model for rendering. This should be provided via the “Assimp” library. It is not
yet known where the transformations and so on will be computed exactly. Up to know, it is assumed
that the vertices are transformed on the GPU and thus in this class. The computation of the
transformation matrices of the skeleton under the mesh will be implemented in the Model class.
The Renderer should be able to render multiple different Images at once (at one single Render() call).
The class object should also be static to ensure the availability of the returned reference to an image.
Otherwise it could be possible, that the Renderer object is deleted before the last access to an image.
Dependencies or Restrictions
It uses the Model and Camera class.
It needs a static method for initialization of OpenGL and some other OpenGL specific static methods.
The Renderer object created in the main function should be declared as static too.
Properties
Property
MyModel
Data Type
*Model
Description
 Contains all extrinsic camera
parameters
 Used by Renderer
MyCameras
std::vec<*Camera>
 Contains all intrinsic camera
parameters
 Used by Renderer
RenderedImages std::vec<std::vec<cv::Mat,M>,C>
 The simulated images
 M: for each parameter set
 C: for each camera
Methods
Render(…)
This method should return the simulated images. It has to pass the parameters to the model for
transformation. The transformation matrices are then passed to OpenGL to adjust the vertices.
Afterwards the surface is rendered and copied to the CPU memory. It has to be able to simulate
multiple different parameter sets simultaneously and also for each camera view.
Parameter
Type
Parameters
In/Out Data Type
In
Camera::ImageType
In
std::vec<std::vec<double,N>,M>
Description
 To select which image
 The parameters for
transformation of the
model
 N: number of different
parameters (DOF of the
model)
 M: number of different
parameter sets to simulate
(different poses)
GetImages(…)
Parameter
In/Out Data Type
Description
Type
In
Camera::ImageType
 To select which image
RenderedImages Return const
 Const reference to all
std::vec<std::vec<cv::Mat,M>,C>&
images
 M: for each
parameter set
 C: for each camera
GetImages(…)
Parameter
In/Out Data Type
Type
In
Camera::ImageType
IndexParameterSet In
int
IndexCamera
In
int
RenderedImages
Return const cv::Mat&
Description
 To select which image
 Select which
parameter set of the M
different
 Select which camera of
the C different
 Const reference to a
selected images
Additional Ideas
Maybe this class should be abstract to allow the use of different rendering methods. For example a
GPU based and a CPU based.
Open Questions
Model Class
General Purpose
This class should contain the model information. This is the skeleton containing of bones, the vertices
for the mesh, the bone hierarchy and so forth. It should load the model and compute the
transformation matrices for each bone. The transformation of the vertices will be performed on the
GPU.
Annotation
Dependencies or Restrictions
The class should be independent of OpenGL.
Properties
Property
MyScene
Data Type
Assimp::aiScene
ParameterAssignment std::map<int,
std::pair<string,TrafoType>>
BoneTransformations
std::vec<Assimp:: aiMatrix4x4>
BoneAssignment
std::map<int,string>
TrafoType
Enumeration
Description
 Contains all model
information
 the bones, the hierarchy, the
mesh…
 assigns the index in the
parameter vector to a bone
of the model and a
transformation type
 transformation matrix for
each bone
 assigns the index of each
bone transformation to a
bone name
 which type of transformation:
rotation or translation and
which axis
 RX,RY,RZ,TX,TY,TZ
Methods
Transform(…)
This method should return the simulated images. It has to pass the parameters to the model for
transformation. The transformation matrices are then passed to OpenGL to adjust the vertices.
Afterwards the surface is rendered and copied to the CPU memory. It has to be able to simulated
multiple different parameter sets simultaneously and also for each camera view.
Parameter
Parameters
In/Out Data Type
In
std::vec<double,N>
Description
 The parameters for
transformation of the
model
 N: number of different
parameters (DOF of the
model)
GetTransformation(…)
This method returns the reference to the transformation matrix of the selected bone.
Parameter
BoneIndex
In/Out Data Type
In
int
Transformation Return const aiMatrix4x4&
Description
 The index of the
considered bone in the
BoneTransformation
 The transformation of the
bone
Additional Ideas
The class could also be abstract to adapt for other models besides the Assimp models. But Assimp
supports a large set of different file types, so it is assumed that it is sufficient.
Open Questions
It is not clear yet which other members are needed for the transformation and bone hierarchy.
Maybe the Assimp objects are sufficient but maybe some more member variables are needed for
easier data access. This could also be done by pointers to the Assimp objects.
CostFunction Class (abstract)
General Purpose
This class should compute the costs which should be minimized by the optimization algorithm. The
costs represent a measure for the similarity of the model based simulated images and the captured
images. It will provide the interface to inherited classes.
Annotation
For the optimization the class needs to implement a method for computation of the gradient. Also
methods for nonlinear constraints are needed as well as the bounds for the parameter space. The
costs itself can be computed using the amount of overlapping pixels or the depth distance and so
forth.
Considering the nonlinear constraints (NLC): It might be requested to add multiple nonlinear
constraints. The NLC is represented as a function f(x) which has to fulfill the equation f(x)>0. Thus a
vector of functions is needed as members of the Class. Maybe an extra class NonLonearConstraints
should be introduced.
Dependencies or Restrictions
The inherited CostFunction_Renderer class has a Renderer object assigned to obtain the simulated
images. There should be one subclass which performs the computation of the costs (just the image
similarity) on the GPU. But it is important, that the Renderer is still able to output the simulated
images to the CPU memory.
Properties
Property
UpperBounds
LowerBounds
NLConstraints
ExternalPrior
Data Type
std::vector<double>
Description
 Gives the upper bounds for
each parameter
std::vector<double>
 Gives the lower bounds for
each parameter
std::vector<NonLinearConstraints>
 All nl constraints to consider
during optimization
void*
 Function pointer to a
function/method which
computes an additional
term for the cost function
 Definition: double
UsedRenderer
(only in inherited
class)
Renderer*

Prior(std::vector<double>
Parameters,
std::vector<cv:Mat >
Images)
Pointer to a Renderer object
to obtain the simulated
images
Methods
Compute(…)
- Computes the actual costs for multiple different parameter vectors
Parameter
Parameters
In/Out Data Type
In
std::vector<std::vector<double>>
Images
In
std::vector<cv::Mat>
Data
In
void*
Costs
Out
std::vector<double>
Description
 Set of parameter vectors
for which the costs should
be computed
 Multiple different
parameter vectors at once
(used by PSO)
 The captured images of all
cameras
 Additional data to pass
 Needs to be casted in the
method to a known data
type (e.g. a struct)
 Can be used to include
previous parameters
 The costs for each
parameter vector
Gradient (…)
- Computes the gradient of the cost function for multiple different parameter vectors
Parameter
Parameters
Grad
In/Out Data Type
Description
In
std::vector<std::vector<double,N>,M>
 Set of parameter
vectors for which the
costs should be
computed
 Multiple different
parameter vectors at
once
 M: number of
different vectors
 N: number of
parameters/values of
the vector
Out
std::vector<std::vector<double,N>,M>
 The gradient for each
parameter vector
Additional Ideas
Maybe a motion model is useful as additional term in the cost function. For example a constant
acceleration model can be assumed and based on the last two poses/parameter vectors the new
parameters can be estimated. (almost like a Kalman filter)
Open Questions
It is not known yet how to deal with the nonlinear constraints and whether it is meaningful to
implement them as class or not.
Optimizer Class (abstract)
General Purpose
This class should perform the optimization of the model parameter on each frame. Therefore it uses
a CostFunction object. Currently two approaches are pursued: One inherited class should implement
a particle swarm optimization and the other inherited class should use an optimization library called
“NLOpt”.
Annotation
Dependencies or Restrictions
Up to now two inherited classes will be implemented: Optimizer_PSO and Optimizer_NLOpt.
Properties
Property
UsedCostFunction
Data Type
CostFunction*
Description
 Pointer to a CostFunction
object
Methods
Run (…)
- Runs the optimization on one frame
Parameter
Images
In/Out Data Type
In
std::vector<cv::Mat>
Data
In
void*
Parameters
InOut
std::vector<double>*
Description
 The captured images of all
cameras
 Additional data to pass
 Needs to be casted in the
method to a known data
type (e.g. a struct)
 Can be used to include
previous parameters
 In: the initial parameters
 Out: the fitted parameters
Additional Ideas
Open Questions
Class Diagram
Work Package Estimation
Package
1. Basic Framework
2. Camera Interface
3. NLOpt Optimizer
4. PSO Optimizer
5. Model Class
6. GPU Renderer
7. GPU CostFunc
8. MachineLearner
Duration
6d*8h/d=48h
9d*8h/d=72h
2d*8h/d=16h
3d*8h/d=24h
9d*8h/d=72h
12d*8h/d=96h
7d*8h/d=56h
9d*8h/d=72h
Description
SUM
456*1.3 = 592
Basic Framework
Tasks
Invent basic structure
Define requirements
Code class bodies
Duration
2d
2d
2d
Description
Duration
2d
2d
Description
Camera Interface
Tasks
Familiarization with existing FW
Familiarization with OpenGL Camera
Trafo
Coding
Testing
3d
2d
NLOpt Optimizer
Tasks
Familiarization with NLOpt
Coding (Gradient)
Testing
Duration
0.5 d
1d
0.5 d
Description
Duration
0.5 d
2d
0.5 d
Description
Duration
2d
2d
1d
3d
1d
Description
Duration
2d
2d
3d
3d
2d
Description
PSO Optimizer
Tasks
Familiarization with PSO
Coding
Testing
Model Class
Tasks
Familiarization with Assimp
Familarization with Q2 Model and Trafo
Familarization with Trafo for OpenGL
Coding
Testing
GPU Renderer
Tasks
Familiarization with OpenGL
Familarization with CUDA
Struggling with CUDA
Coding
Testing
GPU CostFunction
Tasks
Struggling with CUDA
Coding
Adjustment of Cost Function
Testing
Duration
2d
1d
2d
2d
Description
Duration
2d
2d
2d
Description
MachineLearner
Tasks
Selection of an approach
Familiarization with selected approach
Specification of requirements for
approach
Coding
Testing
2d
1d