Requirement Specification Introduction For tracking of humanoids or just parts of the humanoid body a motion capturing software will be developed. In order to create a standardized motion capturing software, a framework should be implemented which has the following main specifications: It should be able to deal with multiple depth cameras. It should combine a model based and machine learning based approach for tracking. The model based optimization for tracking should be independent of the actual implementation of the cost function. And the cost function should be independent of the actual implementation of the model rendering. The machine learning should be able to perform as standalone tracking or as prior term for the optimization. The rendering should also be used for model based learning of the machine learning. To fulfill these specifications abstract classes which provide general interfaces need to be defined. Basic Idea The basic idea of the framework to delveop, is to have two different approaches which can be combined – a model based and a machine learning based. The model based approach uses a 3D model which is defined by a mesh and an underlying skeleton. This can be a human model or anything else. For creation of human models the “Blender” based software “MakeHuman” is used. It creates humanoid models based on key parameters like hight, weight and gender. An example for an created mesh with the underlying skeleton can be seen in figure 11. The simulated images are then compared to captured images to obtain a measure for the similarity (the depth values). With an optimization algorithm the parameters of the skeleton (joint angles) are adjusted until the similarity measure (the costs) are approaching the optimum. If this is done the optimization starts with the next captured frame. The simulation/rendering of the images is done using OpenGL and Assimp. Assimp is a library which offers direct acces to plenty of 3D models like those created by MakeHuman. The machine learning approach should be implemented in parallel and has two functionalities. First, it should be included into the cost function as a prior term which forces the optimizer to tend to more likely poses. Second, it should be able to estimate a set of parameters (joint angles) for the model on its own. These parameters can serve as initial point for the model based optimization. Figure 1: MakeHuman Model 1 http://s34.photobucket.com/user/subzero2006/media/rigproblem_zps23038c05.png.html Implementation Hints Apply Model-View-Control structure Use Observer-Pattern for Logger and Visualizer Class Overview In order to realize these ideas and specifications the following classes should be implemented: Class Optimizer (abstract) CostFunction (abstract) Renderer Model Camera MachineLearner (abstract) Purpose Runs the optimization Has the cost function object assigned to compute the costs Abstract to ensure usage of different optimization approaches Particle Swarm Optimization Non Linear Optimization Toolbox NLOpt used by optimizer has method Costs=Compute(Images, Parameters) also holds the constraints should also have a method Grad=Gradient(Parameters) Needs a function pointer to the MachineLearner class for the prior term One inherited class is the CostFunction_Render: This uses the OpenGL based 3D model rendering The CostFunction_Render has a Renderer object assigned Renders simulated images for each camera based on the given model Has a Model object assigned Has also camera objects assigned (as many as real cameras are used) Maybe abstract to realize different rendering methods (CPU, GPU) Holds the skeleton and the mesh of the humanoid model Computes the transformation of the skeleton based on the parameters Passes the transformation matrices for each bone to the Renderer Class containing the parameters of the used cameras to capture the images Also provides the interface to get new frames Maybe also remote (via Network/IP) cameras need to be implemented Needs to implement a method Prob=Probability(Parameters, Images) Abstract to take different machine learning approaches into account Also should have a method Params=MostLikely(Images) to computed a set of most likely parameters based on a given set of images from different or one camera Should be able to use the model based rendering for learning Visualizer Logger Should implement the displaying of important data 3D view of the model including the registered point cloud of the cameras Data logger for debug purposes Basic Program Flow In the following the basic flow of the program is explained by UML2 activity diagrams. Main Tracking Loop Single Optimization Loop Computation of Cost Function Rendering of Images Camera Class General Purpose This Class has two major purposes. First, it should define an easy to use interface to get the RGBD images of the according real camera. And second, it should provide all camera specific parameters which are needed for the rendering of a simulated scene. Annotation Since a unified camera framework already exists, it is recommended that the newly developed class is just the frontend to the already existing framework. Dependencies or Restrictions Used by the Renderer class Properties and Methods Property ExtrinsicParameter Data Type Struct IntrinsicParameter Struct ImageType Enumeration ParameterType Enumeration Description Contains all extrinsic camera parameters Used by Renderer Contains all intrinsic camera parameters Used by Renderer To select in the GetImage method which image COLOR, DEPTH, MASK To select which parameters (matrix) to return VIEW_MAT, PROJECT_MAT Methods GetImage(…) This method should provide the interface to the captured images. Parameter Type In/Out Data Type In Camera::ImageType Image Out cv::Mat Description To select which image The recorded image GetParameterMatrix(…) This method should return the camera matrix to use by OpenGL for setting up the simulated camera. The matrices must be set as ModelView matrix or Projection matrix in OpenGL. Parameter Type In/Out In Data Type Camera:: ParameterType Description To select which parameter Matrix Out cv::Mat The transformati on matrix Additional Ideas Open Questions Renderer Class General Purpose This class should render the predefined 3D model to provide simulated depth images or maybe also color images. Therefore it uses references to the Camera objects to adjust the parameter and to know how from how many different viewpoints the images need to be computed. Annotation The Renderer needs a model for rendering. This should be provided via the “Assimp” library. It is not yet known where the transformations and so on will be computed exactly. Up to know, it is assumed that the vertices are transformed on the GPU and thus in this class. The computation of the transformation matrices of the skeleton under the mesh will be implemented in the Model class. The Renderer should be able to render multiple different Images at once (at one single Render() call). The class object should also be static to ensure the availability of the returned reference to an image. Otherwise it could be possible, that the Renderer object is deleted before the last access to an image. Dependencies or Restrictions It uses the Model and Camera class. It needs a static method for initialization of OpenGL and some other OpenGL specific static methods. The Renderer object created in the main function should be declared as static too. Properties Property MyModel Data Type *Model Description Contains all extrinsic camera parameters Used by Renderer MyCameras std::vec<*Camera> Contains all intrinsic camera parameters Used by Renderer RenderedImages std::vec<std::vec<cv::Mat,M>,C> The simulated images M: for each parameter set C: for each camera Methods Render(…) This method should return the simulated images. It has to pass the parameters to the model for transformation. The transformation matrices are then passed to OpenGL to adjust the vertices. Afterwards the surface is rendered and copied to the CPU memory. It has to be able to simulate multiple different parameter sets simultaneously and also for each camera view. Parameter Type Parameters In/Out Data Type In Camera::ImageType In std::vec<std::vec<double,N>,M> Description To select which image The parameters for transformation of the model N: number of different parameters (DOF of the model) M: number of different parameter sets to simulate (different poses) GetImages(…) Parameter In/Out Data Type Description Type In Camera::ImageType To select which image RenderedImages Return const Const reference to all std::vec<std::vec<cv::Mat,M>,C>& images M: for each parameter set C: for each camera GetImages(…) Parameter In/Out Data Type Type In Camera::ImageType IndexParameterSet In int IndexCamera In int RenderedImages Return const cv::Mat& Description To select which image Select which parameter set of the M different Select which camera of the C different Const reference to a selected images Additional Ideas Maybe this class should be abstract to allow the use of different rendering methods. For example a GPU based and a CPU based. Open Questions Model Class General Purpose This class should contain the model information. This is the skeleton containing of bones, the vertices for the mesh, the bone hierarchy and so forth. It should load the model and compute the transformation matrices for each bone. The transformation of the vertices will be performed on the GPU. Annotation Dependencies or Restrictions The class should be independent of OpenGL. Properties Property MyScene Data Type Assimp::aiScene ParameterAssignment std::map<int, std::pair<string,TrafoType>> BoneTransformations std::vec<Assimp:: aiMatrix4x4> BoneAssignment std::map<int,string> TrafoType Enumeration Description Contains all model information the bones, the hierarchy, the mesh… assigns the index in the parameter vector to a bone of the model and a transformation type transformation matrix for each bone assigns the index of each bone transformation to a bone name which type of transformation: rotation or translation and which axis RX,RY,RZ,TX,TY,TZ Methods Transform(…) This method should return the simulated images. It has to pass the parameters to the model for transformation. The transformation matrices are then passed to OpenGL to adjust the vertices. Afterwards the surface is rendered and copied to the CPU memory. It has to be able to simulated multiple different parameter sets simultaneously and also for each camera view. Parameter Parameters In/Out Data Type In std::vec<double,N> Description The parameters for transformation of the model N: number of different parameters (DOF of the model) GetTransformation(…) This method returns the reference to the transformation matrix of the selected bone. Parameter BoneIndex In/Out Data Type In int Transformation Return const aiMatrix4x4& Description The index of the considered bone in the BoneTransformation The transformation of the bone Additional Ideas The class could also be abstract to adapt for other models besides the Assimp models. But Assimp supports a large set of different file types, so it is assumed that it is sufficient. Open Questions It is not clear yet which other members are needed for the transformation and bone hierarchy. Maybe the Assimp objects are sufficient but maybe some more member variables are needed for easier data access. This could also be done by pointers to the Assimp objects. CostFunction Class (abstract) General Purpose This class should compute the costs which should be minimized by the optimization algorithm. The costs represent a measure for the similarity of the model based simulated images and the captured images. It will provide the interface to inherited classes. Annotation For the optimization the class needs to implement a method for computation of the gradient. Also methods for nonlinear constraints are needed as well as the bounds for the parameter space. The costs itself can be computed using the amount of overlapping pixels or the depth distance and so forth. Considering the nonlinear constraints (NLC): It might be requested to add multiple nonlinear constraints. The NLC is represented as a function f(x) which has to fulfill the equation f(x)>0. Thus a vector of functions is needed as members of the Class. Maybe an extra class NonLonearConstraints should be introduced. Dependencies or Restrictions The inherited CostFunction_Renderer class has a Renderer object assigned to obtain the simulated images. There should be one subclass which performs the computation of the costs (just the image similarity) on the GPU. But it is important, that the Renderer is still able to output the simulated images to the CPU memory. Properties Property UpperBounds LowerBounds NLConstraints ExternalPrior Data Type std::vector<double> Description Gives the upper bounds for each parameter std::vector<double> Gives the lower bounds for each parameter std::vector<NonLinearConstraints> All nl constraints to consider during optimization void* Function pointer to a function/method which computes an additional term for the cost function Definition: double UsedRenderer (only in inherited class) Renderer* Prior(std::vector<double> Parameters, std::vector<cv:Mat > Images) Pointer to a Renderer object to obtain the simulated images Methods Compute(…) - Computes the actual costs for multiple different parameter vectors Parameter Parameters In/Out Data Type In std::vector<std::vector<double>> Images In std::vector<cv::Mat> Data In void* Costs Out std::vector<double> Description Set of parameter vectors for which the costs should be computed Multiple different parameter vectors at once (used by PSO) The captured images of all cameras Additional data to pass Needs to be casted in the method to a known data type (e.g. a struct) Can be used to include previous parameters The costs for each parameter vector Gradient (…) - Computes the gradient of the cost function for multiple different parameter vectors Parameter Parameters Grad In/Out Data Type Description In std::vector<std::vector<double,N>,M> Set of parameter vectors for which the costs should be computed Multiple different parameter vectors at once M: number of different vectors N: number of parameters/values of the vector Out std::vector<std::vector<double,N>,M> The gradient for each parameter vector Additional Ideas Maybe a motion model is useful as additional term in the cost function. For example a constant acceleration model can be assumed and based on the last two poses/parameter vectors the new parameters can be estimated. (almost like a Kalman filter) Open Questions It is not known yet how to deal with the nonlinear constraints and whether it is meaningful to implement them as class or not. Optimizer Class (abstract) General Purpose This class should perform the optimization of the model parameter on each frame. Therefore it uses a CostFunction object. Currently two approaches are pursued: One inherited class should implement a particle swarm optimization and the other inherited class should use an optimization library called “NLOpt”. Annotation Dependencies or Restrictions Up to now two inherited classes will be implemented: Optimizer_PSO and Optimizer_NLOpt. Properties Property UsedCostFunction Data Type CostFunction* Description Pointer to a CostFunction object Methods Run (…) - Runs the optimization on one frame Parameter Images In/Out Data Type In std::vector<cv::Mat> Data In void* Parameters InOut std::vector<double>* Description The captured images of all cameras Additional data to pass Needs to be casted in the method to a known data type (e.g. a struct) Can be used to include previous parameters In: the initial parameters Out: the fitted parameters Additional Ideas Open Questions Class Diagram Work Package Estimation Package 1. Basic Framework 2. Camera Interface 3. NLOpt Optimizer 4. PSO Optimizer 5. Model Class 6. GPU Renderer 7. GPU CostFunc 8. MachineLearner Duration 6d*8h/d=48h 9d*8h/d=72h 2d*8h/d=16h 3d*8h/d=24h 9d*8h/d=72h 12d*8h/d=96h 7d*8h/d=56h 9d*8h/d=72h Description SUM 456*1.3 = 592 Basic Framework Tasks Invent basic structure Define requirements Code class bodies Duration 2d 2d 2d Description Duration 2d 2d Description Camera Interface Tasks Familiarization with existing FW Familiarization with OpenGL Camera Trafo Coding Testing 3d 2d NLOpt Optimizer Tasks Familiarization with NLOpt Coding (Gradient) Testing Duration 0.5 d 1d 0.5 d Description Duration 0.5 d 2d 0.5 d Description Duration 2d 2d 1d 3d 1d Description Duration 2d 2d 3d 3d 2d Description PSO Optimizer Tasks Familiarization with PSO Coding Testing Model Class Tasks Familiarization with Assimp Familarization with Q2 Model and Trafo Familarization with Trafo for OpenGL Coding Testing GPU Renderer Tasks Familiarization with OpenGL Familarization with CUDA Struggling with CUDA Coding Testing GPU CostFunction Tasks Struggling with CUDA Coding Adjustment of Cost Function Testing Duration 2d 1d 2d 2d Description Duration 2d 2d 2d Description MachineLearner Tasks Selection of an approach Familiarization with selected approach Specification of requirements for approach Coding Testing 2d 1d
© Copyright 2026 Paperzz