Active underwater object recognition from multibeam sonar

Active underwater object recognition
from multibeam sonar imagery
Ivor Rendulić
Laboratory for Underwater Systems and Technologies
Faculty of Electrical Engineering and Computing, University of Zagreb
Email: [email protected]
Abstract—Automatic object recognition is very hard to achieve
underwater. Water turbidity and low lighting cause optical
cameras to often have very limited range and result in poor
quality images. Multibeam sonars, sometimes referred to as
”acoustic cameras”, are not influenced by the optical visibility
problems. However, the image they produce can be quite noisy
and lack in detail.
In order to cope with the low detail images of objects, multiple
views from different perspectives can be very helpful and provide
additional information needed to successfully recognize an object.
The area of active object recognition deals in how to manipulate
the sensor and from which perspective to approach the object
in order to reduce the uncertainty of the recognition estimate.
Having the sonar mounted on an Autonomous Underwater
Vehicle (AUV) makes this scenario a perfect candidate for active
object recognition task.
Another issue with building a recognition system for a multibeam
sonar is unavailability of the training data. Using synthetic 3D
models and a sonar simulator to create images from different
views will be considered as alternative to recording large amounts
of real data.
Index Terms—sonar, multibeam, active object recognition, underwater
I. I NTRODUCTION
With the advancements in algorithms and processing power
object recognition from images or video has become very
accurate. Old benchmarks (such as MNIST [1], a database
for handwritten digit recognition) have become too easy as
even fairly simple models today can result in almost perfect
precision [2]. The focus has shifted on a far more challenging
general object recognition with a large number of object
classes. The most popular modern benchmark is performance
on ImageNet test set [3], which has 1000 different classes.
These types of recognition tasks rely on having a large amount
of training data available from different viewpoints of the
object, and have to classify an unknown object solely based
on a single image of it.
The notion of active object recognition [4], which will be
explained in more detail in later sections, implies that the
recognition system also has some control over the sensor input.
In the observed case, an underwater vehicle with a multibeam
sonar can re-position itself to allow a better angle at the target
and improve the probability of correct classification.
The rest of the paper is organized as follows. In Section II
general overview of related work dealing with next-best view
and active object recognition will be made. Next-best view
planning is one of the key steps in active object recognition,
where the best possible move for the system is being calculated.
An overview of the envisioned scenario and motivation for
the use of an active object recognition system will be made in
Section III. After that, the system will be broken down into
main components in Section IV.
In Section V object representation and recognition from sonar
imagery will be discussed, as well as the need for defining
similarity between images. Using synthetic data for training
the classifier will be presented in Section VI.
Finally, in Section VII algorithms for matching the observed
images with known training data and for path planning with
the goal of maximizing object recognition rate will be explored. Next-best view approach will be used to lead to a
path that is the most discriminating among different candidates
for classification. Hidden Markov Models and Conditional
Random Fields, along with the well-known algorithms for their
optimal solution, will be presented for the matching problem.
Finally, in section VIII a conclusion on all the presented
methods will be given. Plans for future work and initial
prototype of the system will be made.
II. G ENERAL OVERVIEW OF RELATED WORK
In this chapter a general overview of published papers
related to parts of the subject of interest will be made. Later
in the paper some of them will be referenced again in more
detail, in specific sections related to that topic.
A. Active 3D model acquisition
The area of active planning with visual sensors is often used
in two major categories. First one is for 3D object synthesis,
where a virtual 3D model of an object is built based on the
scans of that object. The sensor (e.g. a camera) is usually
mounted on a robot arm or a similar device which allows
accurate positioning to the desired position. This problem is
often related with the for-mentioned note of next-best view
planning as it is desired that the whole process is performed
as quickly as possible. Calculation of best scanning positions
that will capture the whole object is then required.
Examples of the next-best view planning for object scanning
can be be found in [5] where the author develops a method that
uses a range camera for object scanning to build CAD models.
The work is further improved in [6]. The algorithm is based
on splitting the viewing volume into seen and unseen regions,
and a novel representation for it called positional space is
introduced. The effectiveness of compared to other algorithms
at the time showed very good performance.
A more recent work in next-best view for building 3D models
can be found in [7], where the use of stereo camera images and
videos is used for improving 3D models. The authors focus
on selecting viewpoints that minimize the uncertainty of the
incrementally developed model.
B. Active object recognition with synthetic model training data
The other category of active sensor positioning problems
with visual sensors is the one of interest for the problem
described in this paper - active 3D object recognition - and
the research performed in that area will be presented in a bit
more detail. The difference compared to the active 3D model
building is in the criteria governing the process - instead of
looking for quick way of mapping the entire object, the goal
is to recognize the object as quickly as possible.
In the following subsections a quick overview of the relevant
literature will be given.
1) Representation: Scale-invariant Feature Transform
(SIFT) [8] and similar related features have been extremely
popular for representation of objects in images.
In the recent years, with advancements in deep artificial
neural networks, automatically learned representations of
data [9] have shown much better results in object recognition
compared to traditional, hand-crafted features such as SIFT.
Automatically learned features will also be considered for
representation of sonar data.
Another set of features, which are specifically used to
describe shapes, are shape contexts [10]. Their use in [11]
on silhouettes, which are similar looking to sonar images,
showed good potential for the task.
2) Recognition: As for the image recognition part, for years
the dominant algorithms were Support Vector Machines [12]
and boosted Haar cascades [13]. They too were replaced in
the recent years by various deep neural network architectures,
which have managed to achieve much lower error rates.
3) Using synthetic model data: In 3D object recognition,
and especially cases where it is also important to detect
relative orientation of the object, using artificial data can be
very helpful. Although such data will never be completely
true to the real one, it can be collected quickly and in large
quantities using 3D models of objects. Additional advantage
is in perfectly accurate info on orientation and distance from
the camera, which can also be obtained in simulators or 3D
modeling tools. There are many papers dealing with the use
of artificial 3D model data to train object recognition systems.
While both [14] and [15] provide quite old and outdated
overview of the field, some of the concepts described are still
valid.
In [16] the authors have tested the use of synthetic data
obtained from 3D models, and have concluded that much more
training data is needed compared to the use of real images. A
more complex system is built in [17], where 3D models are
used to train a system capable of detection, recognition and
pose estimation of objects in cluttered scenes. Using Kinect
camera their system works in real time.
In [18] the authors focus on pose estimation and attempt to
get a precise estimate of IKEA furniture pose based on 3D
models.
Finally, a complete active object recognition system with
synthetic 3D object models was developed in [11]. This paper
will be referenced in many subsequent sections as it covers a
great deal.
All these model-based approaches share a view clustering step.
Since it is possible to sample the 3D model from any desired
view, it is important to limit the number of views. This is
usually done by clustering visually similar views.
4) Planning next step for active object recognition: In [19]
the authors are dealing with a closely related problem of
improving object recognition when multiple model views are
available. Although this technically does not fall under active
object recognition, as it focuses solely on combining multiple
views and does not influence their acquisition, it provides an
alternative perspective and ideas that can be used in an active
object recognition system.
Similarly, but with focus on active object recognition, in [20]
authors discuss fusion of multiple views specifically for active
object recognition.
Some of the ground-breaking research in planning for active
object recognition was done in [21], where the Bayesian
approach to the problem was introduced with a goal of
minimizing uncertainty in every step of the process. Similar
approaches were used in [22].
III. E NVISIONED SCENARIO
In this section the scenario for proposed active recognition
system, sketched in figure 1, will be presented.
A multibeam sonar is mounted on an autonomous underwater
vehicle (AUV). The vehicle is BUDDY, an AUV developed
at the Laboratory for Underwater Systems and Technologies
(LABUST) for the FP7 project ”CADDY - Cognitive Autonomous Diving Buddy” [23]. The sonar used is a Soundmetrics
ARIS 3000.
While scanning the seabed, if the AUV encounters a possible
object of interest (e.g. a sea mine), it will take additional looks
at it to give a better estimate whether that indeed is a mine
or not. At first, the operator will manually mark the object of
interest while the AUV is scanning, but later this should also
be automatized.
IV. OVERVIEW OF THE PROPOSED SYSTEM
In this section an overview of both the training system and
the real-time active recognition system will be given.
The training phase, shown in Figure 2, consists of several stages. First, from different views of 3D models synthetic images
are generated with a sonar simulator. Then, a transformation of
the image is performed to obtain chosen object representation.
Fig. 1. Visualization of the envisioned scenario
Views are then clustered based on similarity in order to reduce
the dimensionality of the view space and make the recognition
stage easier. Finally, a classifier is trained to enable recognition
of objects and views of the object.
Fig. 3. Block diagram of the active recognition process
V. S ONAR IMAGE
Fig. 2. Block diagram of the training phase with 3D models
In the active recognition phase, shown in Figure 3, the
stages are the following. First, sonar image is transformed into
chosen representation, same as in the training phase. Then, the
sequence of recorded sonar images are matched to training
data from 3D models, simultaneously improving position and
classification estimates with every new image. Finally, based
on calculated estimates, the next action is planned. This can
be next view to which the system should position itself, or
outputting the recognized object if the confidence is high and
enough data was already collected.
Image can be obtained from the sonar in two different
modes. First one is in sonar geometry, where each vertical line
matches one of the beams. This type of image has rectangular
shape, but the objects appear distorted in it as the beams are
not actually parallel.
A natural image is obtained by mapping to Cartesian space
and has a shape of a circular sector. This produces some issues
during processing at the edges, as all the algorithms work on
rectangular images, but offers natural look at objects.
Images of a straight wall can be seen in Figure 4 (native sonar
polar geometry) and 5 (mapped to Cartesian geometry). The
straight line appears distorted in the polar geometry.
A. Feature representation
Due to the appearance and quality of sonar images, some
standard feature representations normally used on camera
images might not be as appropriate. The sonar image is gray
scale, and is extremely noisy. This is partly due to processing
inaccuracies and partly to actual small particles in the water
which can be highly visible on the sonar image, but would
appear barely noticeable in an optical camera image.
1) Classical image features: First set of features that were
tested were the ones described in [24]. The features were tested
Figure 6 displays the original sonar image (left), image after
applying Gaussian blur to filter out the noise (middle) and
image after an adaptive threshold algorithm as a contour
detection step (right).
Fig. 6. Contour detection in sonar images
Fig. 4.
image
Native sonar
Fig. 5. Cartesian sonar image
in terms of optical flow calculation, to see how consistently
they work on two sonar images taken one right after another,
with very little movement in between. On low-frequency sonar
mode, with objects at distance of around 5 meters, it worked
poorly, as the noise in the sonar image influenced features a
lot.
SIFT features [8] have also been tested, producing similar
results and not being consistent. They were also tested on
images of human hand recorded at 1 meter distance with sonar
operating in high-frequency mode. This made an improvement,
as the image becomes much better. However, it was still far
off the results obtainable with normal video cameras.
Plan for the next step is to test previously mentioned shape
descriptors [10]. Based on the idea behind it, it could work
well with smoothed sonar image, after calculating the contour
of the object of interest.
2) Contours: Contours are often used with multibeam sonar
images. In particular, if there are not many objects surrounding the object of interest, it can seem highly prominent in
the image. This approach was tested in LABUST in object
tracking algorithms, where the sonar image was fused in an
Extended Kalman Filter with Ultra-Short Baseline (USBL)
acoustic tracking measurements [25].
In [26] the authors use contour-based object detection, combined with background removal, to perform object detection
with a multibeam sonar.
Contour-based algorithms can also be used to build masks for
approximate isolation of the object of interest.
B. Recognition algorithms
The recognition or classification step can work either directly on the raw image or part of the image, or on computed
features as mentioned in previous subsection.
Some of the popular classification algorithms include Support
Vector Machines (SVM) [12], boosted Haar cascades [13],
Logistic Regression, Random Forest and Artificial Neural
Networks.
Support Vector Machines, in their simplest version, calculate
an optimal separating hyperplane for linearly separable data.
The hyperplane is optimal in the sense of maximizing the
margin between the two classes, and achieves that by being
right in the middle between the two classes. This is the
simplest case of SVM, which has also been extended to nonlinear cases (by using non-linear kernels instead of dot product.
The boosted Haar cascades algorithm was probably the most
widely used algorithm in object detection and recognition in
images and video during the last decade. It was based on two
important concepts - integral image representation, which can
be calculated very efficiently, and on the boosting algorithm
of many simpler classifiers that work on subsets of features.
Both the SVM (with a set of features calculated from the
contours) and the boosted Haar cascades algorithm have been
used in [27] for detection of human hand and recognizing the
gesture. They performed well, but on short distances and in
high-frequency mode for the sonar.
Artificial neural networks with many hidden layers and different structures (the so-called ”Deep Learning”) have been the
most successful and widely used approach not only in object
recognition in images, but in all kinds of machine learning
problems in the last several years.
A type of network that could be very appropriate for the
task is described in [28]. In the paper the authors present
a way of efficiently using the convolutional neural network
to not only recognize object, but also perform the detection
step by learning to predict the object boundaries. That way
the detection could also be automatized in the same neural
network that performs the recognition.
Neural networks are used with sonar data in [29]. However,
the authors use simple feed-forward network as a classifier,
and feed it with features obtained with image processing
techniques. To the best of author’s knowledge, testing deep learning techniques such as deep convolutional neural networks
on sonar data has not yet been reported in literature.
C. Similarity between images
A useful concept often used in active image recognition is to
have the measure of similarity between two images. With this
information the system can plan for the next view to be at a
location that discriminates best among the current candidates.
An example for why this is important can be seen on objects
displayed in Figure 7. If the vehicle approaches the object
from the left, it is impossible to distinguish the two. The best
way for the active object recognition system to act is to go to
the side where the two objects differ the most.
D. Clustering views
Question arises on how densely to sample the 3D view
space. For now a constant radius sphere will be considered
for sampling points.
Some objects have finer detail than others, and do not require
as fine sampling of the view space. For example, a ball appears
the same from any view the sonar might have at it, but an
underwater ship wreckage will greatly differ. Because of that,
and by using the similarity of resulting images of objects,
multiple views can be iteratively clustered until a desired
difference between clusters is achieved. This should, ideally,
lead to a ball having only a single cluster representation, while
a more complex object can be represented as accurately as
desired.
Unfortunately, in the case of multibeam sonar imagery, the
situation is far worse. Sonars are expensive, both for purchase
and to use, and it is almost impossible to find images useful
for training a desired classifier.
Because of the challenges in acquiring real data, and the
relative simplicity of the sonar images, an alternative way
might be to use 3D models of objects and a sonar simulator
to scan them.
Tool that will be used to build a sonar simulator is UWSim
[30], a popular simulator for underwater applications. It offers
the virtual range sensor primitive, which can be used to build
an array that resembles a multibeam sonar.
Alternative (or complementary) way might be to use real
images for generation of a much bigger set of synthetic images
for training. This can be done in a simple way, by introducing
noise or transformations to an image. Another approach is
developed in [31], where the authors use real image to find
the rendering parameters for creating synthetic images.
VII. PATH PLANNING FOR ACTIVE RECOGNITION
A. Matching images to objects and views
Given an image obtained from the sonar (or a sequence of
images) and the information of the vehicle’s path, the system
needs to find the likely candidates and match the sonar and
positioning data with object models. This is the crucial step
in the system.
In [11] the authors use Conditional Random Fields (CRF) to
perform the matching. More specifically, given a sequence of
images F = {f1 , ..., fT } they seek the matching sequence
of object views from training data V = {v1 , ..., vT } (which
is primarily parametrized by the object itself and the view
point in the view space, along with a few parameters they
define). They introduce the CRF to calculate the conditional
probability of V given F :
T
P (V |F ) =
Fig. 7. Two objects appearing the same if approached from the left side
VI. U SING 3D MODELS AS TRAINING DATA
In order to achieve high precision in classification a large
amount of training data is necessary, and this need grows
rapidly with increase in number of classes. With massive
amounts of labeled image data becoming available, object
recognition from camera images can be done with virtually
as much data as the training system can handle.
1 Y
P (vi |F )P (vi , vi−1 |F ).
Z(f ) i
(1)
In the equation above, Z(f ) is the partition function and T is
the number of images in the sequence.
That approach seems very intuitive for the problem, and
leads to an efficient CRF solution with a forward-backward
algorithm. The next step is to extend it with the position
probability P (S), S = {s1 , ..., sT }, which can supposedly
be obtained from the navigational filter of the vehicle. The
probability distribution over model views is now dependent
on both the recorded sonar images F and the position of the
vehicle, so the new target probability is P (V |F, S).
An alternative way in modeling the problem can be by
using the Hidden Markov Model (HMM), which can also
quite intuitively explain the process. The observations are the
obtained sonar images, while the underlying process has object
views as states. State transition probabilities are influenced by
position estimates.
B. Planning the next action
The final step in the system is deciding on which action to
take next. In the trivial case, when the certainty in some object
is high enough, no further views at the object is needed.
In the non-trivial case, there is a subset of object candidates
M = {m1 , ..., mC } which are all still above some likelihood
threshold and cannot be discarded. The goal is then to move
in the direction where the candidates are the least similar. In a
simple case, there are two candidate objects m1 and m2 , and a
similarity map between each pairs of views is available. Given
the current estimates of object orientations, there is a view vi
which will give the smallest similarity measure D(m1 , m2 |vi ).
Any number of criteria could be made to calculate the next
step, and in the simplest one the goal is to go in a shortest
path towards vi .
VIII. C ONCLUSION
In this paper an overview of the literature about active
object recognition and using synthetic 3D models to train
recognition systems was given. The use case and motivation
for such a system, with an Autonomous Underwater Vehicle
and a multibeam sonar, was presented. With many specifics
of sonar imagery, analysis of representation and recognition
algorithms was given. Testing and analysis of the popular
approach often used with optical camera images showed that
some of them do not work as well on sonar images, so
alternative approaches have to be taken. Novel approaches,
such as deep convolutional neural network and similar deep
learning structures, have yet to be tested on sonar imagery and
are not yet reported in literature.
Active object recognition with synthetic 3D model training
data has been researched and provides a good base for the
envisioned system. Using an AUV will give additional positional uncertainty which will have to be included in the model,
in the critical step where the sensor data is matched with
known training data. The use of probabilistic approaches, such
as Conditional Random Fields or Hidden Markov Models, was
considered and is promising for the task, giving reasonable
description of the model and efficient solutions for optimal
solving.
In the future work, a sonar simulator has to be built to create
a database of synthetic training images from 3D models.
More recognition algorithms will be tested, together with
the clustering step, to see how well the recognition process
scales with increasing number of objects and views. Those
two steps will form a foundation for building the entire active
recognition system.
R EFERENCES
[1] Y. LeCun, C. Cortes, and C. J. Burges, “The mnist database of
handwritten digits,” 1998.
[2] L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel,
Y. LeCun, U. A. Muller, E. Sackinger, P. Simard et al., “Comparison
of classifier methods: a case study in handwritten digit recognition,”
in International conference on pattern recognition. IEEE Computer
Society Press, 1994, pp. 77–77.
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
A large-scale hierarchical image database,” in Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE,
2009, pp. 248–255.
[4] D. Wilkes and J. K. Tsotsos, “Active object recognition,” in Computer
Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992
IEEE Computer Society Conference on. IEEE, 1992, pp. 136–141.
[5] R. Pito and R. K. Bajcsy, “Solution to the next best view problem
for automated cad model acquisiton of free-form objects using range
cameras,” in Photonics East’95. International Society for Optics and
Photonics, 1995, pp. 78–89.
[6] R. Pito, “A solution to the next best view problem for automated surface acquisition,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, no. 10, pp. 1016–1030, 1999.
[7] E. Dunn and J.-M. Frahm, “Next best view planning for active model
improvement.” in BMVC, 2009, pp. 1–11.
[8] D. G. Lowe, “Object recognition from local scale-invariant features,” in
Computer vision, 1999. The proceedings of the seventh IEEE international conference on, vol. 2. Ieee, 1999, pp. 1150–1157.
[9] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
review and new perspectives,” IEEE transactions on pattern analysis
and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[10] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object
recognition using shape contexts,” IEEE transactions on pattern analysis
and machine intelligence, vol. 24, no. 4, pp. 509–522, 2002.
[11] A. Toshev, A. Makadia, and K. Daniilidis, “Shape-based object recognition in videos using 3d synthetic object models,” in Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE,
2009, pp. 288–295.
[12] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,
vol. 20, no. 3, pp. 273–297, 1995.
[13] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition,
2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society
Conference on, vol. 1. IEEE, 2001, pp. I–511.
[14] A. R. Pope, “Model-based object recognition,” A Survey of Recent
Tecniques, Technical Report, 1994.
[15] V. Blanz, B. Schölkopf, H. Bülthoff, C. Burges, V. Vapnik, and T. Vetter,
“Comparison of view-based object recognition algorithms using realistic
3d models,” in International Conference on Artificial Neural Networks.
Springer, 1996, pp. 251–256.
[16] B. Heisele, G. Kim, and A. Meyer, “Object recognition with 3d models.”
in BMVC. Citeseer, 2009, pp. 1–11.
[17] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige,
and N. Navab, “Model based training, detection and pose estimation of
texture-less 3d objects in heavily cluttered scenes,” in Asian conference
on computer vision. Springer, 2012, pp. 548–562.
[18] J. J. Lim, H. Pirsiavash, and A. Torralba, “Parsing ikea objects: Fine
pose estimation,” in 2013 IEEE International Conference on Computer
Vision. IEEE, 2013, pp. 2992–2999.
[19] V. Ferrari, T. Tuytelaars, and L. Van Gool, “Integrating multiple model
views for object recognition,” in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society
Conference on, vol. 2. IEEE, 2004, pp. II–105.
[20] F. Deinzer, J. Denzler, and H. Niemann, “On fusion of multiple views
for active object recognition,” in Joint Pattern Recognition Symposium.
Springer, 2001, pp. 239–245.
[21] J. Denzler and C. M. Brown, “Information theoretic sensor data selection
for active object recognition and state estimation,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 145–157,
2002.
[22] N. Govender, J. Warrell, P. Torr, and F. Nicolls, “Probabilistic object and
viewpoint models for active object recognition,” in AFRICON, 2013.
IEEE, 2013, pp. 1–7.
[23] CADDY FP7 project. [Online]. Available: http://caddy-fp7.eu/
[24] J. Shi and C. Tomasi, “Good features to track,” in Computer Vision
and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE
Computer Society Conference on. IEEE, 1994, pp. 593–600.
[25] F. Mandić, I. Rendulić, N. Mišković, and D. Nad, “Underwater object
tracking using sonar and usbl measurements,” Journal of Sensors, vol.
2016, 2016.
[26] E. Galceran, V. Djapic, M. Carreras, and D. P. Williams, “A real-time
underwater object detection algorithm for multi-beam forward looking
sonar,” IFAC Proceedings Volumes, vol. 45, no. 5, pp. 306–311, 2012.
[27] F. Gustin, I. Rendulic, and N. Miskovic, “Hand gesture recognition from
multibeam sonar imagery,” in Conference on Control Applications in
Marine Systems. IFAC, 2016, in press.
[28] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using
convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
[29] J. Han, P. Yang, and L. Zhang, “Object recognition system of sonar
image based on multiple invariant moments and bp neural network,” International Journal of Signal Processing, Image Processing and Pattern
Recognition, vol. 7, no. 5, pp. 287–298, 2014.
[30] M. Prats, J. Pérez, J. J. Fernández, and P. J. Sanz, “An open source
tool for simulation and supervision of underwater intervention missions,”
in 2012 IEEE/RSJ International Conference on Intelligent Robots and
Systems. IEEE, 2012, pp. 2577–2582.
[31] A. Rozantsev, V. Lepetit, and P. Fua, “On rendering synthetic images for
training an object detector,” Computer Vision and Image Understanding,
vol. 137, pp. 24–37, 2015.