Finding Baseball/Softball Fields in Aerial Photos Jonathan Dautrich, [email protected] Data Mining, UC Riverside, Professor Eamonn Keogh November 29, 2009 1 Introduction & Strategy This work applies data mining concepts to searching for baseball and softball fields (referred to just as fields) inside aerial/satellite photographs. Images were captured from Google Earth at a resolution of 977×1280 from an altitude of 4000m. Ground truth was obtained by manually labeling fields on the image. Four images were obtained, two of which were used for training and tweaking of the classifier, and one of which was held for only final testing operations. (The remaining image was discarded as it had a very small number of fields.) Images contained fields ranging from clearly defined grass and dirt infields to simple all-dirt circular infields. Note that fields at the very edge of the image were not labeled, as the detector only considered fields with a sufficient pixel distance from the image edge. 2 Implementation Details Dirt and grass regions were manually sampled. For grass points, the Euclidean distance was computed between the RGB values of a pixel the closest grass sample point. If the distance was less than a threshold distance, it was labeled grass. This threshold was defined as the mean plus the standard deviation of the nearest-neighbor distance between grass sample point values. The same process was used for labeling dirt points, allowing some pixels to be classified as both dirt and grass. The labelings were binary, letting us create an image-sized boolean dirt map and grass map, labeling each point dirt/grass or not, respectively. The initial approach was to search for fixed size circles of dirt (e.g. R = 4 pixel radius) for infields, then to consider a 90 degree wedge of a concentric circle with radius 4R (figure ??). The area within this wedge, but outside the circle would be considered outfield and expected to be grass. Eight different wedge rotations would be considered for each point. For a point to be labeled a field, some fixed fraction of its infield points had to be dirt, and some fixed fraction of its outfield points had to be grass, for at least one of its wedges. This approach had several problems. First, it gave many false positives in regions labeled both grass and dirt. In an attempt to correct for this, a new fraction was introduced limiting the number of infield pixels that could be labeled grass as well as dirt. Having three fixed parameters of this nature made the algorithm too complicated to tweak, its second shortcoming, leaving poor 1 results. Third, the algorithm failed to identify fields with grass infields, expecting only full dirt circles. Fourth, its rigid restriction on infield size and its not-quite-natural specification of the outfield wedge failed to make use of the full amount of information normally used to identify a field. Results for this first approach were quite poor. 2R 4R R (2/3)R R=4 First Method Second Method Figure 1: The two methods (two layouts checked for) used for finding fields in images. Green is grass, tan is dirt. In both methods, the point to be labeled was at the center of the concentric circles defining the wedges. The second approach involved several major adjustments. Fields were identified by 90 degree wedges with various radii and the same origin. Three radii were specified, one for the grass portion of the infield, another for the dirt portion of the infield, and a third for the grass portion of the outfield. A range of pixel radii were considered for each point, consistent with measurements of fields from this altitude. It was found by quick manual measurement of various fields that, in general, the dirt infield radius R would range from 6 to 12 pixels, the grass infield radius would be either zero or approximately (2/3)R, and the outfield radius would be roughly 2R (figure ??). Wedges were again considered rotated to each of the eight different positions (0, 45, 90, ... degrees). The relevant fraction was computed for each wedge segment, for example, fraction of pixels between infield grass and infield dirt radius which are dirt. These three fractions were then combined by multiplying them together, allowing us to specify only one threshold for the field (averaging and specifying a threshold less than 2/3 would have allowed all-grass regions to be detected as fields). Another key addition was to specify that the largest of the eight wedge fractions, which was over the threshold, must also be greater than the mean plus standard deviation of the fractions over all eight wedges. This reduced the likelihood of labeling a large dirt and grass region as a field. This approach gave much better results, allowing both types of fields (grass and dirt-only infields) to be detected much more consistently. 2 3 Performance 1 1 0.9 0.9 0.8 0.8 0.7 0.7 True Positive Rate True Positive Rate Measurement of true and false positive rates was done by identifying blocks of adjacent pixels labeled as fields, and considering their center as a field location. A true positive occurred if an actual field center had a detected center within 10 pixels of it. A false positive occurred for every detected center that was more than 10 pixels away from any actual center. This could cause abnormalities when large regions were detected as fields, as it would be considered as only one field, or one false positive. The alternative approach used instead specified that any actual field center with an identified field point found within 10 pixels would be considered a true positive, and any identified field point found more than 10 pixels away from every field center would be considered a false positive. This approach still does not seem entirely fair either, but it seemed more reasonable and to give results consistent with the expected true/false positive rate paris of (0, 0) and (1.0, 1.0). The results shown here (figure ??) are results of finding fields in image 4 based on dirt/grass labelings obtained from images 1 and 2, using the alternative accuracy approach. Image 4 was not used for training or parameter tweaking. Detection was run after parameters had been fixed for methods one, with a couple additional tests using different parameters when the results were seen to be so poor. The second method was not run on image 4 until the run to collect the results shown here. 0.6 0.5 0.4 0.6 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 0 1 0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 1 Figure 2: ROC curve for tests on image 4 using dirt/grass from images 1 and 2 and first (left) and second (right) accuracy measures. The results of running detection on image 2 using dirt/grass labelings from images 1 and 2 are also shown (figure ??). It is likely that the variability in dirt/grass color between aerial photos taken of different areas and with different atmospheric conditions makes it difficult to use dirt/grass samples from one area to closely identify dirt/grass patches in another. Finally, we show the results of running detection on image 4 using dirt/grass labelings taken from image 4 itself (figure ??). The results are much better and more consistent, indicating that the major deficiency here is in the quality of the dirt/grass samples. Overall, results were rather poor, largely because of inconsistent proper detection of dirt and grass regions. Finding baseball fields in this setting seems a difficult problem when starting by identifying dirt and grass using RGB color values. A better approach to consider may involve 3 1 0.9 0.9 0.8 0.8 0.7 0.7 True Positive Rate True Positive Rate 1 0.6 0.5 0.4 0.6 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 0 1 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 1 1 1 0.9 0.9 0.8 0.8 0.7 0.7 True Positive Rate True Positive Rate Figure 3: ROC curve for tests on image 4 using dirt/grass from images 1 and 2 and first (left) and second (right) accuracy measures. 0.6 0.5 0.4 0.6 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 0 1 0 0.1 0.2 0.3 0.4 0.5 0.6 False Positive Rate 0.7 0.8 0.9 1 Figure 4: ROC curve for tests on image 4 using dirt/grass from images 1 and 2 and first (left) and second (right) accuracy measures. including some contrast considerations, trying to do edge detection along infield/outfield borders. Assigning pixels some probability indicating likelihood for being grass or dirt, instead of a discrete yes or no, should also be a feasible improvement. 4 Matlab Notes Matlab code was vectorized where possible, but could doubtless still use several improvmenets in brevity and speed. In particular, the FieldFinder.m file contains a few dozen relevant lines of code, many of which are used in simple loops or selection structures. Detection via GatherResults on image 4 using grass/dirt points from images 1 and 2 using the 21 thresholds [0:0.05:1] took approximately 16 hours. Below are complete labeling and detection execution instructions, labeling field points on images 1, 2 and 3, labeling grass and dirt points on images 1 and 2, and running detection for a single 4 threshold (0.4) on image 3 (use GatherResults to cover multiple thresholds). See the files themselves for explanations and sample executions of the remaining files/functions, which are utilized by the ones below: >> >> >> >> >> >> >> >> >> >> >> fieldThreshold = 0.4; LabelImage(’Fields-01’); LabelImage(’Fields-02’); LabelImage(’Fields-03’); LabelPixels(’Fields-01’,’Dirt’); LabelPixels(’Fields-01’,’Grass’); LabelPixels(’Fields-02’,’Dirt’); LabelPixels(’Fields-02’,’Grass’); [dirtThreshold grassThreshold] = PrepareTrainingData({’Fields-01’,’Fields-02’},1,1); [dirtMap, grassMap] = ObtainMaps(’Fields-02’, dirtThreshold, grassThreshold); [fieldMap, truePositive, falsePositive] = RunDetection(’Fields-02’,dirtMap,grassMap,fieldThreshold); >> DisplayMaps(fieldMap, dirtMap, grassMap); This should display the image with grass regions in yellow, dirt regions in blue, dirt and grass regions in red, and field points in white, where later colors in the list take precedence. 5 Figure 5: Image 4 with detected field locations colored yellow and surrounded by blue circles, after detection using dirt/grass data also from image 4, with threshold 0.25. 6
© Copyright 2026 Paperzz