An example of addressing sampling bias in geosciences – orientation data. Harmon Maher, Dept. of Geography and Geology, University of Nebraska at Omaha. 2013 What is bias? Very typically data that is collected, the sample population, is hoped to be representative of a larger population (sometimes called the universal population). In this way patterns/behavior seen in the sample population can be extrapolated to a larger entity. This is why careful thought must be given right from the start of a project to the sampling plan. Sampling bias exists when the sample population is not representative of the larger universal population in some consistent manner. Here we will be focusing on orientation data as a way to explore sampling bias. For a given region and rock body it is typical to collect a number of orientation readings on a structure of interest (such as joints), plot them up and find an average value (measure of central tendency) and then use that sample average to characterize the orientation of that structural feature. We will see below that there is significant potential for sampling bias when working with orientation data. Such sampling bias is naturally different than a personal bias for a certain hypothesis, which can also influence data collection on a conscious or subconscious level. The best antidote for such personal bias is the knowledge that other scientists will be eager to point it out to your detriment, and so it is worthwhile to avoid such personal bias. As is often occurs in science, the word bias as used here in association with sampling is a case where the scientific definition is different than the common one. Why is it important? Bias is to be avoided for the obvious reason that it leads to incorrect answers or models. For example, orientation data can be a primary input into groundwater flow models where fracture permeability is important. If your sample average orientation is not representative of the rock body your groundwater flow model will give incorrect results, and a contaminant plume could have a much different footprint than predicted. In other words, bias influences accuracy. Another reason it is important is that, if recognized, bias can be avoided. Ideally this would involve changing the sampling plan in order to eliminate the bias. However, in geology where access to the three dimensional rock body of interest is often at two dimensional outcrop surfaces, bias may be unavoidable. In these cases, there may also be a data weighting that can correct for the bias. This is discussed below. One reason sampling bias can be quite tricky is because there typically is what may be considered a natural bias in what types of rocks get exposed. For example, more fractured rock typically weathers more and therefore outcrops less than less fractured rock. Therefore in general, studies of outcrop fracture density in general will underestimate the density of fractures (unless it is an artificial construction related exposure surface). It is also not unreasonable to believe that the pattern of orientations in more fractured rock are more complex than those in nearby less fracture rock. Another words the fracture pattern and orientation pattern can covary spatially in a study area of interest, and in that area there is a natural bias where the outcrops are biased towards what is going on in the less fracture rock. One response to recognizing such bias is to build it into and limit appropriately the generalizations made from the data. In this particular case, for example, the conclusions may state that resultant groundwater flow model may not capture zones of higher fracture density that may exist. Of course such limitations make the conclusions less useful. Another approach could be to analyze for a relationship between fracture density, fracture orientation pattern, and outcrop pattern in the exposed portions, and use this as a basis to extrapolate to the unexposed areas. This would be an interesting, albeit challenging endeavor. There is no general formula for dealing with biases, and it the response is on a case by case basis. They key thing is to give very careful thought to potential outcrop biases and related sampling biases. Exposure bias when collecting orientation data? When doing structural analysis, fairly good and large outcrops are required, and it is not uncommon that they come in two general types – cliff outcrops and pavement outcrops. Each type has its own potential bias simply because of its orientation. We will start with considering cliff outcrops, and consider the simpler case where the fractures of interest are subvertical. This simplifies the geometry to be considered and also is a fairly common situation geologically. Usually, the cliff outcrops are fairly straight – they have a preferred orientation of their own. That preferred orientation is very typically not independent of the fracture orientation, which controls mass wasting processes, and instead there is typically one fracture direction close to parallel to the cliff face, and another highly oblique to the cliff face. Consider that as you travel along the cliff you will see the fracture direction that is highly oblique to the cliff face much more frequently than you see the fracture direction parallel to the cliff face, and thus if the sample plan is one of measuring available fracture orientations, the oblique direction gets sampled and characterized much more thoroughly than the other. Looking at the orientation plots without considering the sampling bias easily leads to misinterpreting the oblique fracture may as the dominant fracture set, when that may not be the case. Figure 1: The red represents an underlying vertical fracture set, and the blue traverses of equal line length. Figure 1 shows the geometric nature of this bias, which also provides a basis for correcting for it. The red lines represent an underlying subvertical fracture map pattern (view looking down from above) that is being sampled. Imagine that the various blue lines in the circle represent different cliff directions of constant length that you are sampling along. Count how many times you would encounter and measure a fracture for each different traverse? The closer the cliff traverse is to parallel to the fracture direction the fewer times you encounter a member of the fracture set. If you consider the average spacing (measured perpendicular to the fracture direction) between fractures, and the acute angle between the prevailing fracture direction and the traverse, you can write a simple trig function that predicts the relationship between traverse direction and fracture encounter frequency. This in turn can be used a weighting function. The geometry and function is displayed in the figure below. The below graph shows the results. Note that the weighting factor is non‐linear, and that the when the angle is less than 30 degrees or so begins to become quite significant. However, this is not an uncommon situation. weighting factor 25 20 15 10 5 0 0 50 100 angle between strike and traverse Figure 2 showing the geometry and a graph of the weighting factor to remove the bias due to traverse direction. The basic idea behind weighting is that not each reading is counted the same. Typically, when plotting and analyzing orientation data, each reading is treated the same. In a circular histogram (such as a rose diagram), each reading is given a unit vector value. However, in this case if you have a fracture reading that is at 30 degrees to the cliff face then it would have a weighting factor of 2, and it would count twice as much as one that is at 90 degrees to the cliff face. In reality, the cliff face won’t be perfectly straight. In this case a more nuanced approach taking into account the variability of can be used. In general this situation will lead to smaller weighting factors, as a planar cliff face would provide the strongest bias effect. Consider the end member case of two cliff faces at right angles to each other and of equal length. In this case, for the special case considered here of vertical fractures, there would be no bias. If the outcrop permits, then a sampling plan of two traverses at right angles to each other can eliminate this bias. For cases where this is not possible, one approach would be break up the traverse into different approximately planar portions of the cliff surface, and connect the fracture strike reading to the cliff face orientation, and do corresponding corrections. This is fairly easily done in Excel. An important by‐product of the directional bias a linear traverse (e.g. along a cliff face) produces is in the determination of the orientation distribution (normal distribution, other), and specifically in measures of dispersion around a central tendency. Consider the case where a traverse is closer to parallel to a fracture direction and therefore will have fewer readings, coupled with the fact that the greater the number of readings the better a distribution shape and measures of dispersion are defined. The result is that in this case the distribution and measures of dispersion are less well defined than for a traverse that is closer to perpendicular to a fracture direction. Measures of fracture population dispersion are often not reported in the geologic literature, but as fracture analysis becomes more sophisticated they will likely become more important. For example, you can image two cases where the mean orientation is the same, but the standard deviation is quite different. The degree of anisotropy is much better developed for the case where the standard deviation is much smaller. In turn the degree of anisotropy will strongly influence properties such as seismic anisotropy, strength anisotropy, and directional fluid flow. Interestingly the influence may be more nuanced than one might think. For example, very planar fractures do not intersect with each other often and so have lower fracture connectivity than a suite of fractures that may be more braided in character and have a much higher connectivity. Now let’s consider pavement outcrops, which may occur at construction sites, or as wave‐cut terraces along lake shores, or over some rocks more resistant to erosions such as granites which can form pavement outcrops. If we first consider the simpler case where the fractures are again sub‐vertical we can then consider whether there is a bias simply due to shape of the outcrop surface. The answer depends on your sampling plan. If you measure the line‐length of all the fractures exposed there will be no bias due to the shape of the outcrop. To explore this consider a grid with a shape superimposed on it as in Fig. 3 to the right. The grid lines can be thought of representing to equally spaced (equally dense) and geometrically perfect fracture sets. If you measure the line length of the lines enclosed in the space in one direction versus another you will find out they are very similar, and so the line length represents the proportionality of the grid overall. You can do more experiments along these lines to see if the line length accurately represents the fracture density. Figure 3: Grid with a superimposed elongate and irregular shape outlined in red, representing an outcrop exposure (within the red area) and an underlying fracture pattern of geometrically perfect and identical fracture sets at right angles to each other. However, if you measure fractures along traverses within the outcrop surface a potential for bias exists. This is basically the same bias that is described above for a line traverse, where depending on the relative orientations of the traverse and a fracture set direction, some fractures will be seen much less often. To correct for this one can either: a) weight readings in a similar manner to the above, or b) have an equal line length of two perpendicular traverse directions within the outcrop area. Now let’s consider the case where you have a pavement outcrop and the fractures are not vertical, but are dipping along a traverse. To again make things simpler, let’s assume that the traverse is perpendicular to the strike direction. The geometry now is similar to that considered for the cliff section, but rotated so that a view into the cliff is now a view down. Fractures with a low dip will be seen less frequently than vertically dipping fractures in this case. The function will be the same as with the cliff face example described above, except instead of the acute angle between the cliff face direction and the fracture strike, the dip angle is used. For the general case where the fracture strike is not perpendicular to the traverse, you can make a correction both for the strike orientation and dip angle. Again, this is easily done in the Excel environment. One question is whether it is necessary to make such weighting corrections. A significant consideration is whether the bias influences conclusions. For example, most of the examples above relate to some inference about the proportion of one preferred direction versus another (e.g. dominant versus subordinate). If your conclusions are focused only on the direction then the bias may not influence associated conclusions. However, in some cases you may either miss a preferred direction in the data noise if a fracture set is close to a “blind spot” (e.g. a fracture direction close to the traverse direction), or at the very least with a smaller n for the fracture direction the characterization of its mean direction (and variance) will suffer in precision. Additionally, it is always good to describe the sampling biases, as others may use your data in additional analysis and to draw other conclusions and they should be alerted to the inherent biases in the data set. Even though they bear the ultimate responsibility for their analysis and conclusions, why not make it easier for them. One argument that you should correct for such sampling biases when possible is that given the relative ease with which this can be done once the bias is recognized, your data becomes that much more valuable with a modest amount of increased effort.
© Copyright 2026 Paperzz