An important decision at the start of any seismic survey is where to place stations. The goal is typically to cover a large area with a small number of stations that are not unduly influenced by noise. For example, when deploying sparse networks for induced seismicity monitoring there are a number of factors to consider such as; site access, possible noise sources, areal coverage etc. The typical approach to determining station placement is a somewhat ad-hoc process taking into account the various factors and putting points on the map. But what if there was a better way?
Well, in amongst the recent coverage of the Coronavirus I chanced across this photo of a press conference in San Francisco where reporters are practicing social distancing. Which (of course) got me thinking about a possible solution to the station placement problem.
Now, before I start, I should say in the following text I am using social distancing as a (rather limited) metaphor for the purpose of informing geoscientists about station placement. I am not an epidemiologist, nor am I providing advice for health care or minimizing exposure to viral agents. There are plenty of experts out there much better equipped to advise on social distancing than I am. As always you are encouraged to listen to the advice of the authorities and experts in your area.
Social Distancing as an Optimisation Problem
First, let’s consider a simpler version of the problem. You have a large room into which you have placed N people who wish to keep as far away from each other as possible. They can move anywhere in the room to achieve this and, for the sake of keeping things simple, they can do so instantly. So where should they all move to?
Of course, there are a multiple (and possibly infinite) number of solutions to this problem depending on the shape of the room. However, we get something more useful out of the exercise if we consider how our people move away from each other after placement in the room in order to arrive at an optimal solution. Essentially each person is going to move such that they maximize the space around themselves.
This objective can be re-phased as a clustering algorithm. Our survey area (or floor space) is represented as a set of points and these points need to be divided up as equally as possible amongst our N stations (or people). In other words, we need to classify each point in the survey area/ room into groups of roughly equal size, the group centers will then be a good guess for where our stations/ introverts should go.
Let’s start with a square room/ survey area we have divided up into a 20×20 grid (blue dots), I will refer to these as the background points. As an initial seed, I have also placed 6 stations/ people randomly throughout the area (red dots).
As you can see the initial configuration of stations/ people is in no way optimal, but we can apply K-means clustering to classify the background points and compute the optimal station/ people positions as the cluster centers. If you are not familiar with K-means clustering this wikipedia article describes it in gruesome detail. However, for our purposes, it will be sufficient to say that K-means clustering provides a good way to divide up points into roughly equal blob-like groups, which in this case is what we want. The result of K-means might look something like this:
Where I have colored the background grid just to show which point on the background grid belongs to each station. Now the red points (the cluster centroids) provide a reasonable solution to where we should place our stations (or our people should move to) such that they each cover as much of the survey area or room as possible.
A (slightly) more complex example
Now you are probably thinking, not all rooms are square and not all points in a survey area are equal. For example, there might be roads, rivers, or other sites which we would like to avoid in our survey because they create a lot of noise. Similarly, some areas might be preferred for station placement because access is relatively easy compared to other areas. Well, we can account for this by altering the background grid/points and adding weights to the clustering.
Say for sake of argument our study area has a road going through it and we don’t want to place any stations within a certain distance of the road, and let’s say there is a particular area where we really want to sample. In this case, our initial configuration might look something like this
As before I have placed the six initial station positions (red dots) randomly. The road is drawn as a black line and I have removed all the points within a certain distance of it. The remaining points in the background grid are colored according to their weight, most of the weights are 1, but they increase as you approach the point we particularly want to sample on the middle right of the image. The result of the clustering (applied with weights) is as follows:
Note that the “area” assigned to each station is no longer equal, for example, the bottom left corner is more sparsely occupied for instance and the stations have gravitated towards the area I specified with a high weight. Nevertheless, the area is fully covered and the result provides some guidance form where stations should be placed in this setting.
Summary
If you have read this far then hopefully you realize that the examples I have shown are pretty simplistic, but the reasoning can be extended to more complex and more realistic geometries. Similarly, when writing this the particular use case I had in mind was the deployment of sparse station networks for induced seismicity monitoring, however, I am sure there are plenty of other applications and if you have one feel free to message me.
There are also numerous bells and whistles which could be added to optimize including Monte-Carlo style simulations, magnitude of completeness, synthetic modeling and more sophisticated clustering algorithms that could be incorporated. If you think of some feel free get in touch.
Finally, I would add that the golden rule of analysis/ geophysics still applies, quality results require quality inputs. The computed station positions from this type of technique are only going to be as good as the data and constraints that go into them. In other words, this is not a substitute for knowledge of your field area. Instead, think of it as a way to incorporate that knowledge quantitatively into determining station positions.
Further reading
In the method here, we are effectively computing a Voronoi diagram, for points in the survey area using K-means clustering. If you are looking for more background on K-means I have already mentioned wikipedia, but also any basic statistic book or course should cover it. I also found these two youtube videos
which provide quite simple (<10 minutes) descriptions of the K-means algorithm.
In general, the material covered here falls under the general heading of spatial sampling and survey design and the body of statistical literature on this is extensive. I am not even going to pretend I can do it justice. In-fact I would be very surprised if the K-means approach I have outlined here is not mentioned somewhere else. However, one alternative approach I am aware of to produce spatially balanced samples is using recursive partitioning schemes such as Stevens and Olson (2004) “Spatially balanced sampling of natural resources”, Journal of the American Statistical Association.