Skip to main content


Showing posts from 2010

Smooth as...

For a current project I needed a polygon smoothing algorithm in Java that:
generated a curve passing through all polygon vertices
had some way of controlling tightness of fit to the polygon
fell within the domain of my very limited mathematical grasp
Luckily for me, Maxim Shemanarev had not only worked out a Bezier curve algorithm that satisfied all of the above, but he also published a beautifully clear explanation of how it worksnote 1. Maxim also provided some illustrative C code, but my Java implementation below is an independent effort (ie. all mistakes are my fault). It uses geometry objects from the JTS library.

Below is an example of the algorithm in action. The original polygon is in black. The smoothed polygon composed of fitted cubic Bezier segments is in blue. The red points and lines indicate positions of the Bezier control points calculated with Maxim's algorithm. There are a pair of control points for each vertex of the input polygon.

The tightness of fit of the smooth…

Monte Carlo testing of classification groups

This is another article on the theme of defining groups in a hierarchical classification. A previous article described homogeneity analysis to visualize how any well any number of groups, defined at the same level accounts for the variability in the dataset, as measured by within-group pairwise distances. Here we will look at testing whether splitting a particular group into two daughter groups is supported by the data.

Suppose we have a group G containing N objects which we are considering splitting into two groups G1 and G2 consisting of N1 and N2 objects respectively. A Monte Carlo approach to testing whether this is worth doing is as follows:

1. Calculate the mean within-group pairwise distance that would result from defining G1 and G2.

2. Randomly allocate objects to two groups of sizes N1 and N2 and calculate the resulting mean within-group pairwise distance.

3. Repeat step 2 a large number of times (e.g. 999) to create a vector of Monte Carlo means. Add the actual mean from step 1 …

Fun with the proto package: building an MCMC sampler for Bayesian regression

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language - a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas Petzoldt provide an excellent introduction to using proto in the package vignette.

As a learning exercise I concocted the example below involving Bayesian logistic regression. It was inspired by an article on Matt Shotwell's blog about using R environments rather than lists to store the state of a Markov Chain Monte Carlo sampler. Here I use proto to create a parent class-like object (or trait in proto-ese) to contain the regression functions and create child objects to hold both data and results for individual analyses.

First here's an example session...

# Make up some data with a continuous predictor and binary response
nrec <- 500
x <- rnorm(nrec)
y <- rbinom(nrec, 1, plogis(2 - 4*x))

# Predictor matrix with a col of 1s for i…

Homogeneity analysis of hierarchical classifications

I've spent more years than I care to remember analysing vegetation survey data (typically species abundances in plots) using a variety of software including my own algorithms coded in FORTRAN and C++. A recent query on the r-help list, about how to determine the number of groups to define in a hierarchical classification produced with the hclust function, prompted me to unearth one of these algorithms, homogeneity analysis1, which can help to visualize how different levels of grouping partition the variability in a distance matrix.

This algorithm is extremely simple. The classification is progressively divided into groups, with all groups being defined at the same dendrogram height. At each level of grouping, the average of within-group pairwise distances is calculated. Homogeneity is then defined as:
H = 1 - Davwithin-group - Davtotal

where Davtotal is the average pairwise distance in the dataset as a whole.

For data were there exist well-defined clusters of values, a plot of h…

Meeting in the middle; or fudging model II regression with nls

My colleague Karen needed an equation to predict trunk diameter given tree height, which she hoped to base on measurements of trees in semi-arid Australian woodlands. This is the dark art of allometry and a quick google found a large number of formulae that have been used in different studies of tree dimensions. No problem: I started to play with a few of them and eventually settled on this one:
dbh = exp( b0 + b1 / (b2 + h) )
where: dbh is trunk diameter at breast height; h is tree height.

Karen also needed to do reverse predictions, ie. predict a tree's height given its trunk diameter. Again no problem, the inverse equation is simply:
height = b1 / (log( dbh ) - b0) - b2

But then, the pièce de résistance: the forward predictions had to agree with the reverse predictions, ie. if plugging height h into the forward equation gave trunk diameter d, then plugging d into the reverse equation should get you back to h. Karen pointed out that this seemed to be a Model II regression problem bec…

Circle packing with R

To visualize the results of a simulation model of woodland trees within R, I needed an algorithm that could arrange a large number of circles within a rectangle such that no two circles overlapped by more than a specified amount. A colleague had approached this problem earlier by sorting the circles in order of descending size, then randomly dropping each one into the rectangle repeatedly until it landed in a position with acceptable overlap.

I suspected a faster and more robust algorithm could be constructed using some kind of "jiggling the circles" approach. Luckily for me, I discovered that Sean McCullough had written a really nice example of circles packing into a cluster using the Processing language. Sean's program is based on an iterative pair-repulsion algorithm in which overlapping circles move away from each other. Based on this, and modifying the algorithm a little, I came up with an R function to produce constrained random layouts of a given set of circles. He…

Another clip round the ear

In the last episode I presented some code to triangulate a polygon using the ear-clipping algorithm. In this instalment the code has been extended to deal with polygons that have holes. Here are two example results...

No, they're not the prettiest triangulations ever seen, but they could be improved with some post-processing.

Here is the Java code used to generate them. Note, you'll need the JTS library if you want to run this. Warning: this code isn't pretty, even by my standards, but it might be useful if you want to play with the algorithm.

import com.vividsolutions.jts.algorithm.CGAlgorithms;
import com.vividsolutions.jts.geom.Coordinate;
import com.vividsolutions.jts.geom.Envelope;
import com.vividsolutions.jts.geom.Geometry;
import com.vividsolutions.jts.geom.GeometryFactory;
import com.vividsolutions.jts.geom.LineString;
import com.vividsolutions.jts.geom.Polygon;
import java.awt.Color;
import java.util.ArrayList;
import java.util.Arr…

A clip round the ear

Inspired by a recent discussion on the JTS user list about decomposing simple polygons into triangles, here is a program that demonstrates the brute force, ear-clipping algorithm. To run this you will need the JTS library which you can get from the project web site or as a maven artifact (version 1.11) from the Maven central repository.

import com.vividsolutions.jts.algorithm.Angle;
import com.vividsolutions.jts.geom.Coordinate;
import com.vividsolutions.jts.geom.Geometry;
import com.vividsolutions.jts.geom.GeometryFactory;
import com.vividsolutions.jts.geom.LineString;
import com.vividsolutions.jts.geom.Polygon;
import java.util.ArrayList;
import java.util.List;

* Demonstrates brute force approach to the ear clipping algorithm
* to triangulate a polygon. This demo doesn't yet cope with
* polygons that have holes.
* @author Michael Bedward
public class EarClipping {
private static final double EPS = 1.0E-4;

GeometryFactory gf = new G…

Cubic spline interpolation

I spent a while today casting around for a simple, fast and just-good-enough cubic interpolation algorithm to use in the GeoTools library where it is to serve as a 'filter function' that can be used when rendering maps.

The ACM digitial library has a mind-boggling collection of references on interpolation methods, most of which were fascinating but also over the top for this application.

In the end I put together the following code based on a Wikipedia article about cubic Hermite splines. Here it is...

* Cubic hermite spline interpolation. This is adapted from the description of the
* algorithm at:
* Tangent caculations are done with simple finite differencing in the interests
* of speed.
* <p>
* The input arrays xi and yi contain the coordinates of four interpolation
* points defining three segments with the middle segment containing the point
* for which we seek an interpolated value.
* @param x x ordinate of the po…