An introduction to using Pyshepseg
Introduction
Pyshepseg is a tool for performing segmentation of imagery. This tool is useful when you want to segment an image into areas that are similar spectrally for the purpose of mapping areas.
pyshepseg is a tool that needs a bit of work to understand before you can
start using it. This article should help you get running with the basic
functionality.
Installation
pyshepseg can be installed via conda-forge or from source in github.
The examples below assume you have the KEA GDAL driver installed (libgdal-kea on conda forge).
Segment a smallish image
The most simple case is loading an image into memory, segmenting it with
pyshepseg and saving it again. Since this requires all of the image being
read into memory you may want to select a small one (or only segment part
of it). Here is an example of using pyshepseg.shepseg.doShepherdSegmentation to achieve this:
from pyshepseg import shepseg
from osgeo import gdal
inds = gdal.Open('56JMQ.kea')
img = inds.ReadAsArray()
segResult = shepseg.doShepherdSegmentation(img)
ysize, xsize = segResult.segimg.shape
driver = gdal.GetDriverByName('KEA')
outds = driver.Create('out.kea', xsize, ysize, 1, gdal.GDT_UInt32)
band = outds.GetRasterBand(1)
band.WriteArray(segResult.segimg)
This script reads in all bands of 56JMQ.kea into memory and uses them to do a segmentation and
writes back the result to out.kea. This uses the default parameters for the segmentation, see
the docstring of pyshepseg.shepseg.doShepherdSegmentation for information about how to tweak the segmentation for your
situation.
segResult is an instance of pyshepseg.shepseg.SegmentationResult that contains information about the segmentation. Note that
you can save the kmeans object calculated for this segmentation and pass it into another
segmentation run.
Segment a large image
What happens if you have an image that is larger than can be fitted into memory? That’s
where the pyshepseg.tiling.doTiledShepherdSegmentation function comes in. It is similar to pyshepseg.shepseg.doShepherdSegmentation
but handles splitting the image into tiles, processing them and stitching the result together.
Here is an example of processing the same image:
from pyshepseg import tiling
segresult = tiling.doTiledShepherdSegmentation('56JMQ.kea', 'out.kea')
Again, see the docstring for pyshepseg.tiling.doTiledShepherdSegmentation for tweaking the parameters for your situation. Note that the overlap must be sufficently large so that the segments on tile boundaries can be merged.
segresult is an instance of pyshepseg.tiling.TiledSegmentationResult which contains information about the tiled segmentation.
Note that because of the overlap, doing a tiles segmentation is less efficient than an whole image segmentation. However, it does allow you to segment very large images. For information about running the tiled segmentation in parallel, please see our previous blog post.
Calculating per-segment statistics
pyshepseg also has support for calculating per-segment statistics from another image for each segment with the tilingstats module. This module processes the image in a tiled manner meaning that very large images
can be processed. There are two functions, one for non-spatial statistics and another one for spatial statistics. Here is an example calculating the mean (for each segment) of the first band that we used for the segmentation:
from pyshepseg import tilingstats
tilingstats.calcPerSegmentStatsTiled('56JMQ.kea', 1, 'out.kea', [('Band1_mean', 'mean')])
When you open out.kea you will see that there is a new raster attribute column Band1_mean
containing the means of 56JMQ.kea band 1 for each segment. See the docstring of
pyshepseg.tilingstats.calcPerSegmentStatsTiled for more information on the types of statistics
that can be calculated. Needless to say you can calculate statistics on files that weren’t
used for the segmentation (soil maps, landuse etc) but they all must match on extent and pixel size.
Spatial statistics can be calculated in a similar manner, however instead of a statistic name you
must supply a callback written in Numba. There are some functions for common spatial statistics
already in the tilingstats module:
- userFuncVariogram for calculating the variogram at a given distance.
- userFuncNumEdgePixels for calculating the length of the boundary of a segment.
- userFuncMeanCoord for calculating the mean coordinate of a segment.
Here is an example of calculating the variogram at distance = 1 of each segment against the input data:
from pyshepseg import tilingstats
from osgeo import gdal
tilingstats.calcPerSegmentSpatialStatsTiled('56JMQ', 1, 'out.kea',
[('Band1_var', gdal.GFT_Real)],
tilingstats.userFuncVariogram, 1)
This creates a new raster attribute table column Band1_var with the variogram value. Note how you specify
the types of the the columns to be created, and the callback fills in the values for these. See the docstring
for for more information about the parameters.
Conclusion
pyshepseg is a very powerful tool once you understand how to call it. It has the ability to process images
much larger than memory available, and also calculate statistics on the segments in an efficient manner.