Getting information about a file with rios.fileinfo
Introduction
When writing a script to process GDAL/OGR data it is often necessary to be able to determine various properties of input file(s). This can be for both checking the inputs are valid or to provide sensible behaviour depending on some of these properties.
Getting information on a raster file
For rasters, the main class is ImageInfo. This can be used as in the example below:
from rios.fileinfo import ImageInfo
...
info = ImageInfo('abc.kea')
print(info)
This produces output similar to the following. The names in the first column are the fields of the object. These are discussed in more detail in the documentation.
nrows 779
ncols 772
rasterCount 1
xMin 413385.0
xMax 644985.0
yMin -3151485.0
yMax -2917785.0
xRes 300.0
yRes 300.0
lnames ['Band 1']
layerType thematic
dataType 1
dataTypeName Byte
nodataval [0.0]
transform (413385.0, 300.0, 0.0, -2917785.0, 0.0, -300.0)
projection PROJCS["WGS 84 / UTM zone 56N",...
Note how much easier this is compared with opening the file in GDAL and working with the geotransform to work out the bounds. Then iterating through the bands for the band specific information…
Statistics
There is also a ImageFileStats class that can be used to obtain statistics on each band in a raster. It works by returning a ImageLayerStats object for each band:
from rios.fileinfo import ImageFileStats
...
stats = ImageFileStats('abc.kea')
print(stats[0])
This example will print something like the following which produces a summary of the statistics:
Mean: 5.444528324476045, Stddev: 6.186597095823568, Min: 0.0, Max: 20.0, Median: 3, Mode: 3
There are more fields than shown, please refer to the documentation for more information.
Raster Attribute Tables
The RatStats class provides a summary of the statistics on each of the (numeric) columns in the Raster Attribute Table (RAT) of a thematic raster. This is done by accessing the name of the column as an attribute on the object:
from rios.fileinfo import RatStats
...
rat = RatStats('abc.kea')
print(rat.Histogram)
Note that only numeric columns are provided. Each column in the RAT is a ColumnStats object:
Count: 419147.0, Mean: 95852.37478259417, Stddev: 88020.35324744815, Min: 5.0, Max: 191676.0, Median: None, Mode: None
If you have many columns in your RAT you can speed up access by passing the columnlist parameter to the RatStats constructor with a subset of
the column names.
Vector files
Lastly, fileinfo also has the ability to obtain a summary of vector files with the VectorFileInfo
class. Similar to the ImageFileStats class you can index an object of this type with the index of the layer:
from rios.fileinfo import VectorFileInfo
...
vinfo = VectorFileInfo('poly.shp')
print(vinfo[0])
This produces a nice summary (below). As usual the documentation contains more information about the fields.
featureCount: 1
xMin: 484925.81632653065
xMax: 512179.46064139943
yMin: -3023733.5422740523
yMax: -2993073.192419825
geomType: 3
geomTypeStr: Polygon
fieldCount: 1
fieldNames: ['FID']
fieldTypes: [12]
fieldTypeNames: ['Integer64']
spatialRef: PROJCS["WGS 84 / UTM zone 56N",
GEOGCS["WGS 84",
DATUM["WGS_1984",
...
Conclusion
The classes in rios.fileinfo are very helpful for finding information about a file
in one or two lines of code. Without these classes the user would have to write more complex code and understand the complexities
of the various GDAL/OGR function calls.