Considerations about the API
This document comments issues related to 201307091316-Re: Some API methods [gvSIG] [GSoC 2013]
I'm going to talk about a lot of things that you comment on...
I leave aside the fnction loadRasterLayer because this will be surelly needed.
In relation to the method getDataMatrix, we don't want to have it in the scripting features. In fact in the Java API it exists sth similar and we believe this should desappear, because for the user side there should be no difference between a layer and the "DataMatrix", they should be the same thing.
About the method "createDataMatrix", it would be necessary, but it would be something as createLayer, and possibly it will need some extra parameter, as minimum the data type with which it has to work, because a raster file can be from integers, doubles, bytes ... and to create it, you should say the type, and could also happen that each band of the raster will be from a type ... but can be seen when we get to the creation of rasters.
In relation to the method copymatrix, although it may seem interesting, we must be very careful with it. I have not worked with matlab, but when we talk about GIS we can be working with large rasters ... a small one could be of several gigabytes, so something like:
dataMatrix2 = copyMatrix(dataMatrix1)
it may mean running out of memory on your computer. You should never assume that the data is loaded into memory as it could be on an ordinary array, and you have to be careful with the operations performed, because they can double the disk space or spend a lot of time. I really don't see the use of that method.
About the method getData it is obviously needed.
The other methods, applyFilter, getHistogram, applyLocalOperation and applyNeighbourOperation I find them interesting, but involve more analysis ...
- You introduce the concept of filter ... Should have entity in the Scripting's API model? What methods should have an object of filter type?
- Similar to that related to the histogram, but here we have to see the operations that should be interesting and try to define them.
- The methods apply-operation, ugh! I find them very interesting. Surely we'll find memory problems in applying them, we'll have to see how to solve.
I'll leave aside the issue of the histograms, I think it will be completely out due to time restrictions.
Let's have a look to the filter and operations. As I said before, we work with "big" data. Due to the massive nature of the data, usually these are read-only loaded, you scan it and generates another, always on hard disk. To do something like:
modifiedRaster = applyFilter(layer, "filtername")
We'll need more information. We would need:
- Define the resulting layer, as minimum the format and location.
- Creating it
- Apply the filter on the source layer saving the results in the target layer.
- Close, if necessary, the resulting layer.
It is important to keep this in mind, as doing so we see that the defined input parameters are not enough.
We could have something like:
target = createLayer (filename, type) source.filter("filtername", target)
And if we want to simplify this we could have:
source.filter("filtername", TargetFileName, targetType)
It's not the cleanest way but we must not forget that we are talking about an API for scripting, to be used by users.
Let's see the filters...
... for example, we could have a three-band raster, RGB, from byte type and want to increase its brightness ... could do something like this:
def shineFilter(values): return (values[0]+10, values[1]+10, values[2]+10) raster.filter(shineFilter, targetFileName, targetType)
We're going to make the method filter receives the invoking function for each raster point by passing a tuple raster values for each band at that point. This function will return a tuple with the values that the bands should have at that point to the resulting raster. This seems quite powerful and maintains the API small and easy to handle.
We can provide several functions in order the user can apply them without having to define again and again. For example, we could have defined the class ShineFilter with something like:
# read for info -> http://docs.python.org/2/library/functions.html?highlight=callable#callable class ShineFilter(Object): def __init__(self, shineDelta): self._shineDelta = shineDelta def __call__(self, values): delta = self._shineDelta return ( values[0]+delta, values[1]+delta, values[2]+delta )
And the user could do something like this:
raster.filter(ShineFilter(10), targetFileName, targetType)
Another operation that would be interesting to work on is for giving support to the kernel type operartions on rasters. These are operations that do not take as input the value of a single point from the source raster, but involved the values of surrounding points. They would be similar to what I have comment before about the filter, but instead of receiving a table with bands' values, for each point you would receive a 3x3 matrix with the current point in the center. As in the preceding case, for each point of the matrix, we would have a tuple with the values of each band. Using this function to do the same thing we did before with the function shine we would have something like:
def kernelShine(values): # values -> row, col, band delta = 10 return ( values[1][1][0]+delta, values[1][1][1]+delta, values[1][1][2]+delta ) raster.kernel(kernelShine, targetFileName, targetType)
And would allow us to become the value of point dependent from those around, for example, we could alter the brightness value as a function of the point that precedes:
def kernelShine(values): # values -> row, col, band delta = 10 x = values[1][0][0] if x < 25: delta = 30 elif x < 50: delta = 20 else: delta = 10 return ( values[1][1][0]+delta, values[1][1][1]+delta, values[1][1][2]+delta) raster.kernel(kernelShine, targetFileName, targetType)
We can add this I comment about filters with the functions you propose about applyLocalOperation or applyNeighbourOperation, and working with methods such as:
layer1.localOperation(operationFunction, layer2, targetFileName, targetType)
or:
layer1.kernelLocalOperation(operationFunction, layer2, targetFileName, targetType)
Where operationFunction, instead of receiving an isolated parameter, should receive two, on e for each incoming raster. And we could offer funcionts such as:
def AddOperation(values1, values2): return ( values1[0]+values2[0], values1[1]+values2[1], values1[2]+values2[2] )
They should be a bit more complicated, in fact we should be thinking about something else... On the brightness example, I put that we had 2 bands. But we could have a non specific number of bands, so in the function AddOperation we shoul bear that in mind in order to make it better.
With applyNeighbourOperation we could do something like:
maxRGBValues = [0,0,0] def maxRGB(values): if values[0] > maxRGBValues[0] : maxRGBValues[0] = values[0] if values[1] > maxRGBValues[1] : maxRGBValues[1] = values[1] if values[2] > maxRGBValues[2] : maxRGBValues[2] = values[2] layer.neighbourOperation(maxRGB) print maxRGBValues
The user will define its function that will pass to neighbourOperation and that one should be the one that will be invoke for each point. This could be done in a more elegant way with:
class MaxRGB(object): def __MaxRGB__(self): self._max = [0,0,0] def __call__(self, values): if values[0] > self._max[0] : self._max[0] = values[0] if values[1] > self._max[1] : self._max[1] = values[1] if values[2] > self._max[2] : self._max[2] = values[2] def getValues(self): return self._max max = MaxRGB() layer.neighbourOperation(max) print max.getValues()
Although it is more elegant, the only thing we have changed is the user side, not the API itself, and I think we are troubling the user if he/she has to create a class in order to perfomr a calculation.
What I think we should have is a series of classes, as you comment, with the MAX, MIN or MEDIAN operations already imlemented.
I'm going to change the name the the following function from applyNeighbourOperation to walk.
Basically we have seen we should have 3 type of operations:
- filter, that create a new raster applied on a single point.
- oiperation, that has 2 rasters as income
- walk, function that repeat on every raster point in order to make operations with its values.
And that operations can be done taking as a incoming data the raster point value, or instead of that point and its surroundings (kernel).
Before continuing we should shape those ideas.
When performing analysis we are identifying entities...we have identified an "Layer" entity, so the first thing we have to do is to define what it is and which operations and attributes we want them to have. We do not have isolated functions. We'll have entities such as "RasterLayer" with a serie of operations such as getData, filter or getHistogram.
It is a good practice to have factories to create instances of our objects, so loadLayer or createLayer instead of having them as operations on our layer, we can have them as functions who act as a factory in our RasterLayer.
Defining an API involves identifying the entities involved, the relationships and attributes between them, in the context of what we need to do. So far we have centered on the RasterLayer entity and the operations it can have.
So we'll have:
loadRasterLayer(rasterfile [,mode="r"]) createRasterLayer(rasterfile ....) class RasterLayer(Layer): int getBandsCount() long getWidth() long getHeight() int getDataType() Object getData(band, row, colum) filter(filter, targetFileName, targetType) operation(operation, layer2, targetFileName, targetType) walk(operation) kernel(filter, targetFileName, targetType) kernelOperation(operation, layer2, targetFileName, targetType) kernelWalk(operation)
The getDataType method will give you a series of constant integer (in Java it could be an enumerated), which identifies the raster layer data type (byte, double, integer).
All these ideas are part of the analysis procedure, they help us to identify what we have, and with that we can start our work. We need to do 2 things:
- from one hand create the structured documentation (for the only one class ant their operations)
- Try to implement a prototype
Related to documentation, you can follow the similar structure to the Java doc documents:
- Name and description of the entity
- Sum up enumeration of methods, with a simple description (no more than a line) el these ones.
- Detailed enumeration of the methods. Entire description, what they do, origin parameters, types of parameters, etc.
Think that this docs will be then visualizaed in HTML format, most probably from a Java panel, so try not to use overloaded formats for that purpose.
In other document aside, I'll start to comment about how to start the prototype.