Creating your own service
The IQmulus system has been built up as an infrastructure supporting to create your own processing algorithm (service) and adding to the system after it has been finalized by the project.
Check out our new description for developers.
Guidelines for developing processing services
This guide presents an overview for software developers who would like to develop processing services, which execute on the IQmulus platform. The guide focuses on requirements with respect to deployment and execution of the service in IQmulus and does not deal with any software development or implementation issues.
Requirements
It is assumed that the developer has access to an installed and running IQmulus infrastructure, which includes
- Service catalog (Artifactory[1]), which stores processing services, and also contains service metadata.
- DSL toolchain (Interpreter, JobManager and JobTracker), which enables processing and execution of services and workflows.
- User interface (“Thin client” and “Fat client”), which enables the user to edit workflows, browse, upload and visualize 2D/3D data.
- Data catalog (GeoNetwork[2]), which enables data query using metadata.
- Apache Hadoop framework[3] (including YARN, MapReduce and HDFS), which enables execution MapReduce Jobs and data sharing among the nodes within the IQmulus system.
Service implementation
The service must be provided as an executable binary, or a runnable script. Any external programming libraries can be included with the service, or can be installed on the execution platform, if possible. The service must provide automated, non-interactive execution based on the specified input arguments. Three types of services are supported by current IQmulus infrastructure:
- Standalone console application implemented in any programming language that support execution with the installed operating system (currently Ubuntu 14.04). This includes applications running on supported virtual machines, e.g. JVM or Mono.
- Hadoop MapReduce jobs implemented in Java.
- Spark applications implemented in Java, Scala, Python, or R.
In case of console application, parameters of execution, including input and output are specified as command-line arguments. Arguments must strictly be key/value pairs. An example is:
RasterProcessing -input /path/to/data/ -output /path/to/output/ -method enhancement -band 1
There is no restriction on the form of the key or the value, as long as the arguments are separated by a single space. Keys without values or values without keys are not permitted. Each key may only appear a single time, and can only have a single value assigned to it. If multiple values are needed for a parameter, they can be considered as a single string with spaces. An argument cannot be specified multiple times in the execution environment.
Example of unsupported arguments:
- Argument specified multiple times: RasterProcessing -band 1 -band 3 -band 6
- Arguments without key: RasterProcessing /path/to/data/
- Arguments without value: RasterProcessing -multiThreading
The return value of the console application must be 0 for correct execution, and other than 0 if any error occurred during processing. The error code will be returned by the system. There is no exact requirement for error codes, as it is a task of the developer to properly identify the issue in case of any non-zero code.
Data support
The supported data types for input and output in the IQmulus infrastructure are listed in Table 1. Input and output data must match these types in order for the data to be properly registered in the data catalog.
Another mandatory property of the service is the support for reading data from HDFS. As HDFS is a block-based file system, input reading must be performed sequentially, and seeking within the file is prohibited. It is not mandatory for the service to support sequential writing of the output (as most spatial programming libraries do not support this), but this must be indicated in the metadata (see next section).
For services that do not directly operate on HDFS (e.g. MapReduce jobs), access to the distributed file system is provided through local mounting of the file system to path /mnt/hdfs (by default). The service can assume, that with files under this path only sequential reading/writing is supported.
Service metadata
In order to allow the JobManager to automatically deploy and execute services the service metadata must be provided along with the service binaries. They are transferred to the service catalog by the JobManager when the service is deployed in the cloud.
The metadata is provided in JSON format with the following structure:
{ serviceDescription, "parameters": [ { parameterDescription }, {…}, … ] }
Service description provides general information about the service as defined in Table 2. Additionally, to general service properties, the parameters of the service are also defined. The parameters include the input and output parameters. At least one input and one output parameter must be specified. The order must adhere to the order the parameters are accepted on the command line by the service. The parameter description is defined in Table 3.
The metadata file should be named metadata.json.
An example metadata file:
{
"id": 42,
"name": "RasterProcessing",
"description": "Enhances a raster image using the specified method.",
"type": "Radiometric Enhancement",
"path": "bin/RadiometricEnhancement",
"language": "mono",
"language_arguments": "--server",
"required_coordinate_units": ["m", "degree"],
"os": "linux",
"hdfs_support": true,
"parameters": [{
"id": "input_image",
"name": "InputImage",
"description": "The image to enhance.",
"type": "input",
"cardinality": "1..1",
"data_type": "image",
"label": "-input"
}, {
"id": "output_image",
"name": "OutputImage",
"description": "The result of the enhancement.",
"type": "output",
"cardinality": "1..1",
"data_type": "image",
"label": "-output"
}, {
"id": "method",
"name": "method",
"description": "Name of the enhancement method to apply.",
"type": "argument",
"cardinality": "1..1",
"data_type": "string",
"label": "-method"
}]
}
Service installation and execution
IQmulus uses the Docker platform[4] for installing and executing services. Docker provides lightweight virtualization over the operating system enabling installation using a Dockerfile, which contains the required installation steps. Please refer to the Docker website for complete description.
Each service must provide a Dockerfile which contains all instructions required for executing the service on a bare system (i.e. Ubuntu 14.04). The content of the Dockerfile depends on the execution environment of the service. Each Dockerfile must ensure that the service executable is copied into the /opt/iqmulus/bin directory with the COPY command.
If installations are made (e.g. external libraries, frameworks), they should be executed within a single RUN command in Docker by concatenating instructions (&&).
An example Dockerfile:
FROM ubuntu:trusty
WORKDIR /opt/iqmulus/
RUN apt-get update -y && apt-get install -y --no-install-recommends mono-complete && apt-get install -y --no-install-recommends unzip
COPY bin /opt/iqmulus/bin
Service logging
To provide the system with correct information about execution, the following logging scheme should be applied. If omitted, execution of the service will still be possible, but determining the reason for any error occurring during execution will be hindered, as the proper messages will not be forwarded by the DSL toolchain.
The JobTracker collects logs from the standard output and standard error. Log messages must be formed according to the following syntax:
<*))))><{dateTime|~|rawTimeStamp|~|logLevelDomainValue|~|logSourceDomainValue|~|serviceIdentifier|~|processName|~|functionName|~|message|~| [[datasetName, datasetSize],[,]...] }
The initial “fish” symbol (<*))))><) is used to delineate log messages from other output messages of the service. The {} and |~| symbols identify the message and its components. These symbols cannot be used in the log message text. A full description of log items is presented in Table 4.
If the execution of the service can be broken down to discrete reading, processing and writing phases, this should be indicated by log messages defining the start and end of a phase. The start is indicated by the keyword (e.g. processing), the end is indicated by the end of prefix (e.g. end of processing).
Example log messages (do not consider line breaks):
<*))))><{2015-02-17 20:38:00|~|1424201901|~|info|~|service|~|42|~|
RasterProcessing|~|initalization|~|Service started.|~|[]}
<*))))><{2015-02-17 20:38:15|~|1424201916|~|info|~|service|~|42|~|
RasterProcessing|~|reading|~|Reading started.|~|
[input1.tif, 1809927], [input2.tif,1994632]]}
<*))))><{2015-02-17 20:38:18|~|1424201916|~|fatal|~|service|~|42|~|
RasterProcessing|~|end of reading|~|Error occurred: input2.tif
contains invalid data. Aborted execution.|~|[input1.tif, 1809927],
[input2.tif,1994632]]}
Service deployment
Services must be packed in ZIP format, which includes the service executable(s), the metadata and the Dockerfile. The file may also include any additional libraries or resources used by the service, but should not contain any data files. Service metadata (named metadata.json) and Dockerfile should be in the root directory of the ZIP file. There are no further restrictions for the directory structure within the ZIP.
The ZIP file must be uploaded to Artifactory as a new Maven artifact with a version number. The format of the version should follow the generic
Table 1: Supported data types in IQmulus
|
Short name |
Representation of |
Format |
File extension |
|
Arrays and Structs |
scalar parameters, arrays and matrices |
Matlab |
.mat |
|
Vector Data |
e.g., 2D or 3D cartographic features, working areas or user-inputs, |
ESRI Shapefile |
.shp .shx .dbf |
|
2D multi-channel image |
geo-referenced 2D image with multiple channel values (e.g., depth image) |
GeoTIFF |
.tiff , .tiff+xml |
|
Structured Point Cloud |
sets of 3D points for which information about the spatial arrangement of the points is known (e.g., knowledge about the sensor technology) |
LAS |
.las |
|
Gridded Point Cloud |
e.g., grids, rasters or DEM |
GeoTIFF |
.tiff |
|
Voxel Grid |
3D volumetric scalar field |
Matlab |
.mat |
|
Point Cloud |
set of 3D points acquired, e.g., with Lidar, or resulting from processing of 3D measurements |
LAS |
.las |
|
Enriched Point Cloud |
3D point locations with additional information, such as surface normal, uncertainty, point-based scalar values |
LAS |
.las |
|
Volumetric Point Cloud |
3D point locations and rays joining the 3D measured points to the corresponding sensor's observation viewpoint. |
PLY |
.ply |
|
Spline Curve |
Polynomial representation and/or control points |
G2-format |
.g2 |
|
Spline Surface |
Polynomial representation and/or control points |
G2-format |
.g2 |
|
Implicits |
Approximation of scalar fields, over surfaces or volumes |
Matlab |
.mat |
|
Triangulation |
Surfaces or volumes represented as triangle meshes, either 2D or 3D |
PLY |
.ply |
|
Meteo Data |
gridded binary representation of weather data |
GRIB |
.grib |
|
Abstract Dataset Collection |
virtual data set representing the union of datasets of homogeneous type and format |
|
.txt or .xml |
Table 2: Service description in the metadata
|
Key |
Type |
Description |
|
id |
integer |
The number of the service. E.g. 42. |
|
name |
string |
Human-readable name of the service without special characters and without spaces, camel case with a capital letter at the beginning. E.g. MyService, RasterProcessingService. |
|
description |
string |
Human-readable description of the service for use in the IQmulus user interface. |
|
type |
string |
Service type describing usage of the service. E.g. Interpolation, Radiometric enhancement. |
|
path |
string |
Relative path to the service executable in the ZIP file uploaded to the service catalog. E.g. bin/myexecutable.py. |
|
language |
string |
Programming language or platform the service has been developed in or for. E.g. C++, shell, python, mono, hadoop. |
|
language_arguments |
string |
Additional arguments passed on to the platform the service should be deployed on (if applicable). This field is optional. |
|
os |
string |
Operating system conforming the execution environment available in the system. Valid values are linux and windows. |
|
required_coordinate_units |
array |
Array of spatial coordinate units the service accepts. Valid coordinate units are m (Meter) and degree. E.g. ["m"] (service only accepts metric spatial coordinates), ["m", "degree"] (service accepts metric and lat/lon spatial coordinates), ["degree"] (service only accepts lat/lon coordinates). This field is optional, and defaults to ["m", "degree"]. |
|
hdfs_support |
boolean |
Indicates if the service is able to use Hadoop Distributed File System directly to write the output to. This field is optional; the default value is false. |
Table 3: Parameter description in the metadata
|
Key |
Type |
Description |
|
id |
string |
Arbitrary parameter identifier that must be unique for the service. |
|
name |
string |
Human-readable name of the parameter without special characters and without spaces, camel case with a lower case letter at the beginning. E.g. interpolationMethod, input. |
|
description |
string |
Human-readable description of the parameter for use in the IQmulus user interface. |
|
type |
string |
Parameter type. Valid values: input, output, argument. At least one input and one output parameters of the service must be specified. |
|
cardinality |
string |
Cardinality of the parameter: min..max. Valid values are 1..1 (parameter is mandatory), 0..1 (parameter is optional). |
|
data_type |
string |
Type of the parameter. Either a primitive type (valid values: integer, float, string, boolean, directory) or a dataset type. The dataset type must match one short name in Table 1. |
|
file_suffix |
string |
Output file extension or suffix (e.g. .tif or _fixed.txt). Only valid if the type is output. |
|
default |
any |
The default value in case the parameter is not specified. This field is optional. |
|
label |
string |
The name of the parameter on the command line. E.g. --input, --output, --tolerance, -i, -o. |
|
dependencies |
array |
Array of identifiers of parameters this parameter depends on. For example, if the tolerance parameter must be given when the method parameter is also given. |
|
conflicts |
Array |
Array of identifiers of parameters that conflict with this parameter. For example, if -lazy must not be given if -strict is given. |
Table 4: Item of the log message
|
Key |
Description |
|
dateTime |
Human readable date and UTC time in some generic format E.g. 2016-08-23 13:11:00 |
|
rawTimeStamp |
The Unix time, i.e. the number of seconds that have elapsed since 00:00:00 (UTC), Thursday, 1 January 1970. |
|
logLevelDomainValue |
The domain level of the log message. Supported levels are: fatal, error, warn, info, debug, trace. |
|
logSourceDomainValue |
The source of the log message. The value should be service for all services. |
|
serviceIdentifier |
The unique service identifier. E.g. 42. |
|
processName |
The name3 of process currently executed. |
|
functionName |
The name of the substep/function within a process. Mandatory values are reading, end of reading, processing, end of processing, writing, end of writing with additional optional substeps in additional logs (e.g. calculateExtent, processFeatures). |
|
message |
The log message text. |
|
datasetName, datasetSize |
The alphanumeric name of the currently processed dataset, and the size of the dataset specified in kB. Any number of datasets can be specified in the array. If no dataset is processed, an empty list can be passed. |
[1] Artifactory website: https://www.jfrog.com/open-source/
[2] GeoNetwork website: http://geonetwork-opensource.org/
[3] Apache Hadoop website: http://hadoop.apache.org/
[4] Docker website: https://www.docker.com/